Sharepoint Forum

Ask Question   UnAnswered
Home » Forum » Sharepoint       RSS Feeds

PDF iFilter

  Asked By: Lucas    Date: May 05    Category: Sharepoint    Views: 1682

In my current environment, I have 2 FEs, 1 Index Server and a dedicated SQL
Server. I've installed the PDF iFilter on my Index Server, did a full crawl
and found out that it's not searching the content of the PDF files. It's
successful searching on the filename and metadata fields but not the content.
Does this have anything to do with whether the PDF file is searchable or not?



10 Answers Found

Answer #1    Answered By: Alexandra Lewis     Answered On: May 05

There are a couple things that could be causing the issue.

1. Are the PDFs from a scanned source or created by printing from Acrobat
or a word processing application? If they are scanned then the PDFs are image
files and the contents can't be indexed without an OCR engine to translate the
scanned image of text into actual text.

2. Did you add PDF as one of the extensions to index  in MOSS. Installing
the PDF iFilter makes it possible to index PDF text files, but you still need to
tell MOSS to index the files.

Answer #2    Answered By: Himanta Barthakur     Answered On: May 05

For #1, files  that are saved from MS Word 2007 into PDF, can the content  be
indexed and searchable? For #2, yes I did add the file  extension.

Answer #3    Answered By: Mansi Revenkar     Answered On: May 05

Yes, any time you print as PDF -- assuming you're printing from a text-based
format -- the resulting document should be index-able.

Answer #4    Answered By: Lizette Mcconnell     Answered On: May 05

One easy way to be sure is to open the pdf  in Acrobat Reader and try to
do a Find on the test in question. If Acrobat can't find it, then the
iFilter won't be able to index  it.

Answer #5    Answered By: Rosanna Parrish     Answered On: May 05

I did what you recommended below. I was able to search for a word within Acrobat
Reader but when it's in SharePoint, the content  is not indexed.

Any recommendations?

Answer #6    Answered By: Kalash Karmakar     Answered On: May 05

1) Make sure you have followed the full  setup process (I saw
someone else send a link to that)

2) Try uploading a new copy of the PDF before indexing - by times
I've seen documents created before the changes in #1 not get picked up
in crawls.

Answer #7    Answered By: Mauricio Tanner     Answered On: May 05

if the pdf  is a scanned document with no ocr (text behind the image) then its
essentially non indexable

You can check by just downloading a pdf from a web site like microsofts and
seeing if it indexes that

Answer #8    Answered By: Gina Freeman     Answered On: May 05
Answer #9    Answered By: Pablo Igualada     Answered On: Jun 08

I had the same requirement, and scanned PDF, unless you have OCR'd them, can't be searched within SharePoint with standard Adobe IFilter.

We havee developed a custom solution to make this: you can look for it here:


Answer #10    Answered By: Buyi Wen     Answered On: Dec 03

i find a free online pdf ocr http://www.online-code.net/pdf-to-word.html, it can recognize and extract text from pdf document.

Didn't find what you were looking for? Find more on PDF iFilter Or get search suggestion and latest updates.