Sharepoint Forum

Ask Question   UnAnswered
Home » Forum » Sharepoint       RSS Feeds

Portal Search not Working for PDf image file

  Asked By: Travon    Date: Mar 26    Category: Sharepoint    Views: 1984

I have loaded Adobe's IFilter 6.0 software that allows WSS to search PDF
documents. That works fine.

I have run into a problem as several of our SharePoint users are scanning
paper documents. These paper documents are converted to PDF format and then
uploaded to SharePoint. Since these documents are scanned in, they are
converted to a PDF as an image. This prohibits the Search function from
operating on these documents.



10 Answers Found

Answer #1    Answered By: Laura Walker     Answered On: Mar 26

Scanned PDFs need to be processed with Optical Character
Recognition (OCR) software  at some stage in the process. This could be
done during the scan (some scanners/copiers provide for this) or after
the scan. If you have to do it after the scan, we've found that Adobe
Acrobat Professional works  very well (Document -> Recognize Text using
OCR). There are also more elaborate solutions that can automate this for
you. OCR accuracy will vary with the software, the quality of the scan,
whether the scan is aligned exactly horizontally, the font used, the
colour of the paper  of the original, and the resolution of the scan.
Once the document has been processed with the OCR software and saved,
both the image  and text are stored with the PDF document. The image is
what is displayed when the document is opened using Acrobat Reader, and
the text is stored as metadata in the background. It is always this
metadata that is indexed by the IFilter.

Answer #2    Answered By: Nina Banks     Answered On: Mar 26

well I have another problem  ,
I am using e-mail enable document library which uses public folder PDF files
as attachement.
But the content of this PDf files not searches in wss  site even it is pure
PDF content not image  format.
Even i index the portal  content.
Is there any other settings for this search  require.

Answer #3    Answered By: Sharonda Mcfarland     Answered On: Mar 26

Just wanted to check if you had installed the PDF iFilter
onto the SQL Server also. If you haven't then you may need to. This will
enable the FullText SQL Search that WSS Uses to index the PDF's.

Answer #4    Answered By: Kalyan Pujari     Answered On: Mar 26

my Portal and SQL server avaailble on same Machine.and install PDF
iFilter6.0 and i am using Adobe 7.0
and I am using SQL server 2005,
there is one settings for SQL server 2005 like

to execute following script

exec sp_fulltext_service 'load_os_resources', 1;

exec sp_fulltext_service 'verify_signature', 0;

This is i performed on server .

still it is not serching what will be issue,
Is this is related to the Public folder document library issue.
Any idea ?

Answer #5    Answered By: Christop Mcfadden     Answered On: Mar 26

You'll also need to add PDFs as a valid file  type in the content

http://<SERVER:PORT>/ssp/admin/_layouts/managefiletypes.aspx and
select New File Type.

That should trigger SharePoint Search to begin indexing documents
with a PDF extension.

I can't remember how to get the little PDF icon to show up on that
page, but it can be done.

Answer #6    Answered By: Gopal Jamakhandi     Answered On: Mar 26

I done all this steps and search  is working  now for some words
but sometime it not search certain words

whay this happens ibn PDF search.??
Any ideas !

Answer #7    Answered By: Chantal Rosa     Answered On: Mar 26

You'll often find words misspelled due to inaccuracies in the OCR
process. To test:

- Open the PDF in Acrobat Reader
- Copy the text as stored in the background with <CTRL> A
- Paste the text in a text editor and compare the spelling of the
in question

We've looked as ways of improving OCR accuracy for scanned  documents
with mixed results. The only recommendation I can offer is to test
different scanners, resolutions, scanning methods, and OCR software.

The way to eliminate this problem, by the way, is to generate the PDF
from within a document editor such as Word. The document editor
generates the background text without going through OCR and is 100%

Other things to watch for:

- SharePoint search  'word stemming' (search for swim, hit on swam)
- SharePoint Search 'noise word' list (search for 'go the distance',
hit on 'go for distance').

Answer #8    Answered By: Kyla Eckert     Answered On: Mar 26

Now my search  is working  fine............

Answer #9    Answered By: Alisha Holmes     Answered On: Mar 26

To change the file  icon, I followed these steps:

Answer #10    Answered By: Buyi Wen     Answered On: Dec 03

You can try this free online ocr http://www.online-code.net/ocr.html to extract text from image and scanned pdf document.

Didn't find what you were looking for? Find more on Portal Search not Working for PDf image file Or get search suggestion and latest updates.