MOSS Forum

Ask Question   UnAnswered
Home » Forum » MOSS       RSS Feeds

Indexing PDFs

  Asked By: Tameka    Date: Sep 02    Category: MOSS    Views: 957

Has anyone found a solution to searching non-OCR pdf files? Any recommendation
on an application to convert them. We are about to set up sites for contracts -
a large number of which are scanned... Any help would be appreciated.



11 Answers Found

Answer #1    Answered By: Upendra Bordoloi     Answered On: Sep 02

When you say "non-OCR", do you mean they're just scanned as images? You
aren't going to be able to index those based on content; any searchable
attributes will have to be added as metadata.

Answer #2    Answered By: Ali Javed     Answered On: Sep 02

What are non-OCR PDFs?????????????????

Answer #3    Answered By: Karrie Wooten     Answered On: Sep 02

Did you scan the documents and save them as an image (JPG, GIF, BMP, ETC.) or
did you convert and save them in a PDF Format?

Answer #4    Answered By: Alan West     Answered On: Sep 02

I do remember having to specifically allow files with .pdf extensions, and there
was another setting that needs to be set too. Can't remember off the top of my
head what it is though. But I'm guessing what you need is not that simple.

Answer #5    Answered By: Maribel Todd     Answered On: Sep 02

I assume he means PDFs that are pictures, not text.

Answer #6    Answered By: Akshara Negalur     Answered On: Sep 02

yes, the documents are received as hard copy and scanned in .. created a PDF
which is in essense an image. The resolution to this is as simple as Acrobat
which has an option to convert the document, adding a layer of text, which it
reads from the document. This is easy. Acrobat also provides the option to
convert folders - meaning multiple documents. A more expensive option for
volume is adding this capability the scanner - Zerox has an option - so
that when the document is scanned, it is made OCR compatible... This is more
expensive, but worth it for large volume. In our example, we have a library
documents that we would like in SP for historic purposes...

Answer #7    Answered By: Timothy Davis     Answered On: Sep 02

If you are saving the files as an image, that is you can't highlight words or
sentences, then your are going to have to use Metadata in order to search.
Metadata is much less effective and efficient.

Why is the OCR saving the files as Text based PDF?

Answer #8    Answered By: Adya Deshmane     Answered On: Sep 02

I understand that sometimes you need an image for archival purposes. There
are third-party scanning utilities that can use OCR to scan key portions of
a page and use the resulting data to fill metadata fields on the document
when it's checked in.

Answer #9    Answered By: Siobhan Waller     Answered On: Sep 02

Perhaps these articles may help:

Answer #10    Answered By: Lorenzo Steele     Answered On: Sep 02

Yes, that ifilter looks familiar. I believe I had to download something from
adobe, and the .pdf in the allowed file types and I was good to go.

Answer #11    Answered By: Buyi Wen     Answered On: Dec 03

I find a free online ocr http://www.online-code.net/ocr.html, supports 40+ languages, can convert image to plain txt file and searchable pdf document.

Didn't find what you were looking for? Find more on Indexing PDFs Or get search suggestion and latest updates.