Sharepoint Forum

Ask Question   UnAnswered
Home » Forum » Sharepoint       RSS Feeds

Content Sources

  Asked By: Abhilash    Date: Mar 28    Category: Sharepoint    Views: 1069

I need assistance in creating a Content Source. I'm trying to create a
content source pointing to another workspace on the same server. But
when I run the update, I get update failed and here is the details:

9/17/2002 3:00:44 PM Error fetching URL, (80040e09 - Permission
de... ide.portal-dynamics.net/syntroleumactd/documents/

9/17/2002 3:00:44 PM
Document Added
Error fetching URL, (80040e09 - Permission denied. )
http://fqdn/workspace/documents/ <http://fqdn/workspace/documents/>
For more information about this message, click Here.
I have the administrator as the content access account as well as the
propagation access account. What am I missing?



7 Answers Found

Answer #1    Answered By: Aastha Acharya     Answered On: Mar 28

I have a question about content  sources. I setup a content source  to
crawl a web site, i.e., www.cnn.com. Is the content souce indexing
everything on the web site or just the name of the content source? When
I did a search for an item on the web site, I don't get any results. But
when I did a search on CNN, I get a result.

Answer #2    Answered By: Glenda Roth     Answered On: Mar 28

You can check your logs for any errors in crawling the site. If you're
really crawling CNN, then you're probably getting some info in the logs
where it's skipping the site because there's a robots.txt file (or "no
robots" metatag in one of the top pages) and by default, Sharepoint
plays nicely as a crawler and respects the destination's wishes.

When you created the content  source, did you specify "this folder and
all subfolders" or just "this folder"?

Also, depending on what you find in the logs, you may need to tweak the
hops/depths and don't forget about site path rules. Site path rules are
set by going to http://server/workspace/Management/Content%20Sources,
then doubleclicking on Additional Settings. Under Rules there will be
"Site Paths".

For example, this is where the SHADOW directory is excluded. The SHADOW
directory contains all of the unpublished versions of documents (1.1,
1.2, 1.3, etc). It is by using site paths that sharepoint only indexes
published versions (1.0, 2.0, 3.0, etc). It is also by using site paths
that you can specify which NT account  or credentials to use per content

Answer #3    Answered By: Jada Clemons     Answered On: Mar 28

I selected "This site - follow links to all pages on this site".

Answer #4    Answered By: Brooke Lewis     Answered On: Mar 28

And what about the logs? And the stats for the content  source. For
example, go to what I listed below, then properties of the content
source. The 'General' tab shoul dhave something like

[name of the share]

Type: File Share Content Source
Address: \\server\share

Status: Processing notifications

Created: Wednesday, April 24, 2002,9:11:12 PM

Last Built: Using notifications

Indexed: 8,721 Items

Not Found: 84 Items
Access Denied: 0 Items

Other Errors: 26 Items

Excluded: 17,463 Items

Aha, now you can see that there are errors. So then go to Windows
Explorer, browse to content sources, enable web content in Windows
Explorer (Tools->Options), then you'll be able to see the logs. There
will be a link saying "Show detailed logs..."

You can sort and filter by type of error, warning, whatever.

Most of my errors are of files and directories with "@" in the name.

Answer #5    Answered By: Talia Johns     Answered On: Mar 28

Under site paths, I entered in http://www.cnn.com/" target="_blank" rel="nofollow">http://www.cnn.com/*. I checked the box
to include this path and enabled complex links.

After running a full update  on this content  source, this is what I get:

Type: Web Site Content Source
Address: http://www.cnn.com
Status: Idle
Last Built: 4/4/2003 11:09:14 AM
Health: Red line with 0%
Indexed: 0 Items
Not Found: 0 Items
Access Denied: 0 Items
Other Errors: 0 Items
Excluded by Rules: 0 Items

And when I clicked on Detailed Log, I don't see any errors or warnings.

Answer #6    Answered By: Tera Callahan     Answered On: Mar 28

I finally figured it out. That one server  was so tightly locked down by
my sys. adm. that it didn't allow me to do what I was trying to set up
in SPS. I tried the content  source on three other servers and they work

Answer #7    Answered By: Mark Davis     Answered On: Mar 28

Truly interesting. It sure does look like the update  hasn't run.

I just created a new one on a test workspace, pointed to
http://www.cnn.com (no site paths yet) and with a site hop max of 0 and
a depth of 1. It immediately indexed 90 items and excluded 13,000.
Took about 30 seconds.

A lot of the reasons are "URL is excluded by the server  (robots.txt,
no-index attribute on the URL, encrypted file, or a search folder), or
redirected to an excluded URL" But of course it excluded 13,000 because
I set the depth restriction. No reason getting cnn mad at *me* :)

I added a sitepath of http://www.cnn.com/* and enabled complex links and
it did the same thing -- took about 40 seconds, excluded 13,000 items
and indexed 91 items.

Have you tried crawling a different web site? Do your other content
sources show errors, indexed items and exclusions? What does your event
log show during the time of the crawl?

Didn't find what you were looking for? Find more on Content Sources Or get search suggestion and latest updates.