Sharepoint Forum

Ask Question   UnAnswered
Home » Forum » Sharepoint       RSS Feeds

Thesaurus and how it works

  Asked By: Viral    Date: Apr 19    Category: Sharepoint    Views: 1495

Does anyone know how to add elements to the thesaurus. I found the
file but I am not sure what key words to use. What are the different
data definitions, tags, etc. How do I use them ? Is there any
documentation out there ?



5 Answers Found

Answer #1    Answered By: Quentin Cummings     Answered On: Apr 19

This is something that I'm interested in using inside SPS and have done some
work on.

Some answers to your questions are:

1. Not sure which key  words to use?

This is one of the strengths of the facility - it is meant to be flexible.
The idea is that you use a subject specific thesauri depending on the
namespace that you are in. So for a computing firm you would use a computing
thesaurus; for a law firm a law one, etc. etc. The idea is that you are
standardising the terminology used within the namespace and as a result
making information retrieval, using the SPS search engine, much more

So the original file in SPS is intentionally a "blank" one into which you
substitute your own relevant namespace thesaurus.

As an example of this - have a look at part of a computing thesaurus  (from
the ACM) that I prepared to test SPS. The idea is that I included it for a
Workspace against my own personal document set (about 3k of docs focused on
enterprise computing) to improve precision and recall. (If you want I can
send you the whole set of files that I used to set this up. Note that I
haven't altered the sub weighting of the files at all.)

- <XML ID="Microsoft Search Thesaurus">
- <thesaurus>

- <expansion>
<sub weight="1.0">PROGRAMMING TECHNIQUES (E)</sub>
<sub weight="1.0">General</sub>
<sub weight="1.0">Applicative (Functional) Programming</sub>
<sub weight="1.0">Automatic Programming</sub>
<sub weight="1.0">Concurrent Programming</sub>
<sub weight="1.0">Sequential Programming</sub>
<sub weight="1.0">Object-oriented Programming</sub>
<sub weight="1.0">Logic Programming</sub>
<sub weight="1.0">Visual Programming</sub>
<sub weight="1.0">Miscellaneous</sub>
- <expansion>
<sub weight="1.0">Concurrent Programming</sub>
<sub weight="1.0">Distributed programming</sub>
<sub weight="1.0">Parallel programming</sub>
- <expansion>
<sub weight="1.0">SOFTWARE ENGINEERING</sub>
<sub weight="1.0">General</sub>
<sub weight="1.0">Requirements/Specifications</sub>
<sub weight="1.0">Design Tools and Techniques (REVISED)</sub>
<sub weight="1.0">Coding Tools and Techniques (REVISED)</sub>
<sub weight="1.0">Software/Program Verification (F.3.1) (REVISED)</sub>
<sub weight="1.0">Testing and Debugging</sub>
<sub weight="1.0">Programming Environments</sub>
<sub weight="1.0">Distribution, Maintenance, and Enhancement
<sub weight="1.0">Metrics</sub>
<sub weight="1.0">Management</sub>
<sub weight="1.0">Design</sub>
<sub weight="1.0">Software Architectures</sub>
<sub weight="1.0">Interoperability</sub>
<sub weight="1.0">Reusable Software</sub>
<sub weight="1.0">Miscellaneous</sub>

2. What are the different data definitions?

My understanding is the following:

For the <EXPANSION> and <SUB> elements  - this is simply a query expansion to
the query.

Whenever a user passes a query to MSSearch, the query analyser will access
the thesaurus and do a "look up" to see whether any of the words  mentioned
in the query map onto any of the individual XML expansion elements shown
above. If they do then the query is expanded to include any of those key
terms matched against the index.

You can see the power of this when you are looking for, say, programming; in
that the query is massively expanded. Note that this <SUB> element increases
recall (i.e. the absolute number of documents retrieved against the ideal
document subset in any given document collection.

For the <REPLACEMENT> and <PAT(TERN)> elements - this is simply doing a
synonym (with weighting) substitution to the query

This is exampled by the following XML for the query "database":

- <replacement>

<sub>Relational databases</sub>
<sub weight = "0.5"> RDBMS</sub>
<sub>Object-oriented databases</sub>
<sub weight = "0.5"> OODBMS</sub>

<sub weight="0.5">Distributed databases</sub>
<sub weight="0.5">Multimedia databases</sub>

<sub weight="0.25">Parallel databases</sub>
<sub weight="0.25">Rule-based databases</sub>
<sub weight="0.25">Textual databases</sub>


This simply means that the term "database" when entered by a user gets
replaced in the query by the sub terms - with the relevant weightings. If
the thesaurus is set carefully this type of query improves "precision" -
i.e. accuracy of documents achieved against the ideal document subset within
the document collection. However, a word of warning you have to be very
careful in setting this particular approach up.

My advice would be to go for the query expansion rather than out-and-out

3. Is there any documentation out there?

To my knowledge there is nothing written on this - although there is
supposed to be a book coming out on MS Search from MS Press.

Answer #2    Answered By: Bhoomi Chabaria     Answered On: Apr 19

There's some documentation inside the SharePoint administrator's help file.
This file is installed on any pc where SP client has been installed.
It's located in C:\winnt\help\pkmadmin.chm

Answer #3    Answered By: Richa Verma     Answered On: Apr 19

One other question, is the thesaurus  used on the client side or the
server side ?

Answer #4    Answered By: Corrine Potts     Answered On: Apr 19

Does anyone know how to dynamically change the title of a web part? I am
using VBScript.

Answer #5    Answered By: Kabeer Karkare     Answered On: Apr 19

Server side. You can add  items to the thesaurus  and it will effect what is
returned when a search is done. So for instance, if you add SPS as the pattern
and SharePoint Portal Server as the substitution, then when someone searches for
SPS, they will get responses returned for SharePoint Portal Server.

Look here:
C:\SharePoint Portal Server\Data\FTData\SharePointPortalServer\Config\tseng.xml
(or whatever the appropriate language is for your install... that one is

You can make search replacements by making entries like this:

<sub>Windows 2000</sub>

See the "Administrators Help.chm" file on the CD for more info.

Didn't find what you were looking for? Find more on Thesaurus and how it works Or get search suggestion and latest updates.