Friday, June 30, 2006

Microsoft Office Sharepoint Server 2007 Search and Search Ranking Algorithm Enhancement

The one of the biggest benefits that Sharepoint Server 2007 brings is the enhancements on search engine. Those enhancements include customizable interface of search result, customizable search scope definition, Boolean and Cross Site search, business data Catalog (BDC) search.

And one of the problems a current Sharepoint user must addresses is the ranking algorithm. The user wants not only the search returns as many document as possible but also to sort out the relevant document and display them at the top of the search result list. The ranking algorithm of the search engine shipped with SPS2003 is based on probabilistic relevance scoring, a technique developed in 70s, which called OKAPI algorithm. The ranking measurement is based on collection frequency, term frequency, document length and term position.

In the new version of Sharepoint, the Sharepoint Server 2007, it claims the ranking algorithm will be at least as good as any best-of-bread enterprise search engine in the market today. From the evaluation guild of beta 2, I found the ranking will be measured by:
  • Click distance
  • Hyperlink anchor text
  • URL surf depth
  • URL text matching
  • Automated metadata extraction
  • Automatic language detection
  • File type relevancy biasing
  • Enhanced text analysis

But I still need to see the reports coming out with comparison with SPS2003 and other search product like ultraseek, google mini …

No comments: