This blog post is Part 2/4 of blog post series that will help you to get familiar with few concepts and terminologies referred in any search technology. As this blog series is more focused towards ‘Fast Search for SharePoint’ you may see jargon relevant to this.
The following graphic gives ten thousand foot view of what I am trying to capture and explain in this blog post series.
5. Recall and Precision
The total number of results in the result set for a query. You have to find a fair balance between Recall and Precision. The results set should not be too large and should avoid noise as much as possible. If the Recall is too large or too small it will hamper Precision.
There are various techniques to improve Recall such as Synonyms, Stemming etc.
This is pretty common technique and a very obvious one. For instance, if you search for “happy” the search would also query for “joy”, “elated”, “merry” etc.
The use of Stemming is to get to the root form of a word. Stemming compares the root forms of the search terms to the documents in its content sources. For example, if the user enters “viewer” as the query, the search engine searches for “view” and returns all documents with view, viewer, viewing, preview, review etc
Corpus is Latin term for body. In the Search world, it refers to the scope of all the content sources the crawler would crawl and indexes. Following gives an example of what Corpus can include.