Solr Search Relevancy

In this document, we'll introduce the basic concepts of how Lucene/Solr ranks documents, as well as how to tune the way Solr ranks and returns search results.

Lucene scoring model

To be adept at tuning search relevancy, it helps to understand the Lucene scoring algorithm, also known as the tf.idf model. This scoring model involves a number of scoring factors:

tf - Term Frequency. The frequency with which a term appears in a document. Given a search query, the higher the term frequency, the higher the document score.

idf - Inverse Document Frequency. The rarer a term is across all documents in the index, the higher it's contribution to the score.

coord - Coordination Factor. The more query terms that are found in a document, the higher it's score.

fieldNorm - Field length. The more words that a field contains, the lower it's score. This factor penalizes documents with longer field values.

The exact scoring formula that brings these factors together can be found at http://lucene.apache.org/java/3_5_0/api/core/org/apache/lucene/search/Similarity.html

Lucene scores cannot be meaningfully compared across queries, or even with the same query but with a different index. Scores are also always normalized so that they fall between 0 and 1.0.

Boosts

In addition to the scoring factors mentioned above, the primary method of modifying document scores is by boosting.

There are 2 kinds of boosts. Index-time and Query-time boosts.

Index-time boosts are applied when adding documents, and apply to the entire document or to specific fields.

Query-time boosts are applied when constructing a search query, and apply to specific fields.

Query boosts are applied by appending the caret character ^ followed by a positive number to query clauses.

title:foo OR (title:foo AND title:bar)^2.0 OR title:"foo bar"^10

Negative boosts

Whilst Lucene allows negative boosts, Solr does not.

The only way to meaningfully perform a negative boost, is by applying a positive boost to a negative query. For example:

(*:* -title:foo)^2.0

This boosts all documents which don't have "foo" in the title by 2.0, thereby effectively applying a down boost to documents which do.

Function queries

Solr provides another way of boosting documents via function queries.

See http://wiki.apache.org/solr/FunctionQuery for a list of function queries and how to apply them to your query.