scoring
module¶
This module contains classes for scoring (and sorting) search results.
Base classes¶
- class whoosh.scoring.WeightingModel[source]¶
Abstract base class for scoring models. A WeightingModel object provides a method,
scorer
, which returns an instance ofwhoosh.scoring.Scorer
.Basically, WeightingModel objects store the configuration information for the model (for example, the values of B and K1 in the BM25F model), and then creates a scorer instance based on additional run-time information (the searcher, the fieldname, and term text) to do the actual scoring.
- final(searcher, docnum, score)[source]¶
Returns a final score for each document. You can use this method in subclasses to apply document-level adjustments to the score, for example using the value of stored field to influence the score (although that would be slow).
WeightingModel sub-classes that use
final()
should have the attributeuse_final
set toTrue
.- Parameters:
searcher –
whoosh.searching.Searcher
for the index.docnum – the doc number of the document being scored.
score – the document’s accumulated term score.
- Return type:
- class whoosh.scoring.BaseScorer[source]¶
Base class for “scorer” implementations. A scorer provides a method for scoring a document, and sometimes methods for rating the “quality” of a document and a matcher’s current “block”, to implement quality-based optimizations.
Scorer objects are created by WeightingModel objects. Basically, WeightingModel objects store the configuration information for the model (for example, the values of B and K1 in the BM25F model), and then creates a scorer instance.
- block_quality(matcher)[source]¶
Returns the maximum limit on the possible score the matcher can give in its current “block” (whatever concept of “block” the backend might use). This can be an estimate and not necessarily the actual maximum score possible, but it must never be less than the actual maximum score.
If this score is less than the minimum score required to make the “top N” results, then we can tell the matcher to skip ahead to another block with better “quality”.
- class whoosh.scoring.WeightScorer(maxweight)[source]¶
A scorer that simply returns the weight as the score. This is useful for more complex weighting models to return when they are asked for a scorer for fields that aren’t scorable (don’t store field lengths).
- class whoosh.scoring.WeightLengthScorer[source]¶
Base class for scorers where the only per-document variables are term weight and field length.
Subclasses should override the
_score(weight, length)
method to return the score for a document with the given weight and length, and call thesetup()
method at the end of the initializer to set up common attributes.
Scoring algorithm classes¶
- class whoosh.scoring.BM25F(B=0.75, K1=1.2, **kwargs)[source]¶
Implements the BM25F scoring algorithm.
>>> from whoosh import scoring >>> # Set a custom B value for the "content" field >>> w = scoring.BM25F(B=0.75, content_B=1.0, K1=1.5)
- Parameters:
B – free parameter, see the BM25 literature. Keyword arguments of the form
fieldname_B
(for example,body_B
) set field- specific values for B.K1 – free parameter, see the BM25 literature.
Scoring utility classes¶
- class whoosh.scoring.FunctionWeighting(fn)[source]¶
Uses a supplied function to do the scoring. For simple scoring functions and experiments this may be simpler to use than writing a full weighting model class and scorer class.
The function should accept the arguments
searcher, fieldname, text, matcher
.For example, the following function will score documents based on the earliest position of the query term in the document:
def pos_score_fn(searcher, fieldname, text, matcher): poses = matcher.value_as("positions") return 1.0 / (poses[0] + 1) pos_weighting = scoring.FunctionWeighting(pos_score_fn) with myindex.searcher(weighting=pos_weighting) as s: results = s.search(q)
Note that the searcher passed to the function may be a per-segment searcher for performance reasons. If you want to get global statistics inside the function, you should use
searcher.get_parent()
to get the top-level searcher. (However, if you are using global statistics, you should probably write a real model/scorer combo so you can cache them on the object.)
- class whoosh.scoring.MultiWeighting(default, **weightings)[source]¶
Chooses from multiple scoring algorithms based on the field.
The only non-keyword argument specifies the default
Weighting
instance to use. Keyword arguments specify Weighting instances for specific fields.For example, to use
BM25
for most fields, butFrequency
for theid
field andTF_IDF
for thekeys
field:mw = MultiWeighting(BM25(), id=Frequency(), keys=TF_IDF())
- Parameters:
default – the Weighting instance to use for fields not specified in the keyword arguments.