spelling
module¶
See correcting errors in user queries.
This module contains helper functions for correcting typos in user queries.
Corrector objects¶
- class whoosh.spelling.Corrector[source]¶
Base class for spelling correction objects. Concrete sub-classes should implement the
_suggestions
method.- suggest(text, limit=5, maxdist=2, prefix=0)[source]¶
- Parameters:
text – the text to check. This word will not be added to the suggestions, even if it appears in the word graph.
limit – only return up to this many suggestions. If there are not enough terms in the field within
maxdist
of the given word, the returned list will be shorter than this number.maxdist – the largest edit distance from the given word to look at. Values higher than 2 are not very effective or efficient.
prefix – require suggestions to share a prefix of this length with the given word. This is often justifiable since most misspellings do not involve the first letter of the word. Using a prefix dramatically decreases the time it takes to generate the list of words.
QueryCorrector objects¶
- class whoosh.spelling.QueryCorrector(fieldname)[source]¶
Base class for objects that correct words in a user query.
- correct_query(q, qstring)[source]¶
Returns a
Correction
object representing the corrected form of the given query.- Parameters:
q – the original
whoosh.query.Query
tree to be corrected.qstring – the original user query. This may be None if the original query string is not available, in which case the
Correction.string
attribute will also be None.
- Return type:
- class whoosh.spelling.SimpleQueryCorrector(correctors, terms, aliases=None, prefix=0, maxdist=2)[source]¶
A simple query corrector based on a mapping of field names to
Corrector
objects, and a list of("fieldname", "text")
tuples to correct. And terms in the query that appear in list of term tuples are corrected using the appropriate corrector.- Parameters:
correctors – a dictionary mapping field names to
Corrector
objects.terms – a sequence of
("fieldname", "text")
tuples representing terms to be corrected.aliases – a dictionary mapping field names in the query to field names for spelling suggestions.
prefix – suggested replacement words must share this number of initial characters with the original word. Increasing this even to just
1
can dramatically speed up suggestions, and may be justifiable since spellling mistakes rarely involve the first letter of a word.maxdist – the maximum number of “edits” (insertions, deletions, subsitutions, or transpositions of letters) allowed between the original word and any suggestion. Values higher than
2
may be slow.
- class whoosh.spelling.Correction(q, qstring, corr_q, tokens)[source]¶
Represents the corrected version of a user query string. Has the following attributes:
query
The corrected
whoosh.query.Query
object.string
The corrected user query string.
original_query
The original
whoosh.query.Query
object that was corrected.original_string
The original user query string.
tokens
A list of token objects representing the corrected words.
You can also use the
Correction.format_string()
method to reformat the corrected query string using awhoosh.highlight.Formatter
class. For example, to display the corrected query string as HTML with the changed words emphasized:from whoosh import highlight correction = mysearcher.correct_query(q, qstring) hf = highlight.HtmlFormatter(classname="change") html = correction.format_string(hf)