qparser
module¶
Parser object¶
- class whoosh.qparser.QueryParser(fieldname, schema, plugins=None, termclass=<class 'whoosh.query.terms.Term'>, phraseclass=<class 'whoosh.query.positional.Phrase'>, group=<class 'whoosh.qparser.syntax.AndGroup'>)[source]¶
A hand-written query parser built on modular plug-ins. The default configuration implements a powerful fielded query language similar to Lucene’s.
You can use the
plugins
argument when creating the object to override the default list of plug-ins, and/or useadd_plugin()
and/orremove_plugin_class()
to change the plug-ins included in the parser.>>> from whoosh import qparser >>> parser = qparser.QueryParser("content", schema) >>> parser.remove_plugin_class(qparser.WildcardPlugin) >>> parser.add_plugin(qparser.PrefixPlugin()) >>> parser.parse(u"hello there") And([Term("content", u"hello"), Term("content", u"there")])
- Parameters:
fieldname – the default field – the parser uses this as the field for any terms without an explicit field.
schema – a
whoosh.fields.Schema
object to use when parsing. The appropriate fields in the schema will be used to tokenize terms/phrases before they are turned into query objects. You can specify None for the schema to create a parser that does not analyze the text of the query, usually for testing purposes.plugins – a list of plugins to use. WhitespacePlugin is automatically included, do not put it in this list. This overrides the default list of plugins. Classes in the list will be automatically instantiated.
termclass – the query class to use for individual search terms. The default is
whoosh.query.Term
.phraseclass – the query class to use for phrases. The default is
whoosh.query.Phrase
.group – the default grouping.
AndGroup
makes terms required by default.OrGroup
makes terms optional by default.
- filterize(nodes, debug=False)[source]¶
Takes a group of nodes and runs the filters provided by the parser’s plugins.
- filters()[source]¶
Returns a priorized list of filter functions provided by the parser’s currently configured plugins.
- multitoken_query(spec, texts, fieldname, termclass, boost)[source]¶
Returns a query for multiple texts. This method implements the intention specified in the field’s
multitoken_query
attribute, which specifies what to do when strings that look like single terms to the parser turn out to yield multiple tokens when analyzed.- Parameters:
spec – a string describing how to join the text strings into a query. This is usually the value of the field’s
multitoken_query
attribute.texts – a list of token strings.
fieldname – the name of the field.
termclass – the query class to use for single terms.
boost – the original term’s boost in the query string, should be applied to the returned query object.
- parse(text, normalize=True, debug=False)[source]¶
Parses the input string and returns a
whoosh.query.Query
object/tree.- Parameters:
text – the unicode string to parse.
normalize – whether to call normalize() on the query object/tree before returning it. This should be left on unless you’re trying to debug the parser output.
- Return type:
- process(text, pos=0, debug=False)[source]¶
Returns a group of syntax nodes corresponding to the given text, tagged by the plugin Taggers and filtered by the plugin filters.
- Parameters:
text – the text to tag.
pos – the position in the text to start tagging at.
- replace_plugin(plugin)[source]¶
Removes any plugins of the class of the given plugin and then adds it. This is a convenience method to keep from having to call
remove_plugin_class
followed byadd_plugin
each time you want to reconfigure a default plugin.>>> qp = qparser.QueryParser("content", schema) >>> qp.replace_plugin(qparser.NotPlugin("(^| )-"))
- tag(text, pos=0, debug=False)[source]¶
Returns a group of syntax nodes corresponding to the given text, created by matching the Taggers provided by the parser’s plugins.
- Parameters:
text – the text to tag.
pos – the position in the text to start tagging at.
Pre-made configurations¶
The following functions return pre-configured QueryParser objects.
- whoosh.qparser.MultifieldParser(fieldnames, schema, fieldboosts=None, **kwargs)[source]¶
Returns a QueryParser configured to search in multiple fields.
Instead of assigning unfielded clauses to a default field, this parser transforms them into an OR clause that searches a list of fields. For example, if the list of multi-fields is “f1”, “f2” and the query string is “hello there”, the class will parse “(f1:hello OR f2:hello) (f1:there OR f2:there)”. This is very useful when you have two textual fields (e.g. “title” and “content”) you want to search by default.
- Parameters:
fieldnames – a list of field names to search.
fieldboosts – an optional dictionary mapping field names to boosts.
- whoosh.qparser.SimpleParser(fieldname, schema, **kwargs)[source]¶
Returns a QueryParser configured to support only +, -, and phrase syntax.
- whoosh.qparser.DisMaxParser(fieldboosts, schema, tiebreak=0.0, **kwargs)[source]¶
Returns a QueryParser configured to support only +, -, and phrase syntax, and which converts individual terms into DisjunctionMax queries across a set of fields.
- Parameters:
fieldboosts – a dictionary mapping field names to boosts.
Plug-ins¶
- class whoosh.qparser.Plugin[source]¶
Base class for parser plugins.
- class whoosh.qparser.SingleQuotePlugin(expr=None)[source]¶
Adds the ability to specify single “terms” containing spaces by enclosing them in single quotes.
- class whoosh.qparser.PrefixPlugin(expr=None)[source]¶
Adds the ability to specify prefix queries by ending a term with an asterisk.
This plugin is useful if you want the user to be able to create prefix but not wildcard queries (for performance reasons). If you are including the wildcard plugin, you should not include this plugin as well.
>>> qp = qparser.QueryParser("content", myschema) >>> qp.remove_plugin_class(qparser.WildcardPlugin) >>> qp.add_plugin(qparser.PrefixPlugin()) >>> q = qp.parse("pre*")
- class whoosh.qparser.RegexPlugin(expr=None)[source]¶
Adds the ability to specify regular expression term queries.
The default syntax for a regular expression term is
r"termexpr"
.>>> qp = qparser.QueryParser("content", myschema) >>> qp.add_plugin(qparser.RegexPlugin()) >>> q = qp.parse('foo title:r"bar+"')
- class whoosh.qparser.BoostPlugin(expr=None)[source]¶
Adds the ability to boost clauses of the query using the circumflex.
>>> qp = qparser.QueryParser("content", myschema) >>> q = qp.parse("hello there^2")
- class whoosh.qparser.GroupPlugin(openexpr='[(]', closeexpr='[)]')[source]¶
Adds the ability to group clauses using parentheses.
- class whoosh.qparser.FieldsPlugin(expr='(?P<text>\\w+|[*]):', remove_unknown=True)[source]¶
Adds the ability to specify the field of a clause.
- Parameters:
expr – the regular expression to use for tagging fields.
remove_unknown – if True, converts field specifications for fields that aren’t in the schema into regular text.
- class whoosh.qparser.PhrasePlugin(expr='"(?P<text>.*?)"(~(?P<slop>[1-9][0-9]*))?')[source]¶
Adds the ability to specify phrase queries inside double quotes.
- class whoosh.qparser.RangePlugin(expr=None, excl_start='{', excl_end='}')[source]¶
Adds the ability to specify term ranges.
- class whoosh.qparser.OperatorsPlugin(ops=None, clean=False, And='(?<=\\s)AND(?=\\s)', Or='(?<=\\s)OR(?=\\s)', AndNot='(?<=\\s)ANDNOT(?=\\s)', AndMaybe='(?<=\\s)ANDMAYBE(?=\\s)', Not='(^|(?<=(\\s|[()])))NOT(?=\\s)', Require='(^|(?<=\\s))REQUIRE(?=\\s)')[source]¶
By default, adds the AND, OR, ANDNOT, ANDMAYBE, and NOT operators to the parser syntax. This plugin scans the token stream for subclasses of
Operator
and calls theirOperator.make_group()
methods to allow them to manipulate the stream.There are two levels of configuration available.
The first level is to change the regular expressions of the default operators, using the
And
,Or
,AndNot
,AndMaybe
, and/orNot
keyword arguments. The keyword value can be a pattern string or a compiled expression, or None to remove the operator:qp = qparser.QueryParser("content", schema) cp = qparser.OperatorsPlugin(And="&", Or="\|", AndNot="&!", AndMaybe="&~", Not=None) qp.replace_plugin(cp)
You can also specify a list of
(OpTagger, priority)
pairs as the first argument to the initializer to use custom operators. See Creating custom operators for more information on this.
- class whoosh.qparser.PlusMinusPlugin(plusexpr='\\+', minusexpr='-')[source]¶
Adds the ability to use + and - in a flat OR query to specify required and prohibited terms.
This is the basis for the parser configuration returned by
SimpleParser()
.
- class whoosh.qparser.GtLtPlugin(expr=None)[source]¶
Allows the user to use greater than/less than symbols to create range queries:
a:>100 b:<=z c:>=-1.4 d:<mz
This is the equivalent of:
a:{100 to] b:[to z] c:[-1.4 to] d:[to mz}
The plugin recognizes
>
,<
,>=
,<=
,=>
, and=<
after a field specifier. The field specifier is required. You cannot do the following:>100
This plugin requires the FieldsPlugin and RangePlugin to work.
- class whoosh.qparser.MultifieldPlugin(fieldnames, fieldboosts=None, group=<class 'whoosh.qparser.syntax.OrGroup'>)[source]¶
Converts any unfielded terms into OR clauses that search for the term in a specified list of fields.
>>> qp = qparser.QueryParser(None, myschema) >>> qp.add_plugin(qparser.MultifieldPlugin(["a", "b"]) >>> qp.parse("alfa c:bravo") And([Or([Term("a", "alfa"), Term("b", "alfa")]), Term("c", "bravo")])
This plugin is the basis for the
MultifieldParser
.- Parameters:
fieldnames – a list of fields to search.
fieldboosts – an optional dictionary mapping field names to a boost to use for that field.
group – the group to use to relate the fielded terms to each other.
- class whoosh.qparser.FieldAliasPlugin(fieldmap)[source]¶
Adds the ability to use “aliases” of fields in the query string.
This plugin is useful for allowing users of languages that can’t be represented in ASCII to use field names in their own language, and translate them into the “real” field names, which must be valid Python identifiers.
>>> # Allow users to use 'body' or 'text' to refer to the 'content' field >>> parser.add_plugin(FieldAliasPlugin({"content": ["body", "text"]})) >>> parser.parse("text:hello") Term("content", "hello")
- class whoosh.qparser.CopyFieldPlugin(map, group=<class 'whoosh.qparser.syntax.OrGroup'>, mirror=False)[source]¶
Looks for basic syntax nodes (terms, prefixes, wildcards, phrases, etc.) occurring in a certain field and replaces it with a group (by default OR) containing the original token and the token copied to a new field.
For example, the query:
hello name:matt
could be automatically converted by
CopyFieldPlugin({"name", "author"})
to:hello (name:matt OR author:matt)
This is useful where one field was indexed with a differently-analyzed copy of another, and you want the query to search both fields.
You can specify a different group type with the
group
keyword. You can also specifygroup=None
, in which case the copied node is inserted “inline” next to the original, instead of in a new group:hello name:matt author:matt
- Parameters:
map – a dictionary mapping names of fields to copy to the names of the destination fields.
group – the type of group to create in place of the original token. You can specify
group=None
to put the copied node “inline” next to the original node instead of in a new group.two_way – if True, the plugin copies both ways, so if the user specifies a query in the ‘toname’ field, it will be copied to the ‘fromname’ field.
Syntax node objects¶
Base nodes¶
- class whoosh.qparser.SyntaxNode[source]¶
Base class for nodes that make up the abstract syntax tree (AST) of a parsed user query string. The AST is an intermediate step, generated from the query string, then converted into a
whoosh.query.Query
tree by calling thequery()
method on the nodes.Instances have the following required attributes:
has_fieldname
True if this node has a
fieldname
attribute.has_text
True if this node has a
text
attributehas_boost
True if this node has a
boost
attribute.startchar
The character position in the original text at which this node started.
endchar
The character position in the original text at which this node ended.
- query(parser)[source]¶
Returns a
whoosh.query.Query
instance corresponding to this syntax tree node.
- r()[source]¶
Returns a basic representation of this node. The base class’s
__repr__
method calls this, then does the extra busy work of adding fieldname and boost where appropriate.
- set_boost(boost)[source]¶
Sets the boost associated with this node.
For nodes that don’t have a boost, this is a no-op.
Nodes¶
- class whoosh.qparser.FieldnameNode(fieldname, original)[source]¶
Abstract syntax tree node for field name assignments.
- class whoosh.qparser.TextNode(text)[source]¶
Intermediate base class for basic nodes that search for text, such as term queries, wildcards, prefixes, etc.
Instances have the following attributes:
qclass
If a subclass does not override
query()
, the base class will use this class to construct the query.tokenize
If True and the subclass does not override
query()
, the node’s text will be tokenized before constructing the queryremovestops
If True and the subclass does not override
query()
, and the field’s analyzer has a stop word filter, stop words will be removed from the text before constructing the query.
Group nodes¶
- class whoosh.qparser.GroupNode(nodes=None, boost=1.0, **kwargs)[source]¶
Base class for abstract syntax tree node types that group together sub-nodes.
Instances have the following attributes:
merging
True if side-by-side instances of this group can be merged into a single group.
qclass
If a subclass doesn’t override
query()
, the base class will simply wrap this class around the queries returned by the subnodes.
This class implements a number of list methods for operating on the subnodes.
Operators¶
- class whoosh.qparser.Operator(text, grouptype, leftassoc=True)[source]¶
Base class for PrefixOperator, PostfixOperator, and InfixOperator.
Operators work by moving the nodes they apply to (e.g. for prefix operator, the previous node, for infix operator, the nodes on either side, etc.) into a group node. The group provides the code for what to do with the nodes.
- Parameters:
text – the text of the operator in the query string.
grouptype – the type of group to create in place of the operator and the node(s) it operates on.
leftassoc – for infix opeators, whether the operator is left associative. use
leftassoc=False
for right-associative infix operators.
- class whoosh.qparser.PrefixOperator(text, grouptype, leftassoc=True)[source]¶
- Parameters:
text – the text of the operator in the query string.
grouptype – the type of group to create in place of the operator and the node(s) it operates on.
leftassoc – for infix opeators, whether the operator is left associative. use
leftassoc=False
for right-associative infix operators.
- class whoosh.qparser.PostfixOperator(text, grouptype, leftassoc=True)[source]¶
- Parameters:
text – the text of the operator in the query string.
grouptype – the type of group to create in place of the operator and the node(s) it operates on.
leftassoc – for infix opeators, whether the operator is left associative. use
leftassoc=False
for right-associative infix operators.
- class whoosh.qparser.InfixOperator(text, grouptype, leftassoc=True)[source]¶
- Parameters:
text – the text of the operator in the query string.
grouptype – the type of group to create in place of the operator and the node(s) it operates on.
leftassoc – for infix opeators, whether the operator is left associative. use
leftassoc=False
for right-associative infix operators.