qparser module

Parser object

class whoosh.qparser.QueryParser(fieldname, schema, plugins=None, termclass=<class 'whoosh.query.terms.Term'>, phraseclass=<class 'whoosh.query.positional.Phrase'>, group=<class 'whoosh.qparser.syntax.AndGroup'>)[source]

A hand-written query parser built on modular plug-ins. The default configuration implements a powerful fielded query language similar to Lucene’s.

You can use the plugins argument when creating the object to override the default list of plug-ins, and/or use add_plugin() and/or remove_plugin_class() to change the plug-ins included in the parser.

>>> from whoosh import qparser
>>> parser = qparser.QueryParser("content", schema)
>>> parser.remove_plugin_class(qparser.WildcardPlugin)
>>> parser.add_plugin(qparser.PrefixPlugin())
>>> parser.parse(u"hello there")
And([Term("content", u"hello"), Term("content", u"there")])
Parameters:
  • fieldname – the default field – the parser uses this as the field for any terms without an explicit field.

  • schema – a whoosh.fields.Schema object to use when parsing. The appropriate fields in the schema will be used to tokenize terms/phrases before they are turned into query objects. You can specify None for the schema to create a parser that does not analyze the text of the query, usually for testing purposes.

  • plugins – a list of plugins to use. WhitespacePlugin is automatically included, do not put it in this list. This overrides the default list of plugins. Classes in the list will be automatically instantiated.

  • termclass – the query class to use for individual search terms. The default is whoosh.query.Term.

  • phraseclass – the query class to use for phrases. The default is whoosh.query.Phrase.

  • group – the default grouping. AndGroup makes terms required by default. OrGroup makes terms optional by default.

add_plugin(pin)[source]

Adds the given plugin to the list of plugins in this parser.

add_plugins(pins)[source]

Adds the given list of plugins to the list of plugins in this parser.

default_set()[source]

Returns the default list of plugins to use.

filterize(nodes, debug=False)[source]

Takes a group of nodes and runs the filters provided by the parser’s plugins.

filters()[source]

Returns a priorized list of filter functions provided by the parser’s currently configured plugins.

multitoken_query(spec, texts, fieldname, termclass, boost)[source]

Returns a query for multiple texts. This method implements the intention specified in the field’s multitoken_query attribute, which specifies what to do when strings that look like single terms to the parser turn out to yield multiple tokens when analyzed.

Parameters:
  • spec – a string describing how to join the text strings into a query. This is usually the value of the field’s multitoken_query attribute.

  • texts – a list of token strings.

  • fieldname – the name of the field.

  • termclass – the query class to use for single terms.

  • boost – the original term’s boost in the query string, should be applied to the returned query object.

parse(text, normalize=True, debug=False)[source]

Parses the input string and returns a whoosh.query.Query object/tree.

Parameters:
  • text – the unicode string to parse.

  • normalize – whether to call normalize() on the query object/tree before returning it. This should be left on unless you’re trying to debug the parser output.

Return type:

whoosh.query.Query

process(text, pos=0, debug=False)[source]

Returns a group of syntax nodes corresponding to the given text, tagged by the plugin Taggers and filtered by the plugin filters.

Parameters:
  • text – the text to tag.

  • pos – the position in the text to start tagging at.

remove_plugin(pi)[source]

Removes the given plugin object from the list of plugins in this parser.

remove_plugin_class(cls)[source]

Removes any plugins of the given class from this parser.

replace_plugin(plugin)[source]

Removes any plugins of the class of the given plugin and then adds it. This is a convenience method to keep from having to call remove_plugin_class followed by add_plugin each time you want to reconfigure a default plugin.

>>> qp = qparser.QueryParser("content", schema)
>>> qp.replace_plugin(qparser.NotPlugin("(^| )-"))
tag(text, pos=0, debug=False)[source]

Returns a group of syntax nodes corresponding to the given text, created by matching the Taggers provided by the parser’s plugins.

Parameters:
  • text – the text to tag.

  • pos – the position in the text to start tagging at.

taggers()[source]

Returns a priorized list of tagger objects provided by the parser’s currently configured plugins.

term_query(fieldname, text, termclass, boost=1.0, tokenize=True, removestops=True)[source]

Returns the appropriate query object for a single term in the query string.

Pre-made configurations

The following functions return pre-configured QueryParser objects.

whoosh.qparser.MultifieldParser(fieldnames, schema, fieldboosts=None, **kwargs)[source]

Returns a QueryParser configured to search in multiple fields.

Instead of assigning unfielded clauses to a default field, this parser transforms them into an OR clause that searches a list of fields. For example, if the list of multi-fields is “f1”, “f2” and the query string is “hello there”, the class will parse “(f1:hello OR f2:hello) (f1:there OR f2:there)”. This is very useful when you have two textual fields (e.g. “title” and “content”) you want to search by default.

Parameters:
  • fieldnames – a list of field names to search.

  • fieldboosts – an optional dictionary mapping field names to boosts.

whoosh.qparser.SimpleParser(fieldname, schema, **kwargs)[source]

Returns a QueryParser configured to support only +, -, and phrase syntax.

whoosh.qparser.DisMaxParser(fieldboosts, schema, tiebreak=0.0, **kwargs)[source]

Returns a QueryParser configured to support only +, -, and phrase syntax, and which converts individual terms into DisjunctionMax queries across a set of fields.

Parameters:

fieldboosts – a dictionary mapping field names to boosts.

Plug-ins

class whoosh.qparser.Plugin[source]

Base class for parser plugins.

filters(parser)[source]

Should return a list of (filter_function, priority) tuples to add to parser. Lower priority numbers run first.

Filter functions will be called with (parser, groupnode) and should return a group node.

taggers(parser)[source]

Should return a list of (Tagger, priority) tuples to add to the syntax the parser understands. Lower priorities run first.

class whoosh.qparser.SingleQuotePlugin(expr=None)[source]

Adds the ability to specify single “terms” containing spaces by enclosing them in single quotes.

class whoosh.qparser.PrefixPlugin(expr=None)[source]

Adds the ability to specify prefix queries by ending a term with an asterisk.

This plugin is useful if you want the user to be able to create prefix but not wildcard queries (for performance reasons). If you are including the wildcard plugin, you should not include this plugin as well.

>>> qp = qparser.QueryParser("content", myschema)
>>> qp.remove_plugin_class(qparser.WildcardPlugin)
>>> qp.add_plugin(qparser.PrefixPlugin())
>>> q = qp.parse("pre*")
class whoosh.qparser.WildcardPlugin(expr=None)[source]
class whoosh.qparser.RegexPlugin(expr=None)[source]

Adds the ability to specify regular expression term queries.

The default syntax for a regular expression term is r"termexpr".

>>> qp = qparser.QueryParser("content", myschema)
>>> qp.add_plugin(qparser.RegexPlugin())
>>> q = qp.parse('foo title:r"bar+"')
class whoosh.qparser.BoostPlugin(expr=None)[source]

Adds the ability to boost clauses of the query using the circumflex.

>>> qp = qparser.QueryParser("content", myschema)
>>> q = qp.parse("hello there^2")
class whoosh.qparser.GroupPlugin(openexpr='[(]', closeexpr='[)]')[source]

Adds the ability to group clauses using parentheses.

class whoosh.qparser.EveryPlugin(expr=None)[source]
class whoosh.qparser.FieldsPlugin(expr='(?P<text>\\w+|[*]):', remove_unknown=True)[source]

Adds the ability to specify the field of a clause.

Parameters:
  • expr – the regular expression to use for tagging fields.

  • remove_unknown – if True, converts field specifications for fields that aren’t in the schema into regular text.

class whoosh.qparser.PhrasePlugin(expr='"(?P<text>.*?)"(~(?P<slop>[1-9][0-9]*))?')[source]

Adds the ability to specify phrase queries inside double quotes.

class whoosh.qparser.RangePlugin(expr=None, excl_start='{', excl_end='}')[source]

Adds the ability to specify term ranges.

class whoosh.qparser.OperatorsPlugin(ops=None, clean=False, And='(?<=\\s)AND(?=\\s)', Or='(?<=\\s)OR(?=\\s)', AndNot='(?<=\\s)ANDNOT(?=\\s)', AndMaybe='(?<=\\s)ANDMAYBE(?=\\s)', Not='(^|(?<=(\\s|[()])))NOT(?=\\s)', Require='(^|(?<=\\s))REQUIRE(?=\\s)')[source]

By default, adds the AND, OR, ANDNOT, ANDMAYBE, and NOT operators to the parser syntax. This plugin scans the token stream for subclasses of Operator and calls their Operator.make_group() methods to allow them to manipulate the stream.

There are two levels of configuration available.

The first level is to change the regular expressions of the default operators, using the And, Or, AndNot, AndMaybe, and/or Not keyword arguments. The keyword value can be a pattern string or a compiled expression, or None to remove the operator:

qp = qparser.QueryParser("content", schema)
cp = qparser.OperatorsPlugin(And="&", Or="\|", AndNot="&!",
                             AndMaybe="&~", Not=None)
qp.replace_plugin(cp)

You can also specify a list of (OpTagger, priority) pairs as the first argument to the initializer to use custom operators. See Creating custom operators for more information on this.

class whoosh.qparser.PlusMinusPlugin(plusexpr='\\+', minusexpr='-')[source]

Adds the ability to use + and - in a flat OR query to specify required and prohibited terms.

This is the basis for the parser configuration returned by SimpleParser().

class whoosh.qparser.GtLtPlugin(expr=None)[source]

Allows the user to use greater than/less than symbols to create range queries:

a:>100 b:<=z c:>=-1.4 d:<mz

This is the equivalent of:

a:{100 to] b:[to z] c:[-1.4 to] d:[to mz}

The plugin recognizes >, <, >=, <=, =>, and =< after a field specifier. The field specifier is required. You cannot do the following:

>100

This plugin requires the FieldsPlugin and RangePlugin to work.

class whoosh.qparser.MultifieldPlugin(fieldnames, fieldboosts=None, group=<class 'whoosh.qparser.syntax.OrGroup'>)[source]

Converts any unfielded terms into OR clauses that search for the term in a specified list of fields.

>>> qp = qparser.QueryParser(None, myschema)
>>> qp.add_plugin(qparser.MultifieldPlugin(["a", "b"])
>>> qp.parse("alfa c:bravo")
And([Or([Term("a", "alfa"), Term("b", "alfa")]), Term("c", "bravo")])

This plugin is the basis for the MultifieldParser.

Parameters:
  • fieldnames – a list of fields to search.

  • fieldboosts – an optional dictionary mapping field names to a boost to use for that field.

  • group – the group to use to relate the fielded terms to each other.

class whoosh.qparser.FieldAliasPlugin(fieldmap)[source]

Adds the ability to use “aliases” of fields in the query string.

This plugin is useful for allowing users of languages that can’t be represented in ASCII to use field names in their own language, and translate them into the “real” field names, which must be valid Python identifiers.

>>> # Allow users to use 'body' or 'text' to refer to the 'content' field
>>> parser.add_plugin(FieldAliasPlugin({"content": ["body", "text"]}))
>>> parser.parse("text:hello")
Term("content", "hello")
class whoosh.qparser.CopyFieldPlugin(map, group=<class 'whoosh.qparser.syntax.OrGroup'>, mirror=False)[source]

Looks for basic syntax nodes (terms, prefixes, wildcards, phrases, etc.) occurring in a certain field and replaces it with a group (by default OR) containing the original token and the token copied to a new field.

For example, the query:

hello name:matt

could be automatically converted by CopyFieldPlugin({"name", "author"}) to:

hello (name:matt OR author:matt)

This is useful where one field was indexed with a differently-analyzed copy of another, and you want the query to search both fields.

You can specify a different group type with the group keyword. You can also specify group=None, in which case the copied node is inserted “inline” next to the original, instead of in a new group:

hello name:matt author:matt
Parameters:
  • map – a dictionary mapping names of fields to copy to the names of the destination fields.

  • group – the type of group to create in place of the original token. You can specify group=None to put the copied node “inline” next to the original node instead of in a new group.

  • two_way – if True, the plugin copies both ways, so if the user specifies a query in the ‘toname’ field, it will be copied to the ‘fromname’ field.

Syntax node objects

Base nodes

class whoosh.qparser.SyntaxNode[source]

Base class for nodes that make up the abstract syntax tree (AST) of a parsed user query string. The AST is an intermediate step, generated from the query string, then converted into a whoosh.query.Query tree by calling the query() method on the nodes.

Instances have the following required attributes:

has_fieldname

True if this node has a fieldname attribute.

has_text

True if this node has a text attribute

has_boost

True if this node has a boost attribute.

startchar

The character position in the original text at which this node started.

endchar

The character position in the original text at which this node ended.

is_ws()[source]

Returns True if this node is ignorable whitespace.

query(parser)[source]

Returns a whoosh.query.Query instance corresponding to this syntax tree node.

r()[source]

Returns a basic representation of this node. The base class’s __repr__ method calls this, then does the extra busy work of adding fieldname and boost where appropriate.

set_boost(boost)[source]

Sets the boost associated with this node.

For nodes that don’t have a boost, this is a no-op.

set_fieldname(name, override=False)[source]

Sets the fieldname associated with this node. If override is False (the default), the fieldname will only be replaced if this node does not already have a fieldname set.

For nodes that don’t have a fieldname, this is a no-op.

set_range(startchar, endchar)[source]

Sets the character range associated with this node.

Nodes

class whoosh.qparser.FieldnameNode(fieldname, original)[source]

Abstract syntax tree node for field name assignments.

class whoosh.qparser.TextNode(text)[source]

Intermediate base class for basic nodes that search for text, such as term queries, wildcards, prefixes, etc.

Instances have the following attributes:

qclass

If a subclass does not override query(), the base class will use this class to construct the query.

tokenize

If True and the subclass does not override query(), the node’s text will be tokenized before constructing the query

removestops

If True and the subclass does not override query(), and the field’s analyzer has a stop word filter, stop words will be removed from the text before constructing the query.

class whoosh.qparser.WordNode(text)[source]

Syntax node for term queries.

class whoosh.qparser.MarkerNode[source]

Base class for nodes that only exist to mark places in the tree.

Group nodes

class whoosh.qparser.GroupNode(nodes=None, boost=1.0, **kwargs)[source]

Base class for abstract syntax tree node types that group together sub-nodes.

Instances have the following attributes:

merging

True if side-by-side instances of this group can be merged into a single group.

qclass

If a subclass doesn’t override query(), the base class will simply wrap this class around the queries returned by the subnodes.

This class implements a number of list methods for operating on the subnodes.

class whoosh.qparser.BinaryGroup(nodes=None, boost=1.0, **kwargs)[source]

Intermediate base class for group nodes that have two subnodes and whose qclass initializer takes two arguments instead of a list.

class whoosh.qparser.ErrorNode(message, node=None)[source]
class whoosh.qparser.AndGroup(nodes=None, boost=1.0, **kwargs)[source]
class whoosh.qparser.OrGroup(nodes=None, boost=1.0, **kwargs)[source]
class whoosh.qparser.AndNotGroup(nodes=None, boost=1.0, **kwargs)[source]
class whoosh.qparser.AndMaybeGroup(nodes=None, boost=1.0, **kwargs)[source]
class whoosh.qparser.DisMaxGroup(nodes=None, boost=1.0, **kwargs)[source]
class whoosh.qparser.RequireGroup(nodes=None, boost=1.0, **kwargs)[source]
class whoosh.qparser.NotGroup(nodes=None, boost=1.0, **kwargs)[source]

Operators

class whoosh.qparser.Operator(text, grouptype, leftassoc=True)[source]

Base class for PrefixOperator, PostfixOperator, and InfixOperator.

Operators work by moving the nodes they apply to (e.g. for prefix operator, the previous node, for infix operator, the nodes on either side, etc.) into a group node. The group provides the code for what to do with the nodes.

Parameters:
  • text – the text of the operator in the query string.

  • grouptype – the type of group to create in place of the operator and the node(s) it operates on.

  • leftassoc – for infix opeators, whether the operator is left associative. use leftassoc=False for right-associative infix operators.

class whoosh.qparser.PrefixOperator(text, grouptype, leftassoc=True)[source]
Parameters:
  • text – the text of the operator in the query string.

  • grouptype – the type of group to create in place of the operator and the node(s) it operates on.

  • leftassoc – for infix opeators, whether the operator is left associative. use leftassoc=False for right-associative infix operators.

class whoosh.qparser.PostfixOperator(text, grouptype, leftassoc=True)[source]
Parameters:
  • text – the text of the operator in the query string.

  • grouptype – the type of group to create in place of the operator and the node(s) it operates on.

  • leftassoc – for infix opeators, whether the operator is left associative. use leftassoc=False for right-associative infix operators.

class whoosh.qparser.InfixOperator(text, grouptype, leftassoc=True)[source]
Parameters:
  • text – the text of the operator in the query string.

  • grouptype – the type of group to create in place of the operator and the node(s) it operates on.

  • leftassoc – for infix opeators, whether the operator is left associative. use leftassoc=False for right-associative infix operators.