query module

See also :mod:` whoosh.qparser` which contains code for parsing user queries into query objects.

Base classes

The following abstract base classes are subclassed to create the “real” query operations.

class whoosh.query.Query[source]

Abstract base class for all queries.

Note that this base class implements __or__, __and__, and __sub__ to allow slightly more convenient composition of query objects:

>>> Term("content", u"a") | Term("content", u"b")
Or([Term("content", u"a"), Term("content", u"b")])

>>> Term("content", u"a") & Term("content", u"b")
And([Term("content", u"a"), Term("content", u"b")])

>>> Term("content", u"a") - Term("content", u"b")
And([Term("content", u"a"), Not(Term("content", u"b"))])
accept(fn)[source]

Applies the given function to this query’s subqueries (if any) and then to this query itself:

def boost_phrases(q):
    if isintance(q, Phrase):
        q.boost *= 2.0
    return q

myquery = myquery.accept(boost_phrases)

This method automatically creates copies of the nodes in the original tree before passing them to your function, so your function can change attributes on nodes without altering the original tree.

This method is less flexible than using Query.apply() (in fact it’s implemented using that method) but is often more straightforward.

all_terms(phrases=True)[source]

Returns a set of all terms in this query tree.

This method exists for backwards-compatibility. Use iter_all_terms() instead.

Parameters:

phrases – Whether to add words found in Phrase queries.

Return type:

set

all_tokens(boost=1.0)[source]

Returns an iterator of analysis.Token objects corresponding to all terms in this query tree. The Token objects will have the fieldname, text, and boost attributes set. If the query was built by the query parser, they Token objects will also have startchar and endchar attributes indexing into the original user query.

apply(fn)[source]

If this query has children, calls the given function on each child and returns a new copy of this node with the new children returned by the function. If this is a leaf node, simply returns this object.

This is useful for writing functions that transform a query tree. For example, this function changes all Term objects in a query tree into Variations objects:

def term2var(q):
    if isinstance(q, Term):
        return Variations(q.fieldname, q.text)
    else:
        return q.apply(term2var)

q = And([Term("f", "alfa"),
         Or([Term("f", "bravo"),
             Not(Term("f", "charlie"))])])
q = term2var(q)

Note that this method does not automatically create copies of nodes. To avoid modifying the original tree, your function should call the Query.copy() method on nodes before changing their attributes.

children()[source]

Returns an iterator of the subqueries of this object.

copy()[source]

Deprecated, just use copy.deepcopy.

deletion_docs(searcher)[source]

Returns an iterator of docnums matching this query for the purpose of deletion. The delete_by_query() method will use this method when deciding what documents to delete, allowing special queries (e.g. nested queries) to override what documents are deleted. The default implementation just forwards to Query.docs().

docs(searcher)[source]

Returns an iterator of docnums matching this query.

>>> with my_index.searcher() as searcher:
...     list(my_query.docs(searcher))
[10, 34, 78, 103]
Parameters:

searcher – A whoosh.searching.Searcher object.

estimate_min_size(ixreader)[source]

Returns an estimate of the minimum number of documents this query could potentially match.

estimate_size(ixreader)[source]

Returns an estimate of how many documents this query could potentially match (for example, the estimated size of a simple term query is the document frequency of the term). It is permissible to overestimate, but not to underestimate.

existing_terms(ixreader, phrases=True, expand=False, fieldname=None)[source]

Returns a set of all byteterms in this query tree that exist in the given ixreader.

Parameters:
  • ixreader – A whoosh.reading.IndexReader object.

  • phrases – Whether to add words found in Phrase queries.

  • expand – If True, queries that match multiple terms will return all matching expansions.

Return type:

set

field()[source]

Returns the field this query matches in, or None if this query does not match in a single field.

has_terms()[source]

Returns True if this specific object represents a search for a specific term (as opposed to a pattern, as in Wildcard and Prefix) or terms (i.e., whether the replace() method does something meaningful on this instance).

is_leaf()[source]

Returns True if this is a leaf node in the query tree, or False if this query has sub-queries.

is_range()[source]

Returns True if this object searches for values within a range.

iter_all_terms(phrases=True)[source]

Returns an iterator of (fieldname, text) pairs for all terms in this query tree.

>>> qp = qparser.QueryParser("text", myindex.schema)
>>> q = myparser.parse("alfa bravo title:charlie")
>>> # List the terms in a query
>>> list(q.iter_all_terms())
[("text", "alfa"), ("text", "bravo"), ("title", "charlie")]
>>> # Get a set of all terms in the query that don't exist in the index
>>> r = myindex.reader()
>>> missing = set(t for t in q.iter_all_terms() if t not in r)
set([("text", "alfa"), ("title", "charlie")])
>>> # All terms in the query that occur in fewer than 5 documents in
>>> # the index
>>> [t for t in q.iter_all_terms() if r.doc_frequency(t[0], t[1]) < 5]
[("title", "charlie")]
Parameters:

phrases – Whether to add words found in Phrase queries.

leaves()[source]

Returns an iterator of all the leaf queries in this query tree as a flat series.

matcher(searcher, context=None)[source]

Returns a Matcher object you can use to retrieve documents and scores matching this query.

Return type:

whoosh.matching.Matcher

normalize()[source]

Returns a recursively “normalized” form of this query. The normalized form removes redundancy and empty queries. This is called automatically on query trees created by the query parser, but you may want to call it yourself if you’re writing your own parser or building your own queries.

>>> q = And([And([Term("f", u"a"),
...               Term("f", u"b")]),
...               Term("f", u"c"), Or([])])
>>> q.normalize()
And([Term("f", u"a"), Term("f", u"b"), Term("f", u"c")])

Note that this returns a new, normalized query. It does not modify the original query “in place”.

phrases()[source]

Recursively get all individual terms and phrases that are part of this Query

replace(fieldname, oldtext, newtext)[source]

Returns a copy of this query with oldtext replaced by newtext (if oldtext was anywhere in this query).

Note that this returns a new query with the given text replaced. It does not modify the original query “in place”.

requires()[source]

Returns a set of queries that are known to be required to match for the entire query to match. Note that other queries might also turn out to be required but not be determinable by examining the static query.

>>> a = Term("f", u"a")
>>> b = Term("f", u"b")
>>> And([a, b]).requires()
set([Term("f", u"a"), Term("f", u"b")])
>>> Or([a, b]).requires()
set([])
>>> AndMaybe(a, b).requires()
set([Term("f", u"a")])
>>> a.requires()
set([Term("f", u"a")])
simplify(ixreader)[source]

Returns a recursively simplified form of this query, where “second-order” queries (such as Prefix and Variations) are re-written into lower-level queries (such as Term and Or).

terms(phrases=False)[source]

Yields zero or more (fieldname, text) pairs queried by this object. You can check whether a query object targets specific terms before you call this method using Query.has_terms().

To get all terms in a query tree, use Query.iter_all_terms().

tokens(boost=1.0, exreader=None)[source]

Yields zero or more analysis.Token objects corresponding to the terms searched for by this query object. You can check whether a query object targets specific terms before you call this method using Query.has_terms().

The Token objects will have the fieldname, text, and boost attributes set. If the query was built by the query parser, they Token objects will also have startchar and endchar attributes indexing into the original user query.

To get all tokens for a query tree, use Query.all_tokens().

Parameters:

exreader – a reader to use to expand multiterm queries such as prefixes and wildcards. The default is None meaning do not expand.

with_boost(boost)[source]

Returns a COPY of this query with the boost set to the given value.

If a query type does not accept a boost itself, it will try to pass the boost on to its children, if any.

class whoosh.query.CompoundQuery(subqueries, boost=1.0)[source]

Abstract base class for queries that combine or manipulate the results of multiple sub-queries .

class whoosh.query.MultiTerm[source]

Abstract base class for queries that operate on multiple terms in the same field.

class whoosh.query.ExpandingTerm[source]

Intermediate base class for queries such as FuzzyTerm and Variations that expand into multiple queries, but come from a single term.

class whoosh.query.WrappingQuery(child)[source]

Query classes

class whoosh.query.Term(fieldname, text, boost=1.0, minquality=None)[source]

Matches documents containing the given term (fieldname+text pair).

>>> Term("content", u"render")
class whoosh.query.Variations(fieldname, text, boost=1.0)[source]

Query that automatically searches for morphological variations of the given word in the same field.

class whoosh.query.FuzzyTerm(fieldname, text, boost=1.0, maxdist=1, prefixlength=1, constantscore=True)[source]

Matches documents containing words similar to the given term.

Parameters:
  • fieldname – The name of the field to search.

  • text – The text to search for.

  • boost – A boost factor to apply to scores of documents matching this query.

  • maxdist – The maximum edit distance from the given text.

  • prefixlength – The matched terms must share this many initial characters with ‘text’. For example, if text is “light” and prefixlength is 2, then only terms starting with “li” are checked for similarity.

class whoosh.query.Phrase(fieldname, words, slop=1, boost=1.0, char_ranges=None)[source]

Matches documents containing a given phrase.

Parameters:
  • fieldname – the field to search.

  • words – a list of words (unicode strings) in the phrase.

  • slop – the number of words allowed between each “word” in the phrase; the default of 1 means the phrase must match exactly.

  • boost – a boost factor that to apply to the raw score of documents matched by this query.

  • char_ranges – if a Phrase object is created by the query parser, it will set this attribute to a list of (startchar, endchar) pairs corresponding to the words in the phrase

class whoosh.query.And(subqueries, boost=1.0)[source]

Matches documents that match ALL of the subqueries.

>>> And([Term("content", u"render"),
...      Term("content", u"shade"),
...      Not(Term("content", u"texture"))])
>>> # You can also do this
>>> Term("content", u"render") & Term("content", u"shade")
class whoosh.query.Or(subqueries, boost=1.0, minmatch=0, scale=None)[source]

Matches documents that match ANY of the subqueries.

>>> Or([Term("content", u"render"),
...     And([Term("content", u"shade"), Term("content", u"texture")]),
...     Not(Term("content", u"network"))])
>>> # You can also do this
>>> Term("content", u"render") | Term("content", u"shade")
Parameters:
  • subqueries – a list of Query objects to search for.

  • boost – a boost factor to apply to the scores of all matching documents.

  • minmatch – not yet implemented.

  • scale – a scaling factor for a “coordination bonus”. If this value is not None, it should be a floating point number greater than 0 and less than 1. The scores of the matching documents are boosted/penalized based on the number of query terms that matched in the document. This number scales the effect of the bonuses.

class whoosh.query.DisjunctionMax(subqueries, boost=1.0, tiebreak=0.0)[source]

Matches all documents that match any of the subqueries, but scores each document using the maximum score from the subqueries.

class whoosh.query.Not(query, boost=1.0)[source]

Excludes any documents that match the subquery.

>>> # Match documents that contain 'render' but not 'texture'
>>> And([Term("content", u"render"),
...      Not(Term("content", u"texture"))])
>>> # You can also do this
>>> Term("content", u"render") - Term("content", u"texture")
Parameters:
  • query – A Query object. The results of this query are excluded from the parent query.

  • boost – Boost is meaningless for excluded documents but this keyword argument is accepted for the sake of a consistent interface.

class whoosh.query.Prefix(fieldname, text, boost=1.0, constantscore=True)[source]

Matches documents that contain any terms that start with the given text.

>>> # Match documents containing words starting with 'comp'
>>> Prefix("content", u"comp")
class whoosh.query.Wildcard(fieldname, text, boost=1.0, constantscore=True)[source]

Matches documents that contain any terms that match a “glob” pattern. See the Python fnmatch module for information about globs.

>>> Wildcard("content", u"in*f?x")
class whoosh.query.Regex(fieldname, text, boost=1.0, constantscore=True)[source]

Matches documents that contain any terms that match a regular expression. See the Python re module for information about regular expressions.

class whoosh.query.TermRange(fieldname, start, end, startexcl=False, endexcl=False, boost=1.0, constantscore=True)[source]

Matches documents containing any terms in a given range.

>>> # Match documents where the indexed "id" field is greater than or equal
>>> # to 'apple' and less than or equal to 'pear'.
>>> TermRange("id", u"apple", u"pear")
Parameters:
  • fieldname – The name of the field to search.

  • start – Match terms equal to or greater than this.

  • end – Match terms equal to or less than this.

  • startexcl – If True, the range start is exclusive. If False, the range start is inclusive.

  • endexcl – If True, the range end is exclusive. If False, the range end is inclusive.

  • boost – Boost factor that should be applied to the raw score of results matched by this query.

class whoosh.query.NumericRange(fieldname, start, end, startexcl=False, endexcl=False, boost=1.0, constantscore=True)[source]

A range query for NUMERIC fields. Takes advantage of tiered indexing to speed up large ranges by matching at a high resolution at the edges of the range and a low resolution in the middle.

Example Usage:

# Match numbers from 10 to 5925 in the “number” field. nr = NumericRange(“number”, 10, 5925)

__init__(self, fieldname, start, end, startexcl=False, endexcl=False, boost=1.0, constantscore=True)[source]

Initializes a NumericRange object with the specified parameters.

simplify(self, ixreader)[source]

Simplifies the range query by compiling it and calling the simplify method on the compiled query.

estimate_size(self, ixreader)[source]

Estimates the size of the range query by compiling it and calling the estimate_size method on the compiled query.

estimate_min_size(self, ixreader)[source]

Estimates the minimum size of the range query by compiling it and calling the estimate_min_size method on the compiled query.

docs(self, searcher)[source]

Retrieves the documents that match the range query by compiling it and calling the docs method on the compiled query.

_compile_query(self, ixreader)[source]

Compiles the range query by preparing the start and end values, generating subqueries for different resolutions, and combining them into a single query.

matcher(self, searcher, context=None)[source]

Retrieves the matcher for the range query by compiling it and calling the matcher method on the compiled query.

Fields:

fieldname: The name of the field to search. start: Match terms equal to or greater than this number. This should be a number type, not a string. end: Match terms equal to or less than this number. This should be a number type, not a string. startexcl: If True, the range start is exclusive. If False, the range start is inclusive. endexcl: If True, the range end is exclusive. If False, the range end is inclusive. boost: Boost factor that should be applied to the raw score of results matched by this query. constantscore: If True, the compiled query returns a constant score (the value of the boost keyword argument) instead of actually scoring the matched terms. This gives a nice speed boost and won’t affect the results in most cases since numeric ranges will almost always be used as a filter.

Parameters:
  • fieldname – The name of the field to search.

  • start – Match terms equal to or greater than this number. This should be a number type, not a string.

  • end – Match terms equal to or less than this number. This should be a number type, not a string.

  • startexcl – If True, the range start is exclusive. If False, the range start is inclusive.

  • endexcl – If True, the range end is exclusive. If False, the range end is inclusive.

  • boost – Boost factor that should be applied to the raw score of results matched by this query.

  • constantscore – If True, the compiled query returns a constant score (the value of the boost keyword argument) instead of actually scoring the matched terms. This gives a nice speed boost and won’t affect the results in most cases since numeric ranges will almost always be used as a filter.

class whoosh.query.DateRange(fieldname, start, end, startexcl=False, endexcl=False, boost=1.0, constantscore=True)[source]

This is a very thin subclass of NumericRange that only overrides the initializer and __repr__() methods to work with datetime objects instead of numbers. Internally this object converts the datetime objects it’s created with to numbers and otherwise acts like a NumericRange query.

>>> DateRange("date", datetime(2010, 11, 3, 3, 0),
...           datetime(2010, 11, 3, 17, 59))
Parameters:
  • fieldname – The name of the field to search.

  • start – Match terms equal to or greater than this number. This should be a number type, not a string.

  • end – Match terms equal to or less than this number. This should be a number type, not a string.

  • startexcl – If True, the range start is exclusive. If False, the range start is inclusive.

  • endexcl – If True, the range end is exclusive. If False, the range end is inclusive.

  • boost – Boost factor that should be applied to the raw score of results matched by this query.

  • constantscore – If True, the compiled query returns a constant score (the value of the boost keyword argument) instead of actually scoring the matched terms. This gives a nice speed boost and won’t affect the results in most cases since numeric ranges will almost always be used as a filter.

class whoosh.query.Every(fieldname=None, boost=1.0)[source]

A query that matches every document containing any term in a given field. If you don’t specify a field, the query matches every document.

>>> # Match any documents with something in the "path" field
>>> q = Every("path")
>>> # Matcher every document
>>> q = Every()

The unfielded form (matching every document) is efficient.

The fielded is more efficient than a prefix query with an empty prefix or a ‘*’ wildcard, but it can still be very slow on large indexes. It requires the searcher to read the full posting list of every term in the given field.

Instead of using this query it is much more efficient when you create the index to include a single term that appears in all documents that have the field you want to match.

For example, instead of this:

# Match all documents that have something in the "path" field
q = Every("path")

Do this when indexing:

# Add an extra field that indicates whether a document has a path
schema = fields.Schema(path=fields.ID, has_path=fields.ID)

# When indexing, set the "has_path" field based on whether the document
# has anything in the "path" field
writer.add_document(text=text_value1)
writer.add_document(text=text_value2, path=path_value2, has_path="t")

Then to find all documents with a path:

q = Term("has_path", "t")
Parameters:

fieldname – the name of the field to match, or None or * to match all documents.

whoosh.query.NullQuery

alias of <_NullQuery>

Binary queries

class whoosh.query.Require(a, b)[source]

Binary query returns results from the first query that also appear in the second query, but only uses the scores from the first query. This lets you filter results without affecting scores.

class whoosh.query.AndMaybe(a, b)[source]

Binary query takes results from the first query. If and only if the same document also appears in the results from the second query, the score from the second query will be added to the score from the first query.

class whoosh.query.AndNot(a, b)[source]

Binary boolean query of the form ‘a ANDNOT b’, where documents that match b are removed from the matches for a.

class whoosh.query.Otherwise(a, b)[source]

A binary query that only matches the second clause if the first clause doesn’t match any documents.

Span queries

class whoosh.query.Span(start, end=None, startchar=None, endchar=None, boost=1.0)[source]
classmethod merge(spans)[source]

Merges overlapping and touches spans in the given list of spans.

Note that this modifies the original list.

>>> spans = [Span(1,2), Span(3)]
>>> Span.merge(spans)
>>> spans
[<1-3>]
class whoosh.query.SpanQuery[source]

Abstract base class for span-based queries. Each span query type wraps a “regular” query that implements the basic document-matching functionality (for example, SpanNear wraps an And query, because SpanNear requires that the two sub-queries occur in the same documents. The wrapped query is stored in the q attribute.

Subclasses usually only need to implement the initializer to set the wrapped query, and matcher() to return a span-aware matcher object.

class whoosh.query.SpanFirst(q, limit=0)[source]

Matches spans that end within the first N positions. This lets you for example only match terms near the beginning of the document.

Parameters:
  • q – the query to match.

  • limit – the query must match within this position at the start of a document. The default is 0, which means the query must match at the first position.

class whoosh.query.SpanNear(a, b, slop=1, ordered=True, mindist=1)[source]

Note: for new code, use SpanNear2 instead of this class. SpanNear2 takes a list of sub-queries instead of requiring you to create a binary tree of query objects.

Matches queries that occur near each other. By default, only matches queries that occur right next to each other (slop=1) and in order (ordered=True).

For example, to find documents where “whoosh” occurs next to “library” in the “text” field:

from whoosh import query, spans
t1 = query.Term("text", "whoosh")
t2 = query.Term("text", "library")
q = spans.SpanNear(t1, t2)

To find documents where “whoosh” occurs at most 5 positions before “library”:

q = spans.SpanNear(t1, t2, slop=5)

To find documents where “whoosh” occurs at most 5 positions before or after “library”:

q = spans.SpanNear(t1, t2, slop=5, ordered=False)

You can use the phrase() class method to create a tree of SpanNear queries to match a list of terms:

q = spans.SpanNear.phrase("text", ["whoosh", "search", "library"],
                          slop=2)
Parameters:
  • a – the first query to match.

  • b – the second query that must occur within “slop” positions of the first query.

  • slop – the number of positions within which the queries must occur. Default is 1, meaning the queries must occur right next to each other.

  • ordered – whether a must occur before b. Default is True.

Pram mindist:

the minimum distance allowed between the queries.

class whoosh.query.SpanNear2(qs, slop=1, ordered=True, mindist=1)[source]

Matches queries that occur near each other. By default, only matches queries that occur right next to each other (slop=1) and in order (ordered=True).

New code should use this query type instead of SpanNear.

(Unlike SpanNear, this query takes a list of subqueries instead of requiring you to build a binary tree of query objects. This query should also be slightly faster due to less overhead.)

For example, to find documents where “whoosh” occurs next to “library” in the “text” field:

from whoosh import query, spans
t1 = query.Term("text", "whoosh")
t2 = query.Term("text", "library")
q = spans.SpanNear2([t1, t2])

To find documents where “whoosh” occurs at most 5 positions before “library”:

q = spans.SpanNear2([t1, t2], slop=5)

To find documents where “whoosh” occurs at most 5 positions before or after “library”:

q = spans.SpanNear2(t1, t2, slop=5, ordered=False)
Parameters:
  • qs – a sequence of sub-queries to match.

  • slop – the number of positions within which the queries must occur. Default is 1, meaning the queries must occur right next to each other.

  • ordered – whether a must occur before b. Default is True.

Pram mindist:

the minimum distance allowed between the queries.

class whoosh.query.SpanNot(a, b)[source]

Matches spans from the first query only if they don’t overlap with spans from the second query. If there are no non-overlapping spans, the document does not match.

For example, to match documents that contain “bear” at most 2 places after “apple” in the “text” field but don’t have “cute” between them:

from whoosh import query, spans
t1 = query.Term("text", "apple")
t2 = query.Term("text", "bear")
near = spans.SpanNear(t1, t2, slop=2)
q = spans.SpanNot(near, query.Term("text", "cute"))
Parameters:
  • a – the query to match.

  • b – do not match any spans that overlap with spans from this query.

class whoosh.query.SpanOr(subqs)[source]

Matches documents that match any of a list of sub-queries. Unlike query.Or, this class merges together matching spans from the different sub-queries when they overlap.

Parameters:

subqs – a list of queries to match.

class whoosh.query.SpanContains(a, b)[source]

Matches documents where the spans of the first query contain any spans of the second query.

For example, to match documents where “apple” occurs at most 10 places before “bear” in the “text” field and “cute” is between them:

from whoosh import query, spans
t1 = query.Term("text", "apple")
t2 = query.Term("text", "bear")
near = spans.SpanNear(t1, t2, slop=10)
q = spans.SpanContains(near, query.Term("text", "cute"))
Parameters:
  • a – the query to match.

  • b – the query whose spans must occur within the matching spans of the first query.

class whoosh.query.SpanBefore(a, b)[source]

Matches documents where the spans of the first query occur before any spans of the second query.

For example, to match documents where “apple” occurs anywhere before “bear”:

from whoosh import query, spans
t1 = query.Term("text", "apple")
t2 = query.Term("text", "bear")
q = spans.SpanBefore(t1, t2)
Parameters:
  • a – the query that must occur before the second.

  • b – the query that must occur after the first.

class whoosh.query.SpanCondition(a, b)[source]

Matches documents that satisfy both subqueries, but only uses the spans from the first subquery.

This is useful when you want to place conditions on matches but not have those conditions affect the spans returned.

For example, to get spans for the term alfa in documents that also must contain the term bravo:

SpanCondition(Term("text", u"alfa"), Term("text", u"bravo"))

Special queries

class whoosh.query.NestedParent(parents, subq, per_parent_limit=None, score_fn=<built-in function sum>)[source]

A query that allows you to search for “nested” documents, where you can index (possibly multiple levels of) “parent” and “child” documents using the group() and/or start_group() methods of a whoosh.writing.IndexWriter to indicate that hierarchically related documents should be kept together:

schema = fields.Schema(type=fields.ID, text=fields.TEXT(stored=True))

with ix.writer() as w:
    # Say we're indexing chapters (type=chap) and each chapter has a
    # number of paragraphs (type=p)
    with w.group():
        w.add_document(type="chap", text="Chapter 1")
        w.add_document(type="p", text="Able baker")
        w.add_document(type="p", text="Bright morning")
    with w.group():
        w.add_document(type="chap", text="Chapter 2")
        w.add_document(type="p", text="Car trip")
        w.add_document(type="p", text="Dog eared")
        w.add_document(type="p", text="Every day")
    with w.group():
        w.add_document(type="chap", text="Chapter 3")
        w.add_document(type="p", text="Fine day")

The NestedParent query wraps two sub-queries: the “parent query” matches a class of “parent documents”. The “sub query” matches nested documents you want to find. For each “sub document” the “sub query” finds, this query acts as if it found the corresponding “parent document”.

>>> with ix.searcher() as s:
...   r = s.search(query.Term("text", "day"))
...   for hit in r:
...     print(hit["text"])
...
Chapter 2
Chapter 3
Parameters:
  • parents – a query, DocIdSet object, or Results object representing the documents you want to use as the “parent” documents. Where the sub-query matches, the corresponding document in these results will be returned as the match.

  • subq – a query matching the information you want to find.

  • per_parent_limit – a maximum number of “sub documents” to search per parent. The default is None, meaning no limit.

  • score_fn – a function to use to combine the scores of matching sub-documents to calculate the score returned for the parent document. The default is sum, that is, add up the scores of the sub-documents.

class whoosh.query.NestedChildren(parents, subq, boost=1.0)[source]

This is the reverse of a NestedParent query: instead of taking a query that matches children but returns the parent, this query matches parents but returns the children.

This is useful, for example, to search for an album title and return the songs in the album:

schema = fields.Schema(type=fields.ID(stored=True),
                       album_name=fields.TEXT(stored=True),
                       track_num=fields.NUMERIC(stored=True),
                       track_name=fields.TEXT(stored=True),
                       lyrics=fields.TEXT)
ix = RamStorage().create_index(schema)

# Indexing
with ix.writer() as w:
    # For each album, index a "group" of a parent "album" document and
    # multiple child "track" documents.
    with w.group():
        w.add_document(type="album",
                       artist="The Cure", album_name="Disintegration")
        w.add_document(type="track", track_num=1,
                       track_name="Plainsong")
        w.add_document(type="track", track_num=2,
                       track_name="Pictures of You")
        # ...
    # ...


# Find songs where the song name has "heaven" in the title and the
# album the song is on has "hell" in the title
qp = QueryParser("lyrics", ix.schema)
with ix.searcher() as s:
    # A query that matches all parents
    all_albums = qp.parse("type:album")

    # A query that matches the parents we want
    albums_with_hell = qp.parse("album_name:hell")

    # A query that matches the desired albums but returns the tracks
    songs_on_hell_albums = NestedChildren(all_albums, albums_with_hell)

    # A query that matches tracks with heaven in the title
    songs_with_heaven = qp.parse("track_name:heaven")

    # A query that finds tracks with heaven in the title on albums
    # with hell in the title
    q = query.And([songs_on_hell_albums, songs_with_heaven])
class whoosh.query.ConstantScoreQuery(child, score=1.0)[source]

Wraps a query and uses a matcher that always gives a constant score to all matching documents. This is a useful optimization when you don’t care about scores from a certain branch of the query tree because it is simply acting as a filter. See also the AndMaybe query.

Exceptions

exception whoosh.query.QueryError[source]

Error encountered while running a query.