support.bitvector module

An implementation of an object that acts like a collection of on/off bits.

Base classes

class whoosh.idsets.DocIdSet[source]

Base class for a set of positive integers, implementing a subset of the built-in set type’s interface with extra docid-related methods.

This is a superclass for alternative set implementations to the built-in set which are more memory-efficient and specialized toward storing sorted lists of positive integers, though they will inevitably be slower than set for most operations since they’re pure Python.

after(i)[source]

Returns the next integer in the set after i, or None.

before(i)[source]

Returns the previous integer in the set before i, or None.

first()[source]

Returns the first (lowest) integer in the set.

invert_update(size)[source]

Updates the set in-place to contain numbers in the range [0 - size) except numbers that are in this set.

last()[source]

Returns the last (highest) integer in the set.

class whoosh.idsets.BaseBitSet[source]

Implementation classes

class whoosh.idsets.BitSet(source=None, size=0)[source]

A DocIdSet backed by an array of bits. This can also be useful as a bit array (e.g. for a Bloom filter). It is much more memory efficient than a large built-in set of integers, but wastes memory for sparse sets.

Parameters:
  • maxsize – the maximum size of the bit array.

  • source – an iterable of positive integers to add to this set.

  • bits – an array of unsigned bytes (“B”) to use as the underlying bit array. This is used by some of the object’s methods.

class whoosh.idsets.OnDiskBitSet(dbfile, basepos, bytecount)[source]

A DocIdSet backed by an array of bits on disk.

>>> st = RamStorage()
>>> f = st.create_file("test.bin")
>>> bs = BitSet([1, 10, 15, 7, 2])
>>> bytecount = bs.to_disk(f)
>>> f.close()
>>> # ...
>>> f = st.open_file("test.bin")
>>> odbs = OnDiskBitSet(f, bytecount)
>>> list(odbs)
[1, 2, 7, 10, 15]
Parameters:
  • dbfile – a StructFile object to read from.

  • basepos – the base position of the bytes in the given file.

  • bytecount – the number of bytes to use for the bit array.

class whoosh.idsets.SortedIntSet(source=None, typecode='I')[source]

A DocIdSet backed by a sorted array of integers.

class whoosh.idsets.MultiIdSet(idsets, offsets)[source]

Wraps multiple SERIAL sub-DocIdSet objects and presents them as an aggregated, read-only set.

Parameters:
  • idsets – a list of DocIdSet objects.

  • offsets – a list of offsets corresponding to the DocIdSet objects in idsets.