support.bitvector
module¶
An implementation of an object that acts like a collection of on/off bits.
Base classes¶
- class whoosh.idsets.DocIdSet[source]¶
Base class for a set of positive integers, implementing a subset of the built-in
set
type’s interface with extra docid-related methods.This is a superclass for alternative set implementations to the built-in
set
which are more memory-efficient and specialized toward storing sorted lists of positive integers, though they will inevitably be slower thanset
for most operations since they’re pure Python.
Implementation classes¶
- class whoosh.idsets.BitSet(source=None, size=0)[source]¶
A DocIdSet backed by an array of bits. This can also be useful as a bit array (e.g. for a Bloom filter). It is much more memory efficient than a large built-in set of integers, but wastes memory for sparse sets.
- Parameters:
maxsize – the maximum size of the bit array.
source – an iterable of positive integers to add to this set.
bits – an array of unsigned bytes (“B”) to use as the underlying bit array. This is used by some of the object’s methods.
- class whoosh.idsets.OnDiskBitSet(dbfile, basepos, bytecount)[source]¶
A DocIdSet backed by an array of bits on disk.
>>> st = RamStorage() >>> f = st.create_file("test.bin") >>> bs = BitSet([1, 10, 15, 7, 2]) >>> bytecount = bs.to_disk(f) >>> f.close() >>> # ... >>> f = st.open_file("test.bin") >>> odbs = OnDiskBitSet(f, bytecount) >>> list(odbs) [1, 2, 7, 10, 15]
- Parameters:
dbfile – a
StructFile
object to read from.basepos – the base position of the bytes in the given file.
bytecount – the number of bytes to use for the bit array.