support.charset
module¶
This module contains tools for working with Sphinx charset table files. These files
are useful for doing case and accent folding.
See whoosh.analysis.CharsetTokenizer
and whoosh.analysis.CharsetFilter
.
- whoosh.support.charset.default_charset¶
An extensive case- and accent folding charset table. Taken from http://speeple.com/unicode-maps.txt
- whoosh.support.charset.charset_table_to_dict(tablestring)[source]¶
Takes a string with the contents of a Sphinx charset table file and returns a mapping object (a defaultdict, actually) of the kind expected by the unicode.translate() method: that is, it maps a character number to a unicode character or None if the character is not a valid word character.
The Sphinx charset table format is described at http://www.sphinxsearch.com/docs/current.html#conf-charset-table.