Tokenization Cache API

get_tokens(idx_code, cache)

Get tokens for a specific NAICS index or code from cache.

tokenization_cache(cfg=TokenizationConfig(), use_locking=True)

Get tokenization cache, loading from disk or building if necessary.

This function is safe for multi-worker environments. It uses file locking to ensure only one worker builds the cache, while others wait and then load it.

Parameters:

Name Type Description Default
cfg TokenizationConfig

TokenizationConfig

TokenizationConfig()
use_locking bool

If False, skip locking (for fast reads when cache exists)

True