Tokenization Cache API¶
get_tokens(idx_code, cache)
¶
Get tokens for a specific NAICS index or code from cache.
tokenization_cache(cfg=TokenizationConfig(), use_locking=True)
¶
Get tokenization cache, loading from disk or building if necessary.
This function is safe for multi-worker environments. It uses file locking to ensure only one worker builds the cache, while others wait and then load it.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cfg
|
TokenizationConfig
|
TokenizationConfig |
TokenizationConfig()
|
use_locking
|
bool
|
If False, skip locking (for fast reads when cache exists) |
True
|