Utilities API

download_with_retry(url, max_retries=3, initial_delay=1.0, backoff_factor=2.0, timeout=30.0)

Download content from URL with exponential backoff retry logic.

Returns:

Name Type Description
bytes Optional[bytes]

The downloaded content

Raises:

Type Description
(HTTPError, TimeoutException, ValueError)

If all retries fail

get_indices_codes(return_type)

Extract indices and NAICS codes from a parquet file.

Parameters:

Name Type Description Default
return_type Literal['codes', 'indices', 'code_to_idx', 'idx_to_code']

One of 'codes', 'indices', 'code_to_idx', 'idx_to_code'.

required

Returns:

Type Description
Union[List[str], List[int], Dict[str, int], Dict[int, str]]

One of the following based on return_type: codes (List[str]): List of unique NAICS codes. indices (List[int]): List of indices for the NAICS codes. code_to_idx (Dict[str, int]): Mapping from NAICS codes to indices. idx_to_code (Dict[int, str]): Mapping from indices to NAICS codes.

pick_device(device_str='auto')

Pick device for PyTorch operations.

Parameters:

Name Type Description Default
device_str str

Device string ('auto', 'cuda', 'cpu', 'mps')

'auto'

Returns:

Type Description
device

torch.device object

setup_directory(dir_path)

Setup directory, creating it if it doesn't exist.

Parameters:

Name Type Description Default
dir_path str

Path to directory

required

Returns:

Type Description
Path

Path object to the directory