Utilities API¶
download_with_retry(url, max_retries=3, initial_delay=1.0, backoff_factor=2.0, timeout=30.0)
¶
Download content from URL with exponential backoff retry logic.
Returns:
| Name | Type | Description |
|---|---|---|
bytes |
Optional[bytes]
|
The downloaded content |
Raises:
| Type | Description |
|---|---|
(HTTPError, TimeoutException, ValueError)
|
If all retries fail |
get_indices_codes(return_type)
¶
Extract indices and NAICS codes from a parquet file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
return_type
|
Literal['codes', 'indices', 'code_to_idx', 'idx_to_code']
|
One of 'codes', 'indices', 'code_to_idx', 'idx_to_code'. |
required |
Returns:
| Type | Description |
|---|---|
Union[List[str], List[int], Dict[str, int], Dict[int, str]]
|
One of the following based on return_type: codes (List[str]): List of unique NAICS codes. indices (List[int]): List of indices for the NAICS codes. code_to_idx (Dict[str, int]): Mapping from NAICS codes to indices. idx_to_code (Dict[int, str]): Mapping from indices to NAICS codes. |
pick_device(device_str='auto')
¶
Pick device for PyTorch operations.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
device_str
|
str
|
Device string ('auto', 'cuda', 'cpu', 'mps') |
'auto'
|
Returns:
| Type | Description |
|---|---|
device
|
torch.device object |
setup_directory(dir_path)
¶
Setup directory, creating it if it doesn't exist.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dir_path
|
str
|
Path to directory |
required |
Returns:
| Type | Description |
|---|---|
Path
|
Path object to the directory |