API Reference
Public API of bls_release_dates, generated from docstrings (Google style).
Package
bls_release_dates
BLS news release scraper for CES, SAE, and QCEW release dates.
Publication(name, series, index_url, frequency)
dataclass
BLS publication: name, series code, index URL, and frequency.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Short name (e.g. "ces", "sae", "qcew"). |
series |
str
|
BLS series code used in archive URLs (e.g. "empsit", "laus"). |
index_url |
str
|
Full URL of the news release archive index page. |
frequency |
str
|
Either "monthly" or "quarterly". |
main()
Run full pipeline: download, build release_dates, build vintage_dates.
Source code in src/bls_release_dates/__main__.py
build_dataframe()
Parse all downloaded HTML files into a release_dates DataFrame.
Source code in src/bls_release_dates/__main__.py
download_all_publications()
async
Download release HTML files for all configured publications.
Source code in src/bls_release_dates/__main__.py
read_release_dates(path=None)
Read release_dates parquet if it exists.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path | str | None
|
Optional path to the parquet file. Defaults to data/release_dates.parquet relative to the current working directory. |
None
|
Returns:
| Type | Description |
|---|---|
DataFrame | None
|
Polars DataFrame with columns publication, ref_date, vintage_date, or None |
DataFrame | None
|
if the file has not been created yet. |
Source code in src/bls_release_dates/read.py
read_vintage_dates(path=None)
Read vintage_dates parquet if it exists.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path | str | None
|
Optional path to the parquet file. Defaults to data/vintage_dates.parquet relative to the current working directory. |
None
|
Returns:
| Type | Description |
|---|---|
DataFrame | None
|
Polars DataFrame with columns publication, ref_date, vintage_date, revision, |
DataFrame | None
|
benchmark_revision, or None if the file has not been created yet. |
Source code in src/bls_release_dates/read.py
Configuration
bls_release_dates.config
Publication definitions and paths.
Publication(name, series, index_url, frequency)
dataclass
BLS publication: name, series code, index URL, and frequency.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Short name (e.g. "ces", "sae", "qcew"). |
series |
str
|
BLS series code used in archive URLs (e.g. "empsit", "laus"). |
index_url |
str
|
Full URL of the news release archive index page. |
frequency |
str
|
Either "monthly" or "quarterly". |
Parser
bls_release_dates.parser
Extract release (vintage) date from downloaded BLS release HTML files.
parse_vintage_date(html_content)
Extract release (vintage) date from embargo line in HTML.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
html_content
|
str
|
Raw HTML of a BLS release page. |
required |
Returns:
| Type | Description |
|---|---|
date | None
|
The release (vintage) date if found in the embargo line, None otherwise. |
Source code in src/bls_release_dates/parser.py
parse_ref_from_path(path)
Parse reference year and month from a release filename.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path
|
Path to a file named like {pub}{yyyy}{mm}.htm (e.g. ces_2010_03.htm). |
required |
Returns:
| Type | Description |
|---|---|
tuple[int, int] | None
|
(year, month) if the stem matches the expected pattern and values are valid, |
tuple[int, int] | None
|
None otherwise. Month is 1-12, year is 2000-2100. |
Source code in src/bls_release_dates/parser.py
ref_date_from_year_month(year, month)
Return the reference date for a given year and month.
The reference date is always the 12th of the reference month.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
year
|
int
|
Reference year. |
required |
month
|
int
|
Reference month (1-12). |
required |
Returns:
| Type | Description |
|---|---|
date
|
date(year, month, 12). |
Source code in src/bls_release_dates/parser.py
parse_release_file(path, publication_name)
Read a release HTML file and extract publication, ref_date, and vintage_date.
ref_date is the 12th of the reference month (from the filename); vintage_date is parsed from the embargo line in the HTML.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path
|
Path to the release .htm file. |
required |
publication_name
|
str
|
Publication name (e.g. "ces", "sae", "qcew"). |
required |
Returns:
| Type | Description |
|---|---|
tuple[str, date, date] | None
|
(publication_name, ref_date, vintage_date) if both dates could be parsed, |
tuple[str, date, date] | None
|
None otherwise. |
Source code in src/bls_release_dates/parser.py
collect_release_dates(publication_name, releases_dir)
Walk a publication's release directory and yield parsed release rows.
Glob pattern used: {publication_name}_*.htm. Logs a warning and skips files where the vintage date cannot be parsed.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
publication_name
|
str
|
Publication name (e.g. "ces", "sae", "qcew"). |
required |
releases_dir
|
Path
|
Directory containing release .htm files. |
required |
Yields:
| Type | Description |
|---|---|
tuple[str, date, date]
|
Tuples of (publication_name, ref_date, vintage_date) for each valid file. |
Source code in src/bls_release_dates/parser.py
Scraper
bls_release_dates.scraper
Fetch BLS archive index pages and download release HTML files.
ReleaseEntry(ref_year, ref_month, url)
dataclass
A single release: reference year, month, and archive URL.
Attributes:
| Name | Type | Description |
|---|---|---|
ref_year |
int
|
Reference year (e.g. 2010). |
ref_month |
int
|
Reference month 1-12. |
url |
str
|
Full URL to the release HTML (e.g. .../archives/empsit_04022010.htm). |
archive_href_re(series)
Build a regex that matches archive hrefs for the given BLS series.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
series
|
str
|
BLS series code (e.g. "empsit", "laus", "cewqtr"). |
required |
Returns:
| Type | Description |
|---|---|
Pattern
|
Compiled regex matching paths like /news.release/archives/{series}_MMDDYYYY.htm. |
Source code in src/bls_release_dates/scraper.py
parse_index_page(html, publication_name, series, frequency)
Parse an archive index page into release entries.
Only includes entries for years >= START_YEAR. For monthly publications, parses "Month YYYY" from list/link text; for quarterly, parses "First/Second/ Third/Fourth Quarter" and uses the section year.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
html
|
str
|
Raw HTML of the BLS news release archive index page. |
required |
publication_name
|
str
|
Publication name (e.g. "ces", "sae", "qcew"). |
required |
series
|
str
|
BLS series code used to match archive links. |
required |
frequency
|
str
|
"monthly" or "quarterly". |
required |
Returns:
| Type | Description |
|---|---|
list[ReleaseEntry]
|
List of ReleaseEntry (ref_year, ref_month, url) for each release found. |
Source code in src/bls_release_dates/scraper.py
fetch_index(client, url)
async
Fetch index page HTML.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
client
|
AsyncClient
|
HTTP client to use. |
required |
url
|
str
|
URL of the archive index page. |
required |
Returns:
| Type | Description |
|---|---|
str
|
Response body text. Raises on HTTP errors. |
Source code in src/bls_release_dates/scraper.py
download_one(client, semaphore, entry, publication_name, out_dir)
async
Download one release HTML to out_dir/{pub}{yyyy}{mm}.htm.
Skips download if the file already exists. Uses the semaphore to limit concurrency when called from download_all.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
client
|
AsyncClient
|
HTTP client to use. |
required |
semaphore
|
Semaphore
|
Semaphore for concurrency control. |
required |
entry
|
ReleaseEntry
|
Release entry with ref_year, ref_month, and url. |
required |
publication_name
|
str
|
Publication name for the filename. |
required |
out_dir
|
Path
|
Directory to write the .htm file into. |
required |
Returns:
| Type | Description |
|---|---|
Path | None
|
Path to the written or existing file, or None if skipped. |
Source code in src/bls_release_dates/scraper.py
download_all(entries, publication_name, concurrency=5)
async
Download all release HTMLs for a publication; skip existing files.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
entries
|
list[ReleaseEntry]
|
List of ReleaseEntry from parse_index_page. |
required |
publication_name
|
str
|
Publication name (e.g. "ces", "sae", "qcew"). |
required |
concurrency
|
int
|
Max concurrent requests (default 5). |
5
|
Returns:
| Type | Description |
|---|---|
list[Path]
|
List of paths to written or already-existing .htm files. |
Source code in src/bls_release_dates/scraper.py
Read helpers
bls_release_dates.read
Read release_dates or vintage_dates parquet files if they exist.
read_release_dates(path=None)
Read release_dates parquet if it exists.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path | str | None
|
Optional path to the parquet file. Defaults to data/release_dates.parquet relative to the current working directory. |
None
|
Returns:
| Type | Description |
|---|---|
DataFrame | None
|
Polars DataFrame with columns publication, ref_date, vintage_date, or None |
DataFrame | None
|
if the file has not been created yet. |
Source code in src/bls_release_dates/read.py
read_vintage_dates(path=None)
Read vintage_dates parquet if it exists.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path | str | None
|
Optional path to the parquet file. Defaults to data/vintage_dates.parquet relative to the current working directory. |
None
|
Returns:
| Type | Description |
|---|---|
DataFrame | None
|
Polars DataFrame with columns publication, ref_date, vintage_date, revision, |
DataFrame | None
|
benchmark_revision, or None if the file has not been created yet. |
Source code in src/bls_release_dates/read.py
Vintage dates (revisions)
bls_release_dates.vintage_dates
Build vintage_dates dataset from release_dates.parquet with revision codes.
Revision semantics (publication-specific; may not hold for most recent ref_dates):
- 0: initial release (vintage_date from release_dates.parquet)
- 1, 2, ...: subsequent revisions (vintage_date shifted by 1, 2, ... months)
- 9: benchmark revision (CES and SAE only)
benchmark_revision: 0 = not a benchmark row; 1 = first benchmark; 2 = second benchmark (SAE re-replacement only).
- CES: revisions 0, 1, 2, and 9. Benchmark 9 only for March ref_date (vintage = Jan release next year); benchmark_revision=1.
- SAE: revisions 0, 1, and 9. Benchmark 9 twice for April–September ref_dates (double-revision): first at March Y+1 (benchmark_revision=1), second at March Y+2 (benchmark_revision=2).
- QCEW: by quarter of ref_date — Q1: 0,1,2,3,4; Q2: 0,1,2,3; Q3: 0,1,2; Q4: 0,1. No benchmarks (benchmark_revision=0).
build_vintage_dates(release_dates_path=None)
Build vintage_dates DataFrame from release_dates parquet.
Applies publication-specific revision logic (CES 0,1,2 + benchmark; SAE 0,1 + benchmarks; QCEW 0..max by quarter), filters to vintage_date <= today, and sorts by publication, ref_date, vintage_date, revision, benchmark_revision.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
release_dates_path
|
Path | None
|
Path to release_dates.parquet. Defaults to config.PARQUET_PATH. |
None
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Polars DataFrame with columns publication, ref_date, vintage_date, |
DataFrame
|
revision, benchmark_revision. |
Source code in src/bls_release_dates/vintage_dates.py
main()
Build vintage_dates from release_dates and write data/vintage_dates.parquet.
Reads data/release_dates.parquet, applies revision logic, and writes data/vintage_dates.parquet. Creates the output directory if needed.
Source code in src/bls_release_dates/vintage_dates.py
Entry point / main
bls_release_dates.__main__
CLI entry point: download BLS releases, build release_dates and vintage_dates.
download_all_publications()
async
Download release HTML files for all configured publications.
Source code in src/bls_release_dates/__main__.py
build_dataframe()
Parse all downloaded HTML files into a release_dates DataFrame.
Source code in src/bls_release_dates/__main__.py
main()
Run full pipeline: download, build release_dates, build vintage_dates.