NAICS Hyperbolic Embedding System¶

This documentation describes a unified hyperbolic representation learning framework for the North American Industry Classification System (NAICS).

The system consists of four sequential stages:

Multi-channel transformer-based text encoding
Mixture-of-Experts fusion
Lorentz-model hyperbolic contrastive learning
Hyperbolic Graph Convolutional refinement (HGCN)

The final output are geometry-aware embeddings aligned with the hierarchical structure of the NAICS taxonomy. These Lorentz-model hyperbolic embeddings are suitable for similarity search, hierarchical modeling, graph-based reasoning, and downstream machine learning applications.

Key Features¶

Advanced Training Techniques¶

Hard Negative Mining: Selects geometrically challenging negatives using Lorentzian distances
Router-Guided Sampling: Prevents expert collapse by selecting negatives that confuse the MoE gating network
Global Batch Sampling: Enables hard negative mining across all GPUs in distributed training
Structure-Aware Dynamic Curriculum: Progressively enables advanced features based on training progress
Multi-Level Supervision: Supports multiple positive examples at different hierarchy levels
Hyperbolic K-Means Clustering: Clusters embeddings directly in Lorentz space for false-negative mitigation

Loss Functions¶

Decoupled Contrastive Learning (DCL): Improved gradient flow and numerical stability
Hierarchy Preservation Loss: Directly optimizes embedding distances to match tree structure
LambdaRank Loss: Position-aware ranking optimization using NDCG
Radius Regularization: Prevents hyperbolic embeddings from collapsing or expanding too far

Distributed Training¶

Multi-GPU Support: Automatic global batch sampling for better hard negative mining
Memory Efficient: Monitors and logs VRAM usage for distributed operations
Gradient Flow: Proper gradient propagation through all_gather operations

Performance Optimizations¶

torch.compile Support: Fused operations for Lorentz geometry via PyTorch 2.0+ compilation
Compiled Core Operations: Exponential/logarithmic maps, distance computations, and gating optimized for throughput
Modular Mixin Architecture: Model decomposed into functional mixins for maintainability

Use the navigation menu to explore system architecture, training procedures, and API references for each module.

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search