NAICS Hyperbolic Embedding System

This documentation describes a unified hyperbolic representation learning framework for the North American Industry Classification System (NAICS).

The system consists of four sequential stages:

  • Multi-channel transformer-based text encoding
  • Mixture-of-Experts fusion
  • Lorentz-model hyperbolic contrastive learning
  • Hyperbolic Graph Convolutional refinement (HGCN)

The final output are geometry-aware embeddings aligned with the hierarchical structure of the NAICS taxonomy. These Lorentz-model hyperbolic embeddings are suitable for similarity search, hierarchical modeling, graph-based reasoning, and downstream machine learning applications.

Key Features

Advanced Training Techniques

  • Hard Negative Mining: Selects geometrically challenging negatives using Lorentzian distances
  • Router-Guided Sampling: Prevents expert collapse by selecting negatives that confuse the MoE gating network
  • Global Batch Sampling: Enables hard negative mining across all GPUs in distributed training
  • Structure-Aware Dynamic Curriculum: Progressively enables advanced features based on training progress
  • Multi-Level Supervision: Supports multiple positive examples at different hierarchy levels
  • Hyperbolic K-Means Clustering: Clusters embeddings directly in Lorentz space for false-negative mitigation

Loss Functions

  • Decoupled Contrastive Learning (DCL): Improved gradient flow and numerical stability
  • Hierarchy Preservation Loss: Directly optimizes embedding distances to match tree structure
  • LambdaRank Loss: Position-aware ranking optimization using NDCG
  • Radius Regularization: Prevents hyperbolic embeddings from collapsing or expanding too far

Distributed Training

  • Multi-GPU Support: Automatic global batch sampling for better hard negative mining
  • Memory Efficient: Monitors and logs VRAM usage for distributed operations
  • Gradient Flow: Proper gradient propagation through all_gather operations

Performance Optimizations

  • torch.compile Support: Fused operations for Lorentz geometry via PyTorch 2.0+ compilation
  • Compiled Core Operations: Exponential/logarithmic maps, distance computations, and gating optimized for throughput
  • Modular Mixin Architecture: Model decomposed into functional mixins for maintainability

Use the navigation menu to explore system architecture, training procedures, and API references for each module.