Live model versionslive
Measured model versions from this workspace's core-api /v1/models, ranked by effective rank.
asn-head-8080041erank 27.1nDCG@10 0.942
asn-head-1825460deployederank 26.8 · trunc 256nDCG@10 0.954
asn-head-8282654erank 26.0nDCG@10 0.908
Based on 891 runs and 1,190 trainings · evidence: EVIDENCE.md §3.1–§3.7, SWEEP_FINDINGS.md, SWEEP_REPORT.md · updated 2026-06-28
How the pipeline worksThe research org tests every embedding method on measured benchmarks. Methods that prove their worth are promoted to Business Development for real-world tenant pilots, then the Execution team implements the winners in production. Dead ends are archived with the evidence that killed them.
In Research(2)→Promoted to Business Development(3)→In Execution(2)→Archived (rejected)(4)
In Research
2 methodsUnder active investigation in the research org: measured, but not yet promoted.
VICReg (variance + covariance rank floor)
Measured (regime-specific)Loss-space rank floor. Prevents collapse where it occurs (non-contrastive / low-negative regimes) but is neutral on standard InfoNCE real-text training. Kept as insurance, not a default.
Key metric · +0.32 kNN for SimSiam; ~0 for InfoNCE batch≥16; neutral on real text
Evidence: EVIDENCE §3.4, §3.6, §3.7 (H-A)
Differentiable rank-floor regularizer
Measured (mid-pack)Adds a differentiable effective-rank maximization term to InfoNCE. Works but ranked below Barlow/InfoNCE in the search.
Key metric · robust score 1.31 (#3 of 6)
Evidence: EVIDENCE §3.7 wave-2
Promoted to Business Development
3 methodsEarned its keep in research. Handed to Business Development for real-world tenant pilots.
Domain fine-tuning
MeasuredCheap per-tenant fine-tuning on the tenant's domain. The defensible product moat: beats general commercial embeddings in-domain, with no out-of-domain forgetting at this scale.
Key metric · +1.5% (300 pairs) → +3.0% (1,200 pairs) in-domain; OOD improved
Evidence: EVIDENCE §3.6, §3.7 (H-C)
Barlow Twins
Validating on real textRedundancy-reduction objective. Top method in the Bayesian search; keeps full-dim quality while resisting collapse. Real-text confirmation vs VICReg in progress.
Key metric · robust score 1.42 (#1 of 6 methods, synthetic)
Evidence: EVIDENCE §3.7 wave-2 TPE
Matryoshka (MRL) truncation training
Validating on real textTrains nested prefixes of the served representation so it degrades gracefully when truncated. The real lever for cheap truncated serving (decorrelation is not).
Key metric · best truncated-dim kNN among methods (synthetic)
Evidence: EVIDENCE §3.7 (B, wave-2)
In Execution
2 methodsValidated and shipped. Part of the production serving / training path today.
InfoNCE contrastive (default)
MeasuredStandard contrastive objective. Inherently resists collapse with adequate negatives (batch ≥16); the production default training path.
Key metric · AG News kNN 0.892 / nDCG 0.822 after fine-tune
Evidence: EVIDENCE §3.6, §3.7 (H-A)
int8 quantized serving
MeasuredPer-vector int8 quantization for edge serving. Essentially lossless; ship it as the default cheap-serving tier.
Key metric · kNN_int8 ≈ kNN_full everywhere
Evidence: EVIDENCE §3.7 (B)
Archived (rejected)
4 methodsTested honestly and rejected. Kept here so we don't re-litigate dead ends.
Three-tier spectral surgery (original ASN)
RejectedPeriodic weight-space SVD surgery shrinking the weak singular band. Fights anisotropy, not collapse. In a real collapse regime it made collapse worse (rank 3.4→1.0).
Key metric · served rank 3.4 → 1.0; kNN −12–21pts
Evidence: EVIDENCE §3.2
spectral_lift (rank-floor surgery)
RejectedRedesigned surgery that lifts weak singular values. Fixes the harm of three-tier but still loses to doing nothing; weight-space surgery can't outrun the collapse gradient.
Key metric · rank 2.06 vs do-nothing 3.40
Evidence: EVIDENCE §3.3
Sleep / SHY consolidation (AwakenedSleepNet)
RejectedBio-inspired wake/sleep phasing with synaptic downscaling + dream pruning + replay. Phasic consolidation fails; only a continuous in-loss rank floor works. Downscaling survives as a collapse-neutral renormalizer.
Key metric · phasic rank 9.99 / 5.71 vs continuous 21.0
Evidence: EVIDENCE §3.5
DINO-style centering
RejectedSelf-distillation with centering + sharpening. Worst method in the Bayesian search on this task.
Key metric · robust score 1.08 (#6 of 6)
Evidence: EVIDENCE §3.7 wave-2