In production
- InfoNCE contrastive (default)AG News kNN 0.892 / nDCG 0.822 · EVIDENCE §3.6, §3.7 (H-A)
- int8 quantized servingkNN_int8 ≈ kNN_full everywhere · EVIDENCE §3.7 (B)
The org recipe behind arxiviq.com, with the dated evidence that supports each choice. Every claim traces to a row in EVIDENCE.md. RAG Embeddings is not shipped on a hypothesis. See the Research Registry for the full method table.
This site's serving model is trained in production on the platform's own lifecycle: corpus → pair synthesis → head-only training (frozen backbone, features extracted once) → evaluation gates on the served representation → charter → deploy. The trained artifact is a ~3 MB projection head stored in Postgres; the backbone loads from the baked HuggingFace cache at serve time, with no GPU, no shared filesystem, and no plan upgrade.
| Model | nDCG@10 | Effective rank | Provenance |
|---|---|---|---|
asn-head-1825460 (deployed) | 0.9539 | 26.82 | prod eval, gate slice |
| BAAI/bge-small-en-v1.5 | 0.9193 | 22.83 | local, same pairs + metric code |
| all-MiniLM-L6-v2 (raw) | 0.8847 | 25.97 | local, same pairs + metric code |
Method notes: no ASN weight surgery on this path (rejected 0/4 in fleet evidence); gates grade the representation that actually serves; baseline numbers were measured on the identical pair set and metric code (provenance differs by host and is labeled). Full row: EVIDENCE.md § production head-only split.
Based on 891 runs and 1,190 trainings. Sources: EVIDENCE.md §3.1–§3.8, SWEEP_FINDINGS.md, SWEEP_REPORT.md.
Encoder effective-rank collapse traced to a projector-window trigger bug, not the contrastive loss. Re-run with the fixed trigger.
Original ASN weight-space SVD surgery makes collapse worse in a real collapse regime (served rank 3.4 → 1.0; kNN −12–21pts). Archived.
VICReg prevents collapse where it occurs (non-contrastive / low-negative regimes). Neutral on standard InfoNCE real-text. Kept as insurance.
AG News kNN 0.892 / nDCG 0.822 after fine-tune. Per-tenant domain fine-tuning is the defensible moat: it beats general commercial embeddings in-domain with no OOD forgetting. Reconfirmed 2026-06-28.
Barlow Twins top method in Bayesian TPE search (robust 1.42, #1 of 6). Matryoshka (MRL) truncation training promoted, the real lever for cheap truncated serving. int8 quantized serving essentially lossless (kNN_int8 ≈ kNN_full).
VICReg rank floor confirmed regime-specific: +0.32 kNN for SimSiam, ~0 for InfoNCE batch≥16. Not a default term; insurance only.
Removed the silent demo-pair fallback in core-api eval. Gates cannot pass on hard-coded demo pairs; MIN_REAL_PAIRS_EVAL=8 enforced. Deploy spine is honestly certifying.
A model reaches arxiviq.com only when every gate passes for real. Gates fail closed when a dimension is unmeasured, never stubbed true.
rankAboveBaselineeffective_rank > 8.0measuredndcgNonRegressionnDCG ≥ 0.35 (k=2 proxy; k=10 benchmark pending exam port)measuredmrlWithinToleranceknn_t8 ≥ 0.30 AND (knn_full − knn_t8) ≤ 0.05; fail-closed if unmeasuredmeasured (fail-closed)sufficientEvalPairs≥ 8 real collection pairs (no demo-pair fallback)measured (REV-905)data/corpora/research/corpus.jsonl. Re-kickoff with pnpm harvest:arxiv && pnpm kickoff:orgs after the stack is up.