FOR: HYPERNYM OPS / CTO TEAM
All three Hypernym APIs tested · operational state per service
Smoke-tested Hypernym Omnifact, Compressed Repo Analyze, and the new Modulum API. All three respond to requests (no DNS / TLS / hard outages) but each has a distinct operational issue blocking production-grade use. Three concrete asks for ops at the bottom.
Begin call returns 202 in 1.1s. Job queued but worker never picks it up. elapsed_seconds: 0.0 after 42s of polling.
Endpoint healthy but rejects current keychain key. Needs HYPERNYM_REPO_INGEST_API_KEY (pending rotation since 2026-05-09).
Returns valid responses. Decode normal (38-60 tok/s). Prefill regressed ~190× from yesterday: 1 tok/s today vs 193 tok/s on 2026-05-11.
zephyr.hypernym.aiDiagnosis: the begin endpoint queues experiments correctly and returns valid experiment IDs. The worker pool that processes them isn't dequeuing. elapsed_seconds: 0.0 after 42 seconds confirms the worker timer has never started — the job is sitting in the queue waiting for a processor. Likely a stalled worker process or zero workers running.
zephyr-b-gpu.hypernym.aiDiagnosis: the standard HYPERNYM_API_KEY (which works for Omnifact) is rejected here. Per session memory dated 2026-05-09, Repo Analyze uses a separate key HYPERNYM_REPO_INGEST_API_KEY that has been pending rotation. The endpoint itself is healthy — it responds with a structured 401 in 1.1s, confirming reachability and TLS termination.
gemma4.hypernym.aiDiagnosis: the model serves correct responses and speculative drafting (draft_n=2, 100% acceptance) is working as designed. Decode throughput is normal at 60 tok/s. The regression is entirely in prefill — yesterday's 193 tokens/sec prefill is now 1 token/sec. Same Tundra backend, same model file. This is the same backend that returned 503 on 2026-05-10 and timed out my BABILong probe at 64k+ on 2026-05-11. The prefill regression is what's blocking all longer-context validation.
At 1 tok/s prefill, processing common context lengths takes:
32k context = 32,000 s = 8.9 hours just to read the input
64k context = 17.8 hours
128k context = 35.5 hours
This is why yesterday's BABILong probe timed out at 64k+ but completed at 32k. The BABILong +9pp at 128k claim cannot be reproduced via the public API in its current state. Internal benchmarks presumably ran on different infrastructure with intact prefill performance.
Same endpoint, same model file, same auth key. Only variable is time.
| Metric | 2026-05-10 | 2026-05-11 | 2026-05-12 (today) |
|---|---|---|---|
| Endpoint reachable | 503 backend down | 200 OK | 200 OK |
| Total latency · short prompt | n/a (503) | 1.2s | 22.0s |
| Prefill speed (tokens/sec) | n/a | 193 tok/s | 1.0 tok/s |
| Decode speed (tokens/sec) | n/a | 57 tok/s | 38-60 tok/s |
| Speculative draft acceptance | n/a | 2/2 (100%) | 2/2 short · 23/42 (55%) long |
| BABILong 32k retrieval | n/a | 5/5 = 100% | untested today (probe yesterday) |
| BABILong 64k retrieval | n/a | 0/5 (timeouts + 500s) | not retested · prefill regression makes it worse |
| BABILong 128k retrieval | n/a | 0/3 (timeouts + 500s) | not retested · prefill regression makes it worse |
Two-day pattern: 2026-05-10 the backend was down (503). 2026-05-11 it came back at full performance (193 tok/s prefill, 1.2s short-prompt latency, all BABILong 32k samples correct). 2026-05-12 the endpoint is up but prefill is at 0.5% of yesterday's speed. Something changed between 2026-05-11 and 2026-05-12 in the Tundra deployment. Worth checking deploy logs / kernel config / GPU memory pressure in that window.
Each of the three APIs responds to requests. None are hard-down. But each is operationally non-functional for production use:
elapsed_seconds: 0.0.SESSION_STARTUP_2026_05_10.md is still open.The common factor: the API code/infrastructure is functional. The operational state (worker pools, key rotation, prefill performance) drifts between healthy and broken without obvious correlation to anything. This is the "old Hypernym problem" — services that look up in monitoring but fail when actually exercised.
Job f4c50888-4493-4a17-8cab-47ba41baf150 queued at zephyr.hypernym.ai, never dequeued. Check worker pool health; restart if stalled. If the worker process is alive but blocked, check queue config / dead-letter queue / Redis (or whatever message broker is in front of it).
The current key in our keychain (used for both Omnifact and tested against Repo Analyze) returns 401 on zephyr-b-gpu.hypernym.ai/api/repo/analyze. The session memory dated 2026-05-09 notes the rotation has been pending. Please provide the new key value so we can update the macOS keychain entry: security add-generic-password -U -s HYPERNYM_REPO_INGEST_API_KEY -a $USER -w "<new-key>".
Same endpoint, same model file, prefill dropped from 193 tok/s on 2026-05-11 to 1 tok/s on 2026-05-12. Decode is unaffected (38-60 tok/s consistent). Suspect causes: (a) GPU memory pressure forcing CPU-side prefill, (b) batch-size config regression, (c) cache-line alignment / KV cache config issue, (d) inadvertent debug-mode flag flipped. Check deploy logs between 2026-05-11 21:00 UTC and 2026-05-12 19:00 UTC — that window contains the regression event.
Run these after ops changes to confirm fix.
Hypernym's published benchmark (BABILong +9pp at 128k on Gemma 4 31B + Modulum, 2026-05-08 proof doc) is structurally valid. The architecture works. But external customers / partners attempting to reproduce the benchmark via the public API will hit the prefill regression and conclude the system doesn't work as advertised.
The R19 build proposal for Substrate Delta Masks + MTP compound assumes a working Tundra backend as the production validation surface. Phase 1 (baseline reproduction) needs the API to actually serve 64k+ contexts in finite time. Phase 5 (production hardening) was already scoped as critical-path; this latest regression confirms why.
Distribution timing: IHC benchmarking, Chris-direction R19 doc, and any Year-1 commercial wedge (Hypernym Router, Reasoning State SDK, Cursor partnership) all require the public API to be production-grade. The three asks above are the unblock-list.