Same briefings on a different model. Confirms the edge comes from the briefing content, not the specific LLM
Key Statistics
Period: Sep 1 - Nov 22, 2025
Ticks: 83
Arms: 3
Model: Sonnet 4.5
BTC Move: -21.4%
Arm Returns
Arm
Return
Treatment (Briefings)
-2.03%
Control (Price Only)
-6.49%
Placebo (Stale Briefings)
-6.79%
Deltas
Treatment vs Control: +4.46pp
Information Value (Treatment minus Placebo): +4.76pp
What Changed
Switched from Opus 4.6 to Sonnet 4.5. Everything else identical to Run 1.
Market Context
BTC declined from $108K to roughly $85K over Sep to late Nov 2025, with the steepest drop occurring in the final week. This window ends 8 days earlier than Run 1, cutting off just before the November selloff bottomed. The macro backdrop featured rising Treasury yields and broad risk aversion.
Observations
Switching from Opus 4.6 to Sonnet 4.5 was the only variable changed. The treatment arm still outperformed control by +4.46pp, confirming the edge comes from briefing content rather than model-specific reasoning. Sonnet 4.5 is a smaller, faster model, yet it extracted nearly the same value from the same briefings.
The control arm on Sonnet 4.5 behaved similarly to Run 1's control: it held long through most of the window, occasionally trimming on large red candles but never going short. Its -6.49% return was slightly better than Run 1's control (-8.44%), likely because this window ends earlier and misses the deepest part of the November selloff.
The placebo arm performed worse here (-6.79%) than in Run 1 (-5.19%). With stale briefings, Sonnet 4.5 appeared more confident in its (wrong) directional calls. It went long aggressively in late October based on outdated regime signals, then held through the drawdown. This hints that smaller models may be more susceptible to stale data than larger ones.