How Structured Context Improves LLM Trading Decisions

Research findings from 7 controlled backtests. We found a clear hierarchy: structured briefings outperform raw web search, which outperforms stale context, which outperforms no context at all.

The Core Finding

LLMs are language processors, not calculators. When given raw numbers (price: $84,231, hash rate: 612 EH/s), they have no framework for interpretation. When given pre-analyzed context (price is 15% below 200-day MA, hash ribbon in capitulation, regime: risk-off), they can reason about what the data means.

The performance hierarchy across 7 runs: Treatment (structured briefings) > Control-WS (web search) > Placebo (stale briefings) > Control (price only).

The Three-Gear System

Gear 1: Briefings (The Context Layer)

Four modular briefings -- btc.energy (mining economics), cross.regime (regime classification), cross.breadth (market breadth), btc.momentum (trend signals). Each briefing contains pre-computed trend directions, percentile rankings, confidence scores, and signal interpretation. The model receives structured intelligence, not raw data.

Gear 2: The Preamble (The Strategy Framework)

A signal weighting guide telling the model how to prioritize signals. Three tiers: regime signals (highest weight), structural signals (medium), tactical signals (lowest). Position sizing rules and risk management guidelines. The preamble turns briefing data into a decision framework.

Gear 3: Portfolio Execution (The Harness)

Trade mechanics with realistic fee structure (0.045% transaction, 0.01% funding per 8h), position limits (80% gross exposure), and deterministic portfolio carryover. The same harness processes all arms identically.

Key Findings

Briefing evolution matters -- Modular briefings (Run 4+) tripled the delta compared to monolithic briefings (Runs 1-3). Adding btc.energy and btc.momentum to the set moved the needle most.
The defensive edge -- Treatment arms detected regime shifts 1-2 ticks before crashes. Short entries on Nov 8, 2025 and Feb 3, 2026 captured the majority of the alpha. The edge is concentrated in crash avoidance.
Model-agnostic -- Both Opus 4.6 and Sonnet 4.5 showed positive treatment deltas with the same briefings. The edge comes from the context, not the model.
Structured > raw information -- Control-WS (web search) beat Control (price only) by +10pp, but Treatment beat Control-WS by +2 to +9pp more. Having information helps. Having structured analysis helps more.

Implications

Context pipeline matters more than the model. The same briefings produced positive deltas on both Opus 4.6 and Sonnet 4.5. Improving the context pipeline (modular briefings, better signals) had a larger impact than switching models.
Raw information helps, but structured analysis helps more. Web search gave agents access to real-time headlines and data, but without an analytical framework, agents overreacted to noise. PreReason briefings provided the framework.
Pre-analyzed context outperforms unstructured context. The gap between Treatment and Control-WS is the commercial case for PreReason: the value is not just in having data, but in having it pre-analyzed with trend directions, regime classification, and signal hierarchy.

Explore

Evidence Hub -- All 7 runs with aggregate results
Methodology -- 4-arm RCT design and controlled variables
Tick-by-Tick -- Every decision with full reasoning

btc.energy briefing | cross.regime briefing | cross.breadth briefing | btc.momentum briefing

Try the briefings free | API documentation