Follow the trace clues, choose a root cause, then compare your answer with the diagnosis and production fix.
medium scenario
Wrong Model Routing
Report cost increased unexpectedly while quality stayed flat.
Representative sanitized scenario. Customer data, private prompts, internal traces, and exact company costs omitted.
plan_insightsllm1800ms6 unitsPlanner selected 24 low-risk summary tasks after schema inspection.
summary.rewritellm9200ms64 unitsPremium model handled 240 low-value summary rewrites that should have used a cheaper route.
judge.sample_high_riskjudge2400ms9 unitsJudge coverage stayed correctly limited to high-risk insights.
deck.renderrender3100ms4 unitsDeck rendered normally, so the cost anomaly came from model routing, not rendering.
Choose the root cause
Diagnosis result
Pick a diagnosis first, then compare your answer with the production fix.
hard scenario
Judge False Positive
An insight was marked verified, but a business reviewer flagged the conclusion as wrong.
Representative sanitized scenario. Customer data, private prompts, internal traces, and exact company costs omitted.
analysis.select_denominatorllm2100ms8 unitsGenerated code used all respondents instead of the filtered buyer segment.
sandbox.executesandbox3400ms7 unitsPython executed successfully and produced a valid table.
judge.verify_executionjudge2800ms10 unitsJudge verified code execution but did not validate question intent.
narrative.summarizellm1800ms5 unitsNarrative over-trusted the verified table and missed the denominator mismatch.
Choose the root cause
Diagnosis result
Pick a diagnosis first, then compare your answer with the production fix.