Production AI beliefs

Stack opinions backed by systems I have operated

Strong opinions, lightly held. The point is not tool tribalism, it is knowing where correctness, cost, and recovery boundaries belong.

Production agents need explicit boundaries.

Free-form loops are useful for exploration, but production systems need inspectable state, typed interfaces, retries, and failure recovery.

Autonomy is valuable only when the surrounding system makes behavior observable and recoverable.

Evidence: DAG orchestration and task-level execution in the market research platform.

Frameworks help prototypes, but architecture lives in boundaries.

LangChain and LangGraph can be useful, but durable systems need domain-specific evaluation, observability, retry, and cost boundaries.

The strongest question is not which library is fashionable. The question is where correctness, state, and recovery live.

Evidence: Shared Python agent platform with multi-provider routing, tracing, and generated APIs.

AI observability should be artifact-aware, not only prompt-aware.

Prompts matter, but debugging production AI needs spans, costs, retries, model routing, generated artifacts, and business correctness checks.

A pretty trace is not enough unless it helps explain and fix the failure mode.

Evidence: OpenTelemetry and Langfuse-backed instrumentation for AI workflow execution.

Intermediate representations make AI artifacts debuggable.

When AI generates complex deliverables, direct output is fragile. IRs make systems inspectable, testable, and renderer-agnostic.

Deck IR separated reasoning about slide structure from rendering into HTML preview and native PowerPoint.

Evidence: Deck IR to HTML preview to native PPTX export in the market research platform.

Generated analytics need artifact boundaries.

AI products need generated analytical objects that can be inspected, edited in the web app, and translated into native presentation artifacts.

The hard part is joining generated Highcharts-style chart specs, editable web chart objects, quality checks, and native PPTX chart rendering without losing meaning.

Evidence: 15-25 Highcharts charts per report with multi-threshold quality scoring in the market research workflow.

Unit economics are part of AI architecture.

A workflow that works but cannot be afforded is not production-ready.

Cost controls belong in routing, caching, retry limits, judge coverage, and infra choices, not in a post-launch spreadsheet.

Evidence: ML infrastructure cost reduction and cost-aware AI workflow patterns.