Volume 05 · Agentic · 2026

dealbrief

Five models in a row, each doing one thing well. None pretending to be the whole answer.

// five steps. one brief.
    const pipeline = [
      ["extract",         "haiku"],
      ["classify",        "haiku"],
      ["cross-reference", "voyage-3-lite"],
      ["compliance",      "haiku"],
      ["synthesize",      "sonnet · streaming"],
    ];

    // run sequentially, type the handoffs
    for (const [step, model] of pipeline) {
      await run(step, model);
    }

    // total: 4 LLM calls + 1 vector match
    // cost: ~$0.02 per transcript
    // time: 30-90s end-to-end

Fig. 5 — Pipeline at rest

Five steps · ~$0.02 per call

Role

Design & build

Year

2026

Stack

Next · Anthropic · Voyage AI

Live

dealbrief ↗

Source

github ↗

Most products that touch a sales call do the same thing — paste a transcript, call a model, get a paragraph back. dealbrief asks what changes when you stop treating the call as a summary problem and start treating it as a pipeline.

A call is not one document. It's five: who was there, what they pushed back on, how those objections map to a playbook the rep can act on, what the rep said that legal will eventually flag, and — only at the end — a brief that synthesizes everything into something a sales manager would actually hand to a rep. Each is a different shape of work, and each gets its own pass through its own model. Three Haiku calls handle the structured extraction — typed entities, classified objections, compliance flags — because Haiku is fast and inexpensive when you ask it to fill out a Zod schema. Sonnet only runs once, on the synthesis step, where the prose actually has to read like something a senior wrote.

The cross-reference step is where the pipeline gets honest about what's a reasoning problem and what isn't. Matching an objection to a playbook pattern isn't reasoning — it's lookup. Calling an LLM for similarity search would add seconds and cost to every run. Voyage AI's embeddings handle it in under five hundred milliseconds for a fraction of the cost, ranked by cosine similarity and filtered by category. The result is structured enough for Sonnet to consume in the next step.

The synthesis streams. Sonnet writes the brief token by token over Server-Sent Events, which means the first sentence appears within two seconds of the step starting instead of thirty. It is the only place in the pipeline where the user sees the model thinking; everywhere else, the model returns typed objects the application treats as data. The prompt is plumbing. The schema is the contract. The stream is the courtesy.

“A call is not a summary problem. It's a pipeline problem. The interesting work begins on the second model call.”

Evidence — three details that earned their place

Specialized models, not one big one

Three Haiku calls cost pennies and return validated structured data in under a second each. Sonnet earns its place on the one step where prose quality is the deliverable. The full pipeline runs for roughly two cents.

ii.

Embeddings are a tool, not a feature

Cross-reference uses Voyage AI's voyage-3-lite — five hundred milliseconds, deterministic, category-filtered. Calling an LLM for what is fundamentally a similarity search would be slower, more expensive, and less reliable.

iii.

Streaming is honesty

The synthesis step takes thirty to seventy seconds. Streaming token-by-token tells the user the system is working. The same response delivered all at once at the end would feel broken.