Zoriah Cocio · Tucson, AZ · MMXXVI
← Back to index
Volume 05 · Agentic · 2026

dealbrief

Five models in a row, each doing one thing well. None pretending to be the whole answer.

Role
Design & build
Year
2026
Stack
Next · Anthropic · Voyage AI

Most products that touch a sales call do the same thing — paste a transcript, call a model, get a paragraph back. dealbrief asks what changes when you stop treating the call as a summary problem and start treating it as a pipeline.

A call is not one document. It's five: who was there, what they pushed back on, how those objections map to a playbook the rep can act on, what the rep said that legal will eventually flag, and — only at the end — a brief that synthesizes everything into something a sales manager would actually hand to a rep. Each is a different shape of work, and each gets its own pass through its own model. Three Haiku calls handle the structured extraction — typed entities, classified objections, compliance flags — because Haiku is fast and inexpensive when you ask it to fill out a Zod schema. Sonnet only runs once, on the synthesis step, where the prose actually has to read like something a senior wrote.

The cross-reference step is where the pipeline gets honest about what's a reasoning problem and what isn't. Matching an objection to a playbook pattern isn't reasoning — it's lookup. Calling an LLM for similarity search would add seconds and cost to every run. Voyage AI's embeddings handle it in under five hundred milliseconds for a fraction of the cost, ranked by cosine similarity and filtered by category. The result is structured enough for Sonnet to consume in the next step.

The synthesis streams. Sonnet writes the brief token by token over Server-Sent Events, which means the first sentence appears within two seconds of the step starting instead of thirty. It is the only place in the pipeline where the user sees the model thinking; everywhere else, the model returns typed objects the application treats as data. The prompt is plumbing. The schema is the contract. The stream is the courtesy.

A call is not a summary problem. It's a pipeline problem. The interesting work begins on the second model call.

Evidence — three details that earned their place
i.

Specialized models, not one big one

Three Haiku calls cost pennies and return validated structured data in under a second each. Sonnet earns its place on the one step where prose quality is the deliverable. The full pipeline runs for roughly two cents.

ii.

Embeddings are a tool, not a feature

Cross-reference uses Voyage AI's voyage-3-lite — five hundred milliseconds, deterministic, category-filtered. Calling an LLM for what is fundamentally a similarity search would be slower, more expensive, and less reliable.

iii.

Streaming is honesty

The synthesis step takes thirty to seventy seconds. Streaming token-by-token tells the user the system is working. The same response delivered all at once at the end would feel broken.