01 · A working pipeline that cost too much to scale at high volume
The Challenge
The customer's job was straightforward to describe: read a piece of unstructured text (article, mention, social post), extract every named entity referenced, and normalize each to a canonical entry in a database of ~250K records - with disambiguation for the cases where the same surface form could map to several different entries depending on context.
A frontier model with a carefully tuned prompt did the job at the customer's target F1. The prompt was already locked. The eval suite was already built. The model worked.
The bill was the problem. At the volume the customer wanted to ship at, the per-token cost on the frontier-API tier didn't survive contact with the unit economics. Swapping to a smaller frontier model (Gemini 2.5 Flash Lite) as a drop-in dropped F1 well below the bar - saving the cost but breaking the product. The team was stuck on the expensive tier, looking for a way to keep the quality without keeping the bill.





