We've just released our 2026 Buyer's Guide to Intelligent Document Processing, and if you're a technical leader evaluating document extraction solutions—or if you've already built something internally using GPT or Claude—you need to read this.
Here's the uncomfortable truth we lay out in the guide: LLMs have made document extraction easier to start, but they haven't solved the fundamental problems of determinism, reliability, and governance. The future belongs to hybrid architectures that combine LLM reasoning with deterministic validation—and pure LLM approaches are costing teams far more than they realize.
The hidden tax on internal builds
Most teams calculate the cost of building document extraction in-house by estimating the initial build time—maybe a few weeks or a month for an engineer to wrap an LLM API. But what they miss is what we call the "prompt janitor" problem.
Every time Anthropic deprecates a model, every time you want to do schema normalization across a large diversity of documents, every time you want to post-process derived values from the document data, it becomes a janitorial problem. A platform vendor absorbs this burden. An internal team owns it forever.
Our analysis shows the true Year 1 cost of an internal build runs between $140,000 and $210,000.
Sensible founder Josh Lewis learned that lesson firsthand at Newfront (recently acquired for $1.3B by Willis Towers Watson). He says:
“As one of the early engineers, I actually built document automation for the company—and then killed the project."
"Why are we building PDF automation when we're not going to sell it?" I realized. "If you're building something that is not part of the core value proposition of your product, you're probably wasting your engineering dollars."
That experience became the genesis for Sensible, but the lesson applies broadly: most companies shouldn't be building document extraction any more than they should be building their own database.
The hybrid future
The guide identifies six critical failure modes that teams encounter when moving LLM-based extraction from prototype to production. It asserts that the answer to these failures isn't to abandon AI. It's to combine LLM reasoning with deterministic extraction and validation, or hybrid systems.
“Any workflow where the cost of an error is high needs more than just a single point of failure. If you're using an LLM to do the initial document structuring, you want to come at it from another angle to verify it—whether that's a second observer LLM, deterministic validations, or human review.”
Josh Lewis, founder of Sensible.
For hybrid systems, the guide maps out:
- How to evaluate build vs. buy decisions
- The real cost differences between per-page, per-document, and token-based pricing
- A complete decision framework for evaluating intelligent document processing (IDP) platforms
- Industry-specific deep dives showing where hybrid IDP proves critical
From script to infrastructure
Our central thesis: Document processing is shifting from software to infrastructure. By 2030, successful organizations will treat document intelligence with the same operational rigor they apply to databases, authentication systems, and message queues—complete with versioning, monitoring, SLAs, and testing. Teams that continue treating document extraction as a collection of scripts will find themselves perpetually cleaning up after model updates.
"Document automation is infrastructure. It should be robust, accurate, and audited."
Josh Lewis, founder of Sensible
Get the full guide
This is just a glimpse of what's in the full 14-page guide. We cover vendor positioning, pricing model breakdowns, security considerations, regression testing strategies, and industry-specific examples from companies that got this right (and wrong).
Download the complete 2026 Buyer's Guide to Intelligent Document Processing →
Whether you're evaluating vendors or defending an internal build, this guide will give you the framework to make the right architectural decision for your organization.

.png)