Sensible Updates

PDFs are no longer where data goes to die

Josh Lewis and Ming Lu

There is a lot to like about paper documents and PDFs. They're easy for people to use and understand. Businesses use them to work with other businesses without pre-planning and coordination. Some documents are even elegant in their design and their representation of data. But as software eats the world, these documents are a major cause of indigestion.

Related Documents

We believe documents should be as accessible to software as they are to people, and that's why we started Sensible.

The challenge in making documents accessible to software is that documents are really just containers for diverse data. A single document might contain tables, paragraphs, boxes, labels, and images. As a result, the best practice for extracting structured data from documents is to use an ensemble of methods.

For software developers, this creates a tremendous amount of work even when building on top of existing OCR APIs or text extracted directly from PDFs. Many companies have spent months of engineering effort to integrate external documents into their workflows.

For operations leaders, too often you need people power to unlock the data stored in documents. We know the pain of watching headcount grow linearly with servicing volume, whether for customer onboarding, data entry, compliance, or discovery.

In both cases, this effort is not part of the core value that the company is creating, but rather a technical and operational hurdle the company must clear to create value elsewhere.

With Sensible, developers can turn PDFs and document images into structured data in a single afternoon. In turn, this allows operations leaders to focus on high skill, high ROI workloads rather than routine document management.

We accomplish this by providing developers with a wide range of data extraction primitives in a powerful, configurable domain-specific language. These primitives put machine learning and natural language processing techniques at developers' fingertips, which provides transparency and control while being far more concise and less brittle than rule-based methods.

More broadly, Sensible is creating a service for developers to quickly map unstructured and imperfectly structured data to schemas. Documents are the most impactful initial application of this technology, but the same core challenge is present when working with audio, websites, third party APIs, and other sources of data over which the developer does not have full control.

Sensible is the only service you need to connect documents to software, and our API is live in private beta. We're working with companies in the logistics, insurance, real estate, and legal domains to make the documents they handle every day accessible to software. Reach out to us at if you'd like to do the same.

Extract structured data from documents

Schedule a demo