In the healthcare industry, the CMS-1500 form (formerly known as the HCFA-1500) is a standardized paper claim form used by healthcare providers to bill Medicare, Medicaid, and most other insurance carriers. For companies in healthcare tech, automatically extracting data from these forms is critical for streamlining claims processing, reducing manual entry errors, and accelerating reimbursement timelines.
Enter Sensible. With Sensible, you can easily parse key information from CMS-1500 forms using SenseML, Sensible's query language for extracting data from documents. We've written a library of open-source SenseML configurations, so you don't need to write queries from scratch for common documents. From there, your extracted healthcare data is accessible via API, Sensible's UI, or thousands of other software integrations through Zapier.
Note that while Sensible offers powerful LLM-based SenseML methods to parse these documents, CMS-1500 forms have a standardized layout that makes them excellent candidates for our layout-based methods. These methods are not only fast but also extremely accurate for forms with consistent structures. So, this tutorial will focus on layout-based methods to extract from CMS-1500 forms.
What we'll cover
This blog post will walk you through extracting specific pieces of information from an example CMS-1500 form:

By the end, you'll know several SenseML methods and you'll be on your way to extracting any data you choose using our documentation or our prebuilt open-source configurations.
Here’s the example document we’ll use with dummy patient data:

To follow along, you can sign up for a Sensible account, then import an example CMS-1500 PDF and prebuilt open-source configurations directly to the Sensible app using the docs for Out-of-the-box extractions.
Our configurations for bank statement extractions are comprehensive. To keep the example in this post simple, let's extract just the:
- carrier name
- patient name
- patient’s marital status
- Lines of service
Identify form revision with fingerprints
First, let's identify the revision number (08-05) for the CMS-1500 form in order to optimize the extraction process. We’ll use fingerprints to do so. (Note that classifying the form generally as a CMS-1500 happens upstream and isn’t covered in this tutorial.)
This fingerprint tests the CMS-1500 revision by checking that every page contains the text "FORM CMS-1500 (08-05)" at the end of a line. If this test passes, Sensible will use a specific set of queries to extract data from the document. This approach helps Sensible quickly determine the appropriate queries before attempting to extract data from it.
Extract the carrier information
Let's extract the carrier information in the top left of the form:

Here are the queries we’ll use:
This field uses the Region method to extract carrier information. We anchor on the word "CARRIER" and define a rectangular region (2.5 × 1 inches) that's positioned 3 inches to the left and 0.6 inches up from the anchor (displayed as a green rectangular overlay in the preceding screenshot). The Region method extracts all text within this defined area. The sortLines parameter ensures that if there are multiple lines of text in this region, they're read in the correct order.
Extracted value:
Extract the patient's name
Now let's extract the patient's name from the form:

This field uses the Label method to extract the patient's name. We anchor on the text "patient's name" and specify that we want to extract the text directly below this anchor. The editDistance parameter allows for minor OCR errors in the anchor text, making the extraction more robust with scanned documents.
The Label method is well suited to the CMS-1500 form, because much of the data is structured in a label-value format, where a label (like "patient's name") appears near the actual data we want to extract.
Extracted value:
Extract patient's marital status
To determine the patient's marital status, we’ll check if the "Single" checkbox is selected:

This field uses the Nearest Checkbox method to determine if the "Single" checkbox is selected. We anchor on the text "single" and search for the nearest checkbox to the right of this text.
The Nearest Checkbox method can handle a wide variety of checkbox formats. It uses either the document's own metadata about form fields (if available) or falls back to advanced OCR to detect checkbox selections.
Extracted value:
Extract service line items
CMS-1500 forms can have multiple service line items in field 24:

We can extract these using Sections:
This field uses the Sections method to extract multiple service line items from field 24. For each service line, you candefine multiple subfields that extract specific pieces of information like service dates, place of service, procedure codes, and charges.
The Sections method is powerful for handling repeating data structures. It allows us to define a range where these sections appear and then extract consistent data from each section.
Extracted data:
Putting it all together
When you run this configuration against a CMS-1500 form, Sensible extracts all the defined fields and returns them in a structured JSON format that's ready to be integrated with your systems.
Sample output for the extracted fields covered in this tutorial:
Extract more data
We've covered how to extract a few key pieces of data from CMS-1500 forms. Our prebuilt configuration extracts much more information, including insurance details, diagnosis codes, referring provider information, and billing provider details. That full extraction coverage enables use cases such as:
- Automated claims processing
- Real-time eligibility verification
- Integration with EHR systems
- Compliance and audit preparation
Start extracting
Congratulations, you've learned some key methods for extracting structured data from CMS-1500 forms! There's more extraction power for you to uncover. Book a demo or check out our managed services for customized implementation support. Or explore on your own: sign up for an account, check out our prebuilt healthcare templates in our open-source library, and peruse our docs to start extracting data from your own documents.