Employment verification forms (VOEs) serve as essential documents in financial services, providing lenders with crucial information about applicants' income history, employment status, and financial stability. Whether you're processing mortgage applications, personal loans, or credit checks, automating data extraction from these forms can significantly streamline your underwriting process and reduce manual data entry errors.
Each verification provider has its own document format, which presents an interesting challenge for document automation. Enter Sensible, which allows you to handle these variations using SenseML, Sensible's query language for extracting data from documents. We've written a library of open-source SenseML configurations, so you don't need to write queries from scratch for common documents. From there, your extracted employment verification data is accessible via API, Sensible's UI, or thousands of other software integrations through Zapier.
Note that Sensible offers powerful AI-based methods to parse these documents. For example, we offer tutorials on extracting from rent rolls and resumes using LLMs. In contrast to such free-form documents, employment verification forms have consistent layouts that make them excellent candidates for our layout-based methods. These methods are not only fast but also extremely accurate for forms with structured formats. So, this tutorial will focus on layout-based methods to extract from VOEs.
What we'll cover
This blog post will walk you through extracting data from two different employment verification providers: Truework and Equifax:

We'll examine how the same information requires different extraction approaches based on each provider's document layout. Here are the example documents we’ll use with dummy data:


By the end, you'll understand several SenseML methods and you'll be on your way to extracting any data you choose using our documentation or our prebuilt open-source configurations.
Prerequisites
To follow along, you can sign up for a Sensible account, then import example employment verification PDFs and prebuilt open-source configurations directly to the Sensible app using the Out-of-the-box extractions tutorial.
Our configurations for employment verification extractions are comprehensive. To keep the example in this post simple, let's extract solely the following:
- employee name
- employer address
- second-year base pay
- And show how fingerprints identify document subtypes
Pre-extraction provider identification
First, let's walk through identifying different VOE providers, so we use the appropriate queries for each format. We'll use “fingerprints” to do so. (Note that classifying the document generally as a VOE happens upstream and isn't covered in this tutorial.) Fingerprints help Sensible quickly determine the appropriate queries before attempting to extract data from a document.
Truework fingerprint
The Truework fingerprint tests by checking that every page contains the standard report title and Truework branding.
Equifax fingerprint
This fingerprint tests the Equifax format by checking that the document contains specific text patterns unique to Equifax reports. If these tests pass, Sensible will use the Equifax-specific extraction queries for this document. The key difference when writing fingerprints for these providers is that Truework maintains consistent branding throughout their shorter documents, while Equifax uses a more complex document structure requiring multi-page validation.
Extract employee name
Let’s compare and contrast different methods for extracting the employee name from different providers’ document layouts. We’ll also look at fallback strategies for handling document variations from a single provider.
Truework employee name extraction
The Truework form clearly labels the employee name:

To extract this data, let’s use the following SenseML query:
Truework's clean layout allows for a simple approach:
- Employee name has a "Full Name" label, also called an “anchor”
- We can use the Row method to extract the text that’s horizontally aligned with the anchor.
A note on layout-based extraction
This first field example demonstrates some basic principles of SenseML layout-based methods:
- Each “field” is a basic query unit in Sensible. Each field outputs a piece of data from the document that you want to extract. Sensible uses the field id as the key in the key/value JSON output.
- Sensible searches first for a text "anchor" because it's a computationally quick way to narrow down the location of the target data to extract.
- Then, Sensible uses a "method" to expand its search out from the anchor and extract the data you want.
Equifax employee name extraction (Primary method)
To extract the Equifax employee name, we’ll use a primary field and a fallback field. This accounts for layout variation where the social security number (SSN) can be present but redacted, unredacted, or missing completely. In our example document, the redacted SSN is present, so the primary method works.

We’ll use the following SenseML queries:
The primary Equifax method uses the SSN as an anchor because:
- Employee names consistently appear to the left of SSN information
- SSNs appear in a predictable format (either redacted as "xxx-xx-####" or as digits)
- The Label method can extract text positioned relatively closely to the anchor
Equifax employee name extraction (Fallback method)
When the social security number is missing, we’ll fall back to the following query:
This fallback method activates when the SSN is missing from the document:
- Uses regex patterns to identify all-caps name formats directly
- Searches the document header area before "order information"
- Filters out false matches that might fit the name pattern
Extracted values:
Truework:
Equifax:
Extract employer address
To extract the employer address, we’ll use the Region method for both Truework and Equifax, employing different strategies to find the region.
Truework employer address extraction
Truework uses a consistent label for the employer address:

To extract this address, let’s use the following query:
The Truework approach uses a Region method to capture the address positioned in a specific area relative to the label.
Equifax employer address extraction
The Equifax employer address is multiline and has varying labels ( “headquarters address” or “address 1”).

We’ll use the following query to extract this information:
Extracted values:
Truework:
Equifax:
Extract salary data
To extract base pay data, we’ll use a simple row-based approach for Truework and a complex table intersection approach for Equifax.
Truework base pay extraction (Year 2)
Truework uses a table where Base pay is a dedicated row:

To extract the second year of base pay, use the following query:
Truework's layout allows for a straightforward approach:
- Anchors on the "base" salary row
- Uses the Row method to extract horizontally aligned values
- "tiebreaker": "second" selects the second currency value in the row (year 2)
Equifax base pay extraction (Year 2)
Equifax labels base pay using a column header, not a row label:

To extract this data, use the following query:
The Equifax approach handles the table’s column headers using the following strategies:
- The anchor finds the second year in the income table using regex pattern matching
- The Intersection method locates where the year column meets the "Base Salary" row
- offsetX: 0.2 fine-tunes the horizontal position of the row/column intersection to account for column text header alignment
- whitespaceFilter: "all" cleans up any spacing issues in the extracted currency
Extracted data:
Truework:
Equifax:
Summing up layout-based extraction strategies
In this post, you’ve learned how to write a small subset of Sensible’s extraction methods and how to apply them to different document layouts:
- Use the Row or Label methods for cleanly labeled single-line data
- Use the Region method for multi-line data in defined rectangular areas
- Use the Intersection method for complex tables where you need to find data at the meeting point of rows and columns
This general guidance is a bit oversimplified, but it can inform your extraction strategies for different providers:
- Truework employs a cleaner, more modern layout with clear visual separation and consistent labeling, allowing for simpler Row-based extraction.
- Equifax uses a formal, dense structure typical of enterprise reporting systems, requiring precise methods like the Intersection method for tables and multiple fallback strategies for variable data positioning.
Putting it all together
When you run these configurations against employment verification forms, Sensible extracts all the defined fields and returns them in a structured JSON format that's ready to be integrated with your systems.
Sample output for the defined fields:
Complete Truework Output:
Complete Equifax Output:
Extract more data
We've covered how to extract a few key pieces of data from employment verification forms. Our prebuilt configurations extract much more information, including multi-year income histories, bonus and overtime details, hire dates, and reference numbers. That full extraction coverage enables use cases such as:
- Automated loan application processing
- Real-time income verification
- Integration with underwriting systems
- Compliance and audit preparation
Start extracting
Congratulations, you've learned some key methods for extracting structured data from employment verification forms! There's more extraction power to uncover. Book a demo or check out our managed services for customized implementation support. Or explore on your own: sign up for an account, check out our prebuilt financial services templates in our open-source library, and peruse our docs to start extracting data from your own documents.