How to extract data from employment verification forms with Sensible

Updated on
June 19, 2025
5
min read
Contributors
No items found.
Author
How to extract data from employment verification forms with Sensible
Table of contents
Turn documents into structured data
Get started free
Share this post

Employment verification forms (VOEs) serve as essential documents in financial services, providing lenders with crucial information about applicants' income history, employment status, and financial stability. Whether you're processing mortgage applications, personal loans, or credit checks, automating data extraction from these forms can significantly streamline your underwriting process and reduce manual data entry errors.

Each verification provider has its own document format, which presents an interesting challenge for document automation. Enter Sensible, which allows you to handle these variations using SenseML, Sensible's query language for extracting data from documents. We've written a library of open-source SenseML configurations, so you don't need to write queries from scratch for common documents. From there, your extracted employment verification data is accessible via API, Sensible's UI, or thousands of other software integrations through Zapier.

Note that Sensible offers powerful AI-based methods to parse these documents. For example, we offer tutorials on extracting from  rent rolls and resumes using LLMs. In contrast to such free-form documents, employment verification forms have consistent layouts that make them excellent candidates for our layout-based methods. These methods are not only fast but also extremely accurate for forms with structured formats. So, this tutorial will focus on layout-based methods to extract from VOEs.


What we'll cover

This blog post will walk you through extracting data from two different employment verification providers: Truework and Equifax:

Sensible app showing queries, sample document, and extracted document field

We'll examine how the same information requires different extraction approaches based on each provider's document layout. Here are the example documents we’ll use with dummy data:

Truework employment verification document

Equifax employment verification document

By the end, you'll understand several SenseML methods and you'll be on your way to extracting any data you choose using our documentation or our prebuilt open-source configurations.

Prerequisites

To follow along, you can sign up for a Sensible account, then import example employment verification PDFs and prebuilt open-source configurations directly to the Sensible app using the  Out-of-the-box extractions tutorial.

Our configurations for employment verification extractions are comprehensive. To keep the example in this post simple, let's extract solely the following:

  • employee name
  • employer address
  • second-year base pay
  • And show how fingerprints identify document subtypes


Pre-extraction provider identification

First, let's walk through identifying different VOE providers, so we use the appropriate queries for each format. We'll use “fingerprints” to do so. (Note that classifying the document generally as a VOE happens upstream and isn't covered in this tutorial.) Fingerprints help Sensible quickly determine the appropriate queries before attempting to extract data from a document.


Truework fingerprint


{
  /* Sensible uses JSON5 to support in-line comments */
  "fingerprint": {
    "tests": [
      {
        /* test every page for consistent Truework branding */
        "page": "every",
        "match": [
          {
            /* look for the standard report title */
            "text": "Verification of Income Report",
            "type": "endsWith",
            "isCaseSensitive": true
          },
          {
            /* verify Truework branding is present */
            "text": "truework",
            "type": "endsWith"
          }
        ]
      }
    ]
  }
}

The Truework fingerprint tests by checking that every page contains the standard report title and Truework branding.

Equifax fingerprint


{
  "fingerprint": {
    "tests": [
      {
        /* test the first page for unique Equifax elements */
        "page": "first",
        "match": [
          {
            /* look for the standard ORDER INFORMATION header */
            "text": "ORDER INFORMATION",
            "type": "equals",
            "isCaseSensitive": true
          },
          {
            /* verify this is a verification document */
            "text": "Verified On:",
            "type": "equals",
            "isCaseSensitive": true
          },
          {
            /* confirm it contains 'verification type' text*/
            "text": "Verification Type:",
            "type": "equals",
            "isCaseSensitive": true
          }
        ]
      },
      {
        /* test the last page for Equifax footer content */
        "page": "last",
        "match": [
          {
            /* match any 4-digit year (e.g., 2021, 2022) */
            "pattern": "^20\\d{2}$",
            "type": "regex"
          },
          {
            /* verify the standard Equifax verification statement */
            "text": "The statement above is an official verification generated",
            "type": "includes"
          }
        ]
      }
    ]
  }
}

This fingerprint tests the Equifax format by checking that the document contains specific text patterns unique to Equifax reports. If these tests pass, Sensible will use the Equifax-specific extraction queries for this document. The key difference when writing fingerprints for these providers is that Truework maintains consistent branding throughout their shorter documents, while Equifax uses a more complex document structure requiring multi-page validation.

Extract employee name

Let’s compare and contrast different methods for extracting the employee name from different providers’ document layouts. We’ll also look at fallback strategies for handling document variations from a single provider.

Truework employee name extraction

The Truework form clearly labels the employee name:

Truework employee name

To extract this data, let’s use the following SenseML query:


{
  "fields": [
    {
      /* user-friendly ID for the extracted data */
      "id": "employee_name",
      "anchor": {
        "match": {
          /* search for target data
      near anchor text 'full name' in doc*/
          "text": "full name",
          "type": "startsWith"
        }
      },
      "method": {
        /*   target text is to the right of anchor in a row */
        "id": "row",
        "position": "right"
      }
    }
  ]
}

Truework's clean layout allows for a simple approach:

  • Employee name has a "Full Name" label, also called an “anchor”
  • We can use the Row method to extract the text that’s horizontally aligned with the anchor.

A note on layout-based extraction

This first field example demonstrates some basic principles of SenseML layout-based methods:

  • Each “field” is a basic query unit in Sensible. Each field outputs a piece of data from the document that you want to extract. Sensible uses the field id as the key in the key/value JSON output.
  • Sensible searches first for a text "anchor" because it's a computationally quick way to narrow down the location of the target data to extract. 
  • Then, Sensible uses a "method" to expand its search out from the anchor and extract the data you want.

Equifax employee name extraction (Primary method)

To extract the Equifax employee name, we’ll use a primary field and a fallback field. This accounts for layout variation where the social security number (SSN) can be present but redacted, unredacted, or missing completely. In our example document, the redacted SSN is present, so the primary method works.

Equifax employee name

We’ll use the following SenseML queries:


{
  "id": "employee_name",
  "anchor": {
    "match": {
      /* anchor can be any of the following matches */
      "type": "any",
      "matches": [
        {
          /* look for redacted SSN format as anchor */
          "text": "xxx-xx",
          "type": "includes"
        },
        {
          /* or look for 9-10 digit number without dashes (unredacted SSN ). Note we allow matching on erroneously formatted, 10-digit SSNs because we've encountered them in the wild with Equifax forms. */
          "pattern": "\\d{9,10}$",
          "type": "regex"
        }
      ]
    }
  },
  "method": {
    /* extract the name that appears to the left of the SSN */
    "id": "label",
    "position": "left"
  }
}

The primary Equifax method uses the SSN as an anchor because:

  • Employee names consistently appear to the left of SSN information
  • SSNs appear in a predictable format (either redacted as "xxx-xx-####" or as digits)
  • The Label method can extract text positioned relatively closely to the anchor

Equifax employee name extraction (Fallback method)

When the social security number is missing, we’ll fall back to the following query:


{
  "id": "employee_name",
  "anchor": {
    "match": {
      /* regex pattern for names in all-caps */
      "pattern": "^[A-Z]* [A-Z]* ^[A-Z]*| ^[A-Z]* [A-Z]*|",
      "type": "regex"
    },
    /* stop searching before the order information section */
    "end": "order information"
  },
  "method": {
    /* use regex to extract names matching capitalized patterns */
    "id": "regex",
    "pattern": "^[A-Z]* [A-Z]* [A-Z]*|^[A-Z]* [A-Z]*",
    /* filter out all-capped, unwanted lines that might match the pattern */
    "lineFilters": [
      {
        "type": "includes",
        "text": "VERIFICATION SERVICES",
        "isCaseSensitive": true
      },
      {
        "type": "includes",
        "text": "CURRENT AS OF",
        "isCaseSensitive": true
      }
    ]
  }
}

This fallback method activates when the SSN is missing from the document:

  • Uses regex patterns to identify all-caps name formats directly
  • Searches the document header area before "order information"
  • Filters out false matches that might fit the name pattern

Extracted values:

Truework:


"employee_name": {
  "type": "string",
  "value": "Jack Bauer"
}

Equifax:


"employee_name": {
  "type": "string",
  "value": "Shannon Brown"
}

Extract employer address

To extract the employer address, we’ll use the Region method for both Truework and Equifax, employing different strategies to find the region.

Truework employer address extraction

Truework uses a consistent label for the employer address:

Truework employer address

To extract this address, let’s use the following query:


{
  "id": "employer_address",
  "anchor": {
    "match": {
          /* search for 'employer address' anchor */
          "text": "employer address",
          "type": "startsWith"
        },

  "method": {
    /* define a rectangular region in inches relative to the anchor, and extract all text in the region. Region is 4 inches to the right of the anchor (starting from the left edge of the anchor), -0.2 inches above it, and is 3.7" wide by 0.5" high   */
    "id": "region",
    "start": "left",
    "offsetX": 4,
    "offsetY": -0.2,
    "width": 3.7,
    "height": 0.5
  }
}


The Truework approach uses a Region method to capture the address positioned in a specific area relative to the label.

Equifax employer address extraction

The Equifax employer address is multiline and has varying labels ( “headquarters address” or “address 1”).

Equifax employer address

We’ll use the following query to extract this information:


{
  "id": "employer_address",
  /* format output as a properly structured address */
  "type": "address",
  "anchor": {
    "match": {
      "type": "any",
      "matches": [
        {
          /* look for either address anchor format */
          "text": "headquarters address:",
          "type": "startsWith"
        },
        {
          /* our example document uses this address anchor format */
          "text": "address 1:",
          "type": "startsWith"
        }
      ]
    }
  },
  "method": {
    /* define a rectangular region to capture multi-line address */
    "id": "region",
    /* start from the left edge of the anchor */
    "start": "left",
    /* move 1.35 inches to the right of the anchor */
    "offsetX": 1.35,
    /* move slightly up from the anchor */
    "offsetY": -0.1,
    /* region is 2.5 inches wide */
    "width": 2.5,
    /* region is 0.9 inches tall to capture multiple lines */
    "height": 0.9
  }
}

Extracted values:

Truework:


"employer_address": {
  "type": "string",
  "value": "111 Drake Street, Livonia, MI 4423 3"
}

Equifax:


"employer_address": {
  "value": "2223 Trunis Street Data not provided\nChanhassen MN 55317",
  "type": "address"
}

Extract salary data

To extract base pay data, we’ll use a simple row-based approach for Truework and a complex table intersection approach for Equifax.

Truework base pay extraction (Year 2)

Truework uses a table where Base pay is a dedicated row:

Truework base pay

To extract the second year of base pay, use the following query:


{
  "id": "basepay_2",
  /* format output as currency */
  "type": "currency",
  "anchor": {
    /* start search for anchor after employment type section */
    "start": "employment type",
    "match": {
      /* find the base pay row */
      "type": "startsWith",
      "text": "base"
    }
  },
  "method": {
    /* extract values horizontally aligned with the base pay row */
    "id": "row",
    /* look to the right of the anchor */
    "position": "right",
    /* select the second currency value (year 2 data) */
    "tiebreaker": "second"
  }
}

Truework's layout allows for a straightforward approach:

  • Anchors on the "base" salary row
  • Uses the Row method to extract horizontally aligned values
  • "tiebreaker": "second" selects the second currency value in the row (year 2)

Equifax base pay extraction (Year 2)

Equifax labels base pay using a column header, not a row label:

Equifax base pay

To extract this data, use the following query:


{
  "id": "basepay_2",
  /* format output as currency */
  "type": "currency",
  "anchor": {
    "start": {
      /* begin search at the income summary section */
      "text": "ANNUAL INCOME SUMMARY",
      "type": "equals",
      "isCaseSensitive": true
    },
    "match": [
      {
        /* find the first year */
        "pattern": "^20\\d{2}$",
        "type": "regex"
      },
      {
        /* find the second year - this is our target column */
        "pattern": "^20\\d{2}$",
        "type": "regex"
      }
    ],
    "end": [
      /* end conditions to scope the search - find two years followed by a footer */
      {
        "pattern": "^20\\d{2}$",
        "type": "regex"
      },
      {
        "pattern": "^20\\d{2}$",
        "type": "regex"
      },
      {
        /* stop before footer or next section */
        "pattern": "^20\\d{2}$|TWN|the statement above",
        "type": "regex",
        "flags": "i"
      }
    ]
  },
  "method": {
    /* find data at the intersection of year column and base salary row */
    "id": "intersection",
    "verticalAnchor": {
      "start": {
        /* scope the vertical search to the income summary section */
        "type": "equals",
        "text": "ANNUAL INCOME SUMMARY"
      },
      "match": {
        /* find the "base" salary column */
        "text": "base",
        "type": "startsWith"
      }
    },
    /* fine-tune horizontal position of intersection point between 2nd-year row and 'base' column */
    "offsetX": 0.2,
    /* clean up spacing issues in extracted currency values */
    "whitespaceFilter": "all"
  }
}

The Equifax approach handles the table’s column headers using the following strategies:

  • The anchor finds the second year in the income table using regex pattern matching
  • The Intersection method locates where the year column meets the "Base Salary" row
  • offsetX: 0.2 fine-tunes the horizontal position of the row/column intersection to account for column text header alignment 
  • whitespaceFilter: "all" cleans up any spacing issues in the extracted currency

Extracted data:

Truework:


"basepay_2": {
  "source": "$55,520.77",
  "value": 55520.77,
  "unit": "$",
  "type": "currency"
}

Equifax:


"basepay_2": {
  "source": "$443.13",
  "value": 443.13,
  "unit": "$",
  "type": "currency"
}

Summing up layout-based extraction strategies

In this post, you’ve learned how to write a small subset of Sensible’s extraction methods and how to apply them to different document layouts: 

  • Use the Row or Label methods for cleanly labeled single-line data
  • Use the Region method for multi-line data in defined rectangular areas
  • Use the Intersection method for complex tables where you need to find data at the meeting point of rows and columns

This general guidance is a bit oversimplified, but it can inform your extraction strategies for different providers: 

  • Truework employs a cleaner, more modern layout with clear visual separation and consistent labeling, allowing for simpler Row-based extraction.
  • Equifax uses a formal, dense structure typical of enterprise reporting systems, requiring precise methods like the Intersection method for tables and multiple fallback strategies for variable data positioning.

Putting it all together

When you run these configurations against employment verification forms, Sensible extracts all the defined fields and returns them in a structured JSON format that's ready to be integrated with your systems.


Sample output for the defined fields:

Complete Truework Output:


{
  "employee_name": {
    "type": "string",
    "value": "Jack Bauer"
  },
  "employer_address": {
    "type": "string",
    "value": "111 Drake Street, Livonia, MI 4423 3"
  },
  "basepay_2": {
    "source": "$55,520.77",
    "value": 55520.77,
    "unit": "$",
    "type": "currency"
  }
}

Complete Equifax Output:


{
  "employee_name": {
    "type": "string",
    "value": "Shannon Brown"
  },
  "employer_address": {
    "value": "2223 Trunis Street Data not provided\nChanhassen MN 55317",
    "type": "address"
  },
  "basepay_2": {
    "source": "$443.13",
    "value": 443.13,
    "unit": "$",
    "type": "currency"
  }
}


Extract more data

We've covered how to extract a few key pieces of data from employment verification forms. Our prebuilt configurations extract much more information, including multi-year income histories, bonus and overtime details, hire dates, and reference numbers. That full extraction coverage enables use cases such as:

  • Automated loan application processing
  • Real-time income verification
  • Integration with underwriting systems
  • Compliance and audit preparation

Start extracting

Congratulations, you've learned some key methods for extracting structured data from employment verification forms! There's more extraction power to uncover. Book a demo or check out our managed services for customized implementation support. Or explore on your own: sign up for an account, check out our prebuilt financial services templates in our open-source library, and peruse our docs to start extracting data from your own documents.

Frances Elliott
Frances Elliott
Turn documents into structured data
Get started free
Share this post

Turn documents into structured data

Stop relying on manual data entry. With Sensible, claim back valuable time, your ops team will thank you, and you can deliver a superior user experience. It’s a win-win.