How to extract data from resumes with LLMs and Sensible

Updated on

January 25, 2024

min read

Contributors

No items found.

Author

Frances Elliott

Table of contents

Many companies face challenges when automating hiring and recruitment. Aggregating and analyzing candidate data can improve hiring, but is often a taxing manual process. HR Tech products, such as applicant tracking systems (ATS), address these challenges. For example, they can enrich search capabilities for internal or external candidates and power analytics on hiring trends. To build solutions like these, companies in the HR Tech space need intelligent document automation tools. A key use case for such tools is to extract structured data from resumes.

Companies often lack access to resumes in any format other than PDFs, which makes data extraction a potentially difficult problem. Enter Sensible. With Sensible you can easily extract key information out of documents using SenseML, Sensible’s query language. SenseML uses a combination of layout-based rules and LLM prompts to extract from the full spectrum of free-form to structured documents. We’ve written a library of open-source SenseML configurations, so you don’t need to write queries from scratch for common documents. From there, the document data is accessible via Sensible’s API, SDK, app, or 5,000 other software integrations thanks to Zapier.

What we'll cover

This blog post briefly walks you through configuring extractions for resumes. By the end, you’ll know a few methods for extracting document data using our query language, and you’ll be on your way to extracting any data you choose using our documentation or our prebuilt open-source configurations.‍

Write document extraction queries with SenseML

Let's extract data from a resume. Here's an example of a resume PDF with redacted or dummy data:

To extract from this document, take the following prerequisite steps:

Sign up for a Sensible account
Add prebuilt extraction support for resumes to your Sensible account. To add support, follow the steps in Out-of-the-box extractions and select resumes.

Our configurations for resumes are comprehensive. To keep the example in this post simple, let's extract just the:

Candidate’s name
Candidate’s experiences

Configure the LLM preprocessor

You’ll use LLM-based methods to extract from resumes since they’re documents with highly variable layouts. To improve accuracy and performance, you’ll first configure some global parameters for all the LLM prompts in the resume configuration.

See the following screenshot for an overview of how to configure the global LLM parameters:

To configure the global LLM parameters as shown in the preceding screenshot:

Navigate to the resume document type you created in a previous step.
Click Create configuration and create a new test configuration, named for example test_resume.
Click the configuration you created to edit it.
Switch to the JSON editor view by clicking Switch to SenseML.
Paste the following code into the left pane of the Sensible app. This preprocessor code will configure all the data extraction queries you author in succeeding steps in this tutorial:


{
  "preprocessors": [
    {
      /* Sensible uses JSON5 to support code comments */
      "type": "nlp",
      /* describe the document to extract data from, in this case, resumes */
      "contextDescription": "the following context is an excerpt from a resume",
      /* For each field, submit a two-page excerpt to the LLM. The two-page limit improves performance.
         Sensible finds the most relevant document excerpts, or chunks, for each field. */
      /* each excerpt is 1 page long */
      "chunkSize": 1,
      /* submit a total of two excerpts to the LLM */
      "chunkCount": 2,
      /* don't overlap the excerpts */
      "chunkOverlapPercentage": 0
    }
  ],
  "fields": []
}

The preceding code configures the prompts that Sensible submits to the LLMs for each field in the config. They describe the document to extract from (resumes) and configure the size of the document excerpts that Sensible submits to the LLMs as context for the prompts. For more information about these global parameters, see Advanced prompt configuration.

Extract candidate name

Now that you’ve set up baselines for the LLM prompts, let’s start extracting individual pieces of information. We’ll start with the candidate’s name.

See the following screenshot for an overview of how to extract the candidate’s name:

The queries in the left pane in the preceding image search for the candidate’s name in a number of relevant chunks that Sensible excerpts from the document. The PDF is displayed in the middle pane, and the extracted data are in the right pane.

To try this out yourself, paste the following query, or "field", into the left pane of the Sensible app in the `fields` array:


{
  /* ID for target data output */
  "id": "name",
  "method": {
    "id": "query",
    /* instructions for LLM for extracting data from document excerpt. Sensible uses LLMs to identify the most relevant excerpts, then prompts the LLM with information from this description and from the preprocessor */
    "description": "first and last name"
  }
}

You'll get this output:


{
  "name": {
    "type": "string",
    "value": "Shunral Logla"
  }
}

Extract candidate experience

See the following screenshot for an overview of how to extract the work experiences mentioned in the resume:

To try this out yourself, paste the following query, or "field" into the left pane of the Sensible app in the fields array:


{
      "id": "_experience",
      /* returns the output data as a table */
      "type": "table",
      "method": {
        /* use the list method to extract repeating data of unknown format, including tables */
        "id": "list",
        /*overall description of the list's contents  */
        "description": "list every and all of the various job titles the person has held for each company and when they held each job title. there can be multiple job titles listed under a single company. output all job titles along with the respective information. job titles and companies can repeat",
        "properties": [
          {
            /* for each item in the list, provide a user-friendly ID and 
               description of the data you want to extract
               and optional instructions to filter or reformat the data */
            "id": "title",
            "description": "entire job title verbatim"
          },
          {
            "id": "company",
            "description": "company name listed above the job title. if no company is found, repeat the previous company name"
          },
          {
            "id": "date_start",
            "description": "start date of job title"
          },
          {
            "id": "date_end",
            "description": "output the end date or \"Present\" if none"
          },
          {
            "id": "date_range",
            "description": "date range in which the job title was held"
          },
          {
            "id": "duration",
            "description": "duration listed in months and years that the position was held. if duration is not listed, output \"none\""
          },
          {
            "id": "highlights",
            "description": "summarize the job description or contributions listed under the role. if no description, output none"
          }
        ]
      }
    },

You’ll get back the following output:


"_experience": {
    "columns": [
      {
        "id": "title",
        "values": [
          {
            "value": "Project Intern",
            "type": "string"
          },
          {
            "value": "Principal Engineer",
            "type": "string"
          },
          {
            "value": "Principal Engineer",
            "type": "string"
          },
          {
            "value": "none",
            "type": "string"
          },
          {
            "value": "none",
            "type": "string"
          }
        ]
      },
      {
        "id": "company",
        "values": [
          {
            "value": "Cleanest",
            "type": "string"
          },
          {
            "value": "Samsung Engineering",
            "type": "string"
          },
          {
            "value": "Samsung Engineering",
            "type": "string"
          },
          {
            "value": "Corporation & Corporation",
            "type": "string"
          },
          {
            "value": "Corporation & Corporation",
            "type": "string"
          }
        ]
      },
      {
        "id": "date_start",
        "values": [
          {
            "value": "Sep-22",
            "type": "string"
          },
          {
            "value": "Jul-10",
            "type": "string"
          },
          {
            "value": "Apr-18",
            "type": "string"
          },
          {
            "value": "Jul-07",
            "type": "string"
          },
          {
            "value": "Sep-18",
            "type": "string"
          }
        ]
      },
/* JSON output abbreviated */

Click Switch to Sensible Instruct and click Show full output on the _experiences field to see this JSON output in a table format:

Transform extracted data

Sensible extracts the candidate’s experience as an array of columns in a table. You can transform this table to fit your data consumption needs using Sensible’s Computed Field methods.

For example, paste the following code into the left pane of the SenseML editor in the fields array to zip the extracted experiences into an array of row objects, where each object contains all the properties of a given work experience:


{
      "id": "experience",
      "method": {
        /* returns an array of zipped row objects for a table */
        "id": "zip",
        "source_ids": [
          /* the ID of the field containing the extracted data 
         you want to transform */
          "_experience"
        ]
      }
    },

You’ll get back the following output:


"experience": [
    {
      "title": {
        "value": "Project Intern",
        "type": "string"
      },
      "company": {
        "value": "Cleanest",
        "type": "string"
      },
      "date_start": {
        "value": "Sep-22",
        "type": "string"
      },
      "date_end": {
        "value": "Onwards",
        "type": "string"
      },
      "date_range": {
        "value": "Sep-22 ~ Onwards",
        "type": "string"
      },
      "duration": {
        "value": "none",
        "type": "string"
      },
      "highlights": {
        "value": "Teamed to vet solar output to battery energy storage for EV fast charging stations ensuring technical matching for totally DC-coupled microgrids",
        "type": "string"
      }
    },
    {
      "title": {
        "value": "Principal Engineer",
        "type": "string"
      },
      "company": {
        "value": "Samsung Engineering",
        "type": "string"
      },
      "date_start": {
        "value": "Jul-10",
        "type": "string"
      },
      "date_end": {
        "value": "Sep-16",
        "type": "string"
      },
      "date_range": {
        "value": "Jul-10 ~ Sep-16",
        "type": "string"
      },
      "duration": {
        "value": "6 years 2 months",
        "type": "string"
      },
      "highlights": {
        "value": "Headed 20-member teams to successfully execute Oil & Gas, Petrochemical, and Refinery projects totaling $5Bn in CapEx across the US, Middle East, South-East Asia, and Russia; delivered large capital projects within limited budget and tight schedules\nDirected teams to prepare cost estimates for bid proposals aggregating to $10Bn in value; won high-profile contracts\nIdeated system improvement processes by implementing \"System Thinking\" fundamentals to improve system efficiency by 25%\nOwned interactions with project stakeholders for milestone meetings, constructability & design reviews, involving negotiations and critical interface management issues\nConceived value engineering ideas in consultation with senior management to reduce project CapEx by $10Mn across different projects\nPrevented $150Mn as liquidated damages by streamlining delivery of structural steel and pre -emptying the delay in construction at project site in UAE\nConceptualized system development initiatives; achieved a 30% reduction in engineering man -hours through standardization",
        "type": "string"
      }
    },
    {
      "title": {
        "value": "Principal Engineer",
        "type": "string"
      },
      "company": {
        "value": "Samsung Engineering",
        "type": "string"
      },
      "date_start": {
        "value": "Apr-18",
        "type": "string"
      },
      "date_end": {
        "value": "Jul-22",
        "type": "string"
      },
      "date_range": {
        "value": "Apr-18 ~ Jul-22",
        "type": "string"
      },
      "duration": {
        "value": "4 years 3 months",
        "type": "string"
      },
      "highlights": {
        "value": "Headed 20-member teams to successfully execute Oil & Gas, Petrochemical, and Refinery projects totaling $5Bn in CapEx across the US, Middle East, South-East Asia, and Russia; delivered large capital projects within limited budget and tight schedules\nDirected teams to prepare cost estimates for bid proposals aggregating to $10Bn in value; won high-profile contracts\nIdeated system improvement processes by implementing \"System Thinking\" fundamentals to improve system efficiency by 25%\nOwned interactions with project stakeholders for milestone meetings, constructability & design reviews, involving negotiations and critical interface management issues\nConceived value engineering ideas in consultation with senior management to reduce project CapEx by $10Mn across different projects\nPrevented $150Mn as liquidated damages by streamlining delivery of structural steel and pre -emptying the delay in construction at project site in UAE\nConceptualized system development initiatives; achieved a 30% reduction in engineering man -hours through standardization",
        "type": "string"
      }
    },
    {
      "title": {
        "value": "none",
        "type": "string"
      },
      "company": {
        "value": "Corporation & Corporation",
        "type": "string"
      },
      "date_start": {
        "value": "Jul-07",
        "type": "string"
      },
      "date_end": {
        "value": "Jun-10",
        "type": "string"
      },
      "date_range": {
        "value": "Jul-07 ~ Jun-10",
        "type": "string"
      },
      "duration": {
        "value": "2 years 11 months",
        "type": "string"
      },
      "highlights": {
        "value": "Led a 15- member team for the Front-End Engineering Design of a $3.9Bn refinery upgradation in Indonesia\nSpearheaded coordination between cross-functional teams and vendors; delivered drawings 1 month ahead of schedule at Lyondell Basell's project site in Texas\nCollaborated with a multi-disciplinary team for the design of 700-ton modular structures in a $300Mn refinery upgradation project in Roxanna, Illinois for Conoco Phillips",
        "type": "string"
      }
    },
/* JSON output abbreviated */

Extract more resume data

This tutorial extracts just a few pieces of data from resumes. Our prebuilt config extracts more data. Check it out, use the Sensible app to modify existing queries or add your own extraction queries, and then publish your config so you can extract data from resumes in volume.

Start extracting from your documents

Congratulations, you've learned some key methods for extracting structured data from documents. There's more extraction power for you to uncover. Sign up for an account ( no credit card required), check out our prebuilt configs in our open-source library, and peruse our docs to start extracting data from your own documents.

‍

Frances Elliott

Transform resumes into structured data

Start ingesting documents with just a few lines of code. Add document automation to your product in minutes, not months.

Turn documents into structured data

Transform resumes into structured data

Stop relying on manual data entry. With Sensible, claim back valuable time, your ops team will thank you, and you can deliver a superior user experience. It’s a win-win.

Start ingesting documents with just a few lines of code. Add document automation to your product in minutes, not months.

Start Extracting Book a demo

How to extract data from resumes with LLMs and Sensible

What we'll cover

Write document extraction queries with SenseML

Configure the LLM preprocessor

Extract candidate name

Extract candidate experience

Transform extracted data

Extract more resume data

Start extracting from your documents

Transform resumes into structured data

Turn documents into structured data

Transform resumes into structured data

Related posts

How to extract data from employment verification forms with Sensible

How to extract data from CMS-1500 forms with Sensible

Splitting Multi-Document PDFs with LLMs

The opinionated guide to JsonLogic for transforming document data