Automating data extraction from loss runs

Updated on

October 16, 2023

min read

Contributors

No items found.

Author

Ming Lu

Co-Founder, Sensible

Table of contents

Claims history is central to the insurance underwriting process. This history usually comes in the form of loss runs, which are reports generated by insurance carriers that summarize the claims an insured business or individual made while covered by a particular policy. Carriers look at the kinds of claims, the frequency of the claims, and the financial impact of the claims to assess how risky a business is to insure. Carriers charge higher premiums or even decline coverage if they deem a business to be sufficiently risky.

Loss run reports are typically PDF files, where each claim is listed in a table-like structure.

Here's an example of a loss run:

Glancing at this document, it's clear that transcribing this data manually would be painful. There’s a lot of data, and far too many opportunities for human error.

It’s better to automatically extract the claims history.

In addition to the standard PDF parsing trickiness (some examples of which we've enumerated here), here are some additional challenges specific to extracting data from loss runs:

The list of claims are presented in a tabular-like structure, but this isn't your average table. In the case above, there's a single column containing five different pieces of information and a table contained within a "row".
A loss run can consist of one claim, twenty claims, or even zero claims! An extraction should capture all of the claim information no matter how many claims there are.
A loss run report may contain claims from multiple policies that the business has with a single carrier. In this case, each claim should be associated with the correct policy number.

How do we get around these challenges?

Enter Sections, Sensible's new feature built to support parsing data out of documents with complex, repeating sections. With this feature, you first define the start and end of your target section and the various data fields you're interested in within those bounds. For the loss run above, we might define our section to be each individual claim. Each claim starts with a claim number prefixed with "CWC" and ends just underneath the word "Total." You can then use any of Sensible's methods to define the data to extract from each claim — for example, the injury date, the injury description, the amount of money that was paid out, etc.

Sensible then looks through the entire document to identify which parts match your section definition, and pulls out your target data fields for each matching section. The output is a list of elements, where each element is an individual claim and all its accompanying data.


{
   "claim_details":[
      {
         "claim_number":{
            "type":"string",
            "value":"CWC1---"
         },
         "claim_type":{
            "type":"string",
            "value":"Indemnity"
         },
         "claim_date":{
            "source":"05/15/2019",
            "value":"2019-05-15T00:00:00.000Z",
            "type":"date"
         },
         "loss_description":{
            "value":"Strain or injury by",
            "type":"string"
         }
      },
       {
         "claim_number":{
            "type":"string",
            "value":"CWC1---"
         },
         "claim_type":{
            "type":"string",
            "value":"Medical"
         },
         "claim_date":{
            "source":"05/28/2019",
            "value":"2019-05-28T00:00:00.000Z",
            "type":"date"
         },
         "loss_description":{
            "value":"Striking against or stepping on",
            "type":"string"
         }
      },
      {
         "..."
      }
   ]
}

Notably, sections can also be nested inside of other sections. This is useful when working with loss runs because each policy can be section in itself, each containing a list of individual claims. With nested sections, you're able to extract and properly associate claims information from multiple policies contained in one loss run report without having to split up the PDF by policy number.

At Sensible, we're continuously building out tools to handle even the most complex documents, and Sections is just one of the tools in our belt. If you're interested in using Sensible to parse loss runs or any other kind of document, request a demo today.

Ming Lu

Co-Founder, Sensible

Ming started her software career at Intercom where she worked a variety of roles across analytics, engineering, and product on their Growth and Data teams. Before Sensible, Ming was the Head of Product at Lattice as the company grew from 0 to 20MM+ ARR.

Turn documents into structured data

Stop relying on manual data entry. With Sensible, claim back valuable time, your ops team will thank you, and you can deliver a superior user experience. It’s a win-win.

Start Extracting Book a demo

Automating data extraction from loss runs

Turn documents into structured data

Related posts

Introducing email data extraction

How to extract data from employment verification forms with Sensible

How to extract data from CMS-1500 forms with Sensible

Splitting Multi-Document PDFs with LLMs