how-to
Product Releases

Automating data extraction from loss runs

Ming Lu
Monday, November 15, 2021

Loss runs are full of valuable information about an entity's risk profile, but the density of information means that there's more room for human error if you're relying on manual data entry. Learn how Sensible handles a loss run's complex structure to extract clean, tabular data automatically.

Related Documents

Claims history is central to the insurance underwriting process. This history usually comes in the form of loss runs, which are reports generated by insurance carriers that summarize the claims an insured business or individual made while covered by a particular policy. Carriers look at the kinds of claims, the frequency of the claims, and the financial impact of the claims to assess how risky a business is to insure. Carriers charge higher premiums or even decline coverage if they deem a business to be sufficiently risky.

Loss run reports are typically PDF files, where each claim is listed in a table-like structure.

Here's an example of a loss run:


Glancing at this document, it's clear that transcribing this data manually would be painful. There’s a lot of data, and far too many opportunities for human error. 

It’s better to automatically extract the claims history. 

In addition to the standard PDF parsing trickiness (some examples of which we've enumerated here), here are some additional challenges specific to extracting data from loss runs:

  • The list of claims are presented in a tabular-like structure, but this isn't your average table. In the case above, there's a single column containing five different pieces of information and a table contained within a "row".
  • A loss run can consist of one claim, twenty claims, or even zero claims! An extraction should capture all of the claim information no matter how many claims there are.
  • A loss run report may contain claims from multiple policies that the business has with a single carrier. In this case, each claim should be associated with the correct policy number.

How do we get around these challenges?

Enter Sections, Sensible's new feature built to support parsing data out of documents with complex, repeating sections. With this feature, you first define the start and end of your target section and the various data fields you're interested in within those bounds. For the loss run above, we might define our section to be each individual claim. Each claim starts with a claim number prefixed with "CWC" and ends just underneath the word "Total." You can then use any of Sensible's methods to define the data to extract from each claim  — for example, the injury date, the injury description, the amount of money that was paid out, etc.

Sensible then looks through the entire document to identify which parts match your section definition, and pulls out your target data fields for each matching section. The output is a list of elements, where each element is an individual claim and all its accompanying data.


{
   "claim_details":[
      {
         "claim_number":{
            "type":"string",
            "value":"CWC1---"
         },
         "claim_type":{
            "type":"string",
            "value":"Indemnity"
         },
         "claim_date":{
            "source":"05/15/2019",
            "value":"2019-05-15T00:00:00.000Z",
            "type":"date"
         },
         "loss_description":{
            "value":"Strain or injury by",
            "type":"string"
         }
      },
       {
         "claim_number":{
            "type":"string",
            "value":"CWC1---"
         },
         "claim_type":{
            "type":"string",
            "value":"Medical"
         },
         "claim_date":{
            "source":"05/28/2019",
            "value":"2019-05-28T00:00:00.000Z",
            "type":"date"
         },
         "loss_description":{
            "value":"Striking against or stepping on",
            "type":"string"
         }
      },
      {
         "..."
      }
   ]
}

Notably, sections can also be nested inside of other sections. This is useful when working with loss runs because each policy can be section in itself, each containing a list of individual claims. With nested sections, you're able to extract and properly associate claims information from multiple policies contained in one loss run report without having to split up the PDF by policy number.

At Sensible, we're continuously building out tools to handle even the most complex documents, and Sections is just one of the tools in our belt. If you're interested in using Sensible to parse loss runs or any other kind of document, request a demo today.

Get Sensible — The powerful document query language that provides full control over the parsing process
Get early access
Request sent
Oops! Something went wrong while submitting the form.