How to extract data from bank statements with Sensible

Updated on
October 16, 2023
5
min read
Contributors
No items found.
Author
How to extract data from bank statements with Sensible
Table of contents
Turn documents into structured data
Get started free
Share this post

Companies that build lending and mortgage solutions in proptech often need to automate a solution to parse information in bank statements. By extracting and analyzing financial data from bank statements, these companies can more accurately assess an individual's or business's financial health and risk profile to make better lending and mortgage decisions. In addition to aiding analysis, bank statement data can help streamline the loan application process. Borrowers can automate aspects of review and approval, and potential borrowers can get approved more quickly.

Companies often lack access to bank statements in any format other than PDFs, which makes data extraction a potentially difficult problem. Enter Sensible. With Sensible you can easily extract key information out of bank statement PDFs using SenseML, Sensible’s query language for extracting data from documents. We’ve written a library of open-source SenseML configurations, so you don’t need to write queries from scratch for common documents. From there, your bank data is accessible via API, Sensible’s UI, or 5,000 other software integrations thanks to Zapier.

What we'll cover

This blog post briefly walks you through configuring extractions for Chase bank statements. By the end, you’ll know a couple of SenseML methods and you’ll be on your way to extracting any data you choose using our documentation or our prebuilt open-source configurations, which currently support Chase, Bank of America, and Wells Fargo bank statements.

Write document extraction queries with SenseML

Let's walk through extracting specific pieces of data from a bank statement. Here's an example of a bank statement PDF with redacted or dummy data:

Chase bank statement

To follow along, you can sign up for a Sensible account, then download an example PDF and upload it to the Sensible app, or import the PDF and prebuilt open-source configurations directly to the Sensible app.

Our configurations for bank statement extractions are comprehensive. To keep the example in this post simple, let's extract just the:

  • Date range of statement
  • Electronic withdrawals for each account

Extract date range of statement

See the following screenshot for an overview of how to extract the date range:

Extract date (left pane: query. middle pane: document. right pane: output)

 The queries in the left pane in the preceding image searches for dates near the text “through.”  The PDF is displayed in the middle pane, and the extracted dates are in the right pane.

To try this out yourself, paste the following queries, or "fields" into the left pane of the Sensible app.


{
	/* Sensible uses JSON5 to support code comments */
	"fields": [{
			/* search for target data 
			    near 1st instance of text "through" */
			"anchor": "through",
			/* ID for target data */
			"id": "start_date",
			/* target data is a date, else return null */
			"type": "date",
			"method": {
				/* the target date is in the anchor,
				   so return, or passthrough, the anchor */
				"id": "passthrough",
				/* grab the smaller date in the 
				   returned anchor line */
				"tiebreaker": "<"
			},
		},
		{
			/* same logic as start_date, except return
			   the greater date */
			"id": "end_date",
			"type": "date",
			"method": {
				"id": "passthrough",
				"tiebreaker": ">"
			},
			"anchor": "through"


		}
	]
}

You'll get this output:


{
  "start_date": {
    "source": "January 14, 2021",
    "value": "2021-01-14T00:00:00.000Z",
    "type": "date"
  },
  "end_date": {
    "source": "February 10, 2021",
    "value": "2021-02-10T00:00:00.000Z",
    "type": "date"
  }
}

Extract account type and withdrawals

See the following screenshot for an overview of how to extract the first account type and withdrawals mentioned in the document:

extract account type and withdrawals for first account in statement

The query in the left pane in the preceding image extracts the withdrawals by finding the last cell in a row to the right of the anchoring text using the Row method.

To try this out yourself, paste the following query into the left pane of the Sensible app:


{
  /* SenseML uses JSON5 to support code comments */
  "fields": [
    {
      "id": "electronic_withdrawals",
      "anchor": {
        /* start looking for anchor after this text */
        "start": "account number",
        "match": {
          /* target is near anchor  "electronic withdrawals" */
          "text": "electronic withdrawals",
          "type": "includes"
        },
        /* stop looking for anchor at this text */
        "end": "ending balance"
      },
      "type": {
        /* target data is currency */
        "id": "currency",
        /* if the bank statement is scanned,
           allow for common currency OCR errors
           for more info see https://docs.sensible.so/docs/types#currency  */
        "relaxedWithCents": true,
        /* remove whitespaces between currency symbols and values for better recognition */
        "removeSpaces": true
      },
      "method": {
        /* target is in a row, last cell to right of anchor 
           for more info, see https://docs.sensible.so/docs/row
        */
        "id": "row",
        "position": "right",
        "tiebreaker": "last"
      }
    },
    {
      "id": "account_type",
      "method": {
        /* target is to immediate left of anchor */
        "id": "label",
        "position": "left"
      },
      "anchor": {
        "start": "account number",
        "match": {
          /* target data is near text "SUMMARY" */
          "text": "SUMMARY",
          "type": "endsWith",
          "isCaseSensitive": true
        }
      }
    }
  ]
}

You’ll get the following output:


{
  "electronic_withdrawals": {
    "source": "6,320.35",
    "value": 6320.35,
    "unit": "$",
    "type": "currency"
  },
  "account_type": {
    "type": "string",
    "value": "CHECKING"
  }
}

Extract each account’s details

You just saw how to extract a single currency value for an account. In real life, you’ll want to extract each account’s information in the consolidated bank statement. See the following screenshot for an overview of how to extract an array of accounts with details, instead of a single account’s information:

extract information for each account

 

The query in the left pane in the preceding image slices the document into account sections, each of which starts with “account number”.  Each account section is its own mini document with its own set of fields.  For example, you just learned about using the Row method to get a withdrawal. When you use the Row method in a section, you get back the electronic withdrawal for each account. It happens to be null for the savings account.

To try it out, paste the following query into the left pane of the Sensible app:


{
  "fields": [],
  /* Get array of accounts, each containing info
    like balances and transactions.
    Each account is a 'section', or mini-document,
    with its own fields isolated from the rest of the doc.
    For more info see https://docs.sensible.so/docs/sections */
  "sections": [
    {
      "id": "accounts_sections",
      /* Display section boundaries in the Sensible app 
         as green brackets */
      "display": true,
      "range": {
        "anchor": {
          /* start looking for account sections at
             the text "account number" 
             preceded by "TOTAL ASSETS" */
          "start": "total assets",
          "match": {
            "text": "account number:",
            "type": "startsWith"
          }
        },
      },
      "fields": [
        /* same fields extracted in previous example,
           now repeated for each account  */
        {
          "id": "account_type",
          "method": {
            "id": "label",
            "position": "left"
          },
          "anchor": {
            "match": {
              "text": "SUMMARY",
              "type": "endsWith",
              "isCaseSensitive": true
            }
          }
        },
        {
          "id": "electronic_withdrawals",
          "type": {
            "id": "currency",
            "relaxedWithCents": true,
            "removeSpaces": true
          },
          "anchor": {
            "match": {
              "text": "electronic withdrawals",
              "type": "includes"
            },
            "end": "ending balance"
          },
          "method": {
            "id": "row",
            "position": "right",
            "tiebreaker": "last"
          }
        }
      ]
    }
  ]
}

You'll get this output:


{
  "accounts_sections": [
    {
      "account_type": {
        "type": "string",
        "value": "CHECKING"
      },
      "electronic_withdrawals": {
        "source": "6,320.35",
        "value": 6320.35,
        "unit": "$",
        "type": "currency"
      }
    },
    {
      "account_type": {
        "type": "string",
        "value": "SAVINGS"
      },
      "electronic_withdrawals": null
    }
  ]
}

Extract and transform table data 

 

Each account contains transaction tables. By default, Sensible extracts these tables’ columns as parallel arrays. For an overview of how to zip the “column” arrays into “rows” of corresponding data for each transaction, see the following screenshot:

extracted table data

 

To try this out yourself, paste the following query into the left pane of the Sensible app:


{
  "fields": [
    {
      "id": "_electronic_withdrawals_table_raw",
      /* target data is a table */
      "type": "table",
      /* table starts after "electronic withdrawals" line
           preceded by "total atm" line */
      "anchor": {
        "start": [
          {
            "text": "total atm",
            "type": "includes"
          }
        ],
        "match": {
          "text": "electronic withdrawals",
          "type": "startsWith"
        }
      },
      "method": {
        /* of several table methods,
           textTable is fastest */
        "id": "textTable",
        "columns": [
          {
            /* column starts 0.3" from left page edge
               and ends 1" from left edge */
            "id": "date_column",
            "minX": 0.3,
            "maxX": 1
          },
          {
            /* column starts 6.3" from left page edge
               and ends 8.1" from left edge */
            "id": "amount_column",
            "minX": 6.3,
            "maxX": 8.1,
            /* cells in columns must be currency */
            "type": {
              "id": "currency",
              "relaxedWithCents": true,
              "removeSpaces": true
            },
            /* if cell doesn't contain currency,
               omit its row from output */
            "isRequired": true
          }
        ],
        /* (recommended for performance)
        table ends at "total electronic withdrawals" */
        "stop": {
          "text": "total electronic withdrawals",
          "type": "startsWith"
        }
      }
    }
  ],
  "computed_fields": [
    {
      /* by default, table methods return
      column objects. transform these to
      row objects using the Zip method */
      "id": "electronic_withdrawals_table",
      "method": {
        "id": "zip",
        "source_ids": [
          "_electronic_withdrawals_table_raw"
        ]
      }
    },
    /* to avoid redundant output, return
       the zipped table row objects and suppress the 
       original column objects */
    {
      "id": "cleanup_output",
      "method": {
        "id": "suppressOutput",
        "source_ids": [
          "_electronic_withdrawals_table_raw"
        ]
      }
    }
  ]
}

 You'll get this output


{
  "electronic_withdrawals_table": [
    {
      "date_column": {
        "value": "01/15",
        "type": "string"
      },
      "amount_column": {
        "source": "200.00",
        "value": 200,
        "unit": "$",
        "type": "currency"
      }
    },
    {
      "date_column": {
        "value": "01/19",
        "type": "string"
      },
      "amount_column": {
        "source": "250.00",
        "value": 250,
        "unit": "$",
        "type": "currency"
      }
    },
    {
      "date_column": {
        "value": "01/25",
        "type": "string"
      },
      "amount_column": {
        "source": "189.14",
        "value": 189.14,
        "unit": "$",
        "type": "currency"
      }
    },
    {
      "date_column": {
        "value": "02/02",
        "type": "string"
      },
      "amount_column": {
        "source": "173.51",
        "value": 173.51,
        "unit": "$",
        "type": "currency"
      }
    },
    {
      "date_column": {
        "value": "02/02",
        "type": "string"
      },
      "amount_column": {
        "source": "3,108.24",
        "value": 3108.24,
        "unit": "$",
        "type": "currency"
      }
    },
    {
      "date_column": {
        "value": "02/08",
        "type": "string"
      },
      "amount_column": {
        "source": "899.46",
        "value": 899.46,
        "unit": "$",
        "type": "currency"
      }
    },
    {
      "date_column": {
        "value": "02/08",
        "type": "string"
      },
      "amount_column": {
        "source": "1,500.00",
        "value": 1500,
        "unit": "$",
        "type": "currency"
      }
    }
  ]
}

Extract more data

We've covered how to extract a few pieces of data from bank statements. Our prebuilt config extracts much more information. Check it out! In the following screenshot, every blue- or green-outlined line is a piece of extracted data:

Sensible's prebuilt open-source parser

 

Start extracting

Congratulations, you've learned some key methods for extracting structured data from bank statements. There's more extraction power for you to uncover. Sign up for a free account (100 docs a month, no credit card required), check out our prebuilt bank statement config in our open-source library, and peruse our docs to start extracting data from your own documents.

Turn documents into structured data
Get started free
Share this post

Turn documents into structured data

Stop relying on manual data entry. With Sensible, claim back valuable time, your ops team will thank you, and you can deliver a superior user experience. It’s a win-win.