Why extract data from a trucking rate confirmation?
Better data and analytics is key to shock-proofing supply chains against future disruptions, as the COVID-19 pandemic brought home. But high-quality real-world data can be hard to come by in logistics. Take the trucking industry: it’s an essential part of the supply chain and, like many other transportation industries, still has an analog backbone. In particular, getting information about what’s on a truck and where it’s going might seem trivial. But freight brokers don’t often have APIs where this information is easily accessible. Instead, it’s reported in a PDF: the ubiquitous "rate confirmation". Anytime a truckload goes anywhere, a rate confirmation describes what’s on the truck, how much it costs to ship, and where it’s going.
Fortunately, with Sensible you can easily extract key information out of rate confirmation PDFs using SenseML. SenseML is Sensible’s JSON-formatted query language for extracting data from documents.
What we'll cover
In this blog post, we’ll briefly walk through configuring extractions for rate confirmations. At the end, you’ll know a couple of SenseML methods and you’ll be on your way to extracting any data you choose using our documentation or our prebuilt rate confirmation configurations in our open-source library.
Write document extraction queries with SenseML
Let's walk through extracting specific pieces of data from a top global transportation company, CH Robinson. Here's an example of a USA-based trucking rate confirmation PDF with fake data:
To follow along, you can download this example PDF and upload it to the Sensible app:
Our open-source rate confirmation config extracts pretty much everything from this PDF, but for the example in this post, let's keep it simple. We'll extract just the:
- Load ID and total rate
- Receivers' names and delivery dates
- Transportation equipment type
You can also skip ahead to the end of this post to see all the queries we cover in this post wrapped up in a single code sample.
Extract the load ID and total cost
To extract the load ID, #123456789:
Paste the following query, or "field" into the right pane of the Sensible app:
The query treats the string "confirmation - #" as a label, and searches to the right of the label for the load ID. You'll get this output:
To extract the total rate, $4,5000:
Use a query that finds the text "total" preceded by the text "service for load," then treat that text as part of a row. Sensible searches for the first row cell to the right of the text that contains a currency.
The query outputs:
Extract receivers
Since a rate confirmation can list multiple receivers, let's output arrays of receiver data. You can either A) output separate arrays for each piece of receiver data, or you can B) output an array of receiver objects, each containing receiver data.
In our open-source library, our customer-contributed config uses approach A with a "matchAll" parameter. In this post, we'll instead illustrate approach B by treating each receiver as a repeating section. The following image shows the receiver sections, each of which contains name and delivery date properties.
The following query looks for the receiver name and delivery date in a group of receiver sections that repeatedly start with the text "RECEIVER #". You can append this query after the fields array in the Sensible app:
The output is:
Extracting trailer type
Finally, let's take a look at how to refine extracted data. We can get a string for the equipment, "Van - Min L=53":
Using this query:
Let's refine this output so we only capture the trailer type ("van"). We'll use a computed field to split the field's output on the hyphen. You can append this query after the sections array in the Sensible app :
To get output like this:
Wrapping up
To wrap it all up, you can paste all the queries mentioned in this post into the Sensible app like this:
Start extracting
Congratulations, you've learned some key methods for extracting structured data from rate confirmation documents. There's more extraction power for you to uncover. Check out our prebuilt rate confirmation config in our open-source library, check out our docs, and sign up for a free account (150 docs a month, no credit card required) to start extracting data from your own documents.