Managing traffic violations across thousands of commercial vehicles is a challenge. For fleet management companies, data from traffic citations is essential to operations, but automating citation processing is challenging because the citations themselves are inconsistent.
A global fleet management and insurance company—let's call them FMC—tackled this challenge. FMC's competitive advantage lies in its integrated services model. They combine fleet management, insurance, upfitting, and dealerships under one roof, and they wanted violations data to flow seamlessly across all these services. For example, they wanted to use the same processed document data to flag repeat offender drivers for training, automatically pay violation fines, and use accident statistics to underwrite commercial auto insurance policies. Slow data entry from traffic citations blocked the company from fully leveraging its integrated platform.
The Challenge: Variable formats at massive scale
Traffic citations are surprisingly variable. Citations from different states, counties, and municipalities use completely different layouts. Some states (notably New York and New Jersey) list multiple historical violations alongside the current offense. Most violations are one to two pages with critical data packed into compact, poorly formatted layouts.
The company needed to extract violation type and code, date and location of offense, fine amount, driver information, vehicle identification, and due dates from each citation.
The Solution: Flipping the usual LLM approach on its head
Through biweekly meetings, Sensible worked closely with FMC's technology product management team to refine the extraction strategy as edge cases emerged.
The team implemented an unconventional approach that flipped the usual document automation strategy. Generally, you'd use LLMs to handle the 20% of long-tail, variable formats, and use deterministic methods for the 80% of high-volume, consistent formats. FMC did the reverse. They used LLMs for the 80% of variable citation formats, and precise deterministic extraction only for the 20% of problematic formats where LLMs struggled. This strategy made sense because traffic citations hit the sweet spot for LLMs. They have wide format variation, but they're short, don't contain repeating data structures, and have clear field labels. Their wide variability meant that building deterministic extractors for every jurisdiction would be impractical, but building them for just the problematic exceptions was feasible.
The trickiest challenge emerged with citations listing multiple historic violations. The team initially tried a two-step LLM process to classify and route documents, but this added complexity and created new classification errors. The refined solution used pattern-based routing with text matches to identify problematic formats upfront and route them to deterministic extraction, while using LLM-based extraction for everything else. Finally post-processing logic normalized violation codes across all sources.
The Results: Scaled violation processing
FMC’s hybrid approach of combining LLM extraction with deterministic methods for identifiable edge cases has proven itself at scale. Currently, they're processing thousands of violations every day. Same-day fine payment became possible, and driver risk profiles update in real time as new violations arrive. Sensible is working closely with FMC to expand into new use cases that require automatic data extraction.
Get started with traffic violation automation
Whether you're managing a commercial fleet or building vehicle insurance technology, Sensible's solution engineering team can help design a citation extraction pipeline for your specific needs.
Book a demo to discuss your loss run processing requirements, or explore our managed services to see how we can handle template creation and maintenance for you.





