Extract tax forms to structured JSON
Tax forms like W-2s, 1099s, and 1040s contain earnings, withholdings, and income data in IRS-mandated formats. Scan quality and multi-form packages add extraction complexity. Sensible turns tax form data into structured JSON for income verification, compliance, and lending.
Why tax forms need more than basic OCR
Dense field grids, IRS box-number referencing, and multi-form packages demand extraction that is both flexible enough to read any layout and precise enough to never misalign a box value.
A single misread digit in Box 1 changes a W-2's meaning entirely. Extraction is anchored to exact box positions, then validated against expected data types and ranges. Precision that raw OCR alone cannot deliver.
1040, Schedule C, W-2s, 1099s: tax returns arrive as packages. Each form within the package gets classified and extracted with its own configuration. No manual splitting required.
Tax forms come in multiple copies (W-2 Copy A, B, C, D) with identical data but different layouts and shading. Copy A uses red dropout ink designed for machine reading that confuses standard OCR. The IRS also revises form layouts between tax years. Sensible handles both copy variants and revision years automatically.
Fields we extract
Box-level extraction maps to your verification schema. Add cross-form validation as needed.
Employer name, EIN, employee name, SSN (masked), wages (Box 1), federal tax withheld (Box 2), Social Security wages, Medicare wages, state wages, state tax
Payer name, payer TIN, recipient name, recipient TIN, income type, amount reported, federal tax withheld, state tax withheld
Filing status, adjusted gross income, taxable income, total tax, refund/amount owed, dependents, income sources by schedule
Dividends and Distributions from investments.
Individual Income Tax Return with schedules and supporting forms.
Interest Income from banks and financial institutions.
Wage and Tax Statement showing employer info, earnings, and federal/state withholdings.
Miscellaneous income reported for rent, royalties, and other payments.
Nonemployee Compensation for independent contractor payments of $600 or more.
Supported tax forms
Sensible supports all major IRS forms (W-2, 1099 variants, 1040, 1120, K-1) with pre-built configurations. Our hybrid approach handles layout changes between IRS revision years automatically.
W-2, 1099-MISC, 1099-NEC, 1099-INT, 1099-DIV, 1099-R, 1040, Schedule C, Schedule E, Schedule K-1
State W-2 supplements, state income tax returns, W-9, 4506-T, tax transcripts



Common Questions
Answers about IRS form support, box-level precision, and cross-form validation.
Yes. Sensible detects the tax year from the form and handles layout changes between IRS revision years. You can process returns from different years without changing your extraction configuration.
Sensible processes W-2s, 1099 variants (MISC, NEC, INT, DIV, R), 1040 returns, and schedules. Pre-built templates are available in the configuration library.
Yes. Validation rules check that wage totals match across boxes, that EIN and SSN formats are correct, and that calculated fields reconcile with their components.
Sensible can be configured for any state tax form. Pre-built templates focus on federal forms, but SenseML configurations for state-specific forms can be set up quickly.
Yes. Sensible sends extraction results to your webhook endpoint when processing completes. You can also poll the API for status.
Yes. Sensible flags extractions with low confidence for human review. You can configure review thresholds and workflows.
Sensible is SOC 2 Type II certified and HIPAA compliant. Data is encrypted in transit and at rest.
Document data is stored indefinitely by default. Custom retention policies are available and can be configured for same-day deletion if needed.
Yes. Sensible offers a 14-day free trial on the Growth plan. No credit card required to start.
Sensible uses per-document pricing for predictable costs. No token-based billing or usage surprises. Volume discounts are available for higher throughput.
Sensible provides REST APIs and SDKs for Python and Node.js. Most integrations take a few hours. Webhooks, Zapier, and direct API calls are all supported.
Sensible processes PDFs (native or scanned), Microsoft Word (DOC, DOCX), spreadsheets (XLSX, XLS, CSV), single-page images (JPEG, PNG), multi-page images (TIFF), and email bodies with attachments.
Accuracy depends on document quality and configuration. Most production deployments achieve 95%+ accuracy with proper validation rules and confidence signals.
Processing speed depends on document size, page count, OCR requirements, and which extraction methods are used. Simple single-page documents process in seconds. Larger or more complex documents that use LLM-based extraction take longer.
