Extract tax forms to structured JSON

Tax forms like W-2s, 1099s, and 1040s contain earnings, withholdings, and income data in IRS-mandated formats. Scan quality and multi-form packages add extraction complexity. Sensible turns tax form data into structured JSON for income verification, compliance, and lending.

Why tax forms need more than basic OCR

Dense field grids, IRS box-number referencing, and multi-form packages demand extraction that is both flexible enough to read any layout and precise enough to never misalign a box value.

Box-Level Precision

A single misread digit in Box 1 changes a W-2's meaning entirely. Extraction is anchored to exact box positions, then validated against expected data types and ranges. Precision that raw OCR alone cannot deliver.

Multi-Form Packages

1040, Schedule C, W-2s, 1099s: tax returns arrive as packages. Each form within the package gets classified and extracted with its own configuration. No manual splitting required.

Copy Designation and Version Handling

Tax forms come in multiple copies (W-2 Copy A, B, C, D) with identical data but different layouts and shading. Copy A uses red dropout ink designed for machine reading that confuses standard OCR. The IRS also revises form layouts between tax years. Sensible handles both copy variants and revision years automatically.

Fields we extract

Box-level extraction maps to your verification schema. Add cross-form validation as needed.

W-2 fields

Employer name, EIN, employee name, SSN (masked), wages (Box 1), federal tax withheld (Box 2), Social Security wages, Medicare wages, state wages, state tax

1099 fields

Payer name, payer TIN, recipient name, recipient TIN, income type, amount reported, federal tax withheld, state tax withheld

1040 fields

Filing status, adjusted gross income, taxable income, total tax, refund/amount owed, dependents, income sources by schedule


{ /* SenseML: tax form extraction (W-2 example) */
"fields": [
{
"method": {
"id": "queryGroup",
"queries": [
{
// Employer Identification Number (Box b)
"id": "employer_ein",
"description": "employer identification number, EIN, employer's EIN",
"type": {
"id": "custom",
"pattern": "[0-9]{2}-[0-9]{7}"
}
},
{
// Wages, tips, other compensation (Box 1)
"id": "wages_box1",
"description": "wages tips other compensation, box 1 wages, W-2 box 1",
"type": { "id": "currency" }
},
{
// Federal income tax withheld (Box 2)
"id": "federal_tax_withheld",
"description": "federal income tax withheld, box 2, FIT withheld",
"type": { "id": "currency" }
},
{
// Employee SSN (Box a) - masked for security
"id": "employee_ssn",
"description": "employee SSN, social security number, SSN",
"type": {
"id": "custom",
"pattern": "[0-9X*]{3}-[0-9X*]{2}-[0-9]{4}"
}
}
// Additional fields for state wages, employer name, employee address, etc.
]
}
}
]
}
1099-DIV

Dividends and Distributions from investments.

1040

Individual Income Tax Return with schedules and supporting forms.

1099-INT

Interest Income from banks and financial institutions.

W-2

Wage and Tax Statement showing employer info, earnings, and federal/state withholdings.

1099-MISC

Miscellaneous income reported for rent, royalties, and other payments.

1099-NEC

Nonemployee Compensation for independent contractor payments of $600 or more.

Supported tax forms

Sensible supports all major IRS forms (W-2, 1099 variants, 1040, 1120, K-1) with pre-built configurations. Our hybrid approach handles layout changes between IRS revision years automatically.

Federal forms

W-2, 1099-MISC, 1099-NEC, 1099-INT, 1099-DIV, 1099-R, 1040, Schedule C, Schedule E, Schedule K-1

State and supplemental

State W-2 supplements, state income tax returns, W-9, 4506-T, tax transcripts

Trusted by operations and engineering teams at

Common Questions

Answers about IRS form support, box-level precision, and cross-form validation.

Can Sensible handle tax forms from multiple years?

Yes. Sensible detects the tax year from the form and handles layout changes between IRS revision years. You can process returns from different years without changing your extraction configuration.

Which tax forms does Sensible support?

Sensible processes W-2s, 1099 variants (MISC, NEC, INT, DIV, R), 1040 returns, and schedules. Pre-built templates are available in the configuration library.

Does Sensible validate tax form data?

Yes. Validation rules check that wage totals match across boxes, that EIN and SSN formats are correct, and that calculated fields reconcile with their components.

Does Sensible handle state tax forms?

Sensible can be configured for any state tax form. Pre-built templates focus on federal forms, but SenseML configurations for state-specific forms can be set up quickly.

Do you support webhooks?

Yes. Sensible sends extraction results to your webhook endpoint when processing completes. You can also poll the API for status.

Does Sensible support human review?

Yes. Sensible flags extractions with low confidence for human review. You can configure review thresholds and workflows.

What security certifications does Sensible have?

Sensible is SOC 2 Type II certified and HIPAA compliant. Data is encrypted in transit and at rest.

How long is document data retained?

Document data is stored indefinitely by default. Custom retention policies are available and can be configured for same-day deletion if needed.

Is there a free trial?

Yes. Sensible offers a 14-day free trial on the Growth plan. No credit card required to start.

How is pricing structured?

Sensible uses per-document pricing for predictable costs. No token-based billing or usage surprises. Volume discounts are available for higher throughput.

How do I integrate with Sensible?

Sensible provides REST APIs and SDKs for Python and Node.js. Most integrations take a few hours. Webhooks, Zapier, and direct API calls are all supported.

What file formats does Sensible support?

Sensible processes PDFs (native or scanned), Microsoft Word (DOC, DOCX), spreadsheets (XLSX, XLS, CSV), single-page images (JPEG, PNG), multi-page images (TIFF), and email bodies with attachments.

How accurate is the extraction?

Accuracy depends on document quality and configuration. Most production deployments achieve 95%+ accuracy with proper validation rules and confidence signals.

How fast is document processing?

Processing speed depends on document size, page count, OCR requirements, and which extraction methods are used. Simple single-page documents process in seconds. Larger or more complex documents that use LLM-based extraction take longer.