Scanned bank statement OCR (Beta)
A scanned or photographed statement is just an image to a computer — no extractable text layer — so a normal conversion returns a blank table. Scanned OCR first uses a local tesseract engine to recognize the text in the image (with a digits-only character allowlist on amount columns and grayscale/threshold preprocessing to lift accuracy), then runs the same table extraction. We have to be honest: numeric and date OCR on financial documents is a well-known hard problem and measured accuracy is far below clean printed text — low-quality scans, handwriting and ornate fonts make it worse. That’s why this is Beta: after export, verify every single amount, and lean on the balance check to catch errors. Everything runs locally; nothing is uploaded.
A scanned or photographed statement is just an image to a computer — no extractable text layer — so a normal conversion returns a blank table.
Beta: financial-number OCR has limited accuracy and degrades with scan quality. Verify every amount row by row after export and rely on the balance check. If you can re-download a text-based PDF from online banking, use Bank statement to Excel instead — it’s far more accurate.
How to scanned statement ocr (beta)
- 1Drag in a scanned / photographed statement PDF or image.
- 2Pick the recognition language (English, or English + Chinese).
- 3StatementSift runs local OCR + preprocessing (language pack downloads on first use), then splits rows and columns.
- 4Verify every figure (really!), use the balance check to locate suspect rows, then export.
Why use StatementSift's Scanned statement OCR (Beta)?
- Works even with no text layer: scans and phone photos that you’d otherwise retype get OCR’d into structured rows, sparing you most of the data entry.
- Tuned for figures: a digits-only allowlist (0-9 . , - ( )) on amount columns plus image preprocessing push numeric recognition toward the usable zone.
- Honest Beta + balance backstop: we don’t pretend a scan is as reliable as a text PDF — it’s clearly marked Beta, and the balance check helps you find the rows OCR got wrong.
Frequently asked questions
Honestly: less reliable than a text-based PDF. OCR does well on clean print, but numeric and date recognition on financial documents is a known hard problem with measurably lower accuracy; skewed, blurry, dark or handwritten scans are worse. That’s why it’s Beta — treat it as a tool that saves most of the typing, verify every amount by hand, and use the balance check to catch errors.
The OCR language pack downloads on first use (Chinese is ~20MB); your browser caches it, so later runs are faster. Overall speed also depends on page count and your device.
If you can re-download the official PDF statement from online banking (instead of scanning the paper copy), that’s a text-based PDF and Bank statement to Excel will be far faster and more accurate. Scanned OCR is the fallback when that isn’t possible.
Find the page for your bank
Want to see how StatementSift reads your bank’s statement? Open the list of supported banks , where every major bank has its own layout notes and an embedded converter.
Updated · StatementSift team