BodhiLekhan

Why does your OCR forget who's writing?

The Problem

A doctor writes the same prescription shorthand every day. A student fills out answer sheets in the same loopy cursive all semester. A government clerk processes forms in the same tight print, page after page. The handwriting doesn't change. But the OCR system reading it starts from zero every single time.

State-of-the-art handwriting recognition treats every page as if it came from a stranger. Models like TrOCR and PARSeq achieve strong average accuracy across large datasets, but averages hide the damage. Writers whose style drifts from the training distribution — and that's most people — get significantly worse results. The model doesn't learn from seeing them before. It can't.

Recent work has tried to fix this. MetaHTR (CVPR 2021) adapts to individual writers using meta-learning, but requires labeled samples and gradient computation at every inference. MetaWriter (CVPR 2025) improves accuracy further, but still recomputes adaptation from scratch each time the model sees a page. Both methods pay the adaptation cost on every single inference, even when the writer hasn't changed since yesterday.

This is especially wasteful in institutional settings. A school has a known set of students. A hospital has a known set of doctors. These are stable cohorts. Paying the cost of writer adaptation at every inference, for writers you've already seen, is an engineering failure dressed up as a research limitation.

What We're Exploring

We think writer adaptation for known cohorts should be a one-time cost, not a recurring tax.

Current approaches tie personalization to inference. Every time the system reads a page, it re-learns the writer. We're investigating whether writer-specific recognition can be made persistent and reusable — learn a writer once, recognize their handwriting from that point on with no additional cost per page. The same recognition system serving a cohort of hundreds without slowing down for any of them.

The picture we're working toward: a school enrolls its students once. From that point on, handwritten answer sheets are recognized at the accuracy of a personalized model, at the speed of a generic one. A new student joins mid-year? One enrollment session. No retraining, no infrastructure changes.

Getting there raises questions we find genuinely open:

Persistence versus drift. Handwriting changes — people age, get injured, switch pens. If you fix a writer's profile at enrollment, when does it go stale? How do you detect that and correct for it without starting over?

Cohort scaling. Personalizing for 10 writers is easy. Personalizing for 10,000 is a different problem. How do you manage a large population of writer-specific models without the storage and serving costs becoming impractical?

Evaluation beyond averages. Standard HTR benchmarks report corpus-level CER and WER. But the whole point of personalization is per-writer improvement. How do you build an evaluation that measures whether the worst writers got better, not just whether the average moved?

We're testing against established baselines on the IAM Handwriting Database — the standard benchmark for writer-level HTR research — with a direct comparison against both generic models and test-time adaptation methods. Same writers, same test sets, same metrics. The variable is whether the system remembers who it's reading.

Status

Active Research