Extraction accuracy is a process problem, not a model problem

When an AI feature pulls the wrong figure off an invoice, or grabs the date from the wrong line, the conversation almost always goes to the same place. The model is not good enough. We need a better one. And occasionally that is true. But far more often, in our experience, the accuracy problem is not living in the model at all. It is living in the process around it, and that is good news, because the process is something you can actually fix.

Start with the input. A surprising amount of extraction error traces straight back to the document itself. The scan is crooked or low quality. The same field appears in three different places depending on which supplier sent it. A form was filled in a way the sender invented on the spot. No model reads a genuinely ambiguous document perfectly, because there is no single right answer to read. Tidy up how documents arrive, or capture them in a cleaner form at the source, and a chunk of the inaccuracy disappears before the model is even involved.

Then there is the question of what you are asking for. "Pull the total" sounds clear until you meet a document with a subtotal, a tax line, a discount, and a grand total. The model is not wrong when it picks one of them. The instruction was underspecified. A lot of what looks like model error is really a definition that was never pinned down, the kind of thing a human only gets right because they quietly know the convention of your business. The fix is to make that convention explicit, not to demand the model read your mind.

Where accuracy actually comes from

The most important part, though, is what happens to the uncertain cases. Real accuracy in production does not come from a model that is right every single time, because nothing reads every messy document perfectly. It comes from a system that knows when it might be wrong and routes those cases to a person. A workflow where the confident extractions flow through and the doubtful ones get a human check will, in practice, be far more accurate than a marginally better model wired up to trust itself blindly. The judgment about which cases to escalate is a process decision, not a model setting.

And you cannot improve any of this without measuring it. If you are not logging what was extracted and checking it against what was correct, you are guessing. We build that measurement in from the start, so a client can see real accuracy on their real documents rather than a benchmark from someone else's data that may have nothing to do with their world.

So when accuracy disappoints, our first move is not to reach for a bigger model. It is to look at the input, the definitions, the handling of uncertainty, and the measurement. Almost always the meaningful gains are there, in the process. The model is rarely the thing holding you back. The system around it usually is.

Facing something similar in your business?

Talk it through with our AI guide, or send the team a note. We will tell you straight whether and how we can help.

Ask us anything