Sensible | Designing a Human-in-the-Loop

Sensible - Designing AI Workflows for Enterprise Document Extraction

Overview

Sensible is an enterprise LLM-based document extraction platform. I designed core AI workflows that helped move the product from a manual, query-based experience toward a more AI-first one — while solving the trust and control challenges that came with automation.

Challenge

AI made extraction much faster, but users trusted it less than the manual system they understood and controlled.

The challenge was not just adding AI. It was deciding where AI should act, where users should stay in control, and how the system should communicate uncertainty in high-stakes workflows.

What I led

AI-first extraction workflow design
Confidence and human review patterns
Workflow design for both beginner and expert users
Collaboration with product and engineering on automation boundaries

Impact

Designed the core AI workflow for an enterprise document extraction platform
Introduced confidence and review patterns that made AI output more usable in high-stakes workflows
Helped shift users from full manual checking toward more selective, flagged review
Contributed during a period of 5x company growth
Supported product scale to 7-figure ARR

Key research findings

1. Trust was the main barrier
Users saw the speed advantage of AI, but hesitated when they could not judge whether the output was reliable enough for production.

2. Users wanted different levels of automation
Some preferred a faster, simpler workflow. Others, especially more technical users, wanted more transparency and control.

3. Neither extreme worked
Full manual review reduced the value of AI, while blind automation reduced trust. The right answer was a better middle ground.

Framing the problem

The initial problem looked like a trust issue, but the deeper issue was workflow design.

Through user feedback and early testing, it became clear that:

users were not rejecting AI itself - they were reacting to the loss of visibility and control
different users needed different levels of automation

This shifted the question from “How do we make users trust AI?” to “How do we give users the right checkpoints at the right moments?”

I considered pushing further toward full automation with audit logs, but rejected that direction because it removed too much visibility from users and displaced the trust problem rather than solving it.

Design decisions

1. Confidence signals

(where to focus)

Insight: Users did not know which extracted fields to trust and which ones to review. Without any guidance, they had to check everything manually.

Decision: I introduced confidence signals at the field level to help users quickly see what looked reliable and what needed attention. I worked with engineering to make sure the signals reflected real extraction behavior and were useful in practice.

Outcome: Users could review results faster and focus their effort where it mattered most, instead of treating every field the same way.

2. Human review

(safe validation)

Insight: Blind automation felt risky, but full manual review removed much of AI’s value.

Decision: I designed a human review flow for flagged extractions, with validation, editing, source visibility, and approval.

Outcome: A safer middle ground that supported higher-stakes workflows without forcing manual review on every case.

3. More intentional control for expert users (deeper control for power users)

Insight: More technical users wanted transparency and control, not just a simpler AI experience.

Decision: I made the workflow more transparent and preserved deeper control for advanced users.

Outcome: The product felt more trustworthy and production-ready for expert use cases.

What I Learned

The most important lesson from this project was that trust in AI products is shaped less by model performance alone and more by how well the product manages the handoff between system and user.

Users could tolerate imperfect output. What they struggled with was uncertainty - not knowing what was wrong, what required attention, or what to do next.

The biggest trust gains came from designing better handoff moments: clearer review states, better visibility into uncertainty, and stronger user control where the stakes were highest.

The next opportunities I’d explore are

learning from repeated user corrections
surfacing only cases that truly need reviewn - management by exceptions