Your AI activity ledger — every agent, every activity, 100% local.
Every session is also tagged as Value-Added (you'd pay for it), Non-Value-Added (retry / failed / harmful — pure waste), or Business-Value-Added (necessary overhead like research). Borrowed from Cooper & Kaplan's ABM (1991) — most of the easy money to save lives in NVA.
Latest week vs trailing 3-week baseline. From ACCA's variance analysis — the same machinery factories use to catch silent quality regressions.
Per Resource Consumption Accounting, pricing decisions belong to proportional cost (per-call API spend). Cancel/keep decisions belong to fixed cost (subscriptions). Most dashboards mash these together. We don't.
Each project = one value stream: end-to-end cost of one workflow you actually care about. NVA% column shows how much was waste.
| Project | Sessions | Agents | Cost | Est. value | ROI | NVA % |
|---|
Projects where two or more agents touched the same work. The smaller-spend agent is usually the cancel candidate. Each row shows the primary agent (the one you'd keep) and the secondary spend you might be able to drop.
| Project | Primary agent | Primary $ | Also used | Recoverable $ |
|---|
Every session is categorized. Each row shows total cost, total human-equivalent value, and ROI — using mid estimates of human time and market rate, multiplied by the quality factor. Hover the columns for low/high ranges.
tokenpayback auto-detects Claude Code, Codex CLI, Hermes, OpenClaw, OpenHuman, and Cursor on your machine.
| Agent | Sessions | $ spent | est. value | ROI |
|---|
| Week | Cost | Value | ROI | PRs | Commits | +Lines | Reverts |
|---|
Sessions with ROI < 1× (AI cost more than the human time it saved) or with
failed / harmful quality. Look at these before declaring victory.
A negative-ROI session burned tokens AND added cleanup work for you — those are the cases that
should make you change your prompt, your tool choice, or your task framing.
| Date | Agent | Project | Category | What happened | Quality | $ AI cost | $ Value | ROI |
|---|
Every row shows the closest professional role, a time range (low — mid — high) for that human, the market rate range for that role, and the quality factor reflecting whether the AI output was directly usable. Final value = mid_hours × mid_rate × quality_factor; the range comes from combining the low/high estimates. Nothing leaves your machine.
| Date | Agent | Via | Category | Project | Equivalent role | Human time low → mid → high |
Hourly rate low → mid → high |
Quality ×factor |
$ AI cost | $ Human-equiv low — mid — high |
ROI low — mid — high |
|---|
Per session, the LLM picks the closest professional role that would do this work and estimates hours + market rate (anchored to your benchmark: us-west / senior). Value = hours × rate × quality_factor. Quality factor: full-replacement 1.0 · with-edits 0.7 · draft-only 0.4 · failed 0.0.
Estimates are uncertain by design — that's why every number is a range, not a single value.
Tune the benchmark in ~/.tokenpayback/config.yaml (region + seniority).