Local Claude + Codex token intelligence

Find the AI session draining your token budget.

AI Spend Live shows which agent sessions to split, summarize, restart, slow down, or route differently, then helps you protect the 5-hour and weekly token windows.

5h left
2.4M
Week left
18.6M
Cache reuse
81%

See the waste before it eats the day.

This example keeps the scale of a real local run while replacing private names and paths. The point is not the bill. The point is knowing which behavior to change next.

Example run: 7 days API-equivalent estimate from local token logs
Private data removed
Selected range 7,526.3M $12,816.46 estimate
24h burn 153.6M $236.07 estimate
Last hour 15.3M current pace
Cache reuse 81% 7,328.1M cached/read tokens
Sessions costing the most what to close, split, or summarize first
Session Provider Duration Cache Tokens Do next
Long Context Refactorlargest repeated-context thread Claude 6.2d 99% 3,310.4M Summarize and restart
UI Polish Sessionlarge ongoing design pass Claude 1.1d 99% 1,244.8M Split by screen
Codex Repository Sweeplong-running codebase work Codex 3.0d 49% 928.0M Narrow file scope
Feature Handoff Sessionsingle thread carrying context Claude 5.4d 96% 709.1M Checkpoint decisions

Answer the spend questions people actually ask.

When someone shares a screenshot of a context meter, usage table, budget policy, or agent comparison, the useful answer is a recommendation tied to the metric, the reset window, and the session causing the burn.

Context pressure
52% full

When the context window is filling up.

Tell the user to pause, ask for a compact state summary, save it to a handoff file, and restart in a clean thread before output quality degrades.

Agent comparison
72k vs 91k

When one tool feels more expensive.

Compare task outcome, input plus cache-write tokens, tool access, and retry count. Recommend the agent with the right connectors, not just the lowest-looking session.

Budget guardrails
5h + weekly

When budget windows get eaten early.

Compare remaining 5-hour and weekly budget against the sessions consuming each window. Pause fast loops, checkpoint the top burner, and continue only with scoped work.

Tool routing
MCP gap

When the cheaper path lacks tools.

Route broad repo sweeps to Codex, keep MCP-heavy or product-judgment work in the environment with the right tools, then hand off a narrowed summary.

Use stronger prompts before stronger settings.

A good token plan is not just a weekly limit. It is knowing what remains in the 5-hour and weekly windows, when to spend reasoning, and when to stop a fast loop.

Model routing
5.5 -> 5.3

Plan with GPT-5.5, execute with Codex.

Use GPT-5.5 for ambiguous planning, tradeoffs, acceptance criteria, and review. Move implementation to GPT-5.3 Codex once files, constraints, and tests are clear.

Fast mode
bounded only

Keep /fast for finish-line work.

Fast mode is useful for narrow edits and quick checks. It is expensive when the agent is still discovering scope because it can burn through more turns before you notice.

Prompt shape
outcome first

State the result, not every step.

Give the goal, success criteria, allowed side effects, evidence rules, and output shape. Add step-by-step process only when the path itself matters.

Budget pacing
5h + week

Track remaining budget, not just spend.

Compare rolling 5-hour and weekly remaining tokens against the sessions consuming them. If a session is eating the window, checkpoint it before starting another broad turn.

Do this before buying more tokens.

The dashboard is useful because it turns usage, screenshots, and team questions into a concrete next move.

1

Split the thread.

If one session dominates the range, stop using it as the everything-chat. Divide the work by feature, file group, or decision.

2

Summarize and restart.

When cache reads are huge, ask for a compact handoff and continue in a fresh session with only the useful state.

3

Narrow the context.

Replace repo-wide prompts with targeted file lists, failing tests, screenshots, or exact error output.

4

Route the work.

Use Codex for broad codebase sweeps and mechanical edits. Save Claude for architecture, UX, and review.

5

Save the evidence.

Keep the screenshot, terminal output, or usage table with the metric so the recommendation answers the actual situation.

6

Set spend rules.

Use monthly limits, exception notes, and approval thresholds so heavy users can justify real wins before costs drift.

7

Gate fast mode.

Use /fast after the write set is known. If the 5-hour remaining number drops quickly, switch back to scoped prompts and measured effort.

8

Spend reasoning intentionally.

Start with low or medium effort for bounded work. Raise effort only when the task is ambiguous and the result is worth the extra burn.

Built for agent-heavy coding days.

Claude and Codex already keep local usage traces. AI Spend Live turns them, plus supporting screenshots from other tools, into operational signal.

Session identity

Readable names before hashes.

Claude titles and Codex thread names show up first, so you know which real task is burning tokens.

Limit pressure

See stale high-burn threads.

Duration, cache reuse, and token totals reveal sessions that should be closed or summarized.

Spike diagnosis

Find the turn that exploded.

Largest-turn views catch giant asks before you repeat them all afternoon.

Question handling

Turn screenshots into recommendations.

Context meters, usage tables, and spend-policy threads become prompts for what to split, compact, reroute, or cap.

Workflow routing

Account for missing tools.

Token totals are weighed against MCP access, repo context, and retry loops so the cheapest-looking path does not hide extra work.

Budget policy

Support limits with evidence.

Per-user guardrails and exception reviews can point at the sessions, tasks, and outcomes behind the spend.

Prompt guidance

Recommend the prompt shape.

The dashboard can turn a spike into outcome-first asks, compact handoffs, scoped file lists, and effort changes.

Budget runway

Warn before either reset is at risk.

Selectable Claude and Codex plan presets turn official 5-hour message windows into local token estimates, then show whether one session is burning through the budget too early.

Model discipline

Separate planning from execution.

Use frontier planning for uncertainty, then hand Codex a bounded write set, command, and acceptance criteria.

Share it. Clone it. Run it locally.

AI Spend Live is meant to be a public tool people can use on their own machines. No hosted account, no uploaded prompts.

Clone the repo

Download the public repo wherever Claude Code and Codex are already installed.

Start the dashboard

Run the local Node server. It reads only your machine's usage logs.

Change your workflow

Use the top sessions and turns to decide what to split, summarize, narrow, or reroute.

PowerShelllocal
git clone https://github.com/AnthonyDiPerna/aispend-live.git
cd aispend-live
npm install
$env:AI_SPEND_5H_TOKEN_BUDGET="8000000"
$env:AI_SPEND_WEEKLY_TOKEN_BUDGET="50000000"
npm run dashboard

# open http://127.0.0.1:9020/

Open the GitHub repo to star, fork, or file an issue.

Your prompts should not leave your machine just to explain where the tokens went.

The website promotes the tool. The dashboard runs locally.
100% local log parsing
0 prompt text in the UI payload
7 recommendation signals tracked
OFL self-hosted Space Grotesk font