Cost & Analytics

GPT Workbench tracks every token consumed and every dollar spent, giving you full transparency over AI usage costs. This page covers the cost tracking features available at the run, thread, and organization level.

Cost indicator on an AI response

Per-Run Cost Display

Cost Indicator

Every AI response includes a cost indicator in the message header. The indicator appearance depends on your organization's cost display mode:

Display Mode	Indicator	Trigger
USD	Green dollar sign with amount (e.g., `$ 0.0042`)	Click to expand
Credits	Blue coins icon with credit amount (e.g., `0.02`)	Click to expand
None	Blue hash icon (token count only)	Click to expand

The display mode is configured at the subscription plan level by your organization administrator. Individual users cannot override this setting.

Detailed cost breakdown showing all token categories

Click the cost indicator on any AI response to open the cost breakdown popover. The popover displays a detailed itemization of all cost components for that run.

USD Mode shows:

Line Item	Description
Model	The AI model used for this run
Input cost	Cost of all tokens sent to the model
Output cost	Cost of the AI's response tokens
Thinking cost	Cost of extended reasoning tokens (when applicable)
Cached input cost	Cost of tokens served from the provider's cache
Cache write cost	Cost of writing new tokens to the provider's cache
Live search cost	Cost of web search sources used (Gemini models)
LLM subtotal	Sum of all LLM costs (shown when tool costs are present)
Tool cost	Cost of external tool invocations
Total cost	Sum of all components

Credits Mode shows:

Total credits consumed for the run
Credits are calculated from the USD cost multiplied by the organization's credit ratio

Token Usage section (always visible):

Input tokens
Output tokens
Thinking tokens (when applicable)
Cached input tokens
Cache write tokens
Context tokens
Search sources (Gemini live search)
Total tokens

Pricing Tier Indicator

Some models have tiered pricing based on context window usage. When a run exceeds the standard context threshold (typically 128K tokens), the popover displays:

Pricing tier: Standard or High Context
Context usage: Current tokens vs threshold (e.g., "156,000 / 128,000")

High context pricing typically costs 1.5-2x more per token than standard pricing.

Thread Total Cost

The thread header displays a cumulative cost indicator summarizing all runs in the thread. Click it to see:

Thread Cost Summary: Total cost across all runs
Cost by Model: Breakdown showing each model used, its total cost, and number of runs

This is useful for understanding the total investment in a conversation, especially when switching between models during a thread.

Token Types Explained

Token type categories with descriptions

Understanding token types is essential for optimizing costs. Each type has different pricing.

Input Tokens

Your messages, system prompt, context blocks, conversation history, and tool definitions are all serialized into input tokens. This is typically the largest cost component for context-heavy workflows.

What counts as input:

The system prompt configured for the thread
All context blocks (text, documents, repositories, URLs, CRM data)
Previous messages in the conversation history
Tool schemas and descriptions
The current user prompt

Output Tokens

The AI model's response is measured in output tokens. Output tokens are generally 3-5x more expensive than input tokens per unit.

What counts as output:

The text content of the AI response
Structured data in tool call arguments
Any formatted content (code blocks, tables, lists)

Cached Input Tokens

When the same content is sent to a model repeatedly (common with system prompts and context blocks), providers can cache it. Cached tokens are significantly cheaper than regular input tokens.

Anthropic prompt caching:

Automatic for Claude models on GPT Workbench
System messages are always cached
Large content blocks (over ~1,000 tokens) are cached
The last AI message before the current turn is cached
Cached read cost is approximately 88% cheaper than regular input
Cache is maintained per-session; the first request pays full price

How to tell if caching is working:

Open the cost popover on a response
Look for the "Cached input cost" and "Cached input tokens" lines
A high ratio of cached vs uncached input tokens indicates effective caching

Thinking Tokens

Models with extended reasoning capabilities (Claude with thinking, OpenAI o-series, GPT-5) generate internal reasoning tokens before producing the final response. These are billed at the output token rate.

Key characteristics:

Thinking tokens are not visible in the response text
They represent the model's internal chain-of-thought reasoning
Billed at the same rate as output tokens
Controlled by the thinking budget setting in thread configuration
Higher thinking budgets produce more thorough analysis but cost more

Cache Write Tokens

When content is cached for the first time, providers charge a cache write fee. This is a one-time cost per cache entry.

Anthropic cache writes:

Charged at approximately 1.25x the regular input token rate
Only occurs on the first request; subsequent requests use cached reads
Up to 4 cache breakpoints per request
Cache entries expire after a provider-defined TTL (typically 5 minutes of inactivity)

Live Search Sources

Some models (Gemini with grounding) can search the web during response generation. Each search source consulted incurs a small fee.

Billed per 1,000 sources consulted
Displayed as "Search sources" in the token usage section
Cost shown as "Live search cost" in the breakdown

Cost Optimization

Choose the Right Model

Model selection has the largest impact on cost. Here is a general pricing comparison:

Model Tier	Example Models	Relative Cost
Economy	Claude Haiku, GPT-4o mini	1x (baseline)
Standard	Claude Sonnet, GPT-4o	5-10x
Premium	Claude Opus, GPT-5, o3	15-30x

Recommendations:

Use economy models for routine tasks: summarization, formatting, simple Q&A
Use standard models for most business tasks: analysis, writing, code generation
Reserve premium models for complex reasoning, multi-step analysis, or critical decisions

Leverage Prompt Caching

Prompt caching is automatic for Anthropic models and provides substantial savings:

First request to a thread pays full input cost plus cache write
Subsequent requests pay ~12% of the original input cost for cached content
For a thread with 10,000 tokens of context, savings reach ~88% after the first request
Keep conversations in the same thread to maximize cache reuse

Manage Thinking Budgets

When using models with extended thinking:

Light thinking: Fewer reasoning tokens, faster responses, lower cost
Deep thinking: More thorough analysis, slower responses, higher cost
Match the thinking budget to the task complexity
Simple factual questions do not benefit from deep thinking

Optimize Context Usage

Context blocks are included in every request as input tokens:

Remove context blocks you no longer need for the current conversation
Use repository context filters to include only relevant directories
Prefer text context blocks over full document uploads when only excerpts are needed
Monitor the context usage indicator in the thread header to track token consumption

Use Conversation Compacting

Long conversations accumulate token costs because the entire history is sent with each request:

Watch the context usage indicator for warning signs (80% capacity)
Use conversation compacting to summarize older messages
Choose the appropriate compression level: Small (last 3 messages), Medium (last 10), or Large (all)
The summary replaces original messages as a context block, reducing token count

Use Console Mode for Iteration

In Console Mode, AI responses are not committed to conversation history until you explicitly add them:

Experiment with different prompts without inflating history
Regenerate responses without adding to the token accumulation
Only commit the final version to keep the conversation lean

Team Usage Tracking

Team Statistics Card

Team usage statistics with member breakdown

Each team's Settings page includes a statistics card showing:

Total runs: Number of AI interactions by all team members
Total tokens: Combined token consumption across the team
Total cost (USD mode) or Total credits (credits mode): Aggregate spending
Last activity: When the team was last used
Per-member breakdown: Usage metrics for each team member

The statistics card respects the organization's cost display mode. If prices are hidden, only token counts are shown.

Model-Specific Analysis

Team statistics break down usage by AI model:

See which models are used most frequently within the team
Identify cost-heavy model choices
Compare efficiency across models for similar tasks

Organization Usage Reports

Organization administrators have access to comprehensive analytics through the Overview tab. Key reports include:

KPI Summary Cards:

Active users in the selected period
Total threads created
Aggregate token consumption
Total cost or credit usage

Model Consumption Chart:

Visual distribution of usage across AI models
Identify underutilized models for potential cost savings
Track model adoption over time

Credit/Cost Trends:

Historical cost trajectory with trend lines
Compare periods to identify growth patterns
Forecast future costs based on current trends

Top Spenders:

Users ranked by cost or credit consumption
Helps with internal cost allocation
Identifies users who may benefit from cost optimization training

CSV Export:

Download all statistics for the selected date range
Include in management reports or billing reconciliation
Filter by date range before exporting

See Admin Features for full details on organization-level analytics and management.

Pricing Model

GPT Workbench uses a gross margin pricing model, which is the SaaS industry standard:

price = cost / (1 - margin%)

Margin	Multiplier	Example
75%	4x	Provider charges $1 --> user pays $4
80%	5x	Provider charges $1 --> user pays $5

This is distinct from markup pricing (which would be cost x (1 + margin%)). The gross margin model means a fixed percentage of revenue is retained as profit regardless of provider cost fluctuations.

How it works in practice:

The AI provider charges a base cost per token (e.g., $0.003 per 1K input tokens)
GPT Workbench applies the configured margin to determine the user-facing price
The cost breakdown popover shows the margined price, not the raw provider cost
Organizations on custom plans may have different margin rates

Admin Features - Organization-level management and analytics
Threads - Thread cost tracking, token management, and compacting
Models & Tools - AI model selection and pricing tiers
Context Blocks - Managing context to optimize token usage
Teams - Team statistics and collaboration features

Cost & Analytics ​

Per-Run Cost Display ​

Cost Indicator ​

Cost Breakdown Popover ​

Pricing Tier Indicator ​

Thread Total Cost ​

Token Types Explained ​

Input Tokens ​

Output Tokens ​

Cached Input Tokens ​

Thinking Tokens ​

Cache Write Tokens ​

Live Search Sources ​

Cost Optimization ​

Choose the Right Model ​

Leverage Prompt Caching ​

Manage Thinking Budgets ​

Optimize Context Usage ​

Use Conversation Compacting ​

Use Console Mode for Iteration ​

Team Usage Tracking ​

Team Statistics Card ​

Model-Specific Analysis ​

Organization Usage Reports ​

Pricing Model ​

Related Documentation ​