Skip to content

Cost & Analytics

GPT Workbench tracks every token consumed and every dollar spent, giving you full transparency over AI usage costs. This page covers the cost tracking features available at the run, thread, and organization level.

Cost indicator on an AI response

Per-Run Cost Display

Cost Indicator

Every AI response includes a cost indicator in the message header. The indicator appearance depends on your organization's cost display mode:

Display ModeIndicatorTrigger
USDGreen dollar sign with amount (e.g., $ 0.0042)Click to expand
CreditsBlue coins icon with credit amount (e.g., 0.02)Click to expand
NoneBlue hash icon (token count only)Click to expand

The display mode is configured at the subscription plan level by your organization administrator. Individual users cannot override this setting.

Cost Breakdown Popover

Detailed cost breakdown showing all token categories

Click the cost indicator on any AI response to open the cost breakdown popover. The popover displays a detailed itemization of all cost components for that run.

USD Mode shows:

Line ItemDescription
ModelThe AI model used for this run
Input costCost of all tokens sent to the model
Output costCost of the AI's response tokens
Thinking costCost of extended reasoning tokens (when applicable)
Cached input costCost of tokens served from the provider's cache
Cache write costCost of writing new tokens to the provider's cache
Live search costCost of web search sources used (Gemini models)
LLM subtotalSum of all LLM costs (shown when tool costs are present)
Tool costCost of external tool invocations
Total costSum of all components

Credits Mode shows:

  • Total credits consumed for the run
  • Credits are calculated from the USD cost multiplied by the organization's credit ratio

Token Usage section (always visible):

  • Input tokens
  • Output tokens
  • Thinking tokens (when applicable)
  • Cached input tokens
  • Cache write tokens
  • Context tokens
  • Search sources (Gemini live search)
  • Total tokens

Pricing Tier Indicator

Some models have tiered pricing based on context window usage. When a run exceeds the standard context threshold (typically 128K tokens), the popover displays:

  • Pricing tier: Standard or High Context
  • Context usage: Current tokens vs threshold (e.g., "156,000 / 128,000")

High context pricing typically costs 1.5-2x more per token than standard pricing.

Thread Total Cost

The thread header displays a cumulative cost indicator summarizing all runs in the thread. Click it to see:

  • Thread Cost Summary: Total cost across all runs
  • Cost by Model: Breakdown showing each model used, its total cost, and number of runs

This is useful for understanding the total investment in a conversation, especially when switching between models during a thread.

Token Types Explained

Token type categories with descriptions

Understanding token types is essential for optimizing costs. Each type has different pricing.

Input Tokens

Your messages, system prompt, context blocks, conversation history, and tool definitions are all serialized into input tokens. This is typically the largest cost component for context-heavy workflows.

What counts as input:

  • The system prompt configured for the thread
  • All context blocks (text, documents, repositories, URLs, CRM data)
  • Previous messages in the conversation history
  • Tool schemas and descriptions
  • The current user prompt

Output Tokens

The AI model's response is measured in output tokens. Output tokens are generally 3-5x more expensive than input tokens per unit.

What counts as output:

  • The text content of the AI response
  • Structured data in tool call arguments
  • Any formatted content (code blocks, tables, lists)

Cached Input Tokens

When the same content is sent to a model repeatedly (common with system prompts and context blocks), providers can cache it. Cached tokens are significantly cheaper than regular input tokens.

Anthropic prompt caching:

  • Automatic for Claude models on GPT Workbench
  • System messages are always cached
  • Large content blocks (over ~1,000 tokens) are cached
  • The last AI message before the current turn is cached
  • Cached read cost is approximately 88% cheaper than regular input
  • Cache is maintained per-session; the first request pays full price

How to tell if caching is working:

  • Open the cost popover on a response
  • Look for the "Cached input cost" and "Cached input tokens" lines
  • A high ratio of cached vs uncached input tokens indicates effective caching

Thinking Tokens

Models with extended reasoning capabilities (Claude with thinking, OpenAI o-series, GPT-5) generate internal reasoning tokens before producing the final response. These are billed at the output token rate.

Key characteristics:

  • Thinking tokens are not visible in the response text
  • They represent the model's internal chain-of-thought reasoning
  • Billed at the same rate as output tokens
  • Controlled by the thinking budget setting in thread configuration
  • Higher thinking budgets produce more thorough analysis but cost more

Cache Write Tokens

When content is cached for the first time, providers charge a cache write fee. This is a one-time cost per cache entry.

Anthropic cache writes:

  • Charged at approximately 1.25x the regular input token rate
  • Only occurs on the first request; subsequent requests use cached reads
  • Up to 4 cache breakpoints per request
  • Cache entries expire after a provider-defined TTL (typically 5 minutes of inactivity)

Live Search Sources

Some models (Gemini with grounding) can search the web during response generation. Each search source consulted incurs a small fee.

  • Billed per 1,000 sources consulted
  • Displayed as "Search sources" in the token usage section
  • Cost shown as "Live search cost" in the breakdown

Cost Optimization

Choose the Right Model

Model selection has the largest impact on cost. Here is a general pricing comparison:

Model TierExample ModelsRelative Cost
EconomyClaude Haiku, GPT-4o mini1x (baseline)
StandardClaude Sonnet, GPT-4o5-10x
PremiumClaude Opus, GPT-5, o315-30x

Recommendations:

  • Use economy models for routine tasks: summarization, formatting, simple Q&A
  • Use standard models for most business tasks: analysis, writing, code generation
  • Reserve premium models for complex reasoning, multi-step analysis, or critical decisions

Leverage Prompt Caching

Prompt caching is automatic for Anthropic models and provides substantial savings:

  • First request to a thread pays full input cost plus cache write
  • Subsequent requests pay ~12% of the original input cost for cached content
  • For a thread with 10,000 tokens of context, savings reach ~88% after the first request
  • Keep conversations in the same thread to maximize cache reuse

Manage Thinking Budgets

When using models with extended thinking:

  • Light thinking: Fewer reasoning tokens, faster responses, lower cost
  • Deep thinking: More thorough analysis, slower responses, higher cost
  • Match the thinking budget to the task complexity
  • Simple factual questions do not benefit from deep thinking

Optimize Context Usage

Context blocks are included in every request as input tokens:

  • Remove context blocks you no longer need for the current conversation
  • Use repository context filters to include only relevant directories
  • Prefer text context blocks over full document uploads when only excerpts are needed
  • Monitor the context usage indicator in the thread header to track token consumption

Use Conversation Compacting

Long conversations accumulate token costs because the entire history is sent with each request:

  • Watch the context usage indicator for warning signs (80% capacity)
  • Use conversation compacting to summarize older messages
  • Choose the appropriate compression level: Small (last 3 messages), Medium (last 10), or Large (all)
  • The summary replaces original messages as a context block, reducing token count

Use Console Mode for Iteration

In Console Mode, AI responses are not committed to conversation history until you explicitly add them:

  • Experiment with different prompts without inflating history
  • Regenerate responses without adding to the token accumulation
  • Only commit the final version to keep the conversation lean

Team Usage Tracking

Team Statistics Card

Team usage statistics with member breakdown

Each team's Settings page includes a statistics card showing:

  • Total runs: Number of AI interactions by all team members
  • Total tokens: Combined token consumption across the team
  • Total cost (USD mode) or Total credits (credits mode): Aggregate spending
  • Last activity: When the team was last used
  • Per-member breakdown: Usage metrics for each team member

The statistics card respects the organization's cost display mode. If prices are hidden, only token counts are shown.

Model-Specific Analysis

Team statistics break down usage by AI model:

  • See which models are used most frequently within the team
  • Identify cost-heavy model choices
  • Compare efficiency across models for similar tasks

Organization Usage Reports

Organization administrators have access to comprehensive analytics through the Overview tab. Key reports include:

KPI Summary Cards:

  • Active users in the selected period
  • Total threads created
  • Aggregate token consumption
  • Total cost or credit usage

Model Consumption Chart:

  • Visual distribution of usage across AI models
  • Identify underutilized models for potential cost savings
  • Track model adoption over time

Credit/Cost Trends:

  • Historical cost trajectory with trend lines
  • Compare periods to identify growth patterns
  • Forecast future costs based on current trends

Top Spenders:

  • Users ranked by cost or credit consumption
  • Helps with internal cost allocation
  • Identifies users who may benefit from cost optimization training

CSV Export:

  • Download all statistics for the selected date range
  • Include in management reports or billing reconciliation
  • Filter by date range before exporting

See Admin Features for full details on organization-level analytics and management.

Pricing Model

GPT Workbench uses a gross margin pricing model, which is the SaaS industry standard:

price = cost / (1 - margin%)
MarginMultiplierExample
75%4xProvider charges $1 --> user pays $4
80%5xProvider charges $1 --> user pays $5

This is distinct from markup pricing (which would be cost x (1 + margin%)). The gross margin model means a fixed percentage of revenue is retained as profit regardless of provider cost fluctuations.

How it works in practice:

  1. The AI provider charges a base cost per token (e.g., $0.003 per 1K input tokens)
  2. GPT Workbench applies the configured margin to determine the user-facing price
  3. The cost breakdown popover shows the margined price, not the raw provider cost
  4. Organizations on custom plans may have different margin rates
  • Admin Features - Organization-level management and analytics
  • Threads - Thread cost tracking, token management, and compacting
  • Models & Tools - AI model selection and pricing tiers
  • Context Blocks - Managing context to optimize token usage
  • Teams - Team statistics and collaboration features

GPT Workbench Documentation