Conversation Compacting
Conversation compacting is an AI-powered feature that summarizes long conversations and context blocks to reclaim space in the model's context window. As threads grow, they consume more tokens with each request. Compacting condenses older content into a structured summary, preserving key information while significantly reducing token usage.
Overview
Every message sent to an AI model includes the full conversation history, system prompt, and all attached context blocks. As a conversation progresses, this accumulated content can approach or exceed the model's context window limit. Compacting addresses this by replacing the original content with a concise AI-generated summary.
GPT Workbench provides two compacting mechanisms:
- Conversation compacting -- Condenses the entire message history of a thread into a single summary message
- Context block compacting -- Merges multiple context blocks into one summarized block
Both mechanisms use AI to read the full original content, identify key information, and produce a condensed version that retains the essential details.
When to Use Compacting
Compacting is most valuable in the following situations:
- Approaching token limits -- The token usage indicator in the thread header shows you are nearing the model's context window capacity.
- Long-running threads -- Conversations with dozens or hundreds of messages accumulate significant token overhead. Compacting older exchanges frees space for new interactions.
- Redundant context blocks -- Multiple context blocks covering overlapping topics can be merged into a single block without meaningful information loss.
- Performance optimization -- Shorter prompts process faster and cost less. Compacting reduces per-request token consumption, which directly lowers cost.
When not to compact:
- When every detail of the conversation history matters (legal, compliance, or audit threads)
- When the thread is short (fewer than 3 messages)
- When precision on specific earlier exchanges is required for the current task
Conversation Compacting
Conversation compacting replaces all committed messages in a thread with a single summary message. The AI reads the system prompt, attached context blocks, and the full message history to produce a structured summary.
How It Works
- The system collects all committed messages in the thread, ordered chronologically.
- The thread's system prompt and context block metadata are included to provide structural context.
- An AI model generates a summary based on the selected compaction size.
- All original messages are soft-deleted (preserved in the database but hidden from the conversation view).
- The summary is inserted as a new committed user message with metadata indicating it is a compacted summary.
Compaction Sizes
| Size | Description | Generation Time | Token Reduction |
|---|---|---|---|
| Small | Brief summary focusing on key decisions and outcomes | ~1 minute | 70-90% |
| Medium | Balanced summary preserving important details and context | ~2 minutes | 50-75% |
| Large | Maximum detail, preserving all relevant information | ~3 minutes | 30-60% |
The default is Medium, which provides a good balance between compression and detail retention.
Custom Instructions
You can provide custom instructions to guide the summarization. These instructions are appended to the standard compaction prompt and influence what the AI emphasizes or preserves.
Examples of custom instructions:
- "Focus on technical decisions and code architecture"
- "Emphasize action items and deadlines"
- "Preserve all numerical data and metrics"
- "Prioritize business outcomes over implementation details"
Summary Preview and Editing
Before committing to a compaction, you can review and edit the generated summary:
- Click Generate Summary Preview to produce the summary without modifying the conversation.
- Review the preview in the right panel of the compacting modal.
- Click Edit to modify the summary text directly.
- Click Regenerate to generate a new summary (useful after changing the compaction size or custom instructions).
- Use the Fullscreen button to review the summary in a larger view.
- When satisfied, click Confirm & Compact to apply.

The modal displays character count and approximate token count for the summary, so you can assess the savings before confirming.
Compaction Metadata
Each compacted summary message carries metadata that records:
- The date of compaction
- The number of original messages that were condensed
- The compaction size used
- Estimated token savings
- Processing time
This metadata is stored with the message and can be referenced for auditing or tracking purposes.
Context Block Compacting
Context block compacting merges multiple context blocks into a single summarized block. This is useful when a thread has accumulated many blocks covering related topics, and the combined token consumption is high.
How It Works
- Select two or more context blocks using the checkboxes in the Thread Context tab.
- Click Compact Selected in the bulk action toolbar.
- The compacting modal opens, showing the selected blocks.
- Choose a compaction size (Small, Medium, or Large).
- Optionally add custom instructions.
- Click Generate Summary Preview to see the proposed summary.
- Review and optionally edit the summary.
- Click Confirm & Compact to replace the selected blocks with a single summary block.

Compatible Block Types
Only text and document context blocks can be compacted. If your selection includes incompatible block types (repositories, URLs, HubSpot, or other integration blocks), the modal displays a warning listing those blocks. Incompatible blocks are excluded from the summary and are not deleted.
| Block Type | Compactable | Notes |
|---|---|---|
| Text | Yes | Full content included in summary |
| Document | Yes | Extracted text included in summary |
| Repository | No | Dynamic content, cannot be statically summarized |
| URL | No | Content fetched live, cannot be statically summarized |
| HubSpot | No | CRM data refreshed on each use |
| SharePoint | No | Cloud-stored, fetched on each use |
| Google Drive | No | Cloud-stored, fetched on each use |
Result
After compacting, the selected blocks are removed and replaced by a single text context block containing the AI-generated summary. The new block is titled to indicate it is a compacted summary and is attached to the thread in place of the originals.
Token Savings
Typical token savings depend on the compaction size and the nature of the original content:
| Compaction Size | Typical Savings | Best For |
|---|---|---|
| Small | 70-90% | Quick reference, high-level overview |
| Medium | 50-75% | Balanced retention, general-purpose |
| Large | 30-60% | Maximum detail preservation |
Token savings are estimated using a ratio of approximately 1 token per 4 characters. The exact savings appear in the compaction metadata after the operation completes.
Tradeoffs
Compacting is a lossy operation. While the AI is instructed to preserve key information, some detail is inevitably lost:
- Nuance and tone -- Subtle aspects of earlier exchanges may not survive summarization.
- Exact wording -- Specific phrasing, quotes, or instructions from earlier messages may be paraphrased.
- Contextual detail -- Minor details that the AI deems less relevant may be omitted.
- Irreversibility -- Original messages are soft-deleted. While they remain in the database, they are no longer part of the active conversation.
The Large compaction size mitigates these issues by preserving more detail, but at the cost of smaller token savings.
Best Practices
- Compact proactively -- Do not wait until the context window is full. Compact when you notice the token usage indicator approaching 70-80% capacity.
- Use Medium for most cases -- The Medium compaction size provides the best balance between savings and detail retention. Reserve Small for simple conversations and Large for complex technical discussions.
- Review before confirming -- Always generate a preview and review the summary before committing. Edit the summary if key information is missing.
- Add custom instructions -- If the conversation has a specific focus (technical, financial, creative), guide the AI with custom instructions to prioritize relevant details.
- Compact context blocks first -- If both conversation history and context blocks are consuming significant tokens, compact context blocks first. They are easier to review and the operation is more targeted.
- Keep recent messages -- Compacting affects all committed messages. If recent exchanges contain critical context for the next prompt, consider waiting before compacting.
- Use descriptive thread titles -- The thread title and system prompt provide structural context during summarization. Clear titles help the AI produce better summaries.
Related Documentation
- Context Blocks - Managing context blocks and token usage
- Threads - Thread management and conversation history
- Models & Tools - Context window sizes by model
- Cost Analytics - Monitoring token consumption and costs
