Conversation Compacting

Conversation compacting is an AI-powered feature that summarizes long conversations and context blocks to reclaim space in the model's context window. As threads grow, they consume more tokens with each request. Compacting condenses older content into a structured summary, preserving key information while significantly reducing token usage.

Overview

Every message sent to an AI model includes the full conversation history, system prompt, and all attached context blocks. As a conversation progresses, this accumulated content can approach or exceed the model's context window limit. Compacting addresses this by replacing the original content with a concise AI-generated summary.

GPT Workbench provides two compacting mechanisms:

Conversation compacting -- Condenses the entire message history of a thread into a single summary message
Context block compacting -- Merges multiple context blocks into one summarized block

Both mechanisms use AI to read the full original content, identify key information, and produce a condensed version that retains the essential details.

When to Use Compacting

Compacting is most valuable in the following situations:

Approaching token limits -- The token usage indicator in the thread header shows you are nearing the model's context window capacity.
Long-running threads -- Conversations with dozens or hundreds of messages accumulate significant token overhead. Compacting older exchanges frees space for new interactions.
Redundant context blocks -- Multiple context blocks covering overlapping topics can be merged into a single block without meaningful information loss.
Performance optimization -- Shorter prompts process faster and cost less. Compacting reduces per-request token consumption, which directly lowers cost.

When not to compact:

When every detail of the conversation history matters (legal, compliance, or audit threads)
When the thread is short (fewer than 3 messages)
When precision on specific earlier exchanges is required for the current task

Conversation Compacting

Conversation compacting replaces all committed messages in a thread with a single summary message. The AI reads the system prompt, attached context blocks, and the full message history to produce a structured summary.

How It Works

The system collects all committed messages in the thread, ordered chronologically.
The thread's system prompt and context block metadata are included to provide structural context.
An AI model generates a summary based on the selected compaction size.
All original messages are soft-deleted (preserved in the database but hidden from the conversation view).
The summary is inserted as a new committed user message with metadata indicating it is a compacted summary.

Compaction Sizes

Size	Description	Generation Time	Token Reduction
Small	Brief summary focusing on key decisions and outcomes	~1 minute	70-90%
Medium	Balanced summary preserving important details and context	~2 minutes	50-75%
Large	Maximum detail, preserving all relevant information	~3 minutes	30-60%

The default is Medium, which provides a good balance between compression and detail retention.

Custom Instructions

You can provide custom instructions to guide the summarization. These instructions are appended to the standard compaction prompt and influence what the AI emphasizes or preserves.

Examples of custom instructions:

"Focus on technical decisions and code architecture"
"Emphasize action items and deadlines"
"Preserve all numerical data and metrics"
"Prioritize business outcomes over implementation details"

Summary Preview and Editing

Before committing to a compaction, you can review and edit the generated summary:

Click Generate Summary Preview to produce the summary without modifying the conversation.
Review the preview in the right panel of the compacting modal.
Click Edit to modify the summary text directly.
Click Regenerate to generate a new summary (useful after changing the compaction size or custom instructions).
Use the Fullscreen button to review the summary in a larger view.
When satisfied, click Confirm & Compact to apply.

Compacting trigger

The modal displays character count and approximate token count for the summary, so you can assess the savings before confirming.

Compaction Metadata

Each compacted summary message carries metadata that records:

The date of compaction
The number of original messages that were condensed
The compaction size used
Estimated token savings
Processing time

This metadata is stored with the message and can be referenced for auditing or tracking purposes.

Context Block Compacting

Context block compacting merges multiple context blocks into a single summarized block. This is useful when a thread has accumulated many blocks covering related topics, and the combined token consumption is high.

How It Works

Select two or more context blocks using the checkboxes in the Thread Context tab.
Click Compact Selected in the bulk action toolbar.
The compacting modal opens, showing the selected blocks.
Choose a compaction size (Small, Medium, or Large).
Optionally add custom instructions.
Click Generate Summary Preview to see the proposed summary.
Review and optionally edit the summary.
Click Confirm & Compact to replace the selected blocks with a single summary block.

Compacting result

Compatible Block Types

Only text and document context blocks can be compacted. If your selection includes incompatible block types (repositories, URLs, HubSpot, or other integration blocks), the modal displays a warning listing those blocks. Incompatible blocks are excluded from the summary and are not deleted.

Block Type	Compactable	Notes
Text	Yes	Full content included in summary
Document	Yes	Extracted text included in summary
Repository	No	Dynamic content, cannot be statically summarized
URL	No	Content fetched live, cannot be statically summarized
HubSpot	No	CRM data refreshed on each use
SharePoint	No	Cloud-stored, fetched on each use
Google Drive	No	Cloud-stored, fetched on each use

Result

After compacting, the selected blocks are removed and replaced by a single text context block containing the AI-generated summary. The new block is titled to indicate it is a compacted summary and is attached to the thread in place of the originals.

Token Savings

Typical token savings depend on the compaction size and the nature of the original content:

Compaction Size	Typical Savings	Best For
Small	70-90%	Quick reference, high-level overview
Medium	50-75%	Balanced retention, general-purpose
Large	30-60%	Maximum detail preservation

Token savings are estimated using a ratio of approximately 1 token per 4 characters. The exact savings appear in the compaction metadata after the operation completes.

Tradeoffs

Compacting is a lossy operation. While the AI is instructed to preserve key information, some detail is inevitably lost:

Nuance and tone -- Subtle aspects of earlier exchanges may not survive summarization.
Exact wording -- Specific phrasing, quotes, or instructions from earlier messages may be paraphrased.
Contextual detail -- Minor details that the AI deems less relevant may be omitted.
Irreversibility -- Original messages are soft-deleted. While they remain in the database, they are no longer part of the active conversation.

The Large compaction size mitigates these issues by preserving more detail, but at the cost of smaller token savings.

Best Practices

Compact proactively -- Do not wait until the context window is full. Compact when you notice the token usage indicator approaching 70-80% capacity.
Use Medium for most cases -- The Medium compaction size provides the best balance between savings and detail retention. Reserve Small for simple conversations and Large for complex technical discussions.
Review before confirming -- Always generate a preview and review the summary before committing. Edit the summary if key information is missing.
Add custom instructions -- If the conversation has a specific focus (technical, financial, creative), guide the AI with custom instructions to prioritize relevant details.
Compact context blocks first -- If both conversation history and context blocks are consuming significant tokens, compact context blocks first. They are easier to review and the operation is more targeted.
Keep recent messages -- Compacting affects all committed messages. If recent exchanges contain critical context for the next prompt, consider waiting before compacting.
Use descriptive thread titles -- The thread title and system prompt provide structural context during summarization. Clear titles help the AI produce better summaries.

Context Blocks - Managing context blocks and token usage
Threads - Thread management and conversation history
Models & Tools - Context window sizes by model
Cost Analytics - Monitoring token consumption and costs

Conversation Compacting ​

Overview ​

When to Use Compacting ​

Conversation Compacting ​

How It Works ​

Compaction Sizes ​

Custom Instructions ​

Summary Preview and Editing ​

Compaction Metadata ​

Context Block Compacting ​

How It Works ​

Compatible Block Types ​

Result ​

Token Savings ​

Tradeoffs ​

Best Practices ​

Related Documentation ​

Conversation Compacting

Overview

When to Use Compacting

Conversation Compacting

How It Works

Compaction Sizes

Custom Instructions

Summary Preview and Editing

Compaction Metadata

Context Block Compacting

How It Works

Compatible Block Types

Result

Token Savings

Tradeoffs

Best Practices

Related Documentation