Skip to content

Conversation Compacting

Conversation compacting is an AI-powered feature that summarizes long conversations and context blocks to reclaim space in the model's context window. As threads grow, they consume more tokens with each request. Compacting condenses older content into a structured summary, preserving key information while significantly reducing token usage.

Overview

Every message sent to an AI model includes the full conversation history, system prompt, and all attached context blocks. As a conversation progresses, this accumulated content can approach or exceed the model's context window limit. Compacting addresses this by replacing the original content with a concise AI-generated summary.

GPT Workbench provides two compacting mechanisms:

  • Conversation compacting -- Condenses the entire message history of a thread into a single summary message
  • Context block compacting -- Merges multiple context blocks into one summarized block

Both mechanisms use AI to read the full original content, identify key information, and produce a condensed version that retains the essential details.

When to Use Compacting

Compacting is most valuable in the following situations:

  • Approaching token limits -- The token usage indicator in the thread header shows you are nearing the model's context window capacity.
  • Long-running threads -- Conversations with dozens or hundreds of messages accumulate significant token overhead. Compacting older exchanges frees space for new interactions.
  • Redundant context blocks -- Multiple context blocks covering overlapping topics can be merged into a single block without meaningful information loss.
  • Performance optimization -- Shorter prompts process faster and cost less. Compacting reduces per-request token consumption, which directly lowers cost.

When not to compact:

  • When every detail of the conversation history matters (legal, compliance, or audit threads)
  • When the thread is short (fewer than 3 messages)
  • When precision on specific earlier exchanges is required for the current task

Conversation Compacting

Conversation compacting replaces all committed messages in a thread with a single summary message. The AI reads the system prompt, attached context blocks, and the full message history to produce a structured summary.

How It Works

  1. The system collects all committed messages in the thread, ordered chronologically.
  2. The thread's system prompt and context block metadata are included to provide structural context.
  3. An AI model generates a summary based on the selected compaction size.
  4. All original messages are soft-deleted (preserved in the database but hidden from the conversation view).
  5. The summary is inserted as a new committed user message with metadata indicating it is a compacted summary.

Compaction Sizes

SizeDescriptionGeneration TimeToken Reduction
SmallBrief summary focusing on key decisions and outcomes~1 minute70-90%
MediumBalanced summary preserving important details and context~2 minutes50-75%
LargeMaximum detail, preserving all relevant information~3 minutes30-60%

The default is Medium, which provides a good balance between compression and detail retention.

Custom Instructions

You can provide custom instructions to guide the summarization. These instructions are appended to the standard compaction prompt and influence what the AI emphasizes or preserves.

Examples of custom instructions:

  • "Focus on technical decisions and code architecture"
  • "Emphasize action items and deadlines"
  • "Preserve all numerical data and metrics"
  • "Prioritize business outcomes over implementation details"

Summary Preview and Editing

Before committing to a compaction, you can review and edit the generated summary:

  1. Click Generate Summary Preview to produce the summary without modifying the conversation.
  2. Review the preview in the right panel of the compacting modal.
  3. Click Edit to modify the summary text directly.
  4. Click Regenerate to generate a new summary (useful after changing the compaction size or custom instructions).
  5. Use the Fullscreen button to review the summary in a larger view.
  6. When satisfied, click Confirm & Compact to apply.

Compacting trigger

The modal displays character count and approximate token count for the summary, so you can assess the savings before confirming.

Compaction Metadata

Each compacted summary message carries metadata that records:

  • The date of compaction
  • The number of original messages that were condensed
  • The compaction size used
  • Estimated token savings
  • Processing time

This metadata is stored with the message and can be referenced for auditing or tracking purposes.

Context Block Compacting

Context block compacting merges multiple context blocks into a single summarized block. This is useful when a thread has accumulated many blocks covering related topics, and the combined token consumption is high.

How It Works

  1. Select two or more context blocks using the checkboxes in the Thread Context tab.
  2. Click Compact Selected in the bulk action toolbar.
  3. The compacting modal opens, showing the selected blocks.
  4. Choose a compaction size (Small, Medium, or Large).
  5. Optionally add custom instructions.
  6. Click Generate Summary Preview to see the proposed summary.
  7. Review and optionally edit the summary.
  8. Click Confirm & Compact to replace the selected blocks with a single summary block.

Compacting result

Compatible Block Types

Only text and document context blocks can be compacted. If your selection includes incompatible block types (repositories, URLs, HubSpot, or other integration blocks), the modal displays a warning listing those blocks. Incompatible blocks are excluded from the summary and are not deleted.

Block TypeCompactableNotes
TextYesFull content included in summary
DocumentYesExtracted text included in summary
RepositoryNoDynamic content, cannot be statically summarized
URLNoContent fetched live, cannot be statically summarized
HubSpotNoCRM data refreshed on each use
SharePointNoCloud-stored, fetched on each use
Google DriveNoCloud-stored, fetched on each use

Result

After compacting, the selected blocks are removed and replaced by a single text context block containing the AI-generated summary. The new block is titled to indicate it is a compacted summary and is attached to the thread in place of the originals.

Token Savings

Typical token savings depend on the compaction size and the nature of the original content:

Compaction SizeTypical SavingsBest For
Small70-90%Quick reference, high-level overview
Medium50-75%Balanced retention, general-purpose
Large30-60%Maximum detail preservation

Token savings are estimated using a ratio of approximately 1 token per 4 characters. The exact savings appear in the compaction metadata after the operation completes.

Tradeoffs

Compacting is a lossy operation. While the AI is instructed to preserve key information, some detail is inevitably lost:

  • Nuance and tone -- Subtle aspects of earlier exchanges may not survive summarization.
  • Exact wording -- Specific phrasing, quotes, or instructions from earlier messages may be paraphrased.
  • Contextual detail -- Minor details that the AI deems less relevant may be omitted.
  • Irreversibility -- Original messages are soft-deleted. While they remain in the database, they are no longer part of the active conversation.

The Large compaction size mitigates these issues by preserving more detail, but at the cost of smaller token savings.

Best Practices

  1. Compact proactively -- Do not wait until the context window is full. Compact when you notice the token usage indicator approaching 70-80% capacity.
  2. Use Medium for most cases -- The Medium compaction size provides the best balance between savings and detail retention. Reserve Small for simple conversations and Large for complex technical discussions.
  3. Review before confirming -- Always generate a preview and review the summary before committing. Edit the summary if key information is missing.
  4. Add custom instructions -- If the conversation has a specific focus (technical, financial, creative), guide the AI with custom instructions to prioritize relevant details.
  5. Compact context blocks first -- If both conversation history and context blocks are consuming significant tokens, compact context blocks first. They are easier to review and the operation is more targeted.
  6. Keep recent messages -- Compacting affects all committed messages. If recent exchanges contain critical context for the next prompt, consider waiting before compacting.
  7. Use descriptive thread titles -- The thread title and system prompt provide structural context during summarization. Clear titles help the AI produce better summaries.

GPT Workbench Documentation