Metrized Brief for Assadi Capital

Why Copy-Pasting Into ChatGPT Doesn't Scale

Context rot, retrieval-augmented generation, and how to give your AI the right information at the right time. A practical guide for teams moving from casual AI use to structured AI workflows.

March 2026 · Prepared by Metrized Consulting Inc.

01 — What To Do

Practical Recommendations

These are the immediate changes that will improve how your team works with ChatGPT. The rest of this document explains the technical reasons why.

Stop copy-pasting large documents into the chat window. Upload them as files instead — either into a Custom GPT's Knowledge section or a ChatGPT Project — and let the retrieval system surface what's relevant.

When to use what

Short one-off question
Paste into chat. Context rot is negligible for short inputs.
Stable reference docs
Build a Custom GPT with those documents uploaded to its Knowledge section.
Ongoing project work
Use a ChatGPT Project. Files + memory persist across conversations.
Production app / API
Use the Assistants API with Vector Stores for full pipeline control.

Structuring your documents for retrieval

When you upload files, ChatGPT breaks them into chunks and searches them semantically. You can help it work better:


02 — Why It Happens

Context Rot Is a Measured Phenomenon

If you've been pasting business documents into ChatGPT and finding that quality degrades the more you add, the cause is well-documented. Researchers call it context rot — the measurable degradation in LLM output quality as input length increases.

Chroma's 2025 study tested 18 frontier models (GPT-4, Claude, Gemini, and others) and found that every single one degrades at every input length increment. A 1M-token window still degrades at 50K tokens. The issue isn't capacity — it's noise accumulation.

The "Lost in the Middle" Effect
Liu et al., Stanford / TACL 2024 — Accuracy by position of relevant information
40% 55% 70% 85% 95% Start Middle End Position of relevant information in context ~92% ~93% ~56% 30%+ ACCURACY DROP

Three compounding mechanisms

01 — POSITIONAL BIAS

Lost in the Middle

Models attend well to the start and end of context but poorly to the middle. Critical information placed mid-context can sit in a blind spot even if the model technically "sees" it.

02 — ATTENTION DILUTION

Signal Drowns in Noise

Transformer attention is quadratic. At 100K tokens, the model computes 10 billion pairwise relationships. Each token's attention weight shrinks as context grows — the noise floor rises.

03 — DISTRACTOR INTERFERENCE

Similar Content Misleads

Semantically similar but irrelevant content actively misleads the model. Five contracts pasted when you need one aren't neutral — they compete for the model's attention budget.

Illustrative Performance Degradation by Context Length
Based on Chroma (2025) — all 18 tested frontier models degrade at every increment
5K tokens
95%
25K tokens
85%
50K tokens
72%
100K tokens
58%
200K+ tokens
42%

Illustrative composite — individual model curves vary. The key finding is universality: no model is immune.


03 — How It Works

What ChatGPT Does With Your Files

When you upload files to a Custom GPT's Knowledge section or to a ChatGPT Project, the system does not paste everything into the conversation. It uses a technique called Retrieval-Augmented Generation (RAG).

The RAG Pipeline
Index once → retrieve per query → inject only what's relevant
Your Docs CHUNK Vector Store MATCH Relevant Chunks INJECT LLM Clean context = better output Low rot

The key insight: RAG keeps context lean. The model only sees information relevant to your specific question — not your entire document library. This directly combats context rot.

How the approaches compare

MethodHow It WorksContext RotBest For
Copy-pasteEverything goes directly into the context windowHIGHQuick questions on short text
GPT KnowledgeFiles indexed via RAG; relevant chunks retrieved per queryLOWStable reference (SOPs, policies, FAQs)
ChatGPT ProjectsRAG retrieval + cross-conversation memoryLOWOngoing work with evolving context
Assistants APIFull control over chunking, embedding model, retrievalLOWESTProduction apps, enterprise integration

Note: PDFs uploaded as GPT Knowledge use text-only retrieval. PDFs uploaded by users during conversation can use visual retrieval for layout-aware parsing.


04 — Glossary

Key Terms

Context Window
The total amount of text (measured in tokens) that a language model can process in a single interaction. Larger windows don't automatically mean better performance.
Context Rot
The measurable degradation in LLM output quality as input context length increases. Caused by attention dilution, positional bias, and distractor interference. Affects all models.
RAG (Retrieval-Augmented Generation)
A technique where documents are indexed into vector embeddings, and only relevant chunks are retrieved and injected into the model's context at query time. This is what Custom GPTs and Projects use under the hood.
Vector Embedding
A numerical representation of text that captures its semantic meaning. Similar concepts produce similar vectors, enabling semantic search — finding content by meaning rather than exact keyword match.
Vector Store
A specialized database optimized for storing and searching vector embeddings. OpenAI uses these behind the scenes when you upload Knowledge files to a Custom GPT.
Custom GPT
A specialized ChatGPT assistant with persistent instructions, up to 20 uploaded Knowledge files, and optional API Actions. Knowledge files are indexed via RAG for retrieval.
ChatGPT Projects
Smart workspaces in ChatGPT that group chats, files, and custom instructions. They add cross-conversation memory on top of RAG retrieval, making them ideal for evolving, multi-session work.
Token
The basic unit of text that LLMs process. Roughly 1 token ≈ 0.75 words in English. A 128K-token context window fits approximately 96,000 words.
Claude Skills
Reusable instruction packages for Claude (Anthropic's AI). A SKILL.md file defines a repeatable workflow — document formatting, proposal generation, data processing — that Claude follows consistently. Skills are stored locally, version-controlled, and composable: you can chain them for complex multi-step workflows.
Cowork (by Anthropic)
An agentic mode in the Claude Desktop app. Unlike chat, Claude plans and executes multi-step tasks autonomously — organizing files, generating reports, processing data — while keeping you in the loop. It can access local folders, use connected services (Gmail, Drive, Slack), and even control your screen when no direct integration exists. Available on paid Claude plans.

05 — What's Next

Beyond Chat: Improving Your AI Workflow

Structured retrieval (GPTs, Projects) is the first step. The next level is building repeatable AI workflows that integrate directly with your files and business processes. Here are two approaches worth exploring.

Anthropic / Claude

Cowork

Claude's desktop agent mode. Point it at a folder on your computer, describe what you want done, and it plans and executes multi-step tasks: organizing files, drafting reports from scattered notes, processing data, creating formatted documents. It works like delegating to a colleague rather than prompting a chatbot. Now supports computer use — Claude can open apps, navigate browsers, and fill in spreadsheets when no direct connector is available.

claude.com/product/cowork →
Anthropic / Claude

Claude Skills

A SKILL.md file is a set of structured instructions that Claude follows consistently for a specific task — like a playbook. Metrized builds custom skills for document formatting, proposal generation, invoice processing, and more. Skills are version-controlled, shareable, and composable: you can chain them together for complex workflows (e.g., extract data → format report → apply branding).

Learn more about Skills →

Metrized can help you scope, build, and deploy custom AI workflows — from GPT configuration to full Cowork + Skills implementations. Reach out to discuss what's possible for your team.