Skip to content

mlnjsh/context-engineering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧠 Context Engineering β€” The Complete Guide

Everything you need to know about Context Windows, Prompt Engineering, and Building Better AI Systems

GitHub stars GitHub forks License: MIT Last Updated PRs Welcome Awesome


Maintained by Professor Milan Amrut Joshi Professor of Data Science, Northwestern University

A curated, research-backed guide to the emerging discipline of Context Engineering for Large Language Models.

Papers Β· Videos Β· Blog Posts Β· Tools Β· Techniques Β· Courses Β· Roadmap


Table of Contents


What is Context Engineering?

Context Engineering is the art and science of designing, managing, and optimizing the information provided to Large Language Models (LLMs) within their context window to maximize the quality, accuracy, and relevance of their outputs.

While prompt engineering focuses on how you ask, context engineering focuses on what information surrounds your ask β€” the retrieval strategy, the memory architecture, the token budget allocation, the ordering of information, and the system-level design of context pipelines.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      CONTEXT WINDOW                         β”‚
β”‚                                                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚   SYSTEM     β”‚  β”‚  RETRIEVED   β”‚  β”‚   CONVERSATION   β”‚  β”‚
β”‚  β”‚   PROMPT     β”‚  β”‚  DOCUMENTS   β”‚  β”‚   HISTORY        β”‚  β”‚
β”‚  β”‚              β”‚  β”‚  (RAG)       β”‚  β”‚                  β”‚  β”‚
β”‚  β”‚  - Role      β”‚  β”‚  - Chunks    β”‚  β”‚  - Past turns    β”‚  β”‚
β”‚  β”‚  - Rules     β”‚  β”‚  - Metadata  β”‚  β”‚  - Summaries     β”‚  β”‚
β”‚  β”‚  - Examples  β”‚  β”‚  - Rankings  β”‚  β”‚  - Key facts     β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚   TOOLS &    β”‚  β”‚  FEW-SHOT    β”‚  β”‚   USER           β”‚  β”‚
β”‚  β”‚   SCHEMAS    β”‚  β”‚  EXAMPLES    β”‚  β”‚   QUERY          β”‚  β”‚
β”‚  β”‚              β”‚  β”‚              β”‚  β”‚                  β”‚  β”‚
β”‚  β”‚  - Functions β”‚  β”‚  - Input/    β”‚  β”‚  - Current       β”‚  β”‚
β”‚  β”‚  - APIs      β”‚  β”‚    Output    β”‚  β”‚    request       β”‚  β”‚
β”‚  β”‚  - Formats   β”‚  β”‚    pairs     β”‚  β”‚  - Constraints   β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                                             β”‚
β”‚           β–Ό Token Budget Management β–Ό                       β”‚
β”‚           β–Ό Information Ordering    β–Ό                       β”‚
β”‚           β–Ό Relevance Filtering     β–Ό                       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Context Engineering vs Prompt Engineering

Dimension Prompt Engineering Context Engineering
Focus Crafting the query/instruction Designing the entire information environment
Scope Single prompt Full context pipeline (retrieval, memory, tools)
Abstraction Text-level System-level architecture
Key Question "How do I phrase this?" "What information does the model need, and how should it be structured?"
Includes Instructions, few-shot examples RAG, memory, tool definitions, token budgets, ordering
Skill Level 🟒 Beginner to Intermediate 🟑 Intermediate to Advanced
Optimization Wording, formatting, chain-of-thought Retrieval quality, chunking, compression, caching
Analogies Writing a good exam question Designing the entire exam prep system
Dynamic? Mostly static templates Dynamic, adapts per query and session
Measurable Impact Quality of single response System-level accuracy, cost, latency

Why It Matters in 2025-2026

  1. Context windows are exploding β€” From 4K tokens (GPT-3) to 2M+ tokens (Gemini). Managing this space effectively is a core engineering challenge.
  2. RAG is now standard β€” Every production LLM application uses some form of retrieval. Context engineering defines how retrieved data is structured and ranked.
  3. Agentic AI demands it β€” AI agents that use tools, maintain memory, and plan across steps require sophisticated context management.
  4. Cost optimization β€” Tokens cost money. Smart context engineering reduces costs by 50-90% while maintaining quality.
  5. Accuracy at scale β€” The "lost in the middle" problem and context dilution mean that more context is not always better. Engineering is required.

Key Concepts

πŸ”‘ Context Window

The fixed-size buffer of tokens an LLM can process in a single forward pass. Everything the model "knows" at inference time must fit within this window: system prompt, retrieved documents, conversation history, tool schemas, and the user query.

πŸ”‘ Token Limits & Budget Allocation

Given a finite context window, context engineering involves deciding how many tokens to allocate to each component. A common allocation:

  • System prompt: 5-10%
  • Retrieved documents: 40-60%
  • Conversation history: 15-25%
  • Few-shot examples: 5-10%
  • User query + response buffer: 10-20%
πŸ”‘ Retrieval-Augmented Generation (RAG)

The pattern of retrieving relevant documents from an external knowledge base and injecting them into the context window. RAG bridges the gap between parametric knowledge (model weights) and non-parametric knowledge (external data).

πŸ”‘ Memory Management

Strategies for maintaining information across sessions or long conversations: summarization, key-fact extraction, vector-based episodic memory, and hierarchical memory architectures (short-term, long-term, working memory).

πŸ”‘ Context Compression

Techniques to reduce token usage while preserving information: extractive summarization, LLMLingua-style token pruning, semantic deduplication, and information-theoretic compression.

πŸ”‘ Information Ordering

The position of information within the context window affects recall. Models exhibit primacy and recency biases. Context engineering accounts for this by placing critical information at the beginning and end of the context.


πŸ“‹ Context Window Sizes

Model Context Window Provider Year Notes
GPT-4o 128K tokens OpenAI 2024 Multimodal, widely deployed
GPT-o3 200K tokens OpenAI 2025 Reasoning model, extended context
Claude 3.5 Sonnet 200K tokens Anthropic 2024 Strong long-context performance
Claude Opus 4 200K tokens Anthropic 2025 Frontier model
Claude Sonnet 4 200K tokens Anthropic 2025 Balanced performance and speed
Gemini 2.0 Flash 1M tokens Google 2025 Fast, extended context
Gemini 2.0 Pro 2M tokens Google 2025 Largest production context window
Llama 3.1 405B 128K tokens Meta 2024 Open-weight
Llama 4 Maverick 1M tokens Meta 2025 Open-weight, MoE architecture
Mistral Large 2 128K tokens Mistral 2024 European AI lab
DeepSeek V3 128K tokens DeepSeek 2025 MoE, cost-efficient
DeepSeek R1 128K tokens DeepSeek 2025 Reasoning-focused
Command R+ 128K tokens Cohere 2024 RAG-optimized
Grok-2 128K tokens xAI 2024 Real-time data access
Qwen 2.5 72B 128K tokens Alibaba 2024 Multilingual

Note: Context window size alone does not determine quality. Effective utilization across the full window varies significantly between models. See the RULER benchmark and Needle-in-a-Haystack for empirical evaluations.


πŸ“š Research Papers

Full details, abstracts, and annotations available in papers/README.md

Context Window & Long Context

# Paper Authors Year Key Contribution
1 Lost in the Middle: How Language Models Use Long Contexts Liu et al. 2023 Badge
2 Extending Context Window of LLMs via Positional Interpolation Chen et al. 2023 Badge
3 LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Ding et al. 2024 Badge
4 Ring Attention with Blockwise Transformers Liu et al. 2023 Badge
5 YaRN: Efficient Context Window Extension of LLMs Peng et al. 2023 Badge
6 Effective Long-Context Scaling of Foundation Models Xiong et al. (Meta) 2023 Badge
7 LongLoRA: Efficient Fine-tuning of Long-Context LLMs Chen et al. 2023 Badge
8 RULER: What's the Real Context Size of Your LLM? Hsieh et al. 2024 Badge
9 Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention Munkhdalai et al. (Google) 2024 Badge
10 Data Engineering for Scaling Language Models to 128K Context Fu et al. 2024 Badge

Retrieval-Augmented Generation (RAG)

# Paper Authors Year Key Contribution
11 Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks Lewis et al. 2020 Badge
12 Self-RAG: Learning to Retrieve, Generate, and Critique Asai et al. 2023 Badge
13 RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval Sarthi et al. 2024 Badge
14 Corrective Retrieval-Augmented Generation (CRAG) Yan et al. 2024 Badge
15 Active Retrieval Augmented Generation (FLARE) Jiang et al. 2023 Badge
16 Dense Passage Retrieval for Open-Domain QA Karpukhin et al. 2020 Badge
17 ColBERT: Efficient and Effective Passage Search via Late Interaction Khattab & Zaharia 2020 Badge
18 Adaptive-RAG: Learning to Adapt Retrieval-Augmented LLMs Jeong et al. 2024 Badge
19 Seven Failure Points When Engineering a RAG System Barnett et al. 2024 Badge
20 A Survey on RAG Meets LLMs Fan et al. 2024 Badge

Prompt Engineering & Optimization

# Paper Authors Year Key Contribution
21 Chain-of-Thought Prompting Elicits Reasoning in LLMs Wei et al. 2022 Badge
22 Tree of Thoughts: Deliberate Problem Solving with LLMs Yao et al. 2023 Badge
23 DSPy: Compiling Declarative Language Model Calls Khattab et al. 2023 Badge
24 Automatic Prompt Optimization with Gradient Descent and Beam Search Pryzant et al. 2023 Badge
25 Large Language Models Are Human-Level Prompt Engineers Zhou et al. 2022 Badge
26 Principled Instructions Are All You Need Bsharat et al. 2023 Badge
27 Graph of Thoughts: Solving Elaborate Problems with LLMs Besta et al. 2023 Badge

Memory & Context Management

# Paper Authors Year Key Contribution
28 MemGPT: Towards LLMs as Operating Systems Packer et al. 2023 Badge
29 Reflexion: Language Agents with Verbal Reinforcement Learning Shinn et al. 2023 Badge
30 LLMLingua: Compressing Prompts for Accelerated Inference Jiang et al. 2023 Badge
31 Voyager: An Open-Ended Embodied Agent with LLMs Wang et al. 2023 Badge
32 Cognitive Architectures for Language Agents Sumers et al. 2023 Badge
33 LongMem: Augmenting LLMs with Long-Term Memory Wang et al. 2023 Badge
34 Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading Chen et al. 2023 Badge

πŸ“Ή YouTube Videos & Talks

Full playlist with timestamps and key takeaways in videos/README.md

# Title Channel Year Duration Level
1 Let's Build GPT: From Scratch, In Code Andrej Karpathy 2023 2h πŸ”΄
2 Intro to Large Language Models Andrej Karpathy 2023 1h 🟒
3 Attention Is All You Need (Illustrated) Yannic Kilcher 2021 45m 🟑
4 But What Is a GPT? Visual Intro to Transformers 3Blue1Brown 2024 27m 🟒
5 Visualizing Attention in Transformers 3Blue1Brown 2024 26m 🟒
6 RAG from Scratch (Full Course) LangChain 2024 2h 🟑
7 Context Engineering for AI Agents AI Jason 2024 20m 🟑
8 Building Production RAG Systems AI Engineer 2024 35m πŸ”΄
9 How ChatGPT Works Technically ByteByteGo 2023 15m 🟒
10 The Illustrated Retrieval Transformer Jay Alammar 2023 18m 🟑
11 Advanced RAG Techniques DeepLearning.AI 2024 1h 🟑
12 Prompt Engineering Full Course freeCodeCamp 2024 4h 🟒
13 Building AI Agents with Long-Term Memory AI Jason 2024 22m 🟑
14 Stanford CS25 β€” Transformers United Stanford Online 2024 1.5h πŸ”΄
15 DSPy Explained: The Framework for Programming LLMs Connor Shorten 2024 30m 🟑
16 Vector Databases Explained Fireship 2023 8m 🟒
17 Sam Altman on the Future of AI Lex Fridman Podcast 2024 2.5h 🟒
18 Dario Amodei on Anthropic's Vision Lex Fridman Podcast 2024 2.5h 🟒
19 The REAL Problem with RAG (and How to Fix It) James Briggs 2024 25m 🟑
20 Building Effective Agents Anthropic 2025 45m 🟑
21 Context Windows Deep Dive Weights & Biases 2024 40m πŸ”΄
22 Chunking Strategies for RAG Greg Kamradt 2023 30m 🟑

πŸ“ Blog Posts & Articles

Full list with summaries and key takeaways in blogs/README.md

# Title Author / Source Date Level
1 Prompt Engineering Guide Anthropic 2024 🟒
2 OpenAI Prompt Engineering Guide OpenAI 2024 🟒
3 Building RAG-Based LLM Applications Anyscale 2024 🟑
4 The Illustrated Transformer Jay Alammar 2018 🟒
5 Chunking Strategies for LLM Applications Pinecone 2023 🟑
6 What We Learned from a Year of Building with LLMs O'Reilly 2024 πŸ”΄
7 Patterns for Building LLM-Based Systems Eugene Yan 2023 🟑
8 RAG Is More Than Just Vector Search LlamaIndex 2024 🟑
9 Large Language Model Agents (MOOC Materials) Dawn Song / UC Berkeley 2024 πŸ”΄
10 Long Context Prompting for Claude Anthropic 2024 🟑
11 Prompt Engineering vs Context Engineering Simon Willison 2025 🟒
12 Building Effective Agents Anthropic 2024 🟑
13 A Visual Guide to Quantization Maarten Grootendorst 2024 🟑
14 The Full Stack of AI Engineering Pragmatic Engineer 2024 🟒
15 Evaluation Driven Development for LLM Apps Hamel Husain 2024 πŸ”΄
16 Understanding Retrieval-Augmented Generation Lilian Weng 2024 πŸ”΄
17 Agents Overview Lilian Weng 2023 πŸ”΄
18 Prompt Engineering (Comprehensive Guide) Lilian Weng 2023 🟑
19 How to Build an AI Agent LangChain 2024 🟑
20 The Architecture of a Modern RAG System LlamaIndex 2024 πŸ”΄
21 Context Engineering: The Next Frontier Latent Space 2025 🟑
22 Embedding Models: From OpenAI to Open Source Hugging Face 2024 🟑
23 Why RAG Systems Fail and How to Fix Them Towards Data Science 2024 🟑
24 Structured Output from LLMs BoundaryML 2024 🟑
25 The Rise of the AI Engineer Latent Space 2023 🟒
26 A Practitioner's Guide to RAG Cameron Wolfe 2024 🟑
27 Understanding Mixture of Experts Hugging Face 2024 πŸ”΄

πŸ› οΈ Tools & Frameworks

Full comparison with features, pricing, and use cases in tools/README.md

Orchestration Frameworks

Tool Description Language Stars License
LangChain Comprehensive LLM application framework Python/JS 98k+ MIT
LlamaIndex Data framework for LLM context augmentation Python 37k+ MIT
DSPy Programming (not prompting) language models Python 19k+ MIT
Haystack End-to-end NLP / RAG framework Python 17k+ Apache 2.0
Semantic Kernel Microsoft's LLM orchestration SDK C#/Python 22k+ MIT
CrewAI Multi-agent orchestration framework Python 24k+ MIT
AutoGen Multi-agent conversation framework Python 35k+ MIT

Vector Databases

Database Type Hosted Open Source Key Feature
Pinecone Cloud-native Yes No Fully managed, enterprise-grade
Weaviate Hybrid Yes Yes GraphQL API, hybrid search
Chroma Embedded/Cloud Yes Yes Developer-friendly, lightweight
Milvus Distributed Yes Yes Billion-scale vector search
Qdrant Cloud/Self-hosted Yes Yes Rust-based, filtering support
pgvector PostgreSQL extension No Yes Use existing Postgres infra
FAISS Library No Yes Meta's similarity search library

Embedding Models

Model Provider Dimensions Context Notes
text-embedding-3-large OpenAI 3072 8191 Best commercial embedding
text-embedding-3-small OpenAI 1536 8191 Cost-effective
embed-v4 Cohere 1024 512 Multilingual, compressed
voyage-3 Voyage AI 1024 32000 Long-context embeddings
BGE-M3 BAAI 1024 8192 Best open-source multilingual
GTE-Qwen2 Alibaba 1536 32000 Long-context open-source
NomicEmbed Nomic 768 8192 Fully open-source, auditable

Context Management & Agents

Tool Purpose Key Feature
MemGPT / Letta LLM memory management Virtual context, self-editing memory
Mem0 Memory layer for AI Personalized memory for agents
LangMem Long-term memory for LangChain Persistent conversational memory
Instructor Structured output from LLMs Pydantic-based extraction
Guardrails AI LLM output validation Structure, type, and quality checks

πŸŽ“ Courses & Tutorials

Full list with curriculum details in courses/README.md

# Course Provider Level Format Cost
1 ChatGPT Prompt Engineering for Developers DeepLearning.AI + OpenAI 🟒 Video Free
2 Building Systems with the ChatGPT API DeepLearning.AI + OpenAI 🟑 Video Free
3 LangChain for LLM Application Development DeepLearning.AI + LangChain 🟑 Video Free
4 Building and Evaluating Advanced RAG DeepLearning.AI + LlamaIndex 🟑 Video Free
5 Stanford CS324: Large Language Models Stanford πŸ”΄ Lecture Free
6 Stanford CS25: Transformers United Stanford πŸ”΄ Seminar Free
7 Hugging Face NLP Course Hugging Face 🟒 Interactive Free
8 Full Stack LLM Bootcamp FSDL 🟑 Video Free
9 Practical Deep Learning for Coders fast.ai 🟑 Video Free
10 LLM University Cohere 🟒 Interactive Free
11 Prompt Engineering Specialization DeepLearning.AI 🟒 Video Paid
12 UC Berkeley LLM Agents MOOC UC Berkeley πŸ”΄ Video Free

πŸ“Š Techniques & Patterns

Full deep-dive with code examples in techniques/README.md

1. Chunking Strategies

The way you split documents determines retrieval quality.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    CHUNKING STRATEGIES                        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚   Fixed-Size     β”‚    Semantic      β”‚    Recursive           β”‚
β”‚                  β”‚                  β”‚                        β”‚
β”‚  Split every N   β”‚  Split by        β”‚  Try large chunks      β”‚
β”‚  tokens with     β”‚  meaning/topic   β”‚  first, then split     β”‚
β”‚  overlap         β”‚  boundaries      β”‚  smaller if needed     β”‚
β”‚                  β”‚                  β”‚                        β”‚
β”‚  βœ… Simple       β”‚  βœ… Coherent     β”‚  βœ… Adaptive           β”‚
β”‚  βœ… Predictable  β”‚  βœ… Better       β”‚  βœ… Respects           β”‚
β”‚  ❌ Breaks       β”‚     retrieval    β”‚     document           β”‚
β”‚     meaning      β”‚  ❌ Expensive    β”‚     structure          β”‚
β”‚                  β”‚  ❌ Complex      β”‚  ❌ More complex       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

2. Retrieval Patterns

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Naive RAG  │───>β”‚  Advanced RAG    │───>β”‚  Modular RAG      β”‚
β”‚             β”‚    β”‚                  β”‚    β”‚                   β”‚
β”‚ Query ──>   β”‚    β”‚ Query Rewrite    β”‚    β”‚ Router ──> RAG    β”‚
β”‚ Retrieve -> β”‚    β”‚ ──> HyDE         β”‚    β”‚       ──> Agent   β”‚
β”‚ Generate    β”‚    β”‚ ──> Retrieve     β”‚    β”‚       ──> Direct  β”‚
β”‚             β”‚    β”‚ ──> Rerank       β”‚    β”‚                   β”‚
β”‚             β”‚    β”‚ ──> Generate     β”‚    β”‚ Composable        β”‚
β”‚             β”‚    β”‚                  β”‚    β”‚ pipelines         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
     🟒                   🟑                       πŸ”΄

3. Context Compression

Reduce token usage while preserving signal:

  • Extractive: Select only the most relevant sentences/paragraphs
  • Abstractive: Summarize retrieved chunks before injection
  • Token-level: Use LLMLingua to prune low-information tokens (up to 20x compression)
  • Semantic deduplication: Remove redundant information across retrieved chunks

4. Context Caching

Reuse expensive context across requests:

  • Prefix caching: Cache system prompts and few-shot examples (supported by Anthropic, Google)
  • KV-cache sharing: Share key-value caches across similar requests
  • Semantic caching: Cache responses for semantically similar queries

5. Sliding Window Attention

Full Attention (O(n^2)):     Sliding Window (O(n * w)):
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ β–ˆ β–ˆ β–ˆ β–ˆ β–ˆ β–ˆ β”‚              β”‚ β–ˆ β–ˆ β–ˆ Β· Β· Β· β”‚
β”‚ β–ˆ β–ˆ β–ˆ β–ˆ β–ˆ β–ˆ β”‚              β”‚ Β· β–ˆ β–ˆ β–ˆ Β· Β· β”‚
β”‚ β–ˆ β–ˆ β–ˆ β–ˆ β–ˆ β–ˆ β”‚              β”‚ Β· Β· β–ˆ β–ˆ β–ˆ Β· β”‚
β”‚ β–ˆ β–ˆ β–ˆ β–ˆ β–ˆ β–ˆ β”‚              β”‚ Β· Β· Β· β–ˆ β–ˆ β–ˆ β”‚
β”‚ β–ˆ β–ˆ β–ˆ β–ˆ β–ˆ β–ˆ β”‚              β”‚ Β· Β· Β· Β· β–ˆ β–ˆ β”‚
β”‚ β–ˆ β–ˆ β–ˆ β–ˆ β–ˆ β–ˆ β”‚              β”‚ Β· Β· Β· Β· Β· β–ˆ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

6. Multi-hop Reasoning

Chain multiple retrieval steps to answer complex questions:

  1. Decompose query into sub-questions
  2. Retrieve for each sub-question independently
  3. Synthesize intermediate answers
  4. Use intermediate answers to refine retrieval
  5. Generate final comprehensive answer

7. Few-Shot Learning Optimization

  • Dynamic few-shot: Select examples most similar to the current query
  • Diverse few-shot: Ensure coverage of edge cases and formats
  • Ordered few-shot: Place most relevant examples closest to the query (recency bias)

8. System Prompt Engineering

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚            SYSTEM PROMPT LAYERS             β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  1. IDENTITY   β”‚ Role, persona, expertise  β”‚
β”‚  2. CONTEXT    β”‚ Background information    β”‚
β”‚  3. RULES      β”‚ Constraints, boundaries   β”‚
β”‚  4. FORMAT     β”‚ Output structure          β”‚
β”‚  5. EXAMPLES   β”‚ Reference behaviors       β”‚
β”‚  6. FALLBACK   β”‚ Edge case handling        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ—ΊοΈ Roadmap

Context Engineering Learning Path

                    🎯 CONTEXT ENGINEERING MASTERY
                              β”‚
                 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                 β”‚                         β”‚
            FOUNDATIONS              APPLICATIONS
                 β”‚                         β”‚
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”          β”Œβ”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”
         β”‚               β”‚          β”‚             β”‚
    THEORY          PRACTICE    PRODUCTION    RESEARCH
         β”‚               β”‚          β”‚             β”‚
         β–Ό               β–Ό          β–Ό             β–Ό

🟒 BEGINNER (Weeks 1-4)
β”œβ”€β”€ Understand transformer attention mechanisms
β”œβ”€β”€ Learn token counting and context window basics
β”œβ”€β”€ Master basic prompt engineering patterns
β”œβ”€β”€ Study the Illustrated Transformer blog post
β”œβ”€β”€ Complete DeepLearning.AI prompt engineering course
└── Build a simple chatbot with system prompts

🟑 INTERMEDIATE (Weeks 5-10)
β”œβ”€β”€ Implement naive RAG with vector database
β”œβ”€β”€ Learn chunking strategies (fixed, semantic, recursive)
β”œβ”€β”€ Study embedding models and similarity search
β”œβ”€β”€ Implement context compression techniques
β”œβ”€β”€ Build an advanced RAG system with reranking
β”œβ”€β”€ Learn evaluation metrics (faithfulness, relevance, recall)
β”œβ”€β”€ Study "Lost in the Middle" paper and information ordering
└── Complete LangChain / LlamaIndex course

πŸ”΄ ADVANCED (Weeks 11-16)
β”œβ”€β”€ Design multi-agent systems with shared context
β”œβ”€β”€ Implement hierarchical memory (short/long/working)
β”œβ”€β”€ Build modular RAG pipelines with routing
β”œβ”€β”€ Study DSPy for programmatic prompt optimization
β”œβ”€β”€ Implement context caching and cost optimization
β”œβ”€β”€ Learn to evaluate with RAGAS, DeepEval, or custom evals
β”œβ”€β”€ Study agentic RAG patterns (CRAG, Self-RAG, FLARE)
└── Build a production system with monitoring and fallbacks

⭐ EXPERT (Ongoing)
β”œβ”€β”€ Contribute to open-source context engineering tools
β”œβ”€β”€ Publish research on novel context management techniques
β”œβ”€β”€ Design context architectures for enterprise systems
β”œβ”€β”€ Optimize for cost, latency, and quality simultaneously
└── Mentor others in context engineering practices

Contributing

We welcome contributions from the community. Here is how you can help:

  1. Add a resource β€” Open a PR with a new paper, video, blog post, or tool
  2. Fix errors β€” Found a broken link or incorrect information? Open an issue
  3. Improve explanations β€” Help make the techniques section clearer
  4. Add code examples β€” Contribute working code for context engineering patterns
  5. Translate β€” Help translate this guide to other languages

Please read our contribution guidelines before submitting.

How to Contribute

# Fork the repository
git clone https://github.com/mlnjsh/context-engineering.git
cd context-engineering

# Create a feature branch
git checkout -b add-new-resource

# Make your changes and commit
git add .
git commit -m "Add [resource type]: [resource name]"

# Push and create a PR
git push origin add-new-resource

Citation

If you find this resource helpful in your research or work, please consider citing it:

@misc{joshi2025contextengineering,
  title   = {Context Engineering: The Complete Guide},
  author  = {Joshi, Milan Amrut},
  year    = {2025},
  url     = {https://github.com/mlnjsh/context-engineering},
  note    = {A curated guide to context engineering for large language models}
}

License

This work is licensed under the MIT License.


Built with care by Professor Milan Amrut Joshi

Professor of Data Science, Northwestern University

If this resource helped you, please consider giving it a star.

Star History Chart


Contributors & Domain Experts

Milan Amrut Joshi
Milan Amrut Joshi

Project Author
Simon Willison
Simon Willison

LLM context & prompt engineering expert
Brex
Brex

Prompt engineering best practices

About

🧠 The Complete Guide to Context Engineering β€” Research Papers, Blogs, YouTube Videos, Tools & Best Practices for LLM Context Window Optimization

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors