Introduction to Context Engineering
Welcome back, future Harness Engineers! In the previous chapters, we laid the groundwork for building robust AI agents by focusing on systematic environments, robust state management, verification, and control systems. Now, it’s time to dive into what truly powers these agents’ decision-making: the context they operate within.
Imagine an AI agent as a brilliant but literal-minded apprentice. No matter how smart they are, their effectiveness hinges entirely on the clarity and completeness of the instructions you provide and the tools you give them. This is the essence of Context Engineering: the art and science of meticulously crafting the inputs—prompts and tool definitions—that guide an agent’s behavior to achieve desired outcomes reliably.
In this chapter, we’ll explore how to design these critical pieces of context. You’ll learn sophisticated prompt engineering techniques, how to define tools that agents can understand and use effectively, and strategies for managing the ever-present challenge of context window limitations. By the end, you’ll be equipped to give your AI agents the precise “brain food” they need to excel, moving beyond basic prompting to truly engineered context.
Prerequisites
To get the most out of this chapter, you should have a basic understanding of:
- Python programming fundamentals.
- The concepts of Large Language Models (LLMs) and their role in AI agents.
- Familiarity with the agent architecture discussed in previous chapters, particularly how agents interact with their environment and tools.
- A development environment set up with Python 3.10+ and
pip.
The Agent’s World: What is Context?
At its core, an AI agent operates within a defined “world” of information. This world is its context. It includes everything the agent knows or can know at any given moment to make decisions and perform actions.
What does this context typically include?
- System Prompt (Instructions): The overarching directives, persona, goals, and constraints for the agent. This is its mission statement.
- User Prompt (Current Task): The specific request or problem the user wants the agent to solve right now.
- Conversation History: Previous turns in a dialogue, providing continuity and memory.
- Tool Definitions: Descriptions of the functions or APIs the agent can call, along with their expected inputs and outputs.
- External Knowledge: Information retrieved from databases, documentation, or the internet, often provided by a Retrieval-Augmented Generation (RAG) system.
- Observation Results: The outcomes of previous tool calls or environment interactions.
Context engineering is about carefully curating and presenting all this information to the agent’s underlying LLM, ensuring it has everything it needs—and nothing it doesn’t—to make optimal decisions.
Why Context Engineering Matters
Without effective context engineering, even the most powerful LLM will struggle.
- Ambiguity: Vague prompts lead to unpredictable agent behavior or “hallucinations.”
- Inefficiency: Providing too much irrelevant information can waste precious context window tokens and confuse the agent.
- Unreliability: Poorly defined tools might be misused or ignored, leading to failed tasks.
- Security Risks: Without proper guardrails in the prompt, agents might perform unintended or harmful actions.
📌 Key Idea: Context engineering transforms an LLM from a general-purpose text generator into a focused, goal-oriented agent.
Crafting Effective Prompts: The Agent’s Mission Brief
The prompt is the agent’s primary directive. It’s where you define its identity, its purpose, and the rules of its engagement. For AI coding agents, this means setting up a persona, defining coding standards, and outlining the problem-solving process.
Let’s break down the components of a robust prompt for a coding agent.
System Prompt: Setting the Stage
The system prompt acts as the foundational layer, shaping the agent’s overall behavior.
# python_agent_harness/prompts/system_prompt.py
SYSTEM_PROMPT = """
You are a highly skilled, senior Python software engineer specializing in clean code, robust testing, and maintainable architecture.
Your primary goal is to assist the user by writing, refactoring, and debugging Python code.
Here are your core principles:
1. **Understand the Request Fully:** Always ask clarifying questions if the request is ambiguous or incomplete.
2. **Plan First:** Before writing or changing code, outline your approach. Explain your reasoning.
3. **Write Clean, Modern Python:** Adhere to PEP 8, use type hints, and favor idiomatic Python.
4. **Test Thoroughly:** When implementing new features or fixing bugs, consider unit tests or integration tests.
5. **Iterate and Reflect:** After performing an action (e.g., writing code, running tests), evaluate the outcome and adjust your plan.
6. **Safety First:** Never execute arbitrary shell commands or access sensitive files unless explicitly instructed and fully understood.
You have access to a set of tools to interact with the codebase and environment. Use them judiciously.
When asked to perform a task, think step-by-step and explain your reasoning.
"""Explanation:
- Persona: “Highly skilled, senior Python software engineer…” establishes authority and expertise.
- Primary Goal: “assist the user by writing, refactoring, and debugging Python code” defines its core purpose.
- Core Principles: These are explicit behavioral guidelines. They prevent common pitfalls (e.g., rushing to code, ignoring tests) and enforce best practices (PEP 8, type hints).
- Tool Access: Informs the agent it has tools, hinting at their usage.
- Think Step-by-Step: Encourages chain-of-thought reasoning, making the agent’s process more transparent and debuggable.
🧠 Important: The system prompt is your most powerful lever for shaping agent behavior. Invest time in refining it.
User Prompt: The Specific Task
The user prompt is the immediate instruction from the user. For a coding agent, this might be a bug report, a feature request, or a refactoring task.
USER_PROMPT_TEMPLATE = """
The user has provided the following task:
{task_description}
Current context and relevant files:
{file_contents}
Please analyze the task and the provided code, then formulate a plan.
"""Explanation:
task_description: This placeholder will be filled with the user’s specific request (e.g., “Fix the bug inutils.pywhere division by zero occurs”).file_contents: This is crucial for a coding agent. It allows you to inject relevant code snippets or entire file contents, providing the agent with the necessary code context to understand and modify.
⚡ Real-world insight: For complex coding tasks, file_contents might be dynamically populated by a RAG system that retrieves relevant files based on the task_description.
Few-Shot Examples (Optional but Powerful)
For tricky or nuanced tasks, providing a few examples of desired input/output pairs (or interaction sequences) can significantly improve agent performance. This is known as few-shot prompting.
For a coding agent, a few-shot example might demonstrate:
- How to respond when asking for clarification.
- The desired format for a code change (e.g., providing a diff).
- A typical interaction where the agent uses a tool, observes the result, and then takes the next logical step.
# Example of a few-shot interaction snippet (conceptual)
FEW_SHOT_EXAMPLE = """
User: "Refactor `calculate_total` in `order.py` to use a more functional approach."
Agent: "Okay, I understand. I will refactor the `calculate_total` function in `order.py` to be more functional.
My plan is:
1. Read the content of `order.py`.
2. Identify `calculate_total` and analyze its current implementation.
3. Propose a functional refactoring, perhaps using `map` or `reduce`.
4. Present the proposed changes as a diff.
Tool Use: read_file(file_path='order.py')
... (tool output) ...
Proposed diff:
```diff
--- a/order.py
+++ b/order.py
@@ -5,7 +5,7 @@
def calculate_total(items):
- total = 0
- for item in items:
- total += item.price * item.quantity
- return total
+ return sum(item.price * item.quantity for item in items)Does this look good?" """
**Explanation:** This example provides a concrete demonstration of the desired planning, tool usage, and output format.
## Defining Agent Tools: Extending Capabilities
An agent's usefulness skyrockets when it can interact with its environment through tools. For a coding agent, these tools might be:
* Reading/writing files.
* Running shell commands (with extreme caution!).
* Executing Python code in a sandbox.
* Interacting with a debugger or linter.
* Performing Git operations.
Modern LLMs (like OpenAI's GPT models or Google's Gemini) often support **function calling** or **tool use** natively. This means the LLM can parse a user request, decide which tool to use, and generate the correct arguments for that tool based on its definition.
### Tool Definition Structure
Each tool needs a clear definition that the LLM can understand. This typically involves:
1. **Name:** A unique identifier for the tool (e.g., `read_file`).
2. **Description:** A human-readable explanation of what the tool does and when to use it. This is crucial for the LLM's decision-making.
3. **Parameters/Schema:** A structured definition (often using JSON Schema or Pydantic) of the arguments the tool expects.
Let's define a `read_file` tool using Python and Pydantic (a popular library for data validation and settings management, version `2.x` as of 2026).
First, ensure you have Pydantic installed:
```bash
pip install "pydantic>=2.0"Then, define the tool:
# python_agent_harness/tools/file_tools.py
import os
from pydantic import BaseModel, Field
from typing import Callable, Dict, Any
# --- Pydantic Schema for Tool Input ---
class ReadFileInput(BaseModel):
"""Input schema for the read_file tool."""
file_path: str = Field(
..., description="The path to the file to read. Must be a valid, existing file."
)
# --- Tool Function ---
def read_file_tool(file_path: str) -> str:
"""
Reads the content of a specified file.
Returns the file content as a string.
Raises FileNotFoundError if the file does not exist.
"""
if not os.path.exists(file_path):
raise FileNotFoundError(f"File not found: {file_path}")
if not os.path.isfile(file_path):
raise ValueError(f"Path is not a file: {file_path}")
with open(file_path, 'r', encoding='utf-8') as f:
content = f.read()
return content
# --- Tool Definition (for LLM consumption) ---
# This structure is often adapted to the specific agent framework (e.g., LangChain)
def get_read_file_tool_definition() -> Dict[str, Any]:
"""
Returns the definition of the read_file tool in a format
compatible with common LLM function calling mechanisms.
"""
return {
"name": "read_file",
"description": "Reads the content of a specified file and returns it as a string. Use this to inspect code or documentation.",
"parameters": ReadFileInput.model_json_schema() # Pydantic v2.x method
}Explanation:
ReadFileInput(Pydantic Model): This defines the expected arguments forread_file_tool.file_path: str = Field(...): Declaresfile_pathas a required string.description: Provides a clear explanation for both humans and the LLM about whatfile_pathrepresents.
read_file_tool(Python Function): This is the actual Python logic that performs the file reading.- Includes basic error handling (
FileNotFoundError,ValueError) for robustness.
- Includes basic error handling (
get_read_file_tool_definition(LLM-Friendly Definition): This function packages the tool’s metadata into a dictionary format that LLM frameworks (like LangChain or directly interacting with OpenAI/Gemini APIs) can use."name": The name the LLM will use to refer to this tool."description": A detailed explanation of the tool’s purpose. This is critical for the LLM to decide when to use it."parameters": The JSON Schema generated directly from ourReadFileInputPydantic model. This ensures consistency and strict validation of arguments.
🔥 Optimization / Pro tip: Always provide precise and descriptive description fields for your tools. The LLM relies heavily on these descriptions to understand when and how to use a tool correctly.
Workflow with Tools
Here’s how an agent typically uses a tool:
Explanation of the flow:
- User Request: The user provides a task.
- Agent Decision Cycle: The agent’s LLM, given the system prompt, user request, and tool definitions, decides the next best action.
- LLM Tool Call: If a tool is needed, the LLM generates a structured call to one of the defined tools with appropriate arguments (e.g.,
read_file(file_path='main.py')). - Harness Executes Tool: The agent harness (your code) intercepts this tool call and executes the corresponding Python function (
read_file_tool). - Observation Result: The output of the tool (e.g., the content of
main.py) is returned. - Observation back to Agent: The tool’s output is injected back into the LLM’s context as an “observation.”
- Continue Decision Cycle: The LLM now has new information and can decide on the next step (e.g., use another tool, generate a final answer).
- Agent Response: Once the task is complete, the agent generates a final response to the user.
Managing Context Window Limits: The Ever-Present Challenge
LLMs have a finite context window—the maximum amount of text (tokens) they can process at once. For coding agents, this is a major bottleneck because codebases can be vast, and conversations can be long. Exceeding this limit leads to truncation, causing the agent to “forget” crucial information.
Here are strategies to manage context effectively:
Summarization:
- Concept: Periodically summarize conversation history or long tool outputs.
- Application: After several turns, condense the chat log into a concise summary that preserves key decisions and information.
- Caveat: Summarization can lose fine-grained details, so use it judiciously, especially for code.
Retrieval-Augmented Generation (RAG):
- Concept: Instead of putting all possible information into the context, retrieve only the most relevant pieces dynamically.
- Application: When the agent needs to reference code, documentation, or past interactions, use an embedding model and vector database to fetch relevant chunks of text.
- Benefit: Keeps context windows small and focused, allowing agents to work with large codebases. This is a cornerstone of advanced coding agents.
Sliding Window / Fixed Window:
- Concept: Keep only the most recent N tokens or messages in the context, discarding the oldest ones.
- Application: Simple to implement for conversation history, but can lead to forgetting early context.
- Trade-off: Easy, but less intelligent than summarization or RAG.
Hierarchical Context:
- Concept: Maintain different levels of context (e.g., a high-level project overview, a medium-level file context, and a low-level function context).
- Application: The agent can “zoom in” or “zoom out” by swapping out context based on the current task’s scope.
- Example: For a project-wide refactor, load project-level goals. When working on a specific file, load that file’s content.
⚠️ What can go wrong: Aggressive context reduction (especially summarization) can lead to agents “forgetting” critical details, causing subtle bugs or missed requirements. Always evaluate the impact of your context management strategy.
Step-by-Step Implementation: Building a Basic Context-Aware Agent
Let’s put these concepts into practice by building a simple Python coding agent that can read files and is guided by our engineered prompts. We’ll use a conceptual agent framework (similar to LangChain, but simplified for clarity).
First, create a project structure:
python_agent_harness/
├── agent.py
├── main.py
├── prompts/
│ └── system_prompt.py
├── tools/
│ └── file_tools.py
└── test_file.pypython_agent_harness/prompts/system_prompt.py: (Already defined above)python_agent_harness/tools/file_tools.py: (Already defined above)test_file.py: A simple file for our agent to interact with.# test_file.py def greet(name: str) -> str: """ Greets the given name. """ return f"Hello, {name}!" def add(a: int, b: int) -> int: """ Adds two numbers. """ return a + bpython_agent_harness/agent.py: This file will contain our conceptual agent class.We’ll simulate an LLM’s function-calling ability. In a real scenario, you’d integrate with an actual LLM API (e.g.,
openai.ChatCompletion.createorgoogle.generativeai.GenerativeModel).# python_agent_harness/agent.py import json from typing import List, Dict, Any, Tuple from .prompts.system_prompt import SYSTEM_PROMPT from .tools.file_tools import read_file_tool, get_read_file_tool_definition class SimpleCodingAgent: def __init__(self, llm_model_name: str = "gpt-4o-2026-06-18"): # Simulate a modern LLM self.llm_model_name = llm_model_name self.tools = { "read_file": read_file_tool } self.tool_definitions = [ get_read_file_tool_definition() ] self.messages: List[Dict[str, str]] = [{"role": "system", "content": SYSTEM_PROMPT}] print(f"Agent initialized with LLM: {self.llm_model_name}") def _call_llm(self) -> Tuple[str, Dict[str, Any]]: """ Simulates an LLM call. In a real scenario, this would interact with an actual LLM API. For this example, we'll hardcode a tool call response. """ print("\n--- Simulating LLM Call ---") # For demonstration, let's hardcode the LLM to call 'read_file' # when asked about `test_file.py`. # In a real LLM, it would generate this based on the prompt. if "test_file.py" in self.messages[-1]["content"]: print("LLM decided to call read_file for test_file.py") return "tool_call", { "tool_name": "read_file", "tool_args": {"file_path": "test_file.py"} } else: # If no specific tool call, simulate a general response print("LLM decided to respond directly.") return "response", {"content": "I am a coding agent. How can I help you with your Python code?"} def run(self, user_task: str) -> str: self.messages.append({"role": "user", "content": user_task}) print(f"\nUser: {user_task}") while True: action_type, action_details = self._call_llm() if action_type == "tool_call": tool_name = action_details["tool_name"] tool_args = action_details["tool_args"] print(f"Agent requested tool: {tool_name} with args: {tool_args}") if tool_name in self.tools: try: # Execute the tool tool_output = self.tools[tool_name](**tool_args) print(f"Tool '{tool_name}' executed successfully.") # Add tool output as an observation to context self.messages.append({"role": "tool", "content": json.dumps( {"tool_name": tool_name, "args": tool_args, "output": tool_output} )}) print("\n--- Tool Output Added to Context ---") print(tool_output[:200] + "..." if len(tool_output) > 200 else tool_output) # Truncate for display print("Agent will continue thinking with new observation.") except Exception as e: error_message = f"Error executing tool '{tool_name}': {e}" self.messages.append({"role": "tool", "content": json.dumps( {"tool_name": tool_name, "args": tool_args, "error": error_message} )}) print(f"\n--- Tool Error Added to Context ---") print(error_message) return f"Agent encountered an error: {error_message}" else: error_message = f"Agent requested unknown tool: {tool_name}" self.messages.append({"role": "tool", "content": json.dumps( {"tool_name": tool_name, "args": tool_args, "error": error_message} )}) print(f"\n--- Unknown Tool Error Added to Context ---") print(error_message) return f"Agent encountered an error: {error_message}" elif action_type == "response": final_response = action_details["content"] self.messages.append({"role": "assistant", "content": final_response}) print(f"\nAgent: {final_response}") return final_response else: return "Unexpected action type from LLM."Explanation of
agent.py:SimpleCodingAgent.__init__: Initializes the agent with a simulated LLM name, registers theread_file_tool, and sets up the initialSYSTEM_PROMPT._call_llm(): This is a simulated LLM interaction. In a real system, you’d make an API call to an LLM likeopenai.ChatCompletion.createwhich would takeself.messagesandself.tool_definitionsas input and return either a text response or a tool call. For this example, we’re hardcoding a tool call toread_fileif the user task mentionstest_file.py.run(user_task): This is the agent’s main loop.- It adds the user’s task to the
messages(the agent’s context). - It calls the simulated LLM.
- If the LLM decides to
tool_call, it executes the tool, captures the output (or error), and adds it back toself.messagesas arole: "tool"message (an observation). This new observation then becomes part of the context for the next simulated LLM call, enabling multi-step reasoning. - If the LLM decides to
response, it returns the final answer.
- It adds the user’s task to the
main.py: To run our agent.# main.py from python_agent_harness.agent import SimpleCodingAgent def main(): print("Starting Simple Coding Agent...") agent = SimpleCodingAgent() # Task 1: A general query, should not trigger tool call in our simulation agent.run("Tell me about the principles of clean code.") print("\n" + "="*50 + "\n") # Task 2: A specific query that should trigger the read_file tool agent.run("What are the contents of test_file.py? I need to understand the functions defined there.") if __name__ == "__main__": main()
To run this example:
- Save the files into the
python_agent_harnessdirectory as structured above. - Make sure
test_file.pyis in the root of your project, alongside thepython_agent_harnessdirectory. - From your project root (where
main.pyandtest_file.pyare), run:python main.py
You should observe the agent responding to the first query generally, and then for the second query, it will “decide” to call read_file for test_file.py and present its contents (simulated).
Mini-Challenge: Enhance the Agent’s Toolset
Now it’s your turn! Let’s give our agent another crucial capability.
Challenge: Implement a write_file tool for our SimpleCodingAgent.
- Define a Pydantic Schema: Create a
WriteFileInputmodel forfile_pathandcontent. - Implement the Tool Function: Create
write_file_tool(file_path: str, content: str) -> strthat writes thecontenttofile_path. Handle potential errors (e.g., directory not found). - Create Tool Definition: Add
get_write_file_tool_definition()tofile_tools.pysimilar toread_file. - Integrate into Agent: Register the new tool in
SimpleCodingAgent.__init__. - Modify
_call_llm(simulation): Update the_call_llmmethod inagent.pyto simulate callingwrite_fileif the user asks to “create” or “modify” a file (e.g., “Create a new file calledtemp.pywith contentprint('Hello')”). - Test in
main.py: Add a newagent.run()call to test yourwrite_filetool.
Hint: For write_file_tool, you might want to use os.makedirs(os.path.dirname(file_path), exist_ok=True) to ensure the directory exists before writing the file.
What to observe/learn:
- How adding new tools requires defining both the Pydantic schema and the executable function.
- How the LLM’s “decision-making” (even simulated here) would rely on the tool’s description to choose the correct tool.
- The importance of error handling in tool functions for robust agent behavior.
Common Pitfalls & Troubleshooting
Vague Tool Descriptions:
- Pitfall: If your tool’s
descriptionis unclear or generic, the LLM won’t know when to use it, or it might use it incorrectly. - Troubleshooting: Make tool descriptions highly specific, including examples of use cases if necessary. Emphasize preconditions and postconditions.
- Bad: “A tool to get info.”
- Good: “Retrieves the current weather for a specific city. Use this when the user asks about weather conditions in a geographical location.”
- Pitfall: If your tool’s
Context Window Overload:
- Pitfall: Pushing too much information (long chat histories, huge code files) into the LLM’s context. This leads to truncation, “forgetting,” and expensive API calls.
- Troubleshooting: Implement robust context management strategies:
- Start with RAG for large knowledge bases.
- Summarize conversation history after a certain number of turns or token count.
- Only pass relevant code snippets, not entire repositories.
Schema Mismatch/Validation Errors:
- Pitfall: The LLM generates arguments for a tool that don’t match your Pydantic schema, leading to runtime errors when executing the tool.
- Troubleshooting:
- Ensure your tool’s parameter
descriptionin the tool definition is crystal clear. - Review the LLM’s output for malformed JSON or incorrect argument types.
- Ensure your Pydantic models are robust and handle edge cases (e.g., optional fields, default values).
🔥 Optimization / Pro tip:Some frameworks allow you to “repair” LLM-generated tool calls if they’re slightly malformed, using another LLM call or regex.
- Ensure your tool’s parameter
Lack of Observability into Agent Decisions:
- Pitfall: The agent makes a decision (or doesn’t use a tool when it should), and you don’t understand why.
- Troubleshooting: Log the LLM’s input (the full context, including system prompt, user prompt, and tool definitions) and its raw output (tool calls, responses). This allows you to trace its reasoning. Tools like LangSmith (a product by LangChain) are designed specifically for this.
Summary
In this chapter, we’ve explored the critical discipline of Context Engineering, understanding how to provide AI agents with the precise information they need to function effectively.
Here are the key takeaways:
- Context is King: An agent’s performance is directly tied to the quality and relevance of its input context, including system prompts, user prompts, conversation history, tool definitions, and observations.
- Prompt Engineering is Foundational: Craft robust system prompts to define the agent’s persona, goals, and principles. Use user prompts to provide specific tasks and relevant context. Few-shot examples can guide complex behaviors.
- Tools Extend Capabilities: Define tools with clear names, descriptions, and structured parameters (using Pydantic schemas) to allow agents to interact with their environment.
- Context Window Management is Essential: Employ strategies like summarization, RAG, and sliding windows to keep the context within LLM limits without losing critical information.
- Iterate and Observe: Context engineering is an iterative process. Continuously refine your prompts and tool definitions based on agent performance and observed behaviors, using thorough logging for debugging.
What’s Next?
With a solid understanding of how to engineer an agent’s context, we’re ready to dive into the mechanisms that allow us to systematically test and validate their performance. In the next chapter, we’ll explore Verification and Evaluation (Evals) Frameworks, learning how to measure and ensure the reliability of your AI coding agents. Get ready to build confidence in your agents!
References
- Modern Agent Harness Blueprint 2026 - GitHub Gist
- RasaHQ/why-agents-fail: A self-paced course on harness engineering
- Pydantic V2 Documentation (as of 2026)
- OpenAI Function Calling Guide (Conceptual API Reference)
- LangChain Documentation: Agent Tools (Conceptual Guide)
- ai-boost/awesome-harness-engineering - GitHub
This page is AI-assisted and reviewed. It references official documentation and recognized resources where relevant.