Introduction to Context Engineering

Welcome back, future Harness Engineers! In the previous chapters, we laid the groundwork for building robust AI agents by focusing on systematic environments, robust state management, verification, and control systems. Now, it’s time to dive into what truly powers these agents’ decision-making: the context they operate within.

Imagine an AI agent as a brilliant but literal-minded apprentice. No matter how smart they are, their effectiveness hinges entirely on the clarity and completeness of the instructions you provide and the tools you give them. This is the essence of Context Engineering: the art and science of meticulously crafting the inputs—prompts and tool definitions—that guide an agent’s behavior to achieve desired outcomes reliably.

In this chapter, we’ll explore how to design these critical pieces of context. You’ll learn sophisticated prompt engineering techniques, how to define tools that agents can understand and use effectively, and strategies for managing the ever-present challenge of context window limitations. By the end, you’ll be equipped to give your AI agents the precise “brain food” they need to excel, moving beyond basic prompting to truly engineered context.

Prerequisites

To get the most out of this chapter, you should have a basic understanding of:

  • Python programming fundamentals.
  • The concepts of Large Language Models (LLMs) and their role in AI agents.
  • Familiarity with the agent architecture discussed in previous chapters, particularly how agents interact with their environment and tools.
  • A development environment set up with Python 3.10+ and pip.

The Agent’s World: What is Context?

At its core, an AI agent operates within a defined “world” of information. This world is its context. It includes everything the agent knows or can know at any given moment to make decisions and perform actions.

What does this context typically include?

  1. System Prompt (Instructions): The overarching directives, persona, goals, and constraints for the agent. This is its mission statement.
  2. User Prompt (Current Task): The specific request or problem the user wants the agent to solve right now.
  3. Conversation History: Previous turns in a dialogue, providing continuity and memory.
  4. Tool Definitions: Descriptions of the functions or APIs the agent can call, along with their expected inputs and outputs.
  5. External Knowledge: Information retrieved from databases, documentation, or the internet, often provided by a Retrieval-Augmented Generation (RAG) system.
  6. Observation Results: The outcomes of previous tool calls or environment interactions.

Context engineering is about carefully curating and presenting all this information to the agent’s underlying LLM, ensuring it has everything it needs—and nothing it doesn’t—to make optimal decisions.

Why Context Engineering Matters

Without effective context engineering, even the most powerful LLM will struggle.

  • Ambiguity: Vague prompts lead to unpredictable agent behavior or “hallucinations.”
  • Inefficiency: Providing too much irrelevant information can waste precious context window tokens and confuse the agent.
  • Unreliability: Poorly defined tools might be misused or ignored, leading to failed tasks.
  • Security Risks: Without proper guardrails in the prompt, agents might perform unintended or harmful actions.

📌 Key Idea: Context engineering transforms an LLM from a general-purpose text generator into a focused, goal-oriented agent.

Crafting Effective Prompts: The Agent’s Mission Brief

The prompt is the agent’s primary directive. It’s where you define its identity, its purpose, and the rules of its engagement. For AI coding agents, this means setting up a persona, defining coding standards, and outlining the problem-solving process.

Let’s break down the components of a robust prompt for a coding agent.

System Prompt: Setting the Stage

The system prompt acts as the foundational layer, shaping the agent’s overall behavior.

# python_agent_harness/prompts/system_prompt.py
SYSTEM_PROMPT = """
You are a highly skilled, senior Python software engineer specializing in clean code, robust testing, and maintainable architecture.
Your primary goal is to assist the user by writing, refactoring, and debugging Python code.

Here are your core principles:
1.  **Understand the Request Fully:** Always ask clarifying questions if the request is ambiguous or incomplete.
2.  **Plan First:** Before writing or changing code, outline your approach. Explain your reasoning.
3.  **Write Clean, Modern Python:** Adhere to PEP 8, use type hints, and favor idiomatic Python.
4.  **Test Thoroughly:** When implementing new features or fixing bugs, consider unit tests or integration tests.
5.  **Iterate and Reflect:** After performing an action (e.g., writing code, running tests), evaluate the outcome and adjust your plan.
6.  **Safety First:** Never execute arbitrary shell commands or access sensitive files unless explicitly instructed and fully understood.

You have access to a set of tools to interact with the codebase and environment. Use them judiciously.
When asked to perform a task, think step-by-step and explain your reasoning.
"""

Explanation:

  • Persona: “Highly skilled, senior Python software engineer…” establishes authority and expertise.
  • Primary Goal: “assist the user by writing, refactoring, and debugging Python code” defines its core purpose.
  • Core Principles: These are explicit behavioral guidelines. They prevent common pitfalls (e.g., rushing to code, ignoring tests) and enforce best practices (PEP 8, type hints).
  • Tool Access: Informs the agent it has tools, hinting at their usage.
  • Think Step-by-Step: Encourages chain-of-thought reasoning, making the agent’s process more transparent and debuggable.

🧠 Important: The system prompt is your most powerful lever for shaping agent behavior. Invest time in refining it.

User Prompt: The Specific Task

The user prompt is the immediate instruction from the user. For a coding agent, this might be a bug report, a feature request, or a refactoring task.

USER_PROMPT_TEMPLATE = """
The user has provided the following task:
{task_description}

Current context and relevant files:
{file_contents}

Please analyze the task and the provided code, then formulate a plan.
"""

Explanation:

  • task_description: This placeholder will be filled with the user’s specific request (e.g., “Fix the bug in utils.py where division by zero occurs”).
  • file_contents: This is crucial for a coding agent. It allows you to inject relevant code snippets or entire file contents, providing the agent with the necessary code context to understand and modify.

⚡ Real-world insight: For complex coding tasks, file_contents might be dynamically populated by a RAG system that retrieves relevant files based on the task_description.

Few-Shot Examples (Optional but Powerful)

For tricky or nuanced tasks, providing a few examples of desired input/output pairs (or interaction sequences) can significantly improve agent performance. This is known as few-shot prompting.

For a coding agent, a few-shot example might demonstrate:

  • How to respond when asking for clarification.
  • The desired format for a code change (e.g., providing a diff).
  • A typical interaction where the agent uses a tool, observes the result, and then takes the next logical step.
# Example of a few-shot interaction snippet (conceptual)
FEW_SHOT_EXAMPLE = """
User: "Refactor `calculate_total` in `order.py` to use a more functional approach."

Agent: "Okay, I understand. I will refactor the `calculate_total` function in `order.py` to be more functional.
My plan is:
1.  Read the content of `order.py`.
2.  Identify `calculate_total` and analyze its current implementation.
3.  Propose a functional refactoring, perhaps using `map` or `reduce`.
4.  Present the proposed changes as a diff.

Tool Use: read_file(file_path='order.py')
... (tool output) ...
Proposed diff:
```diff
--- a/order.py
+++ b/order.py
@@ -5,7 +5,7 @@
 def calculate_total(items):
-    total = 0
-    for item in items:
-        total += item.price * item.quantity
-    return total
+    return sum(item.price * item.quantity for item in items)

Does this look good?" """


**Explanation:** This example provides a concrete demonstration of the desired planning, tool usage, and output format.

## Defining Agent Tools: Extending Capabilities

An agent's usefulness skyrockets when it can interact with its environment through tools. For a coding agent, these tools might be:
*   Reading/writing files.
*   Running shell commands (with extreme caution!).
*   Executing Python code in a sandbox.
*   Interacting with a debugger or linter.
*   Performing Git operations.

Modern LLMs (like OpenAI's GPT models or Google's Gemini) often support **function calling** or **tool use** natively. This means the LLM can parse a user request, decide which tool to use, and generate the correct arguments for that tool based on its definition.

### Tool Definition Structure

Each tool needs a clear definition that the LLM can understand. This typically involves:
1.  **Name:** A unique identifier for the tool (e.g., `read_file`).
2.  **Description:** A human-readable explanation of what the tool does and when to use it. This is crucial for the LLM's decision-making.
3.  **Parameters/Schema:** A structured definition (often using JSON Schema or Pydantic) of the arguments the tool expects.

Let's define a `read_file` tool using Python and Pydantic (a popular library for data validation and settings management, version `2.x` as of 2026).

First, ensure you have Pydantic installed:
```bash
pip install "pydantic>=2.0"

Then, define the tool:

# python_agent_harness/tools/file_tools.py
import os
from pydantic import BaseModel, Field
from typing import Callable, Dict, Any

# --- Pydantic Schema for Tool Input ---
class ReadFileInput(BaseModel):
    """Input schema for the read_file tool."""
    file_path: str = Field(
        ..., description="The path to the file to read. Must be a valid, existing file."
    )

# --- Tool Function ---
def read_file_tool(file_path: str) -> str:
    """
    Reads the content of a specified file.
    Returns the file content as a string.
    Raises FileNotFoundError if the file does not exist.
    """
    if not os.path.exists(file_path):
        raise FileNotFoundError(f"File not found: {file_path}")
    if not os.path.isfile(file_path):
        raise ValueError(f"Path is not a file: {file_path}")
    
    with open(file_path, 'r', encoding='utf-8') as f:
        content = f.read()
    return content

# --- Tool Definition (for LLM consumption) ---
# This structure is often adapted to the specific agent framework (e.g., LangChain)
def get_read_file_tool_definition() -> Dict[str, Any]:
    """
    Returns the definition of the read_file tool in a format
    compatible with common LLM function calling mechanisms.
    """
    return {
        "name": "read_file",
        "description": "Reads the content of a specified file and returns it as a string. Use this to inspect code or documentation.",
        "parameters": ReadFileInput.model_json_schema() # Pydantic v2.x method
    }

Explanation:

  • ReadFileInput (Pydantic Model): This defines the expected arguments for read_file_tool.
    • file_path: str = Field(...): Declares file_path as a required string.
    • description: Provides a clear explanation for both humans and the LLM about what file_path represents.
  • read_file_tool (Python Function): This is the actual Python logic that performs the file reading.
    • Includes basic error handling (FileNotFoundError, ValueError) for robustness.
  • get_read_file_tool_definition (LLM-Friendly Definition): This function packages the tool’s metadata into a dictionary format that LLM frameworks (like LangChain or directly interacting with OpenAI/Gemini APIs) can use.
    • "name": The name the LLM will use to refer to this tool.
    • "description": A detailed explanation of the tool’s purpose. This is critical for the LLM to decide when to use it.
    • "parameters": The JSON Schema generated directly from our ReadFileInput Pydantic model. This ensures consistency and strict validation of arguments.

🔥 Optimization / Pro tip: Always provide precise and descriptive description fields for your tools. The LLM relies heavily on these descriptions to understand when and how to use a tool correctly.

Workflow with Tools

Here’s how an agent typically uses a tool:

graph TD A[User Request] --> B{Agent Decision Cycle} B -->|Generates Tool Call| C[LLM Tool Call] C --> D[Harness Executes Tool] D -->|Tool Output| E[Observation Result] E --> B B -->|No More Tools Needed| F[Agent Response to User]

Explanation of the flow:

  1. User Request: The user provides a task.
  2. Agent Decision Cycle: The agent’s LLM, given the system prompt, user request, and tool definitions, decides the next best action.
  3. LLM Tool Call: If a tool is needed, the LLM generates a structured call to one of the defined tools with appropriate arguments (e.g., read_file(file_path='main.py')).
  4. Harness Executes Tool: The agent harness (your code) intercepts this tool call and executes the corresponding Python function (read_file_tool).
  5. Observation Result: The output of the tool (e.g., the content of main.py) is returned.
  6. Observation back to Agent: The tool’s output is injected back into the LLM’s context as an “observation.”
  7. Continue Decision Cycle: The LLM now has new information and can decide on the next step (e.g., use another tool, generate a final answer).
  8. Agent Response: Once the task is complete, the agent generates a final response to the user.

Managing Context Window Limits: The Ever-Present Challenge

LLMs have a finite context window—the maximum amount of text (tokens) they can process at once. For coding agents, this is a major bottleneck because codebases can be vast, and conversations can be long. Exceeding this limit leads to truncation, causing the agent to “forget” crucial information.

Here are strategies to manage context effectively:

  1. Summarization:

    • Concept: Periodically summarize conversation history or long tool outputs.
    • Application: After several turns, condense the chat log into a concise summary that preserves key decisions and information.
    • Caveat: Summarization can lose fine-grained details, so use it judiciously, especially for code.
  2. Retrieval-Augmented Generation (RAG):

    • Concept: Instead of putting all possible information into the context, retrieve only the most relevant pieces dynamically.
    • Application: When the agent needs to reference code, documentation, or past interactions, use an embedding model and vector database to fetch relevant chunks of text.
    • Benefit: Keeps context windows small and focused, allowing agents to work with large codebases. This is a cornerstone of advanced coding agents.
  3. Sliding Window / Fixed Window:

    • Concept: Keep only the most recent N tokens or messages in the context, discarding the oldest ones.
    • Application: Simple to implement for conversation history, but can lead to forgetting early context.
    • Trade-off: Easy, but less intelligent than summarization or RAG.
  4. Hierarchical Context:

    • Concept: Maintain different levels of context (e.g., a high-level project overview, a medium-level file context, and a low-level function context).
    • Application: The agent can “zoom in” or “zoom out” by swapping out context based on the current task’s scope.
    • Example: For a project-wide refactor, load project-level goals. When working on a specific file, load that file’s content.

⚠️ What can go wrong: Aggressive context reduction (especially summarization) can lead to agents “forgetting” critical details, causing subtle bugs or missed requirements. Always evaluate the impact of your context management strategy.

Step-by-Step Implementation: Building a Basic Context-Aware Agent

Let’s put these concepts into practice by building a simple Python coding agent that can read files and is guided by our engineered prompts. We’ll use a conceptual agent framework (similar to LangChain, but simplified for clarity).

First, create a project structure:

python_agent_harness/
├── agent.py
├── main.py
├── prompts/
│   └── system_prompt.py
├── tools/
│   └── file_tools.py
└── test_file.py
  1. python_agent_harness/prompts/system_prompt.py: (Already defined above)

  2. python_agent_harness/tools/file_tools.py: (Already defined above)

  3. test_file.py: A simple file for our agent to interact with.

    # test_file.py
    def greet(name: str) -> str:
        """
        Greets the given name.
        """
        return f"Hello, {name}!"
    
    def add(a: int, b: int) -> int:
        """
        Adds two numbers.
        """
        return a + b
  4. python_agent_harness/agent.py: This file will contain our conceptual agent class.

    We’ll simulate an LLM’s function-calling ability. In a real scenario, you’d integrate with an actual LLM API (e.g., openai.ChatCompletion.create or google.generativeai.GenerativeModel).

    # python_agent_harness/agent.py
    import json
    from typing import List, Dict, Any, Tuple
    from .prompts.system_prompt import SYSTEM_PROMPT
    from .tools.file_tools import read_file_tool, get_read_file_tool_definition
    
    class SimpleCodingAgent:
        def __init__(self, llm_model_name: str = "gpt-4o-2026-06-18"): # Simulate a modern LLM
            self.llm_model_name = llm_model_name
            self.tools = {
                "read_file": read_file_tool
            }
            self.tool_definitions = [
                get_read_file_tool_definition()
            ]
            self.messages: List[Dict[str, str]] = [{"role": "system", "content": SYSTEM_PROMPT}]
            print(f"Agent initialized with LLM: {self.llm_model_name}")
    
        def _call_llm(self) -> Tuple[str, Dict[str, Any]]:
            """
            Simulates an LLM call. In a real scenario, this would interact
            with an actual LLM API. For this example, we'll hardcode a
            tool call response.
            """
            print("\n--- Simulating LLM Call ---")
            # For demonstration, let's hardcode the LLM to call 'read_file'
            # when asked about `test_file.py`.
            # In a real LLM, it would generate this based on the prompt.
            if "test_file.py" in self.messages[-1]["content"]:
                print("LLM decided to call read_file for test_file.py")
                return "tool_call", {
                    "tool_name": "read_file",
                    "tool_args": {"file_path": "test_file.py"}
                }
            else:
                # If no specific tool call, simulate a general response
                print("LLM decided to respond directly.")
                return "response", {"content": "I am a coding agent. How can I help you with your Python code?"}
    
        def run(self, user_task: str) -> str:
            self.messages.append({"role": "user", "content": user_task})
            print(f"\nUser: {user_task}")
    
            while True:
                action_type, action_details = self._call_llm()
    
                if action_type == "tool_call":
                    tool_name = action_details["tool_name"]
                    tool_args = action_details["tool_args"]
                    print(f"Agent requested tool: {tool_name} with args: {tool_args}")
    
                    if tool_name in self.tools:
                        try:
                            # Execute the tool
                            tool_output = self.tools[tool_name](**tool_args)
                            print(f"Tool '{tool_name}' executed successfully.")
                            # Add tool output as an observation to context
                            self.messages.append({"role": "tool", "content": json.dumps(
                                {"tool_name": tool_name, "args": tool_args, "output": tool_output}
                            )})
                            print("\n--- Tool Output Added to Context ---")
                            print(tool_output[:200] + "..." if len(tool_output) > 200 else tool_output) # Truncate for display
                            print("Agent will continue thinking with new observation.")
                        except Exception as e:
                            error_message = f"Error executing tool '{tool_name}': {e}"
                            self.messages.append({"role": "tool", "content": json.dumps(
                                {"tool_name": tool_name, "args": tool_args, "error": error_message}
                            )})
                            print(f"\n--- Tool Error Added to Context ---")
                            print(error_message)
                            return f"Agent encountered an error: {error_message}"
                    else:
                        error_message = f"Agent requested unknown tool: {tool_name}"
                        self.messages.append({"role": "tool", "content": json.dumps(
                            {"tool_name": tool_name, "args": tool_args, "error": error_message}
                        )})
                        print(f"\n--- Unknown Tool Error Added to Context ---")
                        print(error_message)
                        return f"Agent encountered an error: {error_message}"
                elif action_type == "response":
                    final_response = action_details["content"]
                    self.messages.append({"role": "assistant", "content": final_response})
                    print(f"\nAgent: {final_response}")
                    return final_response
                else:
                    return "Unexpected action type from LLM."

    Explanation of agent.py:

    • SimpleCodingAgent.__init__: Initializes the agent with a simulated LLM name, registers the read_file_tool, and sets up the initial SYSTEM_PROMPT.
    • _call_llm(): This is a simulated LLM interaction. In a real system, you’d make an API call to an LLM like openai.ChatCompletion.create which would take self.messages and self.tool_definitions as input and return either a text response or a tool call. For this example, we’re hardcoding a tool call to read_file if the user task mentions test_file.py.
    • run(user_task): This is the agent’s main loop.
      • It adds the user’s task to the messages (the agent’s context).
      • It calls the simulated LLM.
      • If the LLM decides to tool_call, it executes the tool, captures the output (or error), and adds it back to self.messages as a role: "tool" message (an observation). This new observation then becomes part of the context for the next simulated LLM call, enabling multi-step reasoning.
      • If the LLM decides to response, it returns the final answer.
  5. main.py: To run our agent.

    # main.py
    from python_agent_harness.agent import SimpleCodingAgent
    
    def main():
        print("Starting Simple Coding Agent...")
        agent = SimpleCodingAgent()
    
        # Task 1: A general query, should not trigger tool call in our simulation
        agent.run("Tell me about the principles of clean code.")
    
        print("\n" + "="*50 + "\n")
    
        # Task 2: A specific query that should trigger the read_file tool
        agent.run("What are the contents of test_file.py? I need to understand the functions defined there.")
    
    if __name__ == "__main__":
        main()

To run this example:

  1. Save the files into the python_agent_harness directory as structured above.
  2. Make sure test_file.py is in the root of your project, alongside the python_agent_harness directory.
  3. From your project root (where main.py and test_file.py are), run:
    python main.py

You should observe the agent responding to the first query generally, and then for the second query, it will “decide” to call read_file for test_file.py and present its contents (simulated).

Mini-Challenge: Enhance the Agent’s Toolset

Now it’s your turn! Let’s give our agent another crucial capability.

Challenge: Implement a write_file tool for our SimpleCodingAgent.

  1. Define a Pydantic Schema: Create a WriteFileInput model for file_path and content.
  2. Implement the Tool Function: Create write_file_tool(file_path: str, content: str) -> str that writes the content to file_path. Handle potential errors (e.g., directory not found).
  3. Create Tool Definition: Add get_write_file_tool_definition() to file_tools.py similar to read_file.
  4. Integrate into Agent: Register the new tool in SimpleCodingAgent.__init__.
  5. Modify _call_llm (simulation): Update the _call_llm method in agent.py to simulate calling write_file if the user asks to “create” or “modify” a file (e.g., “Create a new file called temp.py with content print('Hello')”).
  6. Test in main.py: Add a new agent.run() call to test your write_file tool.

Hint: For write_file_tool, you might want to use os.makedirs(os.path.dirname(file_path), exist_ok=True) to ensure the directory exists before writing the file.

What to observe/learn:

  • How adding new tools requires defining both the Pydantic schema and the executable function.
  • How the LLM’s “decision-making” (even simulated here) would rely on the tool’s description to choose the correct tool.
  • The importance of error handling in tool functions for robust agent behavior.

Common Pitfalls & Troubleshooting

  1. Vague Tool Descriptions:

    • Pitfall: If your tool’s description is unclear or generic, the LLM won’t know when to use it, or it might use it incorrectly.
    • Troubleshooting: Make tool descriptions highly specific, including examples of use cases if necessary. Emphasize preconditions and postconditions.
      • Bad: “A tool to get info.”
      • Good: “Retrieves the current weather for a specific city. Use this when the user asks about weather conditions in a geographical location.”
  2. Context Window Overload:

    • Pitfall: Pushing too much information (long chat histories, huge code files) into the LLM’s context. This leads to truncation, “forgetting,” and expensive API calls.
    • Troubleshooting: Implement robust context management strategies:
      • Start with RAG for large knowledge bases.
      • Summarize conversation history after a certain number of turns or token count.
      • Only pass relevant code snippets, not entire repositories.
  3. Schema Mismatch/Validation Errors:

    • Pitfall: The LLM generates arguments for a tool that don’t match your Pydantic schema, leading to runtime errors when executing the tool.
    • Troubleshooting:
      • Ensure your tool’s parameter description in the tool definition is crystal clear.
      • Review the LLM’s output for malformed JSON or incorrect argument types.
      • Ensure your Pydantic models are robust and handle edge cases (e.g., optional fields, default values).
      • 🔥 Optimization / Pro tip: Some frameworks allow you to “repair” LLM-generated tool calls if they’re slightly malformed, using another LLM call or regex.
  4. Lack of Observability into Agent Decisions:

    • Pitfall: The agent makes a decision (or doesn’t use a tool when it should), and you don’t understand why.
    • Troubleshooting: Log the LLM’s input (the full context, including system prompt, user prompt, and tool definitions) and its raw output (tool calls, responses). This allows you to trace its reasoning. Tools like LangSmith (a product by LangChain) are designed specifically for this.

Summary

In this chapter, we’ve explored the critical discipline of Context Engineering, understanding how to provide AI agents with the precise information they need to function effectively.

Here are the key takeaways:

  • Context is King: An agent’s performance is directly tied to the quality and relevance of its input context, including system prompts, user prompts, conversation history, tool definitions, and observations.
  • Prompt Engineering is Foundational: Craft robust system prompts to define the agent’s persona, goals, and principles. Use user prompts to provide specific tasks and relevant context. Few-shot examples can guide complex behaviors.
  • Tools Extend Capabilities: Define tools with clear names, descriptions, and structured parameters (using Pydantic schemas) to allow agents to interact with their environment.
  • Context Window Management is Essential: Employ strategies like summarization, RAG, and sliding windows to keep the context within LLM limits without losing critical information.
  • Iterate and Observe: Context engineering is an iterative process. Continuously refine your prompts and tool definitions based on agent performance and observed behaviors, using thorough logging for debugging.

What’s Next?

With a solid understanding of how to engineer an agent’s context, we’re ready to dive into the mechanisms that allow us to systematically test and validate their performance. In the next chapter, we’ll explore Verification and Evaluation (Evals) Frameworks, learning how to measure and ensure the reliability of your AI coding agents. Get ready to build confidence in your agents!

References

This page is AI-assisted and reviewed. It references official documentation and recognized resources where relevant.