Creating an Advanced AI Agent From Scratch with Python in 2026: Part 2

Implementing Long-term Memory, Human-in-the-Loop, Observability, and Error Recovery

Hamza Boulahia

Towards AI

· ~16 min read · January 20, 2026 (Updated: March 19, 2026) · Free: No

Non-members link

In Part 1, we built a foundational AI agent with a modular tool system, type-safe structured outputs, and the ReAct reasoning pattern. We created an agent that could use tools, think through problems step-by-step, and provide reliable responses.

But our Part 1 agent had limitations that would make it unsuitable for production:

No memory: Each conversation started fresh, with no context from previous interactions
No oversight: The agent could perform any action without human approval
Limited visibility: We couldn't easily debug or monitor agent behavior
Fragile execution: Tool failures would crash the entire agent

There are many other limitations that we could address, but these are the most important ones that I decided to show you how to work on, as they can be useful for any type of agent that you may want to build.

Today, we're upgrading our agent with four production-critical features:

Long-term Memory: Persisting conversations across sessions
Human-in-the-Loop (HITL): Requiring approval for critical actions
Advanced Observability: Comprehensive logging and tracing
Error Recovery: Graceful failure handling with retry logic

If you haven't read Part 1 yet, I strongly recommend starting there, as we'll be building directly on that foundation. You can find the complete code for both parts in these notebooks: Colab Notebook Part 1, Colab Notebook Part 2.

Also these posts are part of my Master LLMs series. It's a blog series that I am creating to guide you from fundamentals to production-ready systems. Read the full series to go deeper into what makes these models tick and how to wield them effectively.

List: Master LLMs: A Practical Guide from Fundamentals to Mastery | Curated by Hamza Boulahia |…

Master LLMs: A Practical Guide from Fundamentals to Mastery · A 15-part series to truly master large language models…

medium.com

Why These Features Matter

Before we dive into implementation, let's understand why each feature is essential in today's advanced agents:

Long-term Memory transforms your agent from a stateless function into a learning system that improves over time. Imagine a customer support agent that remembers past issues or a personal assistant that learns your preferences.
Human-in-the-Loop is critical for safety and trust. You don't want an agent deleting production databases or sending emails without approval. HITL provides a safety gate for high-stakes actions.
Observability is what separates toys agent from production systems. When something goes wrong (and it will), you need detailed logs showing exactly what the agent did, when, and why.
Error Recovery makes your agent resilient. APIs fail, networks timeout, and rate limits hit. A production agent must handle these gracefully rather than crashing.

Let's build each of these, starting with memory.

Feature 1: Long-term Memory

Any chatbot or AI Agent we use nowadays have some sort of a history that persists either locally (on your machine) or in the cloud. But, our Part 1 agent was stateless, meaning every conversation started from scratch. This works for simple tasks, but real-world agents need to learn from past interactions.

Memory vs. Context Window

You might think: "Why not just keep all messages in the history?" The problem is token limits. LLMs have finite context windows (typically 32k-128k tokens). A long conversation can easily exceed this, causing:

Truncated history (losing important context)
Increased latency (processing massive prompts)
Higher costs (you pay per token)

The solution is selective memory: We choose to persist only important information to disk and inject only recent, relevant context into each request.

Designing the Memory System

Our memory system needs four capabilities:

Persistence: Survive sessions restarts.
Session awareness: Track which conversations belong together.
Selective retrieval: Load only recent or relevant memories.
Clean separation: Memory doesn't interfere with live conversation flow.

Here's the implementation:

class MemoryStore:
    def __init__(self, file_path: str, max_entries: int = 50):
        self.file_path = file_path
        self.max_entries = max_entries
        self._ensure_file()
    
    def _ensure_file(self):
        if not os.path.exists(self.file_path):
            with open(self.file_path, "w") as f:
                json.dump([], f)
    
    def load_all(self) -> List[dict]:
        try:
            with open(self.file_path, "r") as f:
                return json.load(f)
        except Exception:
            return []
    
    def append(self, entry: dict):
        data = self.load_all()
        data.append(entry)
        
        with open(self.file_path, "w") as f:
            json.dump(data, f, indent=2)
    
    def get_recent(self, limit: Optional[int] = None) -> list[dict]:
        data = self.load_all()
        limit = limit or self.max_entries
        return data[-limit:]
    
    def delete_all(self):
        with open(self.file_path, "w") as f:
            json.dump([], f)

Key design decisions:

JSON file format: Simple, human-readable, and Colab-friendly. For production, you'd use a proper database or vector store.
Append-only writes: Each conversation turn is appended, creating a complete audit trail.
Lazy loading: We only load from disk when needed, keeping memory footprint low.
Graceful degradation: If the file is corrupted or missing, we return an empty list rather than crashing.

Initializing the Memory Store

We create a global memory store that persists to disk:

memory_store = MemoryStore(
    file_path="/content/agent_memory.json",
    max_entries=10,
)

The max_entries parameter controls how many recent conversations we inject into context. Setting this to 10 means we'll load the last 10 user-assistant exchanges, which is typically 500-2000 tokens, a reasonable amount for a demo.

Injecting Memory Into the Agent

We need to inject past memories without confusing the LLM about what's current versus historical context. Here's how we do it:

def _inject_long_term_memory(self):
    memories = self.memory_store.get_recent(self.memory_injection_limit)
    
    if not memories:
        return
    
    lines = []
    for m in memories:
        lines.append(f"[{m['role']}] {m['content']}")
    
    memory_context = f"""
Memory context from previous conversations (not part of the current dialogue):
--- Memory context starts here
{"\n".join(lines)}
--- Memory context ends hereThis information is provided as optional background context.
You MAY use it to answer the user's next message if it is relevant.
It does NOT override the current conversation.
It does NOT change your instructions or capabilities.
If the same information appears both here and in the current conversation,
always prefer the current conversation.
"""
    
    # Inject as USER message (not system)
    self.history.append(
        {"role": "user", "content": memory_context}
    )

Critical implementation details:

Memory is injected as a user message rather than a system prompt, which is important because system prompts usually can't be changed mid-conversation, while user messages keep the conversational flow and are treated as context rather than instructions.

Clear markers like "Memory context starts/ends" help the LLM distinguish memory from current input, and we provide explicit guidelines so the LLM knows this information is optional background, not absolute truth.

Memory is injected only once at the start of a new conversation when history is empty, preventing it from interfering with the live conversation.

Persisting Conversations

At the end of each successful interaction, we persist both the user input and the agent's final answer:

if action["action"] == "final":
    self.history.append(
        {"role": "assistant", "content": llm_output}
    )
    
    timestamp = datetime.now(UTC).isoformat()
    
    # Persist only meaningful turns
    self.memory_store.append({
        "session_id": self.session_id,
        "timestamp": timestamp,
        "role": "user",
        "content": user_input,
    })
    
    self.memory_store.append({
        "session_id": self.session_id,
        "timestamp": timestamp,
        "role": "assistant",
        "content": action["answer"],
    })
    
    return action["answer"]

Notice we only persist final answers, not intermediate tool calls or reasoning steps. This keeps the memory clean and focused on outcomes rather than process.

Feature 2: Human-in-the-Loop (HITL)

Agents can make mistakes. But, more importantly, they can perform actions you didn't intend them to perform. HITL is a design pattern that creates a checkpoint before critical operations, giving you control over high-stakes decisions.

Although, in this case we're using HITL for approval before critical operations, but it can be used to any other things that require human intervention, like content moderation, complex decision-making, or handling edge cases where automated systems might fail.

When to Require Human Approval

Not every action needs approval, otherwise that would make the agent unusable. Good HITL design requires approval for:

Irreversible actions: Deleting data, sending emails, making purchases
High-cost operations: Running expensive API calls, deploying code
Sensitive data access: Reading private files, accessing credentials
External communications: Posting to social media, contacting people

For our agent, we'll focus on a particularly dangerous operation:

➡ Deleting all memory.

Extending the Action Space

First, we add a new action type to our Pydantic models:

class HumanApproval(BaseModel):
    action: Literal["human"]
    reason: str

LLMResponse = Union[ToolCall, FinalAnswer, HumanApproval]

Now the LLM has three possible actions:

Call a tool, request human approval, or provide a final answer.

Updating the System Prompt

We need to teach the LLM when and how to request approval. Here's the key addition to our system prompt:

HUMAN-IN-THE-LOOP (MANDATORY):
- You have a special action called "human".
- You MUST choose the "human" action BEFORE performing any irreversible, 
  destructive, or sensitive operation.
- Examples include (but are not limited to): deleting memory, resetting state, 
  or permanently altering stored data.
- When using the "human" action, you MUST clearly explain the reason approval 
  is required.
- After asking for human approval, you have two options depending on the response:
    1. If approval is given: You MUST continue the task by selecting the 
       appropriate next action (usually a tool call).
    2. If approval is denied: You MUST inform the user that the original action 
       won't be performed because approval was not given.
- Do not repeat this action consecutively. You must always follow a "human" 
  action by a "tool" action.

These rules create a clear protocol:

HITL protocol

Implementing the Approval Check

In our agent's run loop, we handle the human approval action:

if action["action"] == "human":
    observer.log("human_approval_requested", {
        "reason": action["reason"]
    })
    
    self.history.append(
        {"role": "assistant", "content": action["reason"]}
    )
    
    approved = self._human_approval(action["reason"])
    
    observer.log("human_approval_result", {
        "approved": approved
    })
    
    if not approved:
        self.history.append({
            "role": "user",
            "content": "Human approval was demanded and it is not given. "
                      "You cannot perform the action that required the approval."
        })
    else:
        self.history.append({
            "role": "user",
            "content": "Human approval was demanded and it is given. "
                      "You can now proceed with the action that required the approval."
        })
    
    continue

The _human_approval() method is straightforward:

def _human_approval(self, reason: str) -> bool:
    choice = input("Approve? (y/n): ").strip().lower()
    return choice == "y"

In a production system, you'd replace this with a more sophisticated approval mechanism: a web interface, Slack notification, or approval queue system.

Creating the Delete Memory Tool with a Factory Pattern

Now here's where things get interesting. We need a tool that can delete memory, but tools need access to agent state (the memory_store object), while remaining stateless from the model's perspective.

This is a perfect use case for the factory pattern:

def make_delete_all_memory_tool(memory_store: MemoryStore):
    def delete_all_memory(confirm: str):
        if confirm.lower() != "true":
            raise ValueError(
                "delete_all_memory called without explicit confirmation"
            )
        
        memory_store.delete_all()
        return "All long-term memory has been permanently deleted."
    
    return delete_all_memory

delete_all_memory_fn = make_delete_all_memory_tool(memory_store)

So, Why use a factory here?

➡ A quick note to explain: Tools must be stateless from the LLM's perspective, meaning they're just function signatures. But they often need access to application state (databases, API clients, configuration). The factory pattern solves this seamlessly:

The outer function (make_delete_all_memory_tool) captures the memory_store in a closure
The inner function (delete_all_memory) is the actual tool, with a clean signature (no arguments)
The LLM only sees the inner function's arguments, maintaining abstraction
We can create multiple versions of the tool with different state (e.g., dev vs. prod memory stores)

This pattern is essential whenever your tools need to access resources beyond their direct parameters. You'll see it in production systems for database connections, API clients, file systems, and more.

We register the tool with an explicit confirmation parameter:

class DeleteAllMemoryArgs(BaseModel):
    confirm: Literal["true"] = Field(
        description="Must be 'true' to confirm permanent deletion of all memory."
    )

registry.register(
    Tool(
        name="delete_all_memory",
        description="Permanently delete all long-term memory. This action is irreversible.",
        input_schema=DeleteAllMemoryArgs,
        output_schema={"result": "string"},
        func=delete_all_memory_fn,
    )
)

The Literal["true"] constraint forces the LLM to explicitly pass confirm="true", making accidental deletion nearly impossible.

The Complete HITL Flow

Here's what happens when a user asks to delete all memory:

HITL flow for a memory deletion request

This flow ensures dangerous operations are never automated without oversight.

Feature 3: Advanced Observability

"If you can't observe it, you can't debug it."

This is the mantra of production systems.

Our Part 1 agent was a black box, when something went wrong, we had no visibility into what happened.

What Observability Means for AI Agents

Traditional software has stack traces, logs, and debuggers. AI agents need something similar but more adapted to their multi-step probabilistic behavior.

We need to track:

What decisions were made (which action, which tool)
When they happened (timestamps, durations)
Why they were made (the LLM's reasoning)
What the outcomes were (tool results, errors)

This creates an audit trail that lets you answer questions like:

"Why did the agent call this tool?"
"How long did each step take?"
"Where did the agent get stuck?"
"What caused this error?"

Building the Observer System

We create an AgentObserver class that handles all logging:

import uuid
import time
from pathlib import Path

class AgentObserver:
    def __init__(self, log_dir="/content/logs"):
        self.trace_id = str(uuid.uuid4())
        self.events = []
        
        Path(log_dir).mkdir(exist_ok=True)
        self.file_path = Path(log_dir) / f"trace_{self.trace_id}.jsonl"
    
    def log(self, event_type, data=None):
        entry = {
            "trace_id": self.trace_id,
            "timestamp": time.time(),
            "event": event_type,
            "data": data or {}
        }
        
        self.events.append(entry)
        
        with open(self.file_path, "a") as f:
            f.write(json.dumps(entry) + "\n")
    
    def span(self, name):
        return Span(self, name)

Event Logging Decisions:

1. How do we track each agent run? We assign a unique trace ID (UUID) to every run. This makes it easy to correlate logs and see exactly what happened during a specific session.
2. How should logs be formatted for easy parsing? We use JSONL, where each line is a complete JSON object. This format keeps parsing simple, even for massive logs.
3. How do we balance speed and persistence? Events are stored in memory for fast access and also written to disk to ensure nothing is lost.
4. How do we keep logs consistent? Every log entry follows a structured schema: it always includes a type, a timestamp, and any additional arbitrary data. This makes analysis and debugging straightforward.

The Span Context Manager

Spans are critical for understanding timing and performance:

class Span:
    def __init__(self, observer, name):
        self.observer = observer
        self.name = name
    
    def __enter__(self):
        self.start = time.time()
        self.observer.log("span_start", {"name": self.name})
    
    def __exit__(self, exc_type, exc, tb):
        duration = time.time() - self.start
        self.observer.log("span_end", {
            "name": self.name,
            "duration_sec": round(duration, 3)
        })

Spans use Python's context manager protocol (with statement) to automatically measure execution time. Usage is beautifully simple:

with observer.span("llm_call"):
    llm_output = self.llm.generate(self.history)

This logs both when the LLM call started and when it finished, along with the total duration.

Integrating Observability Into the Agent

We create an observer at the start of each run and log every significant event:

def run(self, user_input: str):
    observer = AgentObserver()
    observer.log("run_start", {
        "session_id": self.session_id
    })
    
    # ... inject memory ...
    
    observer.log("user_message", {
        "text": user_input
    })
    
    for step in range(self.max_steps):
        with observer.span("llm_call"):
            llm_output = self.llm.generate(self.history)
        
        action = json.loads(llm_output)
        observer.log("llm_decision", {
            "step": step,
            "action": action["action"]
        })
        
        if action["action"] == "tool":
            observer.log("tool_call_requested", {
                "tool_name": action["tool_name"],
                "args": action["args"]
            })
            
            # Execute tool...
            
            observer.log("tool_call_result", {
                "tool_name": tool.name,
                "tool_response": result,
            })

Example Log Output

Here's what a real trace file looks like (formatted for readability):

{"trace_id": "a1b2c3d4", "timestamp": 1705701234.123, "event": "run_start", "data": {"session_id": "xyz789"}}
{"trace_id": "a1b2c3d4", "timestamp": 1705701234.125, "event": "user_message", "data": {"text": "What is 5 plus 3?"}}
{"trace_id": "a1b2c3d4", "timestamp": 1705701234.126, "event": "span_start", "data": {"name": "llm_call"}}
{"trace_id": "a1b2c3d4", "timestamp": 1705701234.891, "event": "span_end", "data": {"name": "llm_call", "duration_sec": 0.765}}
{"trace_id": "a1b2c3d4", "timestamp": 1705701234.892, "event": "llm_decision", "data": {"step": 0, "action": "tool"}}
{"trace_id": "a1b2c3d4", "timestamp": 1705701234.893, "event": "tool_call_requested", "data": {"tool_name": "add", "args": {"a": 5, "b": 3}}}
{"trace_id": "a1b2c3d4", "timestamp": 1705701234.894, "event": "span_start", "data": {"name": "tool:add"}}
{"trace_id": "a1b2c3d4", "timestamp": 1705701234.895, "event": "span_end", "data": {"name": "tool:add", "duration_sec": 0.001}}
{"trace_id": "a1b2c3d4", "timestamp": 1705701234.896, "event": "tool_call_result", "data": {"tool_name": "add", "success": true, "attempt": 1}}
{"trace_id": "a1b2c3d4", "timestamp": 1705701234.897, "event": "span_start", "data": {"name": "llm_call"}}
{"trace_id": "a1b2c3d4", "timestamp": 1705701235.634, "event": "span_end", "data": {"name": "llm_call", "duration_sec": 0.737}}
{"trace_id": "a1b2c3d4", "timestamp": 1705701235.635, "event": "final_answer", "data": {"text": "The result is 8."}}
{"trace_id": "a1b2c3d4", "timestamp": 1705701235.636, "event": "run_complete", "data": {"steps_used": 1}}

From this trace, you can see:

The entire run took about 1.5 seconds
Two LLM calls were made (0.765s and 0.737s each)
One tool was called (add), taking 0.001s
The agent completed in just 1 step

This is the difference between "something went wrong" and "the image generation API timed out after 5.2 seconds on the third retry attempt at step 7."

Feature 4: Error Recovery

Failures such as API errors, network timeouts, and rate limits are unavoidable. A production agent must handle them correctly.

The Problem with Naive Tool Calling

In Part 1, our tool execution was simple:

tool = self.tool_registry.get(action["tool_name"])
result = tool(**action["args"])

If the tool throws an exception, the entire agent crashes. Game over.

Implementing Safe Tool Calls with Retries

We wrap tool execution in a retry handler:

def _safe_tool_call(self, observer, tool, args, retries=2):
    """
    Calls a tool safely with retry and error logging.
    """
    attempt = 0
    while attempt <= retries:
        try:
            with observer.span(f"tool:{tool.name}"):
                result = tool(**args)
            
            observer.log("tool_call_result", {
                "tool_name": tool.name,
                "success": True,
                "attempt": attempt + 1
            })
            return result
            
        except Exception as e:
            attempt += 1
            observer.log("tool_call_error", {
                "tool_name": tool.name,
                "attempt": attempt,
                "error": str(e)
            })
            
            if attempt > retries:
                # Final failure after all retries
                observer.log("tool_call_failed", {
                    "tool_name": tool.name
                })
                return None

Key features:

Configurable retries: Default is 2 retries (3 total attempts), but adjustable per tool
Detailed logging: Every attempt is logged with success/failure status
Graceful degradation: Returns None on final failure rather than crashing
Span timing: Each attempt is measured, showing if failures are slow (timeouts) or fast (validation errors)

Using the Safe Tool Call

In the agent's run loop, we replace the naive call:

if action["action"] == "tool":
    tool = self.tool_registry.get(action["tool_name"])
    result = self._safe_tool_call(observer, tool, action["args"])
    
    # The agent continues even if result is None
    self.history.append({
        "role": "tool",
        "tool_name": tool.name,
        "tool_response": result,
    })

Now the agent continues operating even if a tool fails. The LLM sees the failure (result is None) and can decide how to respond: Maybe try a different tool, ask for clarification, or inform the user of the limitation.

When Retries Help, and When They Don't:

Retries are effective for:

Transient network errors: Temporary connectivity issues
Rate limiting: Brief API throttling
Server overload: Temporary unavailability (503 errors)

Retries are NOT effective for:

Invalid parameters: Will fail every time
Authentication errors: Need to fix credentials, not retry
Resource not found: Won't magically appear on retry
Quota exhausted: Need to wait for quota reset, not immediate retry

For production systems, you'd implement exponential backoff and jitter to avoid thundering herd problems:

import time
import random

def retry_with_backoff(func, max_attempts=3):
    for attempt in range(max_attempts):
        try:
            return func()
        except Exception as e:
            if attempt == max_attempts - 1:
                raise
            
            # Exponential backoff: 1s, 2s, 4s, etc.
            wait_time = (2 ** attempt) + random.uniform(0, 1)
            time.sleep(wait_time)

Putting It All Together

Let's see how all four features work in concert. Here's the complete agent initialization (Pseudo-code with conversation example, full code here):

from google import genai

# Initialize API client
client = genai.Client(api_key=GEMINI_API_KEY)
# Create memory store
memory_store = MemoryStore(
    file_path="/content/agent_memory.json",
    max_entries=10,
)
# Create tool registry with all tools
registry = ToolRegistry()
# ... register tools (add, multiply, delete_all_memory) ...
# Create LLM wrapper
llm = GeminiLLM(client, registry)
# Create agent with all features
agent = Agent(
    llm=llm,
    tool_registry=registry,
    memory_store=memory_store,
    max_steps=5,
    memory_injection_limit=6
)
# Start chatting
chat_with_agent(agent)
```
### Example: A Complete Interaction
Let's walk through a complex scenario that exercises all our new features:
**Conversation 1:**
```
You: What is 10 times 3?
Agent: [Uses multiply tool] The result is 30.
```
*This gets stored in memory.*
**Conversation 2 (new session):**
```
You: Do you remember what we discussed before?
Agent: [Memory injected] Yes, in our previous conversation, you asked me to 
       multiply 10 by 3, and the result was 30.
You: Great! Now delete all our conversation history.
Agent: [HITL triggered] The user is requesting to permanently delete all stored 
       conversation history. This action cannot be undone. Approval is required 
       before proceeding.
Approve? (y/n): y
Agent: [Calls delete_all_memory tool] All long-term memory has been permanently 
       deleted.
```
Behind the scenes, the trace log shows:
- Memory was injected at conversation start
- LLM requested human approval before deletion
- Delete tool was called successfully after approval
- Entire interaction took 2.3 seconds across 4 steps
### The Complete System Architecture
We now have a production-grade agent with:
```

Demo Agent Architecture

Each component has a clear responsibility:

Agent: Orchestrates the conversation flow
Memory: Provides context from past interactions
HITL: Gates dangerous operations
Observer: Tracks everything for debugging
Error Recovery: Keeps the system resilient

Performance Considerations

Memory injection: Adds 50–200 tokens to each conversation start. With proper limits, this is negligible.
HITL checks: Zero overhead unless approval is actually requested. When triggered, adds human wait time (unpredictable).
Observability: Minimal overhead. File I/O happens in the background. Typical overhead is <10ms per run.
Error recovery: Only adds overhead on failures. Successful tool calls have zero retry cost.

What Could be Next

We've built a robust foundation, but there's always more to explore:

Advanced Memory Systems: Move beyond recency to semantic relevance using embeddings. Implement hierarchical memory (working memory, short-term, long-term). Add memory querying as an explicit tool.
Sophisticated HITL: Build approval queues for asynchronous review. Implement role-based permissions. Create approval rules engines.
Production Observability: Integrate with real monitoring systems. Build real-time dashboards. Implement distributed tracing across multiple agents.
Intelligent Error Handling: Add circuit breakers and fallback strategies. Implement predictive failure detection. Build self-healing capabilities.
Multi-Agent Systems: Coordinate multiple specialized agents. Implement agent-to-agent communication. Build supervisor agents that manage worker agents.

Conclusion

Building AI agents from scratch teaches you what frameworks abstract away. You learn why certain design patterns exist, when to use them, and how to adapt them to your specific needs.

The agent we've built across these two posts isn't just a toy, it's a demo example, but it's also a legitimate foundation for production systems. Many commercial AI agents use variations of these exact patterns, but with more improvement layers:

Modular tool systems
Provider-agnostic LLM integration
Type-safe structured outputs
Persistent memory
Human oversight gates
Comprehensive observability
Graceful error handling

Whether you're building a customer support agent, a coding assistant, a research tool, or something entirely new, these patterns will serve you well.

If you're building with LLMs, the Master LLMs series is here to guide you from fundamentals to production-ready systems. Read the full series to go deeper into what makes these models tick and how to wield them effectively.

List: Master LLMs: A Practical Guide from Fundamentals to Mastery | Curated by Hamza Boulahia |…

Master LLMs: A Practical Guide from Fundamentals to Mastery · A 15-part series to truly master large language models…

medium.com

Leave a comment and follow me for more insights on AI, ML, and coding. You can also check out my work and socials: Website | YouTube | GitHub | LinkedIn | X

🚀 I'm launching a curated weekly AI newsletter, and you're invited to be among the first. 👉No hype. No noise. Just essential news, tools, papers, and insights handpicked for engineers and thinkers who build with AI.

Be part of the founding circle → Join free now

#ai #artificial-intelligence #ai-agent #large-language-models #python

Creating an Advanced AI Agent From Scratch with Python in 2026: Part 2

Implementing Long-term Memory, Human-in-the-Loop, Observability, and Error Recovery

List: Master LLMs: A Practical Guide from Fundamentals to Mastery | Curated by Hamza Boulahia |…

Master LLMs: A Practical Guide from Fundamentals to Mastery · A 15-part series to truly master large language models…

Why These Features Matter

Feature 1: Long-term Memory

Memory vs. Context Window

Designing the Memory System

Initializing the Memory Store

Injecting Memory Into the Agent

Persisting Conversations

Feature 2: Human-in-the-Loop (HITL)

When to Require Human Approval

Extending the Action Space

Updating the System Prompt

Implementing the Approval Check

Creating the Delete Memory Tool with a Factory Pattern

The Complete HITL Flow

Feature 3: Advanced Observability

What Observability Means for AI Agents

Building the Observer System

The Span Context Manager

Integrating Observability Into the Agent

Example Log Output

Feature 4: Error Recovery

The Problem with Naive Tool Calling

Implementing Safe Tool Calls with Retries

Using the Safe Tool Call

When Retries Help, and When They Don't:

Putting It All Together

Performance Considerations

What Could be Next

Conclusion

List: Master LLMs: A Practical Guide from Fundamentals to Mastery | Curated by Hamza Boulahia |…

Master LLMs: A Practical Guide from Fundamentals to Mastery · A 15-part series to truly master large language models…

Reporting a Problem