Docker Diffing for Terminal MCP Honeypot

Sooo, it was quite interesting to see that there is a docker diff command in the docker-py sdk. This immediately piqued my curiousity around the possible use-cases of the feature. One thought that immediately came to me was honeypots.

Honeypots

Honeypots - and honeytokens / canary tokens - are quite possibly the greatest ROI tool in the defensive cybersecurity realm. Their main goal is to capture attacker tradecraft and provide signals around trends in global security like data collected at scale by companies like GreyNoise. In comparison, honeytokens/ canary tokens (article from security engineering.dev) are used as an early detection mechanism that provide early alerts.

Honeypots are intentionally left vulnerable so that intruders might interact with the systems. As the main goal is capturing tradecraft, one of the essential features for a honeypot is high dwell time. There are various popular honeypots available in the open-source space like:

Dionaea - Dionaea is a honeypot intended to capture malware exploiting common vulnerabilities in the web. It is also the first honeypot that I ever used when I was trying to capture malware samples in 2017 for my bachelor final year project at Islington. I was able to capture lots of worms, good times.
Kippo - Kippo is another honeypot for SSH.

Well, you get the idea of honeypots now I am sure. Let’s move on to the LLM/Agent part of the honeypot now.

LLMs

You must be in a coma for the past 4 years if you haven’t heard LLMs or AI atleast once in your day today and you don’t even have to be a tech worker for it. LLMs are auto-regressive neural networks that predict next tokens based on previous sequences. This is of course an oversimplication on my part and it can be duly argued now that LLMs are more than probabilistic models but that is a discussion for another day.

LLMs when paired with tools and given autonomy to perform actions over n loops are agents. Agents haven’t been around as long as language models but they have experienced a meteoric rise that is probably not quite comparable to anything in history - partly because of all the sci-fi movies/novels/content that most of the people of this generation were raised on and partly because of moneyyyyyy. Stonks. Obligatory AI is a bubble.

Tools and MCP

Tools can be anything. It could be a wrapper around a scraper (my beloved) or a browser or a calculator or a markdown file and of course our own terminals. The secret sauce for integrating tools to LLMs are harnesses. Yes, harnesses. A harness is sort of a tunnel that allows LLMs (who mostly only generate text) to be able to use external environments. The is analogous to how terminals provide us ability to communicate with the computer in a predefined way (as our old and beloved ttys). This predefined part of the communication is how MCP was born.

Here is really easy way to define tools using the openai-agents sdk:

uv pip install openai-agents

import json
from agents import Agent, function_tool
import requests

@function_tool
async def get_webpage(url: str) -> str:
"""Fetch webpage content.

    Args:
        url: The url to fetch

    Returns:
        str: The fetched string content of the HTML.

"""
    response = requests.get(url)
    return response.content.decode('utf-8')

And you can use the tool with agents like:

agent = Agent(
    name="Assistant",
    tools=[get_webpage]
)

The function_tool decorator could be thought of a harness that allows the Agent/LLM to invoke the get_webpage function. In the backend, the agents sdk converts the function definition to a tool schema that is then passed onto the LLM.

You can print the function args json schema with:


print(agent.tools[0].params_json_schema)

There is a lot that happens between how the LLM knows how to call the tool - which requires training for the LLM - and how to template it but I am not gonna go that deep in this blog. If you bother me on my socials enough, I might write about it but its a lot and I am kinda lazy at times.

For our honeypot this minimal run() function is satisfactory:

async def run():
    await terminal_mcp_server.connect()

    agent = Agent(
        name="Terminal Assistant",
        instructions="You are a helpful assistant that can use tools to perform tasks.",
        model=model,
        mcp_servers=[
            terminal_mcp_server,
        ],
        model_settings=model_settings,
        tool_use_behavior="stop_on_first_tool",
    )

    result = await Runner.run(
        agent,
        "Create an interesting file on the server.",
    )

    await terminal_mcp_server.cleanup()

    print("final output is: \n", result.final_output)

MCP

Now let’s get on to MCP.

Created by Anthropic, MCP (Model Context Protocol) is a ✨ protocol ✨. Another oversimplication of protocol is: We will both decide how we are going to talk to each other. For example: TCP/IP is like I will talk and I will let you know how long I am going to talk and that I have finished talking or not and make sure that you know that my words weren’t twisted and are understandable but you must nod so that I can know that you heard me talking while UDP is kinda like blah blah blah blah blah until they decide to stop talking and they don’t care if you nod or if you understood. lol. This blog is kinda the UDP kind. Welcome to my Ted Talk!

MCP rose out of a need for a standardized way to provide LLMs with environments. Note, I am using the word environment because the moment that we start talking about models as agents, we have to talk about environments because its the basis of Reinforcement Learning and quite literally the founding idea that all the agents are based on and the paradigm that they are trained with; along with the concept of actions which, in typical oversimplication, are function calls in this case (or even individual tokens in case of GRPO and friends, or even sequences in cases of GSPO).

One of the major ideas behind MCP is the context. Note that in the function tool above there is no continuity, it is occurs as a one-time event. MCP handles this by providing an execution context for the agents that connect with the server, again oversimplification but in this context, I hope its justified.

Creating an MCP server is akin to any other http servers like fastapi or good old flask with the added context for managing context. I am sure that there is a lot that goes on inside but I haven’t gone that deep into it and I am not going to pretend that I have.

Let’s create a simple MCP server, and I am going to use my own terminal harness py-agterm for it.

Install the MCP python sdk:

uv pip install mcp

First are the imports, should be self-explanatory.

from collections.abc import AsyncIterator

from mcp.server.fastmcp import FastMCP, Context
from mcp.server.session import ServerSession

from dataclasses import dataclass
from contextlib import asynccontextmanager
from py_agterm import AGTerm

Then we define some dataclasses. I wasn’t privy to dataclass until recently. They are just containers for data.

@dataclass
class AGTermConfig:
    command: str = "/bin/bash"
    max_history_bytes: int = 5 * 1024 * 1024
    ready_markers: list[str] = None

@dataclass
class TerminalContext:
    """Terminal Context for AGTerm"""

    agterm: AGTermSession

Next we are going to define our server session. This will help identify individual sessions when multiple clients connect to it.

class AGTermSession:
    """Create a session with an AGTerm instance."""

    def __init__(self, agterm: AGTerm):
        self.agterm = agterm

    @staticmethod
    async def connect() -> "AGTermSession":
        """Connect to an AGTerm instance via MCP."""
        agterm = AGTerm()
        if agterm.is_alive():
            agterm.send_and_read_until_ready("")  # Initial read to get to prompt
            return AGTermSession(agterm=agterm)

    async def disconnect(self) -> None:
        """Disconnect from the AGTerm instance."""
        self.agterm.close()

    def execute_command(self, command: str, timeout_ms: int = 10000) -> str:
        """Execute a command in the AGTerm instance and return the output."""
        return self.agterm.send_and_read_until_ready(command, timeout_ms=timeout_ms)

Note that the static connect() method acts as a factory method to initiate the session rather than creating class objects with the AGTermSession directly. The execute_command() will be used to execute any commands using a raw tty fd underneath.

The code block below helps us to manage the context and the session. Session is established and the context for the session is yielded. The context manager helps to ✨ manage ✨ the context, and disconnects when session is terminated or halted.

@asynccontextmanager
async def agterm_session_context(server: FastMCP) -> AsyncIterator[TerminalContext]:
    """Async context manager for AGTerm session."""
    session = await AGTermSession.connect()
    try:
        yield TerminalContext(agterm=session)
    finally:
        await session.disconnect()

We define the core MCP server as we do with out flasks and fastapis. I am pretty positive that it uses uvicorn under the hood.

mcp = FastMCP("agterm", lifespan=agterm_session_context, host="0.0.0.0", port=5000)

And of course our tool decorator :

@mcp.tool()
def agterm_tool(
    ctx: Context[ServerSession, TerminalContext], command: str, timeout_ms: int = 10000
) -> str:
    """Description: Tool to execute commands in AGTerm. This is a
    Provide 'command' parameter to execute the command.
    Optionally provide 'timeout_ms' parameter for command timeout (ms).

    """

    if not command:
        ctx.log.warning("No command provided to agterm_tool.")
    output = ctx.request_context.lifespan_context.agterm.execute_command(
        command, timeout_ms=timeout_ms
    )
    return output

because ultimately mcp is just a nice way to manage tools.

And once you do mcp.run(transport="streamable-http"), it runs. Viola!

To connect to the server, we can define the server in the agent.py as:

    terminal_mcp_server = MCPServerStreamableHttp(
        params={
            "url": "http://localhost:5000/mcp",
            "timeout": 10,
            "use_structured_content": True,
        },
    )

And then pass it to the agent above. Sorry for the mixup!

Docker Diffs

Now we come to the main reason why I decided to pursue this little project. Diff-ing in computer-world is just the changes in files/objects (but actually files since everything is a file in $Linux$).

We can diff the changes simply by:

changed_files_and_folders = container.diff()

This will give us a list of files and folders that have changed when compared to the image that the container was created from. I am not going to give a talk about docker here.

The changes files and folders have the following schema:

{
    "Path": "/some/random/path",
    "Kind": 0 # or 1, 1 is for files and 0 is for directories
}

Once we filter for the files only using:

changed_files = [files["Path"] for files in changed_files if files["Kind"] == 1]

then we can copy them over to our host filesystem by utilizing get_archive method for containers from the docker sdk.

    @staticmethod
    def copy_file(
        container: docker.models.containers.Container,
        source_path: str,
        destination_path: str,
    ) -> None:
        """
        Copy a single file from container to host.
        """

        dest_path = Path(destination_path)
        dest_path.mkdir(parents=True, exist_ok=True)

        try:
            stream, stat = container.get_archive(source_path)
        except docker.errors.NotFound:
            raise FileNotFoundError(f"{source_path} not found in container")

        # Write tar stream to temp file
        with tempfile.NamedTemporaryFile() as tmp:
            for chunk in stream:
                tmp.write(chunk)
            tmp.flush()

            with tarfile.open(tmp.name) as tar:
                members = tar.getmembers()
                if not members:
                    raise RuntimeError("Empty archive returned from container")

                # Extract only the file (strip internal path)
                member = members[0]
                member.name = Path(member.name).name  # Prevent nested dirs
                tar.extract(member, path=dest_path)

                dest_file = dest_path / member.name

        if dest_file.is_file():
            print(f"Successfully wrote file {dest_file.resolve()}")
        else:
            raise RuntimeError("Unable to write file")

    [
        self.copy_file(self.container, file, self.honeypot_save_path)
        for file in changed_files
    ]

The copy_file method is just a normal function/method to copy the files from the container to the host. Since, docker tars the files when copying the archive, we have to extract the file from the tar. Everything else is just sugar around it.

So, yeah that is actually about it. I have uploaded the repository on github here ashim-mahara/mcp-terminal-honeypot licensed under Creative Commons 1.0 Universal. I might upload screenshots proving it works in my computer but I feel like the blog is done.

This post is dedicated to Andrew Morris (whose reply actually inspired me to write this blog, I was kinda leaning towards just releasing the github) and GreyNoise. I know that they are well-celebrated but it is essential we appreciate people frequently when presented the chance to do so.

Thanks for coming to my Ted Talk.

Ta Ta!

Authors

Ashim Mahara

Security Analytics

My research interests include Cyber Security, Security Analytics and Autonomous Agents.

Scraper Release Feb 15, 2026 →