Connect a Large Language Model to a robot using the Model Context Protocol (MCP) and phosphobot.
This guide provides the essential code and instructions to get started with MCP for robotics. Using phosphobot and the Model Context Protocol (MCP), you can connect a Large Language Model (LLM) like Claude to a robot, enabling it to access camera feeds and trigger actions through a standardized interface.
The Model Context Protocol (MCP) is an open standard that connects Large Language Models to real-world tools and data sources.
Think of it like an USB-C port for AI, a universal translator between an AI and any application.
MCP allows an LLM to “plug into” different systems, giving it the power to see, reason, and, most importantly, act. For MCP robotics, this means giving an AI the hands and eyes to interact with the physical world.
Before MCP, connecting an AI to a robot required custom, complex integrations for each specific model and robot. MCP robotics changes this by creating a universal standard.
With robots MCP, developers can build powerful applications where an AI can perceive its environment and execute physical tasks.
The phosphobot MCP integration is a practical example of this protocol in action. The basic demo exposes two primary capabilities to the LLM:
The phosphobot MCP server manages these tools and the communication with the robot’s local API.
Follow these steps to set up your MCP robotics environment.
git
are installed on your system.phosphobot is an open-source platform that allows you to control robots, record data, and train robotics AI models.
First, install phosphobot with the command for your OS:
Next, run the phosphobot server, which will listen for commands from the MCP server.
This server exposes the robot’s controls to Claude. We recommend installing it with uv.
This command starts the phospho
MCP server and registers its tools with Claude. When you open the Claude desktop app, you will see the server and its tools available for use.
The phosphobot MCP server communicates with the local phosphobot instance via its REST API (defaulting to http://localhost:80
).
GET /frames
: Fetches the latest camera image.POST /recording/play
: Executes a pre-recorded robot action.The PhosphoClient
class manages this communication. If you run phosphobot on a port other than 80, you must update the base URL in the tools/phosphobot.py
file.
You can test the server with the MCP inspector by running:
Ask Claude a question that requires vision:
“What do you see on my desk?”
Claude will use the get_camera_frame
tool to answer.
Give Claude a command:
“Pick up the banana”
Claude will use the pickup_object
tool to perform the action.
pickup_object
Triggers a pre-recorded robotic action.
get_camera_frame
Captures a JPEG image from the phosphobot camera.
Q: What is phosphobot
?
A: phosphobot
is an open-source platform for robotics that helps you control robots, collect data, and train AI models for robotic tasks.
Q: What is phosphobot mcp
?
A: phosphobot mcp
refers to the integration of the phosphobot platform with the Model Context Protocol. It allows an LLM like Claude to control a robot managed by phosphobot by using standardized tools for actions and camera feeds.
Q: Can I use this with a physical robot?
A: Yes. phosphobot
is designed to control physical robots, allowing you to bridge the gap between AI and hardware.
Q: Can it only use pre-recorded actions?
A: No, while the demo uses pre-recorded actions for simplicity, you can extend the phosphobot MCP server
to include real-time control commands or trigger any AI model trained with phosphobot (eg: ACT, gr00t).