You just trained an AI model and now you want to use it to control your robot.

Here is how to do it with minimum effort.

Setup an inference server

First, you need to setup an inference server. This server runs on a beefy machine that can run the AI model. It can be your own machine, a cloud server, or a dedicated server.

If you choose a remote location, chose the closest location to minimize latency.

To setup the inference server, follow the instructions in the link below:

Setup the inference server

How to setup the inference server?

Call your inference server from a python script

We provide clients for ACT servers and Pi0 servers. You can implement the ActionModel class with your own logic here.

We also provide classes for ACT and pi0 inference server.

Example script for ACT

from phosphobot.camera import AllCameras
from phosphobot.api.client import PhosphoApi
from phosphobot.am import ACT

import time
import numpy as np

# Connect to the phosphobot server
client = PhosphoApi(base_url="http://localhost:80")

# Get a camera frame
allcameras = AllCameras()

# Need to wait for the cameras to initialize
time.sleep(1)

# Instantiate the model
model = ACT()

# Get the frames from the cameras
# We will use this model: LegrandFrederic/Orange-brick-in-black-box
# It requires 3 cameras as you can see in the config.json
# https://huggingface.co/LegrandFrederic/Orange-brick-in-black-box/blob/main/config.json

while True:
    images = [
        allcameras.get_rgb_frame(camera_id=0, resize=(240, 320)),
        allcameras.get_rgb_frame(camera_id=1, resize=(240, 320)),
        allcameras.get_rgb_frame(camera_id=2, resize=(240, 320)),
    ]

    # Get the robot state
    state = client.control.read_joints()

    inputs = {"state": np.array(state.angles_rad), "images": np.array(images)}

    # Go through the model
    actions = model(inputs)

    for action in actions:
        # Send the new joint postion to the robot
        client.control.write_joints(angles=action.tolist())
        # Wait to respect frequency control (30 Hz)
        time.sleep(1 / 30)

Example script for Pi0

from phosphobot.camera import AllCameras
from phosphobot.api.client import PhosphoApi
from phosphobot.am import Pi0

import time
import numpy as np

# Connect to the phosphobot server
client = PhosphoApi(base_url="http://localhost:80")

# Get a camera frame
allcameras = AllCameras()

# Need to wait for the cameras to initialize
time.sleep(1)

# Instantiate the model
model = Pi0(server_url="YOUR_SERVER_URL")

while True:
    # Get the frames from the cameras
    # We will use this model: PLB/pi0-so100-orangelegobrick-wristcam
    # It requires 2 cameras (a context cam and a wrist cam)
    images = [
        allcameras.get_rgb_frame(camera_id=0, resize=(240, 320)),
        allcameras.get_rgb_frame(camera_id=1, resize=(240, 320)),
    ]

    # Get the robot state
    state = client.control.read_joints()

    inputs = {
        "state": np.array(state.angles_rad),
        "images": np.array(images),
        "prompt": "Pick up the orange brick",
    }

    # Go through the model
    actions = model(inputs)

    for action in actions:
        # Send the new joint postion to the robot
        client.control.write_joints(angles=action.tolist())
        # Wait to respect frequency control (30 Hz)
        time.sleep(1 / 30)

For more information, check out the implementation here.