The phospho starter pack makes it easy to train robotics AI models by integrating with LeRobot from Hugging Face.

In this guide, we’ll show you how to train the ACT (Action Chunking Transformer) model using the phospho starter pack and LeRobot by Hugging Face.

What is LeRobot?

LeRobot is a platform designed to make real-world robotics more accessible for everyone. It provides pre-trained models, datasets, and tools in PyTorch.

It focuses on state-of-the-art approaches in imitation learning and reinforcement learning.

With LeRobot, you get access to:

  • Pretrained models for robotics applications
  • Human-collected demonstration datasets
  • Simulated environments to test and refine AI models

Useful links:

Step by step guide

In this guide, we will use the phospho starter pack to record a dataset and upload it to Hugging Face.

Prerequisites

  1. You need an assembled SO-100 robot arm and cameras. Get the phosphot starter pack here.
  2. Install the phosphobot software
curl -fsSL https://raw.githubusercontent.com/phospho-app/phosphobot/main/install.sh | bash
  1. Connect your cameras to the computer. Start the phosphobot server.
phosphobot run
  1. Complete the quickstart and check that you can control your robot.
  2. You have the phosphobot teleoperation app is installed on your Meta Quest 2, Pro, 3 or 3s
  3. You have a device to train your model. We recommend using a GPU for faster training.

1. Set up your Hugging Face token

To sync datasets, you need a Hugging Face token with write access. Follow these steps to generate one:

  1. Log in to your Hugging Face account. You can create one here for free

  2. Go to Profile and click Access Tokens in the sidebar.

  3. Select the Write option to grant write access to your account. This is necessary for creating new datasets and uploading files. Name your token and click Create token.

  4. Copy the token and save it in a secure place. You will need it later.

  5. Make sure the phosphobot server is running. Open a browser and access localhost or phosphobot.local if you’re using the control module. Then go to the Admin Configuration.

  6. Paste the Hugging Face token, and save it.

2. Set your dataset name and parameters

Go to the Admin Configuration page of your phospshobot dashboard. You can adjust settings. The most important are:

  • Dataset Name: The name of the dataset you want to record.
  • Task: A text description of the task you’re about to record. For example: “Pick up the lego brick and put it in the box”. This helps you remember what you recorded and is used by some AI models to understand the task.
  • Camera: The cameras you want to record. By default, all cameras are recorded. You can select the cameras to record in the Admin Configuration.

3. Control the robot in the Meta Quest app

The easiest way to record a dataset is to use the Meta Quest app.

  1. In the Meta Quest, open the phospho teleop application. Wait a moment, then you should see a row displaying phosphobot or your computer name. Click the Connect button using the Trigger Button.
  • Make sure you’re connected to the same WiFi as the phosphobot server.
  • If you don’t see the server, check the IP address of the server in the phosphobot dashboard and enter it manually.

  1. After connecting, you’ll see the list of connected cameras and recording options.
  • Move the windows with the Grip button to organize your space.
  • Enable preview to see the camera feed. Check the camera angles and adjust their positions if needed.
We recommend disabling the camera preview to save bandwidth.
  1. Press A once to start teleoperation and begin moving your controller.

    • The robot will naturally follow the movement of your controller. Press the Trigger button to close the gripper.
    • Press A again to stop the teleoperation. The robot will stop.
  2. Press B to start recording. You can leave the default settings for your first attempt.

    • Press B again to stop the recording.
    • Press Y (left controller) to discard the recording.
  3. Continue teleoperating and stop the recording by pressing B when you’re done.

  4. The recording is automatically saved in LeRobot v2 format and uploaded to your HuggingFace account.

Go to your Hugging Face profile to see the uploaded datasets.

You can view it using the LeRobot Dataset Visualizer.

4. Train your first model

Train in one click with phosphobot cloud

To train a model, you can use the phosphobot cloud. This is the quickest way to train a model.

Train with phosphobot cloud

Train your AI robotics model with phosphobot cloud

Running on your local machine

You need a GPU with at least 16GB of memory to train the model.

  1. On the device where you want to run the training, install the Phosphobot package:

    pip install --upgrade phosphobot
    
  2. Clone the LeRobot repository and install it:

    git clone https://github.com/huggingface/lerobot.git
    cd lerobot
    pip install -e .  # Requires Python 3.10+
    
  3. (Optional) If you want to use Weights & Biases for tracking training metrics, log in with:

    wandb login
    
  4. Run the training script with the following command in the lerobot repository (Set —device=mps for Apple Silicon (Mac M1/M2), cuda if you have an NVIDIA GPU or cpu if you don’t have a GPU). Ensure that your lerobot virtual environment is activated.

    sudo python lerobot/scripts/train.py \
      --dataset.repo_id=<HF_USERNAME>/<DATASET_NAME> \
      --policy.type=<act or diffusion or tdmpc or vqbet> \
      --output_dir=outputs/train/phoshobot_test \
      --job_name=phosphobot_test \
      --device=mps \
      --wandb.enable=true
    
  5. Your trained model will be saved in lerobot/outputs/train/.

5. Control your ROBOT with ACT

Clone the phosphobot repository to get the inference script. You can use the git command to clone the repository:

git clone https://github.com/phospho-app/phosphobot.git

If you use pip, just run

cd phosphobot/inference
pip install .
python ACT/inference.py --model_id=<YOUR_HF_DATASET_NAME>

This will load your model and create an /act endpoint that expects the robot’s current position and images. If using a local model, you can pass the path of a local model instead of a HF model with the “—model_id=…” flag.

This inference script is intended to run on a machine with a GPU, separately from your phosphobot installation.

from phosphobot.camera import AllCameras
from phosphobot.api.client import PhosphoApi
from phosphobot.am import ACT

import time
import numpy as np

# Connect to the phosphobot server
client = PhosphoApi(base_url="http://localhost:80")

# Get a camera frame
allcameras = AllCameras()

# Need to wait for the cameras to initialize
time.sleep(1)

# Instantiate the model
model = ACT(
  server_url="YOUR_SERVER_URL",
  server_port="YOUR_SERVER_PORT",
)

# Get the frames from the cameras
# We will use this model: LegrandFrederic/Orange-brick-in-black-box
# It requires 3 cameras as you can see in the config.json
# https://huggingface.co/LegrandFrederic/Orange-brick-in-black-box/blob/main/config.json

while True:
    images = [
        allcameras.get_rgb_frame(camera_id=0, resize=(240, 320)),
        allcameras.get_rgb_frame(camera_id=1, resize=(240, 320)),
        allcameras.get_rgb_frame(camera_id=2, resize=(240, 320)),
    ]

    # Get the robot state
    state = client.control.read_joints()

    inputs = {"state": np.array(state.angles_rad), "images": np.array(images)}

    # Go through the model
    actions = model(inputs)

    for action in actions:
        # Send the new joint postion to the robot
        client.control.write_joints(angles=action.tolist())
        # Wait to respect frequency control (30 Hz)
        time.sleep(1 / 30)

What’s next?

Next, you can use the trained model to control your robot. Head to our guide to get started!