Train SmolVLA
How to Train and Run SmolVLA with LeRobot: A Step-by-Step Guide
In this tutorial, we will walk you through the process of fine-tuning a SmolVLA model and deploying it on a real robot arm. We will cover environment setup, training, inference, and common troubleshooting issues.
This tutorial is for LeRobot by Hugging Face, which is different than phosphobot. It’s geared towards more advanced users with a good understanding of Python and machine learning concepts. If you’re new to robotics or AI, we recommend starting with the phosphobot documentation.
This tutorial may be outdated
The LeRobot library is under active development, and the codebase changes frequently. While this tutorial is accurate as of June 11, 2025, some steps or code fixes may become obsolete. Always refer to the official LeRobot documentation for the most up-to-date information.
What is LeRobot by Hugging Face?
LeRobot is a platform designed to make real-world robotics more accessible for everyone. It provides pre-trained models, datasets, and tools in PyTorch.
It focuses on state-of-the-art approaches in imitation learning and reinforcement learning.
With LeRobot, you get access to:
- Pretrained models for robotics applications
- Human-collected demonstration datasets
- Simulated environments to test and refine AI models
Useful links:
Introduction to SmolVLA
SmolVLA is a 450M parameter, open-source Vision-Language-Action (VLA) model from Hugging Face’s LeRobot team. It’s designed to run efficiently on consumer hardware by using several clever tricks, such as skipping layers in its Vision-Language Model (VLM) backbone and using asynchronous inference to compute the next action while the current one is still executing.
Part 1: Training the SmolVLA Model with LeRobot by Hugging Face
1.1 Environment Setup for LeRobot by Hugging Face
Setting up a clean Python environment is crucial to avoid dependency conflicts. We recommend using uv
, a fast and modern Python package manager.
-
Install
uv
: -
Clone the LeRobot Repository:
💡 Pro Tip: Before you start, run
git pull
inside thelerobot
directory to make sure you have the latest version of the library. -
Create a Virtual Environment and Install Dependencies: This tutorial uses Python 3.10.
1.2 Training on a GPU-enabled Machine with LeRobot by Hugging Face
Training a VLA model is computationally intensive and requires a powerful GPU. This example uses an Azure Virtual Machine with an NVIDIA A100 GPU, but any modern NVIDIA GPU with sufficient VRAM should work.
Note on MacBook Pro: While it’s technically possible to train on a MacBook Pro with an M-series chip (using the
mps
device), it is extremely slow and not recommended for serious training runs.
-
The Training Command: We will fine-tune the base SmolVLA model on a “pick and place” dataset from the Hugging Face Hub.
-
--save_freq
: Saves a model checkpoint every 5000 steps, which is useful for not losing your work.
Note on WandB: As of June 11, 2025, Weights & Biases logging (
wandb
) may have issues in the current version of LeRobot. If you encounter errors, you can disable it by changing the flag to--wandb.enable=false
. -
-
Fixing config.json You need to change
n_action_steps
in theconfig.json
file. The default value is set to 1, but for inference on SmolVLA, it should be set to 50. This is only used during inference, but it’s easier to fix it now rather than later (before uploading the model to the Hugging Face Hub).-
Locate the config.json file: It will be in the
lerobot/smolvla_base
directory. -
Edit the file: Open it in a text editor and change the line:
to
Note: If you don’t change this, the inference will be very slow, as the model will only predict one action at a time instead of a sequence of actions.
-
-
Uploading the Model to the Hub: Once training is complete, you’ll need to upload your fine-tuned model to the Hugging Face Hub to use it for inference.
- Login to your Hugging Face account:
- Upload your model checkpoint: The trained model files will be in a directory like
outputs/train/YYYY-MM-DD_HH-MM-SS/
.
- Login to your Hugging Face account:
Part 2: Training on Google Colab with LeRobot by Hugging Face
Running inference is often done on a different machine. Google Colab is a popular choice, but it comes with its own set of challenges.
-
Initial Setup on Colab: Start by cloning the repository.
-
Fixing the
torchcodec
Error: You will likely encounter aRuntimeError: Could not load libtorchcodec
. This is because the default PyTorch version in Colab is incompatible with thetorchcodec
version required by LeRobot.The fix is to downgrade
torchcodec
:After running this, you must restart the Colab runtime for the change to take effect.
-
Avoiding Rate Limits: Colab instances share IP addresses, which can lead to getting rate-limited by the Hugging Face Hub when downloading large datasets. If you see
HTTP Error 429: Too Many Requests
, you have two options:- Wait: The client will automatically retry with an exponential backoff.
- Use a Local Dataset: Download the dataset to your Google Drive, mount the drive in Colab, and point the script to the local path instead of the
repo_id
.
Part 3: LeRobot training Advanced Troubleshooting & Code Fixes
Here are some other common issues you might face and how to solve them.
Issue: ffmpeg
or libtorchcodec
Errors on macOS
- Problem: On macOS, you might encounter
RuntimeError
s related toffmpeg
or shared libraries not being found, even if they are installed. This is often a dynamic library path issue. - Fix: Explicitly set the
DYLD_LIBRARY_PATH
environment variable to include the path where Homebrew installs libraries.
Issue: ImportError: cannot import name 'GradScaler'
- Problem: This error occurs if your PyTorch version is too old. SmolVLA requires
torch>=2.3.0
. - Fix: Upgrade PyTorch in your
uv
environment.
Part 4: Running Inference on a Real SO-100 or SO-101 Robot with LeRobot by Hugging Face
The LeRobot library is integrated with the SO-100 and SO-101 robots, allowing you to run inference directly on these devices. This section will guide you through the hardware setup, calibration, and running the inference script with LeRobot.
You can use the robots from our dev kit for this step. However, the LeRobot setup is different and completly independent from phosphobot. Be careful and do not mix the two setups.
2.1 LeRobot Hardware Setup and Calibration
-
Hardware Connections:
- Connect both your leader arm and follower arm to your computer via USB.
- Connect your cameras (context camera and wrist camera).
-
Finding Robot Ports: Run this script to identify the USB ports for each arm.
Note the port paths (e.g.,
/dev/tty.usbmodemXXXXXXXX
). -
Calibrating the Arms: The calibration process saves a file with the min/max range for each joint.
- Follower Arm:
- Leader Arm:
- Follower Arm:
-
Test Calibration with Teleoperation: Before running the AI, verify that the calibration works by teleoperating the robot. This lets you control the follower arm with the leader arm.
If the follower arm correctly mimics the movements of the leader arm, your calibration is successful.
-
Finding Camera Indices: Run this script to list all connected cameras and their indices.
Identify the indices for your context and wrist cameras.
2.2 Running the LeRobot Inference Script
This is the main command to make the robot move.
--policy-path
: Note that this time we do not add the/pretrained_model
subfolder. We will fix this in the code.
Part 5: LeRobot Troubleshooting and Code Fixes
Issue 1: Unit Mismatch (Radians vs. Degrees)
-
Problem: The SmolVLA model outputs actions in the same units as its training data. Some datasets use radians. For example, the datasets recorder with phosphobot such as
PLB/phospho-playground-mono
uses radians. However, the LeRobot SO-100 driver expects actions in degrees. This will cause the robot to move erratically or barely at all. -
Fix: Convert the model’s output from radians to degrees.
- File:
lerobot/common/policies/smolvla/modeling_smolvla.py
- Location: In the
select_action
method. - Code: Add the following lines just after the
# Unpad actions
section.
- File:
Issue 2: Flimsy Leader Arm Connection
- Problem: The leader arm can sometimes have an unstable connection, causing the calibration or teleoperation script to crash if it fails to read a motor position.
- Fix: Add a
try-except
block to gracefully handle connection errors.- File:
lerobot/common/robot/motors_bus.py
- Location: In the
record_ranges_of_motion
method. - Code: Wrap the
while True:
loop in atry-except
block.
- File:
Issue 3: config.json
or model.safetensors
Not Found
- Problem: When running inference, the script may fail with
FileNotFoundError: config.json not found on the HuggingFace Hub
because it doesn’t look inside thepretrained_model
subfolder by default. - Fix: Modify the
from_pretrained
method to include the subfolder when downloading files.- File:
lerobot/common/policies/pretrained.py
- Location: In the
from_pretrained
class method. - Code: Add the
subfolder
argument to bothhf_hub_download
calls.
- File: