Improving Your Dataset Collection
The quality and structure of your demonstration data will have the single biggest impact on your model’s performance. Think of data collection as teaching by example—the clearer your examples, the better your student will learn.Control the Environment
A consistent and controlled environment is essential for collecting reliable data. Your goal is to eliminate random variables that could confuse the model. Keep the robot’s operating area free of unnecessary changes, like people walking by or other machinery moving in the background. Most importantly, ensure the lighting is even and stable across all recordings. Shadows and glare can obscure objects or change their appearance, so use diffused lamps or ring lights to maintain uniformity and avoid relying on natural light, which varies throughout the day.Optimize Your Camera Setup
Cameras are the eyes of your robot, and their configuration directly impacts what the model can “see” and learn. For best results, try to match the camera arrangement used to pre-train your foundation model. For instance, a model like pi-zero by Physical Intelligence was trained with wrist cameras on each arm and a first-person view (FPV) camera. Positioning these cameras to clearly capture the robot’s gripper, the target object, and the overall workspace is crucial. Before recording, ask yourself: “Could I control the robot effectively using only these camera views?” If the answer is no, your model will struggle too.Demonstrate Clear Actions
How the robot approaches and interacts with objects forms the core of the learned task. Plan the robot’s movements so the target object is visible to the cameras—especially the wrist cameras—for as long as possible. Avoid having the gripper block the view of the object during the final approach. Instead, angle the arm to keep both the gripper and the target in sight. This clarity helps the model build a strong connection between its movements and the outcome. Strive for a consistent and repeatable strategy for each task, as this helps the model learn a reliable pattern of behavior.Build a Diverse and Balanced Dataset
A model that only sees one way to do a task will be brittle. A diverse dataset teaches the model to generalize across different but related scenarios. Introduce intentional variations by changing the starting position of the target object (e.g., left, right, center) or by using objects of different shapes, sizes, and colors. This defines the “learning space” where your model can operate successfully. However, it’s important to balance diversity with consistency. Avoid recording “outlier” episodes that are radically different from the rest, as they can mislead the model and teach it incorrect or unsafe behaviors. For example, if you’re traning a model to pick up a cup, don’t include episodes where the robot fails to pick it up, or episodes where it pushes the cup instead of grasping it. These outliers can confuse the model and lead to poor performance.Collect the Right Amount of Data
While quality is key, quantity also matters. A good starting point for a single task is 40-50 high-quality episodes. An “episode” is one complete execution of the task, from start to finish. For more complex tasks or when fine-tuning large models, you may need more. For example, when fine-tuning a model like GR00T N1.5, a common recommendation is to record longer episodes (30-40 seconds each) for a total of 20 to 30 minutes of recorded data. You can see a great reference dataset for a table cleanup task to understand the quality and structure to aim for. Recording data can get boring for humans. Try to make it fun using VR control, making breaks, and rewarding yourself for reaching milestones. Pick an exciting demo that you find meaningful and a demo that you’ll enjoy sharing on social media. This will help you stay motivated and engaged throughout the process.Subscribe to phospho pro to unlock VR control
Unlock access to VR Control, advanced AI training, and more.
Lifetime access to the phospho teleoperation app
Get the phospho teleoperation app on the Meta Store for Meta Quest 2, Pro, 3, and 3s.
If you bought our phospho starter pack, you should have received a link to get the phospho teleoperation Meta Quest app. Please reach out if not.
Final Sanity Checks Before Scaling
Robotics datasets are time-consuming to create and difficult to edit. Before you invest heavily in collecting hundreds of episodes, it’s wise to perform a few checks to ensure your time is well spent. First, record just a handful of episodes and use a tool like the LeRobot Visualize Dataset space to confirm the data was saved correctly and loads without errors. Then, run a full, small-scale cycle: collect a small dataset (e.g., 10 episodes), train a model for a few epochs, and test its ability to perform the task. Once a model is trained, a great test is to see if it can at least replay one of the training episodes perfectly. If this mini-pipeline works, you can scale up your data collection with confidence.Beyond the Dataset: Hyperparameter Tuning
If your dataset is solid but the model still struggles, the issue may lie in the training configuration. Hyperparameters are the settings that control the learning process itself. While default values are often a good start, tuning them can lead to significant performance gains. Each model have different hyperparameters, but the idea is always the same: tinker with the settings to find the best configuration for your specific task. Here are some common hyperparameters to consider:Learning Rate
The learning rate determines how much the model adjusts its internal parameters after each batch of data. Think of it as the size of the steps it takes towards a solution. If the learning rate is too high, the model might “overshoot” the optimal solution and become unstable. If it’s too low, training can be incredibly slow, or the model might get stuck in a suboptimal state. A common strategy is to start with a default value (e.g., 1e-4) and adjust it by factors of 10 (e.g., 1e-3 or 1e-5) to see how it affects performance.Number of Epochs or Steps
An epoch is one full pass through the entire training dataset. The number of epochs determines how many times the model gets to see the data. Too few epochs can lead to underfitting, where the model hasn’t learned the patterns in the data. Too many epochs can cause overfitting, where the model memorizes the training data perfectly but fails to generalize to new, unseen situations. Start training with a small number of epochs (eg: 1 or 2), and then progressively scale up.To train your models for longer, consider using phospho pro to unlock longer training times.