Robotics Dataset Recording Best Practices

Garbage in, garbage out. The quality of your dataset will impact the performance of your model. Here are some best practices for recording a good dataset for imitation learning in robotics, based on what we learned from training our models

Environment of your robot

A controlled environment is foundational to collecting reliable data. Here’s how to optimize it:

Static Surroundings

Keep the robot’s operating area free of unnecessary changes. Avoid placing the robot in spaces with moving objects (e.g., people walking by, machinery operating) or shifting backgrounds, as these introduce noise that can confuse the model.

Lighting Consistency

Ensure even, stable lighting across all recordings. Shadows can obscure objects, so use tools like ring lights or diffused lamps to maintain uniformity. Avoid natural light from windows, which varies with time and weather, and keep lighting consistent for each object throughout the dataset.

Camera setup

Cameras are the eyes of your robot during imitation learning. Their configuration directly impacts what the model can “see” and learn.

Reenact Pretraining Setup

Match the camera arrangement used in the model’s pretraining phase. For example, if using a model like pi0 by Physical Intelligence, include wrist cameras on each robot arm and a first-person view (FPV) context camera for a broad perspective.

Optimal Positioning

Position cameras to capture the robot’s actions and the target object clearly. Ideal angles include:

Wrist cameras aligned to track the gripper and object interaction.
Context camera providing a wide view of the workspace.

Stability and Clarity

Secure cameras to prevent movement during recording. Test the setup by asking: “Could I control the robot effectively using only these camera views?” If the answer is no, adjust until the robot and target are fully visible.

Way to Grasp Targets

How the robot approaches and grasps objects shapes the model’s understanding of tasks.

Visible Approach

Plan the robot’s trajectory so the target object enters the camera’s field of view—especially the wrist cameras—as early as possible. For example, avoid occluding the object with the gripper during the approach; instead, angle the arm to keep the target in sight.

Consistency

Use a repeatable grasping strategy to help the model learn reliable patterns, while allowing slight variations for robustness (see Dataset Diversity below).

Dataset diversity

A diverse dataset ensures the model generalizes well across scenarios.

Introduce Variability

Manually repeating a task naturally adds some diversity, but you can enhance this intentionally. Examples include:

Object Placement: Vary the position of the target (e.g., left, right, center) to teach the model spatial adaptability.
Object Types: Use objects of different shapes, sizes, or colors to broaden the learning space.

Learning Space Concept

Think of diversity as defining the “space” where the model can operate. If objects are always on the left, the model won’t learn to pick them up from the right. Test the limits of your task to ensure coverage.

Avoid outliers

Outliers are examples that deviate significantly from the norm. They can mislead the model, causing it to learn incorrect patterns. Try to keep the examples close to the norm.

Dataset size

The quantity of data matters as much as its quality. Aim for 40-50 episodes per task as a starting point. An “episode” is a complete execution of the task, from start to finish (e.g., picking up an object and placing it elsewhere).

Sanity checks

Robotics datasets take a long time to collect and are hard to edit. Ensure that the time you invest in collecting demonstrations is worthwhile. We recommend first recording a few episodes and then checking that the data is good. Use the Visualize Dataset space from LeRobot to verify that the data is correct and can be loaded properly. Do a full cycle of data collection, training, and test inference to check that the full pipeline works. Then scale the process. Once a model is trained, check it learns to at least replay an episode. Put it in the same state as the dataset and check that it can execute the task.

What’s next?

Next, record your own dataset and use it to train a policy!

Datasets

Recorde your first dataset

AI Training

Train your first AI model

Discord

Join the Discord to ask questions, get help from others and get updates (we ship almost daily)

Getting Started

phosphobot Basic Usage

Learn about AI and robotics

Hardware

API Reference

Examples

Other

Datasets Best Practices

Robotics Dataset Recording Best Practices

Environment of your robot

Static Surroundings

Lighting Consistency

Camera setup

Reenact Pretraining Setup

Optimal Positioning

Stability and Clarity

Way to Grasp Targets

Visible Approach

Consistency

Dataset diversity

Introduce Variability

Learning Space Concept

Avoid outliers

Dataset size

Sanity checks

What’s next?

Datasets

AI Training

Discord

Getting Started

phosphobot Basic Usage

Learn about AI and robotics

Hardware

API Reference

Examples

Other

​Robotics Dataset Recording Best Practices

​Environment of your robot

​Static Surroundings

​Lighting Consistency

​Camera setup

​Reenact Pretraining Setup

​Optimal Positioning

​Stability and Clarity

​Way to Grasp Targets

​Visible Approach

​Consistency

​Dataset diversity

​Introduce Variability

​Learning Space Concept

​Avoid outliers

​Dataset size

​Sanity checks

​What’s next?

Datasets

AI Training

Discord

Robotics Dataset Recording Best Practices

Environment of your robot

Static Surroundings

Lighting Consistency

Camera setup

Reenact Pretraining Setup

Optimal Positioning

Stability and Clarity

Way to Grasp Targets

Visible Approach

Consistency

Dataset diversity

Introduce Variability

Learning Space Concept

Avoid outliers

Dataset size

Sanity checks

What’s next?