Cameras are the eyes of your robot.Robots may rely on one or more camera to make decisions. The images, along with additional information such as joint positions, are fed into the model during both training and inference.There exist different types of cameras and phosphobot supports most of them.
phosphobot uses the powerful OpenCV library to detect cameras automatically. This open source library supports most of the generic cameras available. phosphobot ships with its own binary of OpenCV, so you don’t need to install it separately.Placement of cameras matter. In robotics datasets, there are two main types of camera setups:
Context cameras: These cameras are placed to capture the environment and objects around the robot. These context cameras can be placed on the robot (e.g. on the head, on the body) or in the environment (e.g. on the walls, on the ceiling).
Wrist cameras: These cameras are placed on the wrists (hands) of the robot. They help with fine-grained manipulation tasks, so that the robot can see if it’s holding an object correctly. The phosphobot starter pack comes with two wrist cameras, one for each arm.
Adding more cameras usually helps to improve AI accuracy. However, it requires more compute at inference time (slower models) and makes the real-life setup more cumbersome. Usually, one context camera and two wrist cameras are a good trade-off.
Stereo Cameras are made of two lenses that capture two images of the same scene from slightly different angles.
The shift between the two images is used to calculate the depth of the scene. The greater the shift, the closer the object is to the camera.
Depth cameras are a type of camera that can directly return a depth map in addition to the color image. A depth map is a 2D image where each pixel represents the distance between the camera and the object in the scene.This is an example of a depth image:Intel RealSense cameras are a popular choice for depth cameras. They are more complex than standard cameras. They have multiple sensors (infrared, multiple color sensors, etc.) and a processor used to combine them to compute the depth map.This means they are pricier than standard cameras and then to be more difficult to set up.
When recording a dataset with phosphobot, the images are saved in a mp4 video file using OpenCV. The number of FPS (frames per second), the mp4 codec as well as the video resolution can be configured for recording in the Admin Configuration.By default, all available cameras are recorded. But you can disable some of them in the Admin Configuration. This is helpful if you don’t want to record your laptop camera or if Apple’s iPhone camera records inside your pocket.Anybody can contribute to add new types of cameras to the phospho starter packs by creating a new camera class inheriting from BaseCamera.
Every camera vendor has its own SDK and drivers. This can lead to compatibility issues. Here are some tips:
Run phosphobot with sudo to avoid permission issues.
Copy
Ask AI
sudo phosphobot run
Use virtual cameras to avoid compatibility issues. Virtual cameras let you record your computer screen or a specific window. This is helpful as a workaround when your camera is not detected by phosphobot.
Ask for help on the phospho Discord, along with your camera model, your operating system (MacOS, Linux, Windows…) and what you have tried so far. We’ll get this working together!