Policies
What are the latest implementations in AI robotics?
The phospho junior dev kit was designed to provide a platform for learning and experimenting with AI models in robotics.
With recent advancements in Large Language Models (LLMs), new Vision-Language Models (VLMs) have emerged.
- These models are particularly well-suited for robotics because they function as a brain.
- They process both images and text instructions to predict the next action.
- Unlike traditional AI models that generate text (like ChatGPT), these models output actions, such as move left.
Essentially, you could tell your robot to “pick up the red ball” and it would do so.
So, what is a policy? It is a function that maps the current state of the robot to an action. It tells the robot what to do in a given situation.
ACT
ACT (Action Chunking Transformer) is a popular repo that lets you experiment with imitation learning.
How it works:
- You record episodes of your robot performing a task.
- The model learns from this data and enacts a policy based on it.
Why use ACT?
- Typically requires ~30 episodes for training
- Can run on an RTX 3000 series GPU in less than 30 minutes.
- This is a great starting point to get your hands dirty with AI in robotics.
OpenVLA
OpenVLA is another great repo.
It’s a more advanced model designed for complex robotics tasks.
Key differences with ACT:
- Training such a model requires more data and computational power.
- Typically needs ~100 episodes for training
- Training takes a few hours on an NVIDIA A100 GPU.
For more details, check out Nvidia’s blog post on OpenVLA.
Diffusion Transformer
Diffusion transformers offer a unique approach to policy learning.
Instead of deterministically mapping states to actions, the model hallucinates (generates) the most probable next action based on patterns learned from data.
Why consider Diffusion Transformers?
- The currently #1 model in robotics on Hugging Face is a diffusion transformer called RDT-1b.
- Fine tuning the model on your own data is expensive but inference is fast.
LeRobot Integration
LeRobot supports multiple policy architectures for robotics. You can find them here.
They include:
- act
- diffusion
- pi0
- tdmpc
- vqbet
Other models
- AutoRT by Google DeepMind (closed weight model). Implementation on Github
- pi0 by Physical Intelligence. Weights on Huggingface
Was this page helpful?