World Models for Human Manipulation
I trained a world model that predicts how humans manipulate objects from a single image and an action sequence. Given the first frame and 16-step action sequence, the model predicts future manipulation frames. The premise is simple: if you can accurately simulate what happens when a human performs an action, you don’t need a physical robot to learn manipulation. A policy can explore thousands of candidate action sequences in imagination, evaluating outcomes before committing to real-world execution....
Large-Scale Robotics Data Collection with UMI-Style Grippers
(Additional contributors, alphabetically: Laz Kopits, Ryan Leong, Sheel Taskar, Daryl Yang, William Zang) Building robust robotic manipulation policies requires diverse, high-quality demonstration data at scale. The challenge lies not just in collecting data, but in doing so reliably across multiple devices while maintaining temporal synchronization and handling the inevitable network disruptions of real-world deployments. This post details the system architecture behind the SF Fold dataset, a large-scale robotics manipulation dataset collected using UMI-style grippers....
VR Teleoperation for Bimanual Robots
I built a system that lets you control a bimanual robot using a Meta Quest headset. Put on the headset, grab the controllers, and a Trossen AI Mobile robot mirrors your arm movements in real-time through inverse kinematics. The code is available on request. [VIDEO_REAL_ROBOT_PLACEHOLDER: Full teleoperation demo - controlling the real robot] Architecture The system bridges consumer VR hardware to research-grade robotics through a streaming pipeline optimized for low latency....
Diffusion Transformer Implementation
A PyTorch implementation of the Diffusion Transformer (DiT) model. With OpenAI’s Sora demonstrating the power of DiTs for multidimensional tasks, they represents a stable and efficient approach any diffusion task (vision, audio, robotics etc..). This implementation provides a clean, modular codebase to extend DiT for various generative applications. Code Repository Architecture Implementation details Firstlayer: A Python class that initializes the input processing. It includes a Patchify module to convert images into patches, a learnable positional embedding, and separate embedding layers for timesteps and class labels....
GPT-2 Pretraining
A nano-GPT implementation with Pytorch Lightning. The goal is to have a clean building block for other research projects by containing just enough manual implementation do be easily modifiable, but also by using common tools to have a painless optimized training and nice monitoring. Its contains the code to train the model, prepare the dataset and run evals. This page also details results I got training on HF’s FineWeb-Edu. Code Repository...
U-Net for Segmentation
A simple Pytroch U-Net implementation. The goal is to have an clean building block that can be used in other bigger projects (e.g. Diffusion). The model is tested with a segmentation task on the MIT scene-parse-150 dataset. Code Repository Architecture The network is built up as follows: The network consists of a downsampling path, a bottleneck, and an upsampling path. In the downsampling path: A sequence of DoubleConv modules are applied....
RL Policy for Legged Locomotion
Quadrupeds robots currently have difficulty overcoming rough terrains, the goal of this project is to improve the agility and robustness of legged locomotion over complex terrain using reinforcement learning. The project consists of implementing the following paper from Nvidia Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning and adapt it to the unitree A1 Problem statement Traditionnaly locomotion is achieve through Optimization algorithms, especially Model Predictive Control (MPC)....
MPC Ball Balancing Robot
Control a ball on a plate using a robotic manipulator and a MPC controller. This project was carried out as part of the CS206B Robotic Manipulation and Interaction course at UC Berkeley MPC Controller MPC is a type of feedback control algorithm, in which a model of the system is used to make predictions about it’s future behavior. The control inputs are then computed based on the predictions, with the goal of achieving the desired system behavior....