👋 Hi welcome to my page

This blog is meant to document my experiments on Robotics, LLMs, Vision and other AI-related subjects. Project posts are written while I’m doing them so some might be finished and some might be waiting for completion.

World Models for Human Manipulation

This post presents a world model that predicts how humans manipulate objects from a single image and an action sequence. Given the first frame and 16-step action sequence, the model predicts future manipulation frames. Robotics lacks a cheap eval layer. Language has perplexity, vision has classification accuracy, but evaluating a manipulation policy still requires running a real robot or trusting a physics simulator that can’t model cloth. Video diffusion models pretrained on internet data already encode useful physical priors, but they generate plausible futures, not controllable ones....

Large-Scale Robotics Data Collection with UMI-Style Grippers

(Additional contributors, alphabetically: Laz Kopits, Ryan Leong, Sheel Taskar, Daryl Yang, William Zang) Building robust robotic manipulation policies requires diverse, high-quality demonstration data at scale. The challenge lies not just in collecting data, but in doing so reliably across multiple devices while maintaining temporal synchronization and handling the inevitable network disruptions of real-world deployments. This post details the system architecture behind the SF Fold dataset, a large-scale robotics manipulation dataset collected using UMI-style grippers....

VR Teleoperation for Bimanual Robots

This system lets the user control a bimanual robot using a Meta Quest headset. The user puts on the headset, grabs the controllers, and a Trossen AI Mobile robot mirrors their arm movements in real-time through inverse kinematics. [VIDEO_REAL_ROBOT_PLACEHOLDER: Full teleoperation demo - controlling the real robot] Architecture The system bridges consumer VR hardware to research-grade robotics through a streaming pipeline optimized for low latency. The Quest runs a native Android app built with OpenXR that captures controller poses at 60 Hz....

Diffusion Transformer Implementation

A PyTorch implementation of the Diffusion Transformer (DiT) model. With OpenAI’s Sora demonstrating the power of DiTs for multidimensional tasks, they represents a stable and efficient approach any diffusion task (vision, audio, robotics etc..). This implementation provides a clean, modular codebase to extend DiT for various generative applications. Code Repository Architecture Implementation details Firstlayer: A Python class that initializes the input processing. It includes a Patchify module to convert images into patches, a learnable positional embedding, and separate embedding layers for timesteps and class labels....

GPT-2 Pretraining

A nano-GPT implementation with Pytorch Lightning. The goal is to have a clean building block for other research projects by containing just enough manual implementation do be easily modifiable, but also by using common tools to have a painless optimized training and nice monitoring. Its contains the code to train the model, prepare the dataset and run evals. This page also details results I got training on HF’s FineWeb-Edu. Code Repository...

U-Net for Segmentation

A simple Pytroch U-Net implementation. The goal is to have an clean building block that can be used in other bigger projects (e.g. Diffusion). The model is tested with a segmentation task on the MIT scene-parse-150 dataset. Code Repository Architecture The network is built up as follows: The network consists of a downsampling path, a bottleneck, and an upsampling path. In the downsampling path: A sequence of DoubleConv modules are applied....

RL Policy for Legged Locomotion

Quadrupeds robots currently have difficulty overcoming rough terrains, the goal of this project is to improve the agility and robustness of legged locomotion over complex terrain using reinforcement learning. The project consists of implementing the following paper from Nvidia Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning and adapt it to the unitree A1 Problem statement Traditionnaly locomotion is achieve through Optimization algorithms, especially Model Predictive Control (MPC)....

MPC Ball Balancing Robot

Control a ball on a plate using a robotic manipulator and a MPC controller. This project was carried out as part of the CS206B Robotic Manipulation and Interaction course at UC Berkeley MPC Controller MPC is a type of feedback control algorithm, in which a model of the system is used to make predictions about it’s future behavior. The control inputs are then computed based on the predictions, with the goal of achieving the desired system behavior....