A Tutorial on Vision-Language-Action Models for Humanoid Control

Overview

This presentation was created for the Practical English Presentation Class at the University of Tokyo. It provides a tutorial on Vision-Language-Action (VLA) models and their role in humanoid robotics.

About Slidev

This presentation was built using Slidev, a slides-making tool for engineers. I love using it for its ease of writing in Markdown and the minimal but beautiful design!


Acknowledgments

This presentation was created as part of the coursework for the Practical English Presentation Class at the University of Tokyo.

I would really like to thank for Umar Jamil for making a lot of amazing lectures and visuals, part of which I used with some modifications for my presentation slides.

Shinnosuke Ono
Shinnosuke Ono
Master’s Student

I’m interested in understanding how machines learn through the lens of representations. My research interests include representation learning, multimodal models, language models, and reinforcement learning.