A Tutorial on Vision-Language-Action Models for Humanoid Control

Jul 17, 2025

Overview

This presentation was created for the Practical English Presentation Class at the University of Tokyo. It provides a tutorial on Vision-Language-Action (VLA) models and their role in humanoid robotics.

About Slidev

This presentation was built using Slidev, a slides-making tool for engineers. I love using it for its ease of writing in Markdown and the minimal but beautiful design!

Acknowledgments

This presentation was created as part of the coursework for the Practical English Presentation Class at the University of Tokyo.

I would really like to thank for Umar Jamil for making a lot of amazing lectures and visuals, part of which I used with some modifications for my presentation slides.

A Tutorial on Vision-Language-Action Models for Humanoid Control

Overview

About Slidev

Acknowledgments

Shinnosuke Ono

Master’s Student