Ryan Hanrui Wang received his Ph.D. in CS from MIT, advised by Prof. Song Han. His research focuses on efficient AI computing and computer architecture. He has received several honors, including the Best Paper Award at QCE and ICML RL4RL, Best Paper Candidate at DATE, ACM SRC 1st Place, Best Poster at the NSF AI Institute, and fellowships from Qualcomm and the Unitary Fund. He was also named a Rising Star in both ML & Systems and ISSCC, and was a finalist for the NVIDIA Fellowship.
Ryan's work on SpAtten (Sparse Attention)—an efficient GenAI compression framework inventing cascade KV token pruning and quantization—has become widely adopted in both academia and industry. It is the most cited HPCA paper since 2020. He also introduced Hardware-Aware Transformers for optimized GenAI deployment. His open-source repositories and models have been downloaded over one million times and integrated into platforms like IBM and the PyTorch Ecosystem. He co-founded the QuCS Forum to promote AI education. He earned his B.Eng. with highest honors from Fudan University.
Efficient Generative AI
Efficiency is a fundamental enabler for scaling generative AI to real-world deployment. Our research develops principled methods to improve the performance, cost-efficiency, and scalability of large models across modalities. We focus on Transformer and LLM optimization—pioneering techniques such as SpAtten (cascade KV cache pruning and quantization), SpAtten-Chip, Hardware-Aware Transformer, Lightning-Transformer, and SpArch for system-algorithm co-design. In computer vision, we introduced AMC, and APQ for automated model compression. Our work leverages pruning, quantization, neural architecture search, reinforcement learning, and compiler-hardware-algorithm co-design to build highly optimized models that run efficiently on diverse hardware—from cloud GPUs to edge devices. These innovations have been widely adopted in both academia and industry, powering faster, more efficient, and accessible foundation models.
Efficient AI Systems with Emerging Technology
We explore how emerging compute platforms—such as quantum computers and photonics accelerators—can advance AI efficiency and scalability. Our work spans AI-centric hardware-software co-design, photonic neural networks (Lightning-Transformer), hybrid quantum-classical AI acceleration (QuantumNAS, QuEst, QuantumNAT, QOC, Atomique), and system-level optimization to push the limits of model training and inference performance.
Email: hanrui@mit.edu
If you work on efficient AI Computing, Quantum Computing, GenAI and interested in working with me, please fill out the recruiting form.