About

‍Ryan Hanrui Wang received his Ph.D. in CS from MIT, advised by Prof. Song Han. His research focuses on efficient AI computing and computer architecture. He has received several honors, including the Best Paper Award at QCE and ICML RL4RL, Best Paper Candidate at DATE, ACM SRC 1st Place, Best Poster at the NSF AI Institute, and fellowships from Qualcomm and the Unitary Fund. He was also named a Rising Star in both ML & Systems and ISSCC, and was a finalist for the NVIDIA Fellowship.

Ryan's work on SpAtten (Sparse Attention)—an efficient GenAI compression framework inventing cascade KV token pruning and quantization—has become widely adopted in both academia and industry. It is the most cited HPCA paper since 2020. He also introduced Hardware-Aware Transformers for optimized GenAI deployment. Recently he developed the WorkForce-Agent-R1, an RL-trained LLM web agent that boosts reasoning capability for enterprise automation, beating SFT baselines, rivaling close-source models such as GPT-4o. His open-source repositories and models have been downloaded over one million times and integrated into platforms like IBM and the PyTorch Ecosystem. He co-founded the QuCS Forum to promote AI education. He earned his B.Eng. with highest honors from Fudan University.

Twitter

Google Scholar

GitHub

YouTube

Research Interests

Efficient Generative AI

Efficiency is a fundamental enabler for scaling generative AI to real-world deployment. Our research develops principled methods to improve the performance, cost-efficiency, and scalability of large models across modalities. We focus on Transformer and LLM optimization—pioneering techniques such as SpAtten (cascade KV cache pruning and quantization), SpAtten-Chip, Hardware-Aware Transformer, Lightning-Transformer, and SpArch for system-algorithm co-design. In computer vision, we introduced AMC, and APQ for automated model compression. Our work leverages pruning, quantization, neural architecture search, reinforcement learning, and compiler-hardware-algorithm co-design to build highly optimized models that run efficiently on diverse hardware—from cloud GPUs to edge devices. These innovations have been widely adopted in both academia and industry, powering faster, more efficient, and accessible foundation models.

Occult

WorkForceAgent-R1

HEXA-MOE

Lightening-Transformer

Hardware-Aware Transformer

AMC

Efficient AI Systems with Emerging Technology

We explore how emerging compute platforms—such as quantum computers and photonics accelerators—can advance AI efficiency and scalability. Our work spans AI-centric hardware-software co-design, photonic neural networks (Lightning-Transformer), hybrid quantum-classical AI acceleration (QuantumNAS, QuEst, QuantumNAT, QOC, Atomique), and system-level optimization to push the limits of model training and inference performance.

Atomique

Lightening-Transformer

QOC

QuantumNAT

QuantumNAS

Honors and Awards

2024 Rising Star in Solid-State Circuits at ISSCC

2/18/2024

2023 Rising Stars in ML and Systems

8/17/2023

MARC 2023 Best Pitch Award

1/25/2023

Gold Medal of 2022 ACM Student Research Competition

11/1/2022

2022 DAC Young Fellowship

7/10/2022

2022 ACM Student Research Competition Award 1st Place

5/1/2022

2021 Qualcomm Innovation Fellowship

5/1/2021

2020 Nvidia Graduate Fellowship Finalist

5/1/2020

2021 Analog Devices Outstanding Student Designer Award

5/1/2020

2020 DAC Young Fellowship

5/1/2020

3/31/2025

Best Paper Candidate

of

qGDP

7/15/2023

Best Demo Award

of

DAC University Demo

SpAtten

4/29/2023

Best Poster Award

of

NSF AI Institute Annual Showcase

QuantumNAT

9/17/2022

Best Paper Award

of

IEEE International Conference on Quantum Computing and Engineering (QCE)

SnCQA

5/3/2022

Best Poster Award

of

2022 NSF Athena AI Institute

QuantumNAS

12/15/2020

Best Presentation Award

of

DAC 2020 Young Fellow

6/9/2019

Best Paper Award

of

ICML 2019 Reinforcement Learning for Real Life Workshop

Competition Awards

1st Place Award

,

ACM Quantum Computing for Drug Discovery Contest

,

, @

ICCAD 2023

,

2023

QuantumNAS

First Place (1/150)

,

ACM/IEEE TinyML Design Contest

,

Memory Occupation Track

, @

ICCAD

,

2022

Hardware-Aware Transformer

First Place

,

SemanticKITTI leaderboard

,

3D semantic segmentation

, @

ECCV

,

2020

SPVNAS

Contact

Email: hanrui@mit.edu

If you work on efficient AI Computing, Quantum Computing, GenAI and interested in working with me, please fill out the recruiting form.

Ryan Hanrui Wang

MIT

About

Research Interests

Selected Publications

WorkForceAgent-R1: Incentivizing Reasoning Capability in LLM-based Web Agents via Reinforcement Learning

SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning

SpArch: Efficient Architecture for Sparse Matrix Multiplication

QuantumNAS: Noise-Adaptive Search for Robust Quantum Circuits

QuEst: Graph Transformer for Quantum Circuit Reliability Estimation

PointAcc: Efficient Point Cloud Accelerator

Park: An Open Platform for Learning-Augmented Computer Systems

Occult: Optimizing Collaborative Communication across Experts for Accelerated Parallel MoE Training and Inference

HEXA-MoE: Efficient and Heterogeneous-aware MoE Acceleration with ZERO Computation Redundancy

HAT: Hardware-Aware Transformers for Efficient Natural Language Processing

Atomique: A Quantum Compiler for Reconfigurable Neutral Atom Arrays

APQ: Joint Search for Nerwork Architecture, Pruning and Quantization Policy

AMC: AutoML for Model Compression and Acceleration on Mobile Devices

Honors and Awards

Competition Awards

Contact