Publications

You can also find my articles on my Google Scholar profile.

Conference Papers

FlashFuser: Expanding the Scale of Kernel Fusion for Compute-Intensive Operators via Inter-Core Connection

Published in HPCA 2026, 2025

FlashFuser is a compiler framework that uses inter-core connection for kernel fusion on modern GPUs.

Recommended citation: Ziyu Huang, Yangjie Zhou, Zihan Liu, Xinhao Luo, Yijia Diao, Minyi Guo, Jidong Zhai, Yu Feng, Chen Zhang, Anbang Wu, and Jingwen Leng. (2026). "FlashFuser: Expanding the Scale of Kernel Fusion for Compute-Intensive Operators via Inter-Core Connection." HPCA 2026.
Download Paper

ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive

Published in NeurIPS 2025, 2025

ClusterFusion expands operator fusion scope for LLM inference by introducing cluster-level collective primitives for on-chip communication.

Recommended citation: Xinhao Luo, Zihan Liu, Yangjie Zhou, Shihan Fang, Ziyu Huang, Yu Feng, Chen Zhang, Shixuan Sun, Zhenzhe Zheng, Jingwen Leng, and Minyi Guo. (2025). "ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive." NeurIPS 2025.
Download Paper

GMLake: Efficient and Transparent GPU Memory Defragmentation for Large-scale DNN Training with Virtual Memory Stitching

Published in ASPLOS '24, 2024

GMLake is a transparent GPU memory allocation framework that mitigates fragmentation for large-scale DNN training using virtual memory stitching.

Recommended citation: Cong Guo, Rui Zhang, Jiale Xu, Jingwen Leng, Zihan Liu, Ziyu Huang, Minyi Guo, Hao Wu, Shouren Zhao, Junping Zhao, and Ke Zhang. (2024). "GMLake: Efficient and Transparent GPU Memory Defragmentation for Large-scale DNN Training with Virtual Memory Stitching." ASPLOS '24.
Download Paper