I'm a 2nd year Ph.D. student from UCLA VAST Lab, advised by Prof. Jason Cong. My major research focus is on improving algorithms and hardware/system designs for efficient and high-quality AI inference. Specifically, I'm working on developing
- New language model architecture or system of models with high generation quality and hardware-friendly
- New architecture designs paradigms and frameworks to deploy our designed models on heterogeneous resources efficiently (DSP, LUT, SRAM, SIMD cores, etc.)
News
- 2025.05.30: I will join Microsoft Research at Redmond, WA as a student research intern this summer!
- 2025.05.12: I'm honored to be selected as the AMD HACC Outstanding Researcher!
- 2025.05.05: Our paper InTAR is presented at FCCM 2025!
- 2025.04.30: Our paper HMT is presented at NAACL 2025!
Education
University of California, Los Angeles (UCLA)
Ph.D. in Computer Science
Sept. 2023 - Present
Accumulated GPA: 4.0/4.0
University of California, Los Angeles (UCLA)
B. S. in Computer Science
B. S. in Applied Mathematics
Sept. 2019 - June 2023
Accumulated GPA: 4.0/4.0
Work Experience
Microsoft Research
Student Research Intern, AI Acceleration and Hardware Design
Jun. 2025 - Sept. 2025
Meta
Production Engineer Internship, Meta Fintech Payment
Jun. 2022 - Sept. 2022
Publications
InTAR: Inter-Task Auto-Reconfigurable Accelerator Design for High Data Volume Variation in DNNs
TL;DR: A novel accelerator design methodology for DNNs that improves on-chip data utilization with auto-reconfiguration.
Zifan He, Anderson Truong, Yingqi Cao, Jason Cong
FCCM, 2025
Optimized Multi-Token Joint Decoding with Auxiliary Model for LLM Inference
Zongyue Qin, Ziniu Hu, Zifan He, Neha Prakriya, Jason Cong, Yizhou Sun
ICLR, 2025
Dynamic-Width Speculative Beam Decoding for LLM Inference
Zongyue Qin, Zifan He, Neha Prakriya, Jason Cong, Yizhou Sun
AAAI, 2025
HMT: Hierarchical Memory Transformer for Efficient Long Context Language Processing
TL;DR: A novel language model architecture for efficient long-context inputs, achieving a comparable or superior generation quality to long-context LLMs with 2-57x fewer parameters and 2.5-116x less inference memory.
Zifan He, Yingqi Cao, Zongyue Qin, Neha Prakriya, Yizhou Sun, Jason Cong
NAACL, 2025
LevelST: Stream-based Accelerator for Sparse Triangular Solver
TL;DR: A novel stream-based accelerator for sparse triangular solver on FPGA, with 2.65x speedup over GPU.
Zifan He, Linghao Song, Robert F. Lucas, Jason Cong
FPGA, 2024
Optimization of Assisted Search Over Server-Mediated Peer-to-peer Networks
Zifan He, Leonard Kleinrock
GLOBECOM, 2022
Professional Services
- Conference Reviewer: FPGA 2024, FPGA 2025
- Journal Reviewer: IEEE Transactions on Parallel and Distributed Systems (TPDS)
- Artifact Evaluator: FPGA 2024
Awards
- AMD HACC Outstanding Researcher Award 2024
- UCLA DGE Dean's Scholar Award 2023
- UCLA Outstanding CS Undergraduate Award 2023
- UCLA Internet Research Initiative Prize 2022