I'm a 2nd year Ph.D. student from UCLA VAST Lab, advised by Prof. Jason Cong. My major research focus is on improving algorithms and hardware/system designs for efficient and high-quality AI inference. Specifically, I'm working on developing

New language model architecture or system of models with high generation quality and hardware-friendly
New architecture designs paradigms and frameworks to deploy our designed models on heterogeneous resources efficiently (DSP, LUT, SRAM, SIMD cores, etc.)

The ultimate goal is to enable personalized AI agents on resource-constrained devices, which requires low power consumption, low latency, and high generation quality at the same time.

News

2025.05.30: I will join Microsoft Research at Redmond, WA as a student research intern this summer!
2025.05.12: I'm honored to be selected as the AMD HACC Outstanding Researcher!
2025.05.05: Our paper InTAR is presented at FCCM 2025!
2025.04.30: Our paper HMT is presented at NAACL 2025!

Education

University of California, Los Angeles (UCLA)
Ph.D. in Computer Science
Sept. 2023 - Present

Accumulated GPA: 4.0/4.0

University of California, Los Angeles (UCLA)
B. S. in Computer Science
B. S. in Applied Mathematics
Sept. 2019 - June 2023

Accumulated GPA: 4.0/4.0

Work Experience

Microsoft Research
Student Research Intern, AI Acceleration and Hardware Design
Jun. 2025 - Sept. 2025

Meta
Production Engineer Internship, Meta Fintech Payment
Jun. 2022 - Sept. 2022

Publications

InTAR: Inter-Task Auto-Reconfigurable Accelerator Design for High Data Volume Variation in DNNs
TL;DR: A novel accelerator design methodology for DNNs that improves on-chip data utilization with auto-reconfiguration.
Zifan He, Anderson Truong, Yingqi Cao, Jason Cong
FCCM, 2025

Optimized Multi-Token Joint Decoding with Auxiliary Model for LLM Inference
Zongyue Qin, Ziniu Hu, Zifan He, Neha Prakriya, Jason Cong, Yizhou Sun
ICLR, 2025

Dynamic-Width Speculative Beam Decoding for LLM Inference
Zongyue Qin, Zifan He, Neha Prakriya, Jason Cong, Yizhou Sun
AAAI, 2025

HMT: Hierarchical Memory Transformer for Efficient Long Context Language Processing
TL;DR: A novel language model architecture for efficient long-context inputs, achieving a comparable or superior generation quality to long-context LLMs with 2-57x fewer parameters and 2.5-116x less inference memory.
Zifan He, Yingqi Cao, Zongyue Qin, Neha Prakriya, Yizhou Sun, Jason Cong
NAACL, 2025

LevelST: Stream-based Accelerator for Sparse Triangular Solver
TL;DR: A novel stream-based accelerator for sparse triangular solver on FPGA, with 2.65x speedup over GPU.
Zifan He, Linghao Song, Robert F. Lucas, Jason Cong
FPGA, 2024

Optimization of Assisted Search Over Server-Mediated Peer-to-peer Networks
Zifan He, Leonard Kleinrock
GLOBECOM, 2022

Professional Services

Conference Reviewer: FPGA 2024, FPGA 2025
Journal Reviewer: IEEE Transactions on Parallel and Distributed Systems (TPDS)
Artifact Evaluator: FPGA 2024

Awards

AMD HACC Outstanding Researcher Award 2024
UCLA DGE Dean's Scholar Award 2023
UCLA Outstanding CS Undergraduate Award 2023
UCLA Internet Research Initiative Prize 2022