Experience

Research Assistant
Shanghai Jiao Tong Unviersity February 2024 – June 2024
High-performance Code Generation for Vector Quantization
- We design and implement an efficient fused VQ kernel generation framework. We introduce a software abstraction called codebook cache to optimize codebook access efficiency and support the integration of VQ with various computations.
- Additionally, we provide adaptive heuristics to tailor parameter selection in our optimizations to diverse VQ configurations. Our optimizations achieve an average latency reduction of 46.13% compared to unoptimized versions.
Research Intern
Microsoft Research Asia September 2023 – December 2023
Improve LLM Quantization by Searching Configurations
- Latest quantization research mainly focuses on improving quantization methods, neglecting the different sensitivity of weights in groups or layers.
- I construct a search space of different granularity, dynamically assigning number of bits to each part of the weights based on profiled sensitivity information, to gain a better performance among the memory and accuracy trade-off.
- With the help of sparse matrix kernels of previous works, hybrid quantization can be perfectly supported.
Research Assistant
Duke University September 2022 – December 2023
Combining Graph & Tensor Transformation with Scheduling via Compilers
- Due to the emergence of new forms of computation like State Space Models, researchers need to manually design CUDA kernels for these computations.
- I explore TVM and EinNet to combine graph-level and tensor-level transformation on tensor expressions, along with the scheduling space, and implement it based on TVM te expression.
Search Efficient Network Architectures for LLM with Linear Complexity
- Linear complexity architectures like State Space Models (SSM) are efficient in memory and computation. I used NAS-based distillation to search linear complexity architecture to substitute attention mechanism in transformers. I proposed a method to measure the complexity of generated architectures.
Research Assistant
Shanghai Jiao Tong University June 2022 – May 2023
Optimize High-Dimensional ANNS with Ray-Tracing Core
- We design a threshold-based selective algorithm to rapidly filter out the unnecessary search points leveraging the sparsity and spatial locality and I propose a mapping for our algorithm to run on the RT core.
- We study how to generalize the existing kNN-RT core mapping to ANN search with arbitrary dimensions, in aspects of approximation method, metrics and system design, and propose JUNO, an end-to-end high-dimensional ANN search engine with both algorithmic enhancement and optimized hardware mapping.

Experience

Research Assistant

Research Intern

Research Assistant

Research Assistant

Education

Ph.D.

B.E. in Computer Science