Experience

  1. Research Assistant

    Shanghai Jiao Tong Unviersity

    High-performance Code Generation for Vector Quantization

    • We design and implement an efficient fused VQ kernel generation framework. We introduce a software abstraction called codebook cache to optimize codebook access efficiency and support the integration of VQ with various computations.
    • Additionally, we provide adaptive heuristics to tailor parameter selection in our optimizations to diverse VQ configurations. Our optimizations achieve an average latency reduction of 46.13% compared to unoptimized versions.
  2. Research Intern

    Microsoft Research Asia

    Improve LLM Quantization by Searching Configurations

    • Latest quantization research mainly focuses on improving quantization methods, neglecting the different sensitivity of weights in groups or layers.
    • I construct a search space of different granularity, dynamically assigning number of bits to each part of the weights based on profiled sensitivity information, to gain a better performance among the memory and accuracy trade-off.
    • With the help of sparse matrix kernels of previous works, hybrid quantization can be perfectly supported.
  3. Research Assistant

    Duke University

    Combining Graph & Tensor Transformation with Scheduling via Compilers

    • Due to the emergence of new forms of computation like State Space Models, researchers need to manually design CUDA kernels for these computations.
    • I explore TVM and EinNet to combine graph-level and tensor-level transformation on tensor expressions, along with the scheduling space, and implement it based on TVM te expression.

    Search Efficient Network Architectures for LLM with Linear Complexity

    • Linear complexity architectures like State Space Models (SSM) are efficient in memory and computation. I used NAS-based distillation to search linear complexity architecture to substitute attention mechanism in transformers. I proposed a method to measure the complexity of generated architectures.
  4. Research Assistant

    Shanghai Jiao Tong University

    Optimize High-Dimensional ANNS with Ray-Tracing Core

    • We design a threshold-based selective algorithm to rapidly filter out the unnecessary search points leveraging the sparsity and spatial locality and I propose a mapping for our algorithm to run on the RT core.
    • We study how to generalize the existing kNN-RT core mapping to ANN search with arbitrary dimensions, in aspects of approximation method, metrics and system design, and propose JUNO, an end-to-end high-dimensional ANN search engine with both algorithmic enhancement and optimized hardware mapping.

Education

  1. Ph.D.

    University of California San Diego, CSE
  2. B.E. in Computer Science

    Shanghai Jiao Tong University, ACM Honors Class