Combining Graph & Tensor Transformation with Scheduling via Compilers
- Due to the emergence of new forms of computation like State Space Models, researchers need to manually design CUDA kernels for these computations.
- I explore TVM and EinNet to combine graph-level and tensor-level transformation on tensor expressions, along with the scheduling space, and implement it based on TVM te expression.
Search Efficient Network Architectures for LLM with Linear Complexity
- Linear complexity architectures like State Space Models (SSM) are efficient in memory and computation. I used NAS-based distillation to search linear complexity architecture to substitute attention mechanism in transformers. I proposed a method to measure the complexity of generated architectures.