Yixin Song 宋奕欣

Bringing AGI to life with local GPU 💓

Ph.D., SJTU

yixinsong@sjtu.edu.cn / jeremysyx@gmail.com

I am a Ph.D. candidate at Shanghai Jiao Tong University (SJTU), focusing on Large Language Models (LLMs). I am a member of the Institute of Parallel and Distributed Systems (IPADS), where I am fortunate to be supervised by Prof. Zeyu Mi and Prof. Haibo Chen.
Prior to joining SJTU, I completed my bachelor's degree at Huazhong University of Science and Technology, where I conducted research with Prof. Jin Hai . Since October 2022, I have been working as a research intern at the Shanghai AI Laboratory..

My research focuses on algorithm and hardware co-design for efficient model serving and training systems.

CV  /  Google Scholar  /  HuggingFace  /  Github

profile photo

News

Selected Publications

*Equal contribution.

Powerinfer: Fast large language model serving with a consumer-grade gpu
Yixin Song, Zeyu Mi, Haotong Xie, Haibo Chen
SOSP, 2024
project page / arXiv / code

PowerInfer is a CPU/GPU LLM inference engine leveraging activation locality for your device.

PowerInfer-2: Fast Large Language Model Inference on a Smartphone
Zhenliang Xue*, Yixin Song*, Zeyu Mi, Le Chen, Yubin Xia, Haibo Chen
arxiv, 2024
project page / arXiv

PowerInfer-2 is a highly optimized inference framework designed specifically for smartphones.

Activation sparsity of different LLaMAs with regard to the model scale.
ReLU^2 Wins: Discovering Efficient Activation Functions for Sparse LLMs
Zhengyan Zhang* Yixin Song*, Guanghui Yu, Xu Han Yankai Lin, Chaojun Xiao, Chenyang Song, Zhiyuan Liu, Zeyu Mi, Maosong Sun
arxiv, 2024
aixiv

We introduce a general method that defines neuron activation through neuron output magnitudes and a tailored magnitude threshold, demonstrating that non-ReLU LLMs also exhibit sparse activation.

Bamboo: A New 7B Mistral-level Open LLM with High Sparsity
Yixin Song, Haotong Xie, Zeyu Mi, Li Ma, Haibo Chen
project, 2023
github

We introduce Bamboo-v0.1, a new 7B LLM that boasts high sparsity while delivering performance equivalent to Mistral-7B. This repo provides the details of the model.

Working Experience

Shanghai AI Laboratory

Researcher Intern, 2022.10 ~ Present


Miscellanea

  • Artifacts Evaluation Committee: OSDI 2024.
  • Invited Talk: Sparse Inference Framework for Local LLM Serving (MindSpore AI Framework Summit 2024)
  • Award: National Scholarship in 2024.
  • Award: Outstanding Undergraduate in 2022.
  • Award: Top 1.67% Nationally, CCF Certified Software Professional (CSP) Exam ‘19-’22
  • Award: Third Prize, Operating System Functional Design Track, Computer System Capability Competition ‘21-’22
  • Award: First Prize, Open Source Practice Teaching Group, China Software Conference ‘21-’22

The website template was adapted from Jon Barron.