I am a Ph.D. candidate at Shanghai Jiao Tong University (SJTU),
focusing on Large Language Models (LLMs).
I am a member of the Institute of Parallel and Distributed Systems (IPADS),
where I am fortunate to be supervised by Prof. Zeyu
Mi and Prof. Haibo Chen.
Prior to joining SJTU, I completed my bachelor's degree at Huazhong University of Science and
Technology,
where I conducted research with Prof. Jin Hai .
Since October 2022, I have been working as a research intern at the Shanghai AI Laboratory..
My research focuses on algorithm and hardware co-design for efficient model serving and training
systems.
We introduce a general method that defines neuron activation through
neuron output magnitudes and a tailored magnitude threshold, demonstrating that non-ReLU
LLMs also exhibit sparse activation.
We introduce Bamboo-v0.1, a new 7B LLM that boasts high sparsity while delivering performance
equivalent to Mistral-7B. This repo provides the details of the model.