ERC-SVD: Error-Controlled SVD for Large Language Model Compression

Haolei Bai1,2  Siyong Jian1,4  Tuo Liang3  Yu Yin3  Huan Wang1† 

2026

1Westlake University, Hangzhou, China 2Nanyang Technological University, Singapore 3Case Western Reserve University, Cleveland, USA 4Nanjing University, Nanjing, China
Corresponding author: wanghuan@westlake.edu.cn

WLU
NTU
CWRU
NJU
ENCODE Lab
Left: The Kendall correlation between the final-layer error and the average zero-shot accuracy. Middle: Layer-wise error comparison between ERC-SVD and SVD-LLM of LLaMA-2-7B under 20% compression ratio. Right: The accuracy comparison of LLaMA-2-7B compressed by different methods under 20% compression ratio on seven reasoning and understanding tasks.
The overall framework of ERC-SVD, and comparison with other methods. The last k layers are selected through partial-layer compression and compressed using residual compensation> with calibration data. intact layers denotes these layers remain intact, while compressed layers denotes these layers are replaced by low-rank approximations. The overall compression ratio is Ro, for ERC-SVD, the first (N-k) layers stay unchanged, and the layer compression ratio Rl for last k layers is (N x Ro)/k.

Abstract

Large language models (LLMs) have demonstrated impressive capabilities in a wide range of downstream natural language processing tasks. Nevertheless, their considerable sizes and memory demands hinder practical deployment, underscoring the importance of developing efficient compression strategies. Singular value decomposition (SVD) decomposes a matrix into orthogonal components, enabling efficient low-rank approximation. This is particularly suitable for LLM compression, where weight matrices often exhibit significant redundancy. However, current SVD-based methods neglect the residual matrix from truncation, resulting in significant truncation loss. Additionally, compressing all layers of the model results in severe error propagation. To overcome these limitations, we propose ERC-SVD, a new post-training SVD-based LLM compression method from an error-controlled perspective. Specifically, we leverage the residual matrix generated during the truncation process to reduce truncation loss. Moreover, under a fixed overall compression ratio, we selectively compress the last few layers of the model, which mitigates error propagation and improves compressed model performance. Comprehensive evaluations on diverse LLM families and multiple benchmark datasets indicate that ERC-SVD consistently achieves superior performance over existing counterpart methods, demonstrating its practical effectiveness.

Layer-wise Error Comparison

Layer-wise error comparison between the original model, LLaMA-7B, and OPT-6.7B compressed by ERC-SVD with different layer selection strategies on WikiText-2. The overall compression ratio is 20%, and all layer selection strategies strictly adhere to the compression constraint.

Results

Overall performance of LLaMA-2-7B compressed by ERC-SVD and baselines under 20% to 60% compression ratios, including performance on three language modeling datasets (measured by perplexity ()) and zero-shot performance on seven common sense reasoning datasets (measured by individual and average accuracy ()). The best results are marked in bold. NaN denotes evaluation failure due to numerical instability. * refers to results derived from the original paper. - means that results are not available.
Visual question answering outputs generated by LLaVA-1.5-7B compressed using ERC-SVD under 20% compression ratio. Questions (Q) and model answers (A) are provided, correct answers are highlighted in orange to emphasize answer quality retention.

BibTeX

@inproceedings{bai2026ercsvd,
  title={ERC-SVD: Error-Controlled SVD for Large Language Model Compression},
  author={Bai, Haolei and Jian, Siyong and Liang, Tuo and Yin, Yu and Wang, Huan},
  booktitle={CPAL},
  year={2026}
}
        
@article{bai2025ressvd,
  title={Ressvd: Residual compensated svd for large language model compression},
  author={Bai, Haolei and Jian, Siyong and Liang, Tuo and Yin, Yu and Wang, Huan},
  journal={arXiv preprint arXiv:2505.20112},
  year={2025}
}