ERC-SVD: Error-Controlled SVD for Large Language Model Compression

2026

¹Westlake University, Hangzhou, China ²Nanyang Technological University, Singapore ³Case Western Reserve University, Cleveland, USA ⁴Nanjing University, Nanjing, China

^†Corresponding author: wanghuan@westlake.edu.cn

Abstract

Large language models (LLMs) have demonstrated impressive capabilities in a wide range of downstream natural language processing tasks. Nevertheless, their considerable sizes and memory demands hinder practical deployment, underscoring the importance of developing efficient compression strategies. Singular value decomposition (SVD) decomposes a matrix into orthogonal components, enabling efficient low-rank approximation. This is particularly suitable for LLM compression, where weight matrices often exhibit significant redundancy. However, current SVD-based methods neglect the residual matrix from truncation, resulting in significant truncation loss. Additionally, compressing all layers of the model results in severe error propagation. To overcome these limitations, we propose ERC-SVD, a new post-training SVD-based LLM compression method from an error-controlled perspective. Specifically, we leverage the residual matrix generated during the truncation process to reduce truncation loss. Moreover, under a fixed overall compression ratio, we selectively compress the last few layers of the model, which mitigates error propagation and improves compressed model performance. Comprehensive evaluations on diverse LLM families and multiple benchmark datasets indicate that ERC-SVD consistently achieves superior performance over existing counterpart methods, demonstrating its practical effectiveness.

Layer-wise Error Comparison

Layer-wise error comparison between the original model, LLaMA-7B, and OPT-6.7B compressed by ERC-SVD with different layer selection strategies on WikiText-2. The overall compression ratio is 20%, and all layer selection strategies strictly adhere to the compression constraint.

Results

Overall performance of LLaMA-2-7B compressed by ERC-SVD and baselines under 20% to 60% compression ratios, including performance on three language modeling datasets (measured by perplexity (↓)) and zero-shot performance on seven common sense reasoning datasets (measured by individual and average accuracy (↑)). The best results are marked in bold. NaN denotes evaluation failure due to numerical instability. * refers to results derived from the original paper. - means that results are not available.

Visual question answering outputs generated by LLaVA-1.5-7B compressed using ERC-SVD under 20% compression ratio. Questions (Q) and model answers (A) are provided, correct answers are highlighted in orange to emphasize answer quality retention.

BibTeX

@inproceedings{bai2026ercsvd, title={ERC-SVD: Error-Controlled SVD for Large Language Model Compression}, author={Bai, Haolei and Jian, Siyong and Liang, Tuo and Yin, Yu and Wang, Huan}, booktitle={CPAL}, year={2026} } @article{bai2025ressvd, title={Ressvd: Residual compensated svd for large language model compression}, author={Bai, Haolei and Jian, Siyong and Liang, Tuo and Yin, Yu and Wang, Huan}, journal={arXiv preprint arXiv:2505.20112}, year={2025} }