Efficient Deep Neural Network Pruning: From Convolutional Networks to Large Language Models

Date

2025

Authors

Cheng, Hongrong

Editors

Advisors

Shi, Javen Qinfeng
Zhang, Miao (Harbin Institute of Technology, Shenzhen)
Zhang, Zhen

Journal Title

Journal ISSN

Volume Title

Type:

Thesis

Citation

Statement of Responsibility

Conference Name

Abstract

Neural network pruning is a widely used compression technique. However, minimal performance degradation can make pruning resource-intensive, requiring substantial computational and memory overhead. It becomes particularly challenging when resources are limited. This thesis delves into the development of effective and resource efficient pruning methods for convolutional neural networks and large language models. We begin by conducting an in-depth comparison analysis of contemporary neural network pruning methods, evaluating their effectiveness and efficiency. Our study includes qualitative and quantitative assessments across various pruning settings, such as unstructured vs. structured and one-shot vs. iterative. It covers models ranging from small and medium-sized to large-scale. This comprehensive analysis provides valuable guidance for developing efficient and effective pruning methods. Subsequently, we propose a computationally efficient channel pruning method called Influence Function based Second-Order (IFSO). Our findings show significant differences in loss changes with and without retraining, with the latter resulting in worse Top-1 accuracy. To address this issue computationally efficiently, we propose a closed-form estimator for true loss change without retraining per mask alteration inspired by influence functions. It allows simultaneous evaluation of all channels’ importance. Extensive experiments validate the effectiveness of our method. Finally, we propose a Memory-effIcieNt structured prunIng procedure for LLMs (MINI-LLM). Specifically, we develop a hybrid pruning criterion called Feature Map Sensitivity (FMS) to integrate weight magnitude, activation, and gradient for scoring and employ zeroth-order optimization to estimate gradients solely through forward passes. Extensive experiments validate that MINI-LLM consistently outperforms gradient-free pruning methods and matches or surpasses back-propagation gradient-based methods while maintaining similar GPU memory to gradient-free methods.

School/Discipline

School of Computer and Mathematical Sciences

Dissertation Note

Thesis (Ph.D.) -- University of Adelaide, School of Computer and Mathematical Sciences, 2025

Provenance

This electronic version is made publicly available by the University of Adelaide in accordance with its open access policy for student theses. Copyright in this thesis remains with the author. This thesis may incorporate third party material which has been used by the author pursuant to Fair Dealing exceptions. If you are the owner of any included third party copyright material you wish to be removed from this electronic version, please complete the take down form located at: http://www.adelaide.edu.au/legals

Description

Access Status

Rights

License

Grant ID

Published Version

Call number

Persistent link to this record