Efficient Deep Neural Network Pruning: From Convolutional Networks to Large Language Models
dc.contributor.advisor | Shi, Javen Qinfeng | |
dc.contributor.advisor | Zhang, Miao (Harbin Institute of Technology, Shenzhen) | |
dc.contributor.advisor | Zhang, Zhen | |
dc.contributor.author | Cheng, Hongrong | |
dc.contributor.school | School of Computer and Mathematical Sciences | |
dc.date.issued | 2025 | |
dc.description.abstract | Neural network pruning is a widely used compression technique. However, minimal performance degradation can make pruning resource-intensive, requiring substantial computational and memory overhead. It becomes particularly challenging when resources are limited. This thesis delves into the development of effective and resource efficient pruning methods for convolutional neural networks and large language models. We begin by conducting an in-depth comparison analysis of contemporary neural network pruning methods, evaluating their effectiveness and efficiency. Our study includes qualitative and quantitative assessments across various pruning settings, such as unstructured vs. structured and one-shot vs. iterative. It covers models ranging from small and medium-sized to large-scale. This comprehensive analysis provides valuable guidance for developing efficient and effective pruning methods. Subsequently, we propose a computationally efficient channel pruning method called Influence Function based Second-Order (IFSO). Our findings show significant differences in loss changes with and without retraining, with the latter resulting in worse Top-1 accuracy. To address this issue computationally efficiently, we propose a closed-form estimator for true loss change without retraining per mask alteration inspired by influence functions. It allows simultaneous evaluation of all channels’ importance. Extensive experiments validate the effectiveness of our method. Finally, we propose a Memory-effIcieNt structured prunIng procedure for LLMs (MINI-LLM). Specifically, we develop a hybrid pruning criterion called Feature Map Sensitivity (FMS) to integrate weight magnitude, activation, and gradient for scoring and employ zeroth-order optimization to estimate gradients solely through forward passes. Extensive experiments validate that MINI-LLM consistently outperforms gradient-free pruning methods and matches or surpasses back-propagation gradient-based methods while maintaining similar GPU memory to gradient-free methods. | |
dc.description.dissertation | Thesis (Ph.D.) -- University of Adelaide, School of Computer and Mathematical Sciences, 2025 | en |
dc.identifier.uri | https://hdl.handle.net/2440/144761 | |
dc.language.iso | en | |
dc.provenance | This electronic version is made publicly available by the University of Adelaide in accordance with its open access policy for student theses. Copyright in this thesis remains with the author. This thesis may incorporate third party material which has been used by the author pursuant to Fair Dealing exceptions. If you are the owner of any included third party copyright material you wish to be removed from this electronic version, please complete the take down form located at: http://www.adelaide.edu.au/legals | en |
dc.subject | deep neural network pruning | |
dc.subject | influence function | |
dc.subject | model compression | |
dc.subject | convolutional networks | |
dc.subject | large language models | |
dc.title | Efficient Deep Neural Network Pruning: From Convolutional Networks to Large Language Models | |
dc.type | Thesis | en |
Files
Original bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- Cheng2025_phD.pdf
- Size:
- 2.2 MB
- Format:
- Adobe Portable Document Format