Efficient Deep Neural Network Pruning: From Convolutional Networks to Large Language Models

Cheng, Hongrong

Efficient Deep Neural Network Pruning: From Convolutional Networks to Large Language Models

dc.contributor.advisor	Shi, Javen Qinfeng
dc.contributor.advisor	Zhang, Miao (Harbin Institute of Technology, Shenzhen)
dc.contributor.advisor	Zhang, Zhen
dc.contributor.author	Cheng, Hongrong
dc.contributor.school	School of Computer and Mathematical Sciences
dc.date.issued	2025
dc.description.abstract	Neural network pruning is a widely used compression technique. However, minimal performance degradation can make pruning resource-intensive, requiring substantial computational and memory overhead. It becomes particularly challenging when resources are limited. This thesis delves into the development of effective and resource efficient pruning methods for convolutional neural networks and large language models. We begin by conducting an in-depth comparison analysis of contemporary neural network pruning methods, evaluating their effectiveness and efficiency. Our study includes qualitative and quantitative assessments across various pruning settings, such as unstructured vs. structured and one-shot vs. iterative. It covers models ranging from small and medium-sized to large-scale. This comprehensive analysis provides valuable guidance for developing efficient and effective pruning methods. Subsequently, we propose a computationally efficient channel pruning method called Influence Function based Second-Order (IFSO). Our findings show significant differences in loss changes with and without retraining, with the latter resulting in worse Top-1 accuracy. To address this issue computationally efficiently, we propose a closed-form estimator for true loss change without retraining per mask alteration inspired by influence functions. It allows simultaneous evaluation of all channels’ importance. Extensive experiments validate the effectiveness of our method. Finally, we propose a Memory-effIcieNt structured prunIng procedure for LLMs (MINI-LLM). Specifically, we develop a hybrid pruning criterion called Feature Map Sensitivity (FMS) to integrate weight magnitude, activation, and gradient for scoring and employ zeroth-order optimization to estimate gradients solely through forward passes. Extensive experiments validate that MINI-LLM consistently outperforms gradient-free pruning methods and matches or surpasses back-propagation gradient-based methods while maintaining similar GPU memory to gradient-free methods.
dc.description.dissertation	Thesis (Ph.D.) -- University of Adelaide, School of Computer and Mathematical Sciences, 2025	en
dc.identifier.uri	https://hdl.handle.net/2440/144761
dc.language.iso	en
dc.provenance	This electronic version is made publicly available by the University of Adelaide in accordance with its open access policy for student theses. Copyright in this thesis remains with the author. This thesis may incorporate third party material which has been used by the author pursuant to Fair Dealing exceptions. If you are the owner of any included third party copyright material you wish to be removed from this electronic version, please complete the take down form located at: http://www.adelaide.edu.au/legals	en
dc.subject	deep neural network pruning
dc.subject	influence function
dc.subject	model compression
dc.subject	convolutional networks
dc.subject	large language models
dc.title	Efficient Deep Neural Network Pruning: From Convolutional Networks to Large Language Models
dc.type	Thesis	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Cheng2025_phD.pdf
Size:: 2.2 MB
Format:: Adobe Portable Document Format

Download

Collections

Research Theses