Hourglass: Enabling Efficient Split Federated Learning with Data Parallelism

He, Q.; Wang, K.; Dong, Z.; Yuan, L.; Chen, F.; Jin, H.; Yang, Y.

doi:10.1145/3689031.3717467

Hourglass: Enabling Efficient Split Federated Learning with Data Parallelism

dc.contributor.author	He, Q.
dc.contributor.author	Wang, K.
dc.contributor.author	Dong, Z.
dc.contributor.author	Yuan, L.
dc.contributor.author	Chen, F.
dc.contributor.author	Jin, H.
dc.contributor.author	Yang, Y.
dc.contributor.conference	Twentieth European Conference on Computer Systems (EuroSys) (30 Mar 2025 - 3 Apr 2025 : Rotterdam, Netherlands)
dc.date.issued	2025
dc.description.abstract	Federated learning (FL) has emerged as a promising solution for training machine learning (ML) models with privacy preservation. One of the key challenges is the computational burden on clients caused by training large-sized models. To tackle this challenge, researchers are trying to incorporate split learning into federated learning so that an ML model can be partitioned into two parts, one for training on clients and the other on a cloud server or an edge server. In current split FL systems, each client's server-side model partition is trained with an individual GPU on the fed server before model aggregation. This demands massive GPU resources and does not scale in real-world scenarios. This paper presents Hourglass, a new split FL system that trains clients' server-side model partitions on multiple GPUs with data parallelism. Unlike existing systems that maintain one model partition for each client and pass clients' intermediate features through corresponding model partitions, Hourglass maintains model partitions shared by clients and passes their intermediate features through GPUs in groups based on their differences. In this way, Hourglass prevents the overhead incurred by swapping model partitions in and out of GPUs and improves knowledge sharing between clients. Extensive experiments are conducted on four widely-used public datasets to evaluate the performance of Hourglass. The results demonstrate that, compared to state-of-the-art systems, Hourglass accelerates model convergence by up to 35.2x, and improves model accuracy by up to 9.28%.
dc.description.statementofresponsibility	Qiang He, Kaibin Wang, Zeqian Dong, Liang Yuan, Feifei Chen, Hai Jin, Yun Yang
dc.identifier.citation	Proceedings of the 20th European Conference on Computer Systems (EuroSys 2025), 2025, pp.1317-1333
dc.identifier.doi	10.1145/3689031.3717467
dc.identifier.isbn	979-8-4007-1196-1
dc.identifier.uri	https://hdl.handle.net/2440/145988
dc.language.iso	en
dc.publisher	Association for Computing Machinery (ACM)
dc.publisher.place	New York, NY, USA
dc.relation.grant	http://purl.org/au-research/grants/arc/DP200102491
dc.rights	© 2025 Copyright held by the owner/author(s). This work is licensed under a Creative Commons Attribution 4.0 International License.
dc.source.uri	https://dl.acm.org/doi/proceedings/10.1145/3689031
dc.subject	Split federated learning; data parallelism; machine learning system
dc.title	Hourglass: Enabling Efficient Split Federated Learning with Data Parallelism
dc.type	Conference paper
pubs.publication-status	Published

Files

Original bundle

Now showing 1 - 1 of 1

Name:: hdl_145988.pdf
Size:: 16.39 MB
Format:: Adobe Portable Document Format
Description:: Published version

Download

Collections

Research Outputs