Differentially private k-means clustering with convergence guarantee

Lu, Z.; Shen, H.

Please use this identifier to cite or link to this item: https://hdl.handle.net/2440/131721

Scopus	Web of Science®	Altmetric
Citations
?	?

Full metadata record

DC Field	Value	Language
dc.contributor.author	Lu, Z.	-
dc.contributor.author	Shen, H.	-
dc.date.issued	2021	-
dc.identifier.citation	IEEE Transactions on Dependable and Secure Computing, 2021; 18(4):1541-1552	-
dc.identifier.issn	1545-5971	-
dc.identifier.issn	1941-0018	-
dc.identifier.uri	http://hdl.handle.net/2440/131721	-
dc.description.abstract	Iterative clustering around representative points is an effective technique for clustering and helps us learn insights behind data to support various important applications. Unfortunately, it also provides security holes which may allow adversaries to infer the privacy of individuals with some background knowledge. To protect individual privacy against such inference attacks, preserving differential privacy for iterative clustering algorithms has been extensively studied. Existing differentially private clustering algorithms adopt the same framework to compute differentially private centroids iteratively by running Lloyd's $k$ -means algorithm to obtain the actual centroids, then perturbing them with a differential privacy mechanism. These algorithms suffer from the problem of no convergence guarantee, i.e., they provide no guarantee of termination at a solution of Lloyd's algorithm within a bounded number of iterations. This problem severely impacts their clustering quality and execution efficiency. To address this problem, this article follows the same centroid updating pattern as existing work in interactive settings; however we propose a novel framework for injecting differential privacy into the actual centroids. Specifically, to ensure convergence, we maintain the perturbed centroids of the previous iteration $t-1$ to compute a convergence zone for each cluster in the current iteration $t$ , where we inject differential privacy noise. To achieve a satisfactory convergence rate, we further control the orientation of centroid movement in each cluster using two strategies: one takes the orientation of centroid movement from iteration $t-1$ to iteration $t$ (past knowledge); the other uses the additional information of the orientation from iteration $t$ to iteration $t+1$ (future knowledge). We prove that, in the expected case, our algorithm (in both strategies) converges to a solution of Lloyd's algorithm in at most twice as many iterations as Lloyd's algorithm. Furthermore, when using both past and future knowledge, we prove that our algorithm converges to the same solution as Lloyd's algorithm (for the same initial centroids) with high probability, at the cost of a slower convergence speed compared to using only past knowledge due to duplicated operations in each iteration required for computing the future knowledge. We perform experimental evaluations on seven widely used real-world datasets. The experimental results show that our algorithm outperforms the state-of-the-art methods for interactive differentially private clustering with a guaranteed convergence and better clustering quality whilst meeting the same differential privacy requirements.	-
dc.description.statementofresponsibility	Zhigang Lu, Hong Shen	-
dc.language.iso	en	-
dc.publisher	IEEE	-
dc.rights	© 2020 IEEE.	-
dc.source.uri	http://dx.doi.org/10.1109/tdsc.2020.3043369	-
dc.subject	Differential privacy, machine learning; k-Means clustering	-
dc.title	Differentially private k-means clustering with convergence guarantee	-
dc.type	Journal article	-
dc.identifier.doi	10.1109/TDSC.2020.3043369	-
dc.relation.grant	http://purl.org/au-research/grants/arc/DP150104871	-
pubs.publication-status	Published	-
dc.identifier.orcid	Shen, H. [0000-0002-3663-6591] [0000-0003-0649-0648]	-
Appears in Collections:	Aurora harvest 8 Computer Science publications

Files in This Item:

There are no files associated with this item.

Show simple item record

Adelaide Research & Scholarship