Efficient discovery of de-identification policy options through a risk-utility frontier

Xia, W.; Heatherly, R.; Ding, X.; Li, J.; Malin, B.

doi:10.1145/2435349.2435357

Efficient discovery of de-identification policy options through a risk-utility frontier

Date

2013

Authors

Xia, W.

Heatherly, R.

Ding, X.

Li, J.

Malin, B.

Type:

Conference paper

Citation

Proceeding of ACM Conference on Data and Application Security and Privacy (CODASPY), 2013, vol.2013, pp.59-70

Conference Name

3rd ACM Conference on Data and Application Security and Privacy (18 Feb 2013 - 20 Feb 2013 : San Antonio, TX, USA)

DOI

10.1145/2435349.2435357

Abstract

Modern information technologies enable organizations to capture large quantities of person-specic data while providing routine services. Many organizations hope, or are legally required, to share such data for secondary purposes (e.g., validation of research findings) in a deidentified manner. In previous work, it was shown de-identification policy alternatives could be modeled on a lattice, which could be searched for policies that met a prespecived risk threshold (e.g., likelihood of re-identification). However, the search was limited in several ways. First, its definition of utility was syntactic - based on the level of the lattice - and not semantic - based on the actual changes induced in the resulting data. Second, the threshold may not be known in advance. The goal of this work is to build the optimal set of policies that trade-o between privacy risk (R) and utility (U), which we refer to as a R-U frontier. To model this problem, we introduce a semantic definition of utility, based on information theory, that is compatible with the lattice representation of policies. To solve the problem, we initially build a set of policies that define a frontier. We then use a probability guided heuristic to search the lattice for policies likely to update the frontier. To demonstrate the effectiveness of our approach, we perform an empirical analysis with the Adult dataset of the UCI Machine Learning Repository. We show that our approach can construct a frontier closer to optimal than competitive approaches by searching a smaller number of policies. In addition, we show that a frequently followed de-identification policy (i.e., the Safe Harbor standard of the HIPAA Privacy Rule) is suboptimal in comparison to the frontier discovered by our approach.

Description

Link to a related website: http://europepmc.org/articles/pmc4266184?pdf=render, Open Access via Unpaywall

Rights

Published Version

https://doi.org/10.1145/2435349.2435357

Persistent link to this record

https://hdl.handle.net/1959.8/156759

Full item page

Efficient discovery of de-identification policy options through a risk-utility frontier

Date

Authors

Editors

Advisors

Journal Title

Journal ISSN

Volume Title

Type:

Citation

Statement of Responsibility

Conference Name

DOI

Abstract

School/Discipline

Dissertation Note

Provenance

Description

Access Status

Rights

License

Grant ID

Published Version

Call number

Persistent link to this record