Privacy-Preserving Internet Traffic Publication

Date

2016

Authors

Guo, L.
Shen, H.

Editors

Advisors

Journal Title

Journal ISSN

Volume Title

Type:

Conference paper

Citation

Proceedings of the 15th IEEE International Conference On Trust,Security And Privacy In Computing And Communications,10th IEEE International Conference on Big Data Science and Engineering, 13th International Conference on Embedded Software and Systems (2016 IEEE Trustcom/BigDataSE/ISPA), 2016, pp.884-891

Statement of Responsibility

Longkun Guo, Hong Shen

Conference Name

15th IEEE International Conference On Trust,Security And Privacy In Computing And Communications,10th IEEE International Conference on Big Data Science and Engineering, 13th International Conference on Embedded Software and Systems (2016 IEEE Trustcom/BigDataSE/ISPA) (23 Aug 2016 - 26 Aug 2016 : Tianjin, China)

Abstract

As machine learning (ML)-based traffic classification develops, Internet traffic data is published in public to serve as test data. Although the IP addresses therein are anonymized, it is given explicitly which data belongs to an identical user. Then using the information, an adversary can identify a user from the anonymized users. The paper first gives a k-anonymity method to reduce the probability of information leak to P/k, where P is the probability of information leak without k-anonymity. Assume the number of the flows belonging to an IP address follows Normal distribution, the information loss is shown (μ2+σ2)/(kμ2+σ2), where μ and σ are respectively the mean and the variance of the Normal distribution. Later, random noise is added to further reduce the probability of information leak to P/k2, with an expected distortion rate of approximately 2d+log k-log|X|, where d is the number of dimensions and |X| is the number of the vectors. At last, real-world Internet traffic data is used to evaluate the utility of the anonymized traffic data. According to the experimental results, the k-anonymized noised data can be clustered with an overall accuracy rate close to the state-of-the-art results for non-anonymized traffic data.

School/Discipline

Dissertation Note

Provenance

Description

Access Status

Rights

© 2016 IEEE

License

Published Version

Call number

Persistent link to this record