Please use this identifier to cite or link to this item: https://hdl.handle.net/2440/64293
Citations
Scopus Web of Science® Altmetric
?
?
Type: Conference paper
Title: Approximately detecting duplicates for probabilistic data streams over sliding windows
Author: Wang, X.
Shen, H.
Citation: Proceedings of the 3rd International Symposium and Parallel Architectures, Algorithm and Programming, 2010 / Xianchao Zhang, Wenxin Liang, Hong Shen, and Guoliang Chen (eds.): pp.263-267
Publisher: IEEE
Publisher Place: USA
Issue Date: 2010
ISBN: 9780769543123
Conference Name: International Symposium on Parallel Architectures, Algorithm and Programming (3rd : 2010 : Dalian, China)
Statement of
Responsibility: 
Xiujun Wang, Hong Shen
Abstract: Proceedings of the 3rd International Symposium and Parallel Architectures, Algorithm and Programming, Xianchao Zhang, Wenxin Liang, Hong Shen, and Guoliang Chen (eds.): pp.63-267Abstract-A probabilistic data stream S is defined as a sequence of uncertain tuples <;ti, pi >;, i = 1...∞, with the semantics that element ti occurs in the stream with probability pi ϵ (0, 1). Thus each distinct element t, which occurs in tuples of S, has an existential probability based on the tuples: <; ti = t, pi >; ϵ S. Existing duplicate detection methods for a traditional deterministic data stream can't maintain these existential probabilities for elements in S, which is important query information. In this paper, we present a novel data structure, Floating Counter Bloom Filter (FCBF), as an extension of CBF, which can maintain these existential probabilities effectively. Based on FCBF, we present an efficient algorithm to approximately detect duplicates for probabilistic data streams over sliding windows. Given a sliding window size W and floating counter number N, for any t which occurs in the past sliding window, our method outputs the accurate existential probability of t with probability 1-(1/2)ln(2)*N/W. Our experimental results on the synthetic data verify the effectiveness of our approach.
Keywords: Counting Bloom Filter
Duplicate Detection
False Positive
Floating Counter Bloom Filter
Probabilistic Data Stream
Rights: © 2010 IEEE
DOI: 10.1109/PAAP.2010.16
Published version: http://dx.doi.org/10.1109/paap.2010.16
Appears in Collections:Aurora harvest
Computer Science publications

Files in This Item:
There are no files associated with this item.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.