Truth discovery via exploiting implications from multi-source data

Wang, X.; Sheng, Q.; Yao, L.; Li, X.; Fang, X.; Xu, X.; Benatallah, B.

Please use this identifier to cite or link to this item: https://hdl.handle.net/2440/110038

Scopus	Web of Science®	Altmetric
Citations
?	?

Type:	Conference paper
Title:	Truth discovery via exploiting implications from multi-source data
Author:	Wang, X. Sheng, Q. Yao, L. Li, X. Fang, X. Xu, X. Benatallah, B.
Citation:	Proceedings of the 25th ACM International Conference on Information and Knowledge Management, 2016, vol.24-28-October-2016, pp.861-870
Publisher:	ACM
Issue Date:	2016
ISBN:	9781450340731
Conference Name:	25th ACM International Conference on Information and Knowledge Management (CIKM) (24 Oct 2016 - 28 Oct 2016 : Indianapolis, IN)
Statement of Responsibility:	Xianzhi Wang, Quan Z. Sheng, Lina Yao, Xue Li, Xiu Susie Fang, Xiaofei Xu, and Boualem Benatallah
Abstract:	Data veracity is a grand challenge for various tasks on the Web. Since the web data sources are inherently unreliable and may provide con icting information about the same real-world entities, truth discovery is emerging as a counter- measure of resolving the con icts by discovering the truth, which conforms to the reality, from the multi-source data. A major challenge related to truth discovery is that different data items may have varying numbers of true values (or multi-truth), which counters the assumption of existing truth discovery methods that each data item should have exactly one true value. In this paper, we address this challenge by exploiting and leveraging the implications from multi-source data. In particular, we exploit three types of implications, namely the implicit negative claims, the distribution of positive/negative claims, and the co-occurrence of values in sources' claims, to facilitate multi-truth discovery. We propose a probabilistic approach with improvement measures that incorporate the three implications in all stages of truth discovery process. In particular, incorporating the negative claims enables multi-truth discovery, considering the distribution of positive/negative claims relieves truth discovery from the impact of sources' behavioral features in the specific datasets, and considering values' co-occurrence relationship compensates the information lost from evaluating each value in the same claims individually. Experimental results on three real-world datasets demonstrate the effectiveness of our approach.
Keywords:	Truth discovery; multiple true values; probabilistic model; imbalanced claims
Rights:	© 2016 ACM
DOI:	10.1145/2983323.2983791
Published version:	http://dx.doi.org/10.1145/2983323.2983791
Appears in Collections:	Aurora harvest 3 Computer Science publications

Files in This Item:

File	Description	Size	Format
RA_hdl_110038.pdf	Restricted Access	1.14 MB	Adobe PDF	View/Open

Show full item record

Adelaide Research & Scholarship