Certus: An effective entity resolution approach with graph differential dependencies (GDDs)

dc.contributor.authorKwashie, S.
dc.contributor.authorLiu, J.
dc.contributor.authorLi, J.
dc.contributor.authorLiu, L.
dc.contributor.authorStumptner, M.
dc.contributor.authorYang, L.
dc.date.issued2018
dc.description.abstract<jats:p>Entity resolution (ER) is the problem of accurately identifying multiple, differing, and possibly contradicting representations of unique real-world entities in data. It is a challenging and fundamental task in data cleansing and data integration. In this work, we propose graph differential dependencies (GDDs) as an extension of the recently developed graph entity dependencies (which are formal constraints for graph data) to enable approximate matching of values. Furthermore, we investigate a special discovery of GDDs for ER by designing an algorithm for generating a non-redundant set of GDDs in labelled data. Then, we develop an effective ER technique, Certus, that employs the learned GDDs for improving the accuracy of ER results. We perform extensive empirical evaluation of our proposals on five real-world ER benchmark datasets and a proprietary database to test their effectiveness and efficiency. The results from the experiments show the discovery algorithm and Certus are efficient; and more importantly, GDDs significantly improve the precision of ER without considerable trade-off of recall.</jats:p>
dc.identifier.citationProceedings of the VLDB Endowment, 2018; 12(6):653-666
dc.identifier.doi10.14778/3311880.3311883
dc.identifier.issn2150-8097
dc.identifier.issn2150-8097
dc.identifier.orcidLiu, J. [0000-0002-0794-0404]
dc.identifier.urihttps://hdl.handle.net/11541.2/137224
dc.language.isoen
dc.publisherASSOC COMPUTING MACHINERY
dc.relation.fundingData to Decisions CRC (D2D CRC) DC160031 D2D CRC ILE Entity Linking & Resolution Project
dc.rightsCopyright 2019 The author(s). This work is licensed under the Creative Commons Attribution Non Commercial-No Derivatives 4.0 International License. To view a copy of this license, visit (http://creativecommons.org/licenses/by-nc-nd/4.0/)
dc.source.urihttps://doi.org/10.14778/3311880.3311883
dc.titleCertus: An effective entity resolution approach with graph differential dependencies (GDDs)
dc.typeJournal article
pubs.publication-statusPublished
ror.fileinfo12174963050001831 13174963040001831 p653-kwashie.pdf
ror.mmsid9916284676701831

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
9916284676701831_12174963050001831_p653-kwashie.pdf
Size:
846.33 KB
Format:
Adobe Portable Document Format
Description:
Published version

Collections