Please use this identifier to cite or link to this item: https://hdl.handle.net/2440/123956
Citations
Scopus Web of Science® Altmetric
?
?
Type: Journal article
Title: An efficient method for high quality and cohesive topical phrase mining
Author: Li, B.
Yang, X.
Zhou, R.
Wang, B.
Liu, C.
Zhang, Y.
Citation: IEEE Transactions on Knowledge and Data Engineering, 2019; 31(1):120-137
Publisher: IEEE
Issue Date: 2019
ISSN: 1041-4347
1558-2191
Statement of
Responsibility: 
Bing Li , Xiaochun Yang, Rui Zhou, Bin Wang, Chengfei Liu, and Yanchun Zhang
Abstract: A phrase is a natural, meaningful, and essential semantic unit. In topic modeling, visualizing phrases for individual topics is an effective way to explore and understand unstructured text corpora. However, from phrase quality and topical cohesion perspectives, the outcomes of existing approaches remain to be improved. Usually, the process of topical phrase mining is twofold: phrase mining and topic modeling. For phrase mining, existing approaches often suffer from order sensitive and inappropriate segmentation problems, which make them often extract inferior quality phrases. For topic modeling, traditional topic models do not fully consider the constraints induced by phrases, which may weaken the cohesion. Moreover, existing approaches often suffer from losing domain terminologies since they neglect the impact of domain-level topical distribution. In this paper, we propose an efficient method for high quality and cohesive topical phrase mining. A high quality phrase should satisfy frequency, phraseness, completeness, and appropriateness criteria. In our framework, we integrate quality guaranteed phrase mining method, a novel topic model incorporating the constraint of phrases, and a novel document clustering method into an iterative framework to improve both phrase quality and topical cohesion. We also describe efficient algorithmic designs to execute these methods efficiently. The empirical verification demonstrates that our method outperforms the state-of-the-art methods from the aspects of both interpretability and efficiency.
Keywords: Topical phrase mining; phrase mining; chunking; topic model
Rights: © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
DOI: 10.1109/TKDE.2018.2823758
Grant ID: http://purl.org/au-research/grants/arc/DP160102412
http://purl.org/au-research/grants/arc/DP170104747
61532021
61572122
U1736104
Appears in Collections:Aurora harvest 4
Computer Science publications

Files in This Item:
There are no files associated with this item.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.