Please use this identifier to cite or link to this item:
|Scopus||Web of Science®||Altmetric|
|Title:||A topic model based on poisson decomposition|
|Citation:||Proceedings of the ACM Conference on Information and Knowledge Management (CIKM 2017), 2017 / vol.Part F131841, pp.1489-1498|
|Publisher:||Association for Computing Machinery|
|Publisher Place:||New York, NY, USA|
|Conference Name:||ACM Conference on Information and Knowledge Management (CIKM 2017) (06 Nov 2017 - 10 Nov 2017 : Singapore, SINGAPORE)|
|Haixin Jiang, Rui Zhou, Limeng Zhang, Hua Wang, Yanchun Zhang|
|Abstract:||Determining appropriate statistical distributions for modeling text corpora is important for accurate estimation of numerical charac- teristics. Based on the validity of the test on a claim that the data conforms to Poisson distribution we propose Poisson decomposi- tion model (PDM), a statistical model for modeling count data of text corpora, which can straightly capture each document’s mul- tidimensional numerical characteristics on topics. In PDM, each topic is represented as a parameter vector with multidimensional Poisson distribution, which can be easily normalized to multino- mial term probabilities and each document is represented as mea- surements on topics and thereby reduced to a measurement vec- tor on topics. We use gradient descent methods and sampling al- gorithm for parameter estimation. We carry out extensive experi- ments on the topics produced by our models. The results demon- strate our approach can extract more coherent topics and is com- petitive in document clustering by using the PDM-based features, compared to PLSI and LDA.|
|Keywords:||Topic model; Poisson decomposition; statistical testing; text classi- fication; topic coherence|
|Description:||Session 8B: Text Analysis|
|Rights:||© 2017 Association for Computing Machinery.|
|Appears in Collections:||Computer Science publications|
Files in This Item:
There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.