Mining source code topics through topic model and words embedding

Zhang, W.; Sheng, Q.; Abebe, E.; Ali Babar, M.; Zhou, A.

Please use this identifier to cite or link to this item: https://hdl.handle.net/2440/108612

Scopus	Web of Science®	Altmetric
Citations
?	?

Full metadata record

DC Field	Value	Language
dc.contributor.author	Zhang, W.	-
dc.contributor.author	Sheng, Q.	-
dc.contributor.author	Abebe, E.	-
dc.contributor.author	Ali Babar, M.	-
dc.contributor.author	Zhou, A.	-
dc.date.issued	2016	-
dc.identifier.citation	Lecture Notes in Artificial Intelligence, 2016, vol.10086 LNAI, pp.664-676	-
dc.identifier.isbn	9783319495859	-
dc.identifier.issn	0302-9743	-
dc.identifier.issn	1611-3349	-
dc.identifier.uri	http://hdl.handle.net/2440/108612	-
dc.description	LNCS, volume 10086	-
dc.description.abstract	Developers nowadays can leverage existing systems to build their own applications. However, a lack of documentation hinders the process of software system reuse. We examine the problem of mining topics (i.e., topic extraction) from source code, which can facilitate the comprehension of the software systems. We propose a topic extraction method, Embedded Topic Extraction (EmbTE), that considers word semantics, which are never considered in mining topics from source code, by leveraging word embedding techniques. We also adopt Latent Dirichlet Allocation (LDA) and Non-negative Matrix Factorization (NMF) to extract topics from source code. Moreover, an automated term selection algorithm is proposed to identify the most contributory terms from source code for the topic extraction task. The empirical studies on Github (https://github.com/) Java projects show that EmbTE outperforms other methods in terms of providing more coherent topics. The results also indicate that method name, method comments, class names and class comments are the most contributory types of terms to source code topic extraction.	-
dc.description.statementofresponsibility	Wei Emma Zhang, Quan Z. Sheng, Ermyas Abebe, M. Ali Babar, and Andi Zhou	-
dc.language.iso	en	-
dc.publisher	Springer	-
dc.rights	© Springer International Publishing AG 2016	-
dc.source.uri	http://dx.doi.org/10.1007/978-3-319-49586-6_47	-
dc.subject	Source code mining; Topic model; Word embedding	-
dc.title	Mining source code topics through topic model and words embedding	-
dc.type	Conference paper	-
dc.contributor.conference	International Conference on Advanced Data Mining and Applications (ADMA) (12 Dec 2016 - 15 Dec 2016 : Gold Coast, Qld)	-
dc.identifier.doi	10.1007/978-3-319-49586-6_47	-
pubs.publication-status	Published	-
dc.identifier.orcid	Zhang, W. [0000-0002-0406-5974]	-
Appears in Collections:	Aurora harvest 8 Computer Science publications

Files in This Item:

File	Description	Size	Format
RA_hdl_108612.pdf Restricted Access	Restricted Access	473.27 kB	Adobe PDF	View/Open

Show simple item record

Adelaide Research & Scholarship