Challenges in analyzing software documentation in Portuguese

Treude, C.; Prolo, C.; Filho, F.

Please use this identifier to cite or link to this item: https://hdl.handle.net/2440/109373

Scopus	Web of Science®	Altmetric
Citations
?	?

Type:	Conference paper
Title:	Challenges in analyzing software documentation in Portuguese
Author:	Treude, C. Prolo, C. Filho, F.
Citation:	Proceedings of the 29th Brazilian Symposium on Software Engineering, 2015, pp.179-184
Publisher:	IEEE
Issue Date:	2015
ISBN:	9781467392723
Conference Name:	IEEE Brazilian Symposium on Software Engineering (SBES) (21 Sep 2015 - 26 Sep 2015 : Belo Horizonte, Brazil)
Statement of Responsibility:	Christoph Treude, Carlos A. Prolo, Fernando Figueira Filho
Abstract:	Many tools that automatically analyze, summarize, or transform software artifacts rely on natural language processing tooling for the interpretation of natural language text produced by software developers, such as documentation, code comments, commit messages, or bug reports. Processing natural language text produced by software developers is challenging because of unique characteristics not found in other texts, such as the presence of code terms and the systematic use of incomplete sentences. In addition, texts produced by Portuguese-speaking developers mix languages since many keywords and programming concepts are referred to by their English name. In this paper, we provide empirical insights into the challenges of analyzing software artifacts written in Portuguese. We analyzed 100 question titles from the Portuguese version of Stack Overflow with two Portuguese language tools and identified multiple problems which resulted in very few sentences being tagged completely correctly. Based on these results, we propose heuristics to improve the analysis of natural language text produced by software developers in Portuguese.
Keywords:	Documentation; natural language processing.
Rights:	© 2015 IEEE
DOI:	10.1109/SBES.2015.27
Published version:	http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=7324164
Appears in Collections:	Aurora harvest 3 Computer Science publications

Files in This Item:

There are no files associated with this item.

Show full item record

Adelaide Research & Scholarship