Please use this identifier to cite or link to this item: https://hdl.handle.net/2440/129622
Type: Thesis
Title: Analysis of World War One Diaries using Natural Language Processing
Author: Dennis-Henderson, Ashley Grace
Issue Date: 2020
School/Discipline: School of Mathematical Sciences
Abstract: World War I was a significant event in Australian history and as such it has been extensively researched. The analysis of relevant primary sources has included the close reading of war diaries. Close reading involves reading the diaries to understand what the soldiers went through. With the advancement of computational techniques, we now have the ability to analyse large volumes of text, and this concept is known as distant reading. This project focuses on 557 Australian World War I diaries collected and transcribed by the State Library of New South Wales, and aims to use distant reading methods to determine what the soldiers wrote about and how they felt over the course of the war. In order to perform our analysis over time, we first needed to extract dates from the diaries. This was done using regular expressions. However, some problems were found in the extracted dates, including missing data such as the month or year, dates which were written incorrectly, and dates that were actually within the text referring to events that happened rather than the date the diaries were written. Hence, an optimisation program was formed to fix these problems and give more accurate information about when the diaries were written. We then considered several types of analysis to understand what the soldiers wrote about, including word frequencies, tf-idf (term frequency - inverse document frequency), and topic modelling. It was found that whilst all three of these techniques gave results that would be expected when considering World War I diaries, they also showed different aspects of the war. In particular, through considering the tf-idf results for 1916 we see many words regarding places and battles in the Middle East. However, when considering topic modelling for this time period we see more words regarding Europe. Sentiment analysis, more specifically dictionary-based methods, was then used to understand the emotions of the soldiers over time. Using our dictionaries, each month was given an overall sentiment score from -1 (very negative) to +1 (very positive). It was found that the average sentiment of the diaries ranged between 0 and 0.2. We were also able to compare this to our topic modelling results to determine which topics corresponded to peaks and dips in our sentiment.
Advisor: Mitchell, Lewis
Roughan, Matthew
Tuke, Simon
Dissertation Note: Thesis (MPhil) -- University of Adelaide, School of Mathematical Sciences, 2020
Keywords: World War 1
sentiment analysis
topic modeling
diaries
digital humanities
distant reading
Provenance: This electronic version is made publicly available by the University of Adelaide in accordance with its open access policy for student theses. Copyright in this thesis remains with the author. This thesis may incorporate third party material which has been used by the author pursuant to Fair Dealing exceptions. If you are the owner of any included third party copyright material you wish to be removed from this electronic version, please complete the take down form located at: http://www.adelaide.edu.au/legals
Appears in Collections:Research Theses

Files in This Item:
File Description SizeFormat 
Dennis-Henderson2020_MPhil.pdf6.03 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.