Shen, ChunhuaWu, QiWang, Hu2022-01-212022-01-212021https://hdl.handle.net/2440/134156Deep Reinforcement Learning (DRL) is a set of algorithms to deal with sequential decision problems. It mimics human learning behaviours through trial-and-error. Deep Reinforcement Learning has been developing steadily after decades of research and development. It has a wide range of applications that can deal with multi-modality data, such as robot navigation, traffic signal control, self-driving vehicles and etc. Recently, many impressive results have been achieved. However, it is not trivial to design a good state space, action space and reward function for reinforcement learning models to tackle a specific task. In this thesis, we propose a series of novel techniques to cope with challenging multi-modality data based on the idea of deep reinforcement learning. We first proposed a DRL solution to alleviate the traffic congestion problem in urban areas. To achieve this, we introduce a DRL model with an edge-weighted graph convolutional encoder and unified structure decoder with states, actions and reward functions settings. Additionally, we also propose a new dataset and settings, including synthetic and real traffic data in more complex scenarios. Behaviour cloning techniques directly copy the demonstrations from experts and it will inevitably incur worse performance on unseen environments due to error accumulation. Reinforcement learning based methods have better generalisation ability, but designing suitable reward functions towards a specific task is expensive. We thus propose a generic soft expert reward learning strategy to distil the expert’s behaviour directly and successfully apply it on Vision-and-Language Navigation. The experimental results show the effectiveness of the model. Finally, inspired by the exploration and expert’s behaviour distillation of deep reinforcement learning, we propose a generic random network prediction strategy on anomaly detection and clustering tasks in an unsupervised learning manner. We propose to learn important features without using any labelled data by training neural networks to predict data distances in a randomly projected space.enDeep learningreinforcement learningmulti-modalityMulti-modality Data Analysis Using Deep Reinforcement LearningThesis