Acoustic Scene Classification Using Deep Residual Networks with Late Fusion of Separated High and Low Frequency Paths
Date
2020
Authors
McDonnell, M.D.
Gao, W.
Editors
Advisors
Journal Title
Journal ISSN
Volume Title
Type:
Conference paper
Citation
Proceedings of the ... IEEE International Conference on Acoustics, Speech, and Signal Processing / sponsored by the Institute of Electrical and Electronics Engineers Signal Processing Society. ICASSP (Conference), 2020, vol.2020-May, pp.141-145
Statement of Responsibility
Conference Name
ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (4 May 2020 - 8 May 2020 : Barcelona, Spain)
Abstract
We investigate the problem of acoustic scene classification, using a deep residual network applied to log-mel spectrograms complemented by log-mel deltas and delta-deltas. We design the network to take into account that the temporal and frequency axes in spectrograms represent fundamentally different information. In particular, we use two pathways in the residual network: one for high frequencies and one for low frequencies, that were fused just two convolutional layers prior to the network output. We conduct experiments using two public 2019 DCASE datasets for acoustic scene classification; the first with binaural audio inputs recorded by a single device, and the second with single-channel audio inputs recorded through various devices. We show the performance of our models are significantly enhanced by the use of log-mel deltas, and that overall our approach is capable of training strong single models, without use of any supplementary data from outside the official challenge dataset, with excellent generalization to unknown devices. In particular, our approach achieved second place in 2019 DCASE Task 1b (0.4% behind the winning entry), and the best Task 1B evaluation results (by a large margin of over 5%) on test data from a device not used to record any training data.
School/Discipline
Dissertation Note
Provenance
Description
Access Status
Rights
Copyright 2020 IEEE