Voice activity classification using beamformer-output-ratio
Date
2012
Authors
Tran, T.N.
Cowley, W.
Pollok, A.
Editors
Advisors
Journal Title
Journal ISSN
Volume Title
Type:
Conference paper
Citation
2012 Australian Communications Theory Workshop (AusCTW), 2012, pp.96-101
Statement of Responsibility
Conference Name
2012 Australian Communications Theory Workshop (AusCTW) (30 Jan 2012 - 2 Feb 2012 : Wellington, NZ)
Abstract
In a conversation between multiple speakers, each person participates in the speech at different times. Therefore the active speakers in each speech segment are unknown. However, identifying the voice activity (VA) of the speakers of interest is required for adaptive beamforming techniques such as minimum variance distortionless response beamforming and the adaptive blocking beamforming (AB). Considering two speakers, this paper addresses a voice activity classification (VAC) problem that focuses on identifying the active speaker(s) in each speech segment. The proposed method is based on a new concept, the beamformer-output-ratio (BOR). This value is calculated from the outputs of two different beamformers steering at two speakers. The first part of the paper introduces the definition of BOR, the VAC method using BOR and simulation results. The simulations are based on real recordings and show a high classification accuracy. In the second part of the paper, the theoretical results of the BOR of the delay-and-sum (DS) beamforming are presented, including BOR formula derived in different environments and its behaviour in relation to parameter errors.
School/Discipline
Dissertation Note
Provenance
Description
Access Status
Rights
Copyright 2012 IEEE