The reverberation of speech signals can cause significant degradation in the perceptual quality of speech at the ear of a person or for an audio recording device. Reverberation is the effect caused by multiple reflections of sound off the walls, floor and ceiling of the enclosure from the sound source to the microphone. Mathematically, this is the convolution of the actual speech with the long acoustic impulse response (AIR) of the room.
This reverberation is responsible for the annoying acoustical problem of the so-called barrel effect. Audio examples are presented here to demonstrate that the recorded signal in a reverberative room sounds far away, like at the bottom of a barrel.
Another problem that can cause difficulties in speech intelligibility and audio recording is known as the cocktail party phenomenon, which is caused when multiple speech sources are present simultaneously, along with other background noise such as moving fans and street traffic. The undesired speech sources and background noise act as interference on the desired speech signal.
Our goal is to reduce the detrimental effects of the barrel effect and cocktail party phenomenon on speech through the use of blind signal processing techniques. These techniques attempt to determine either the unknown input sources (blind deconvolution for single source, blind source separation for multiple sources) or the unknown channel (blind identification) using only the observed measurement signals.
Although our primary focus is on audio applications, blind signal processing for deconvolution and source separation have application in a wide variety of fields, including:
The nature of typical AIR for real rooms, as shown in the figure below, introduces three major design issues to be addressed when developing a blind signal processing algorithm for dereverberation:
To solve the computational tractability issue, we make use of a special subbanding structure referred to as complex filter banks or Generalized Discrete Fourier Transform filter banks. These filter banks inherently suppress aliasing in the subbands, (provided certain requirements on the analysis filters are met) and thus (approximately) eliminate the need for cross-filters between subbands. The result is that one large computationally intractable problem can be decomposed into many, much smaller, tractable pieces, allowing the overall problem to be solved. Download paper
Sequential Monte Carlo (Particle Filtering) Methods:
The ill-conditioned nature of the problem can be alleviated by using SMC methods to estimate the speech waveform directly, rather than blindly estimating or equalizing the channel. The posterior distribution of the speech given the measurements for a dynamic Bayesian state space model can be estimated and tracked efficiently using SMC methods. Results using particle filter expected shortly, while results using MCMC techniques are available in the following paper:
An important aspect of the algorithm design is verifying the performance using collected data from a real reverberative room. It is anticipated that three sources of data will available for performance evaluation: