Noise Reduction in Car Speech

This paper presents properties of chosen multichannel algorithms for speech enhancement in a noisy environment. These methods are suitable for hands-free communication in a car cabin. Criteria for evaluation of these systems are also presented. The criteria consider both the level of noise suppression and the level of speech distortion. The performance of multichannel algorithms is investigated for a mixed model of speech signals and car noise and for real signals recorded in a car.


Introduction
This paper presents some possible ways of speech enhancement in a car cabin.This task is a very important for speech control of devices in a car or for mobile communication.Both of these applications contributes to greater traffic safety.
Multichannel methods of digital signal processing can be successfully used for speech enhancement.This class of methods outperforms single channel methods and achieves greater noise suppression.

Spatial filtering
A microphone array is a basic part of multichannel processing.A uniformly spaced microphone array is the simplest arrangement.The input acoustic signal is sampled in space due to microphone spacing and in time.It is possible to distinguish the signals coming from different directions thanks to spatial sampling.
An input multichannel signal x[n] can be described as a mixture of the desired signal and interference.Most multichannel systems are described under several assumptions.A model of a multichannel signal is introduced.First, the microphone array is focused to the Direction Of Arrival of the desired signal (DOA).Second, it is assumed that the source signal is far enough from the array.The input acoustic signal can be assumed to be a plane wave [9].The input signal at the m-th channel can be expressed as where s[n] denotes the n-th sample of the desired signal, and u m [n] denotes the noise and interference at the m-th sensor.

Interference in multichannel systems
Three types of interference are usually considered in a where )denotes the Power Spectral Density (PSD) of a signal in the j-th channel and f )the CrossPower Spectral Density (CPSD) of signals in the i-th and the j-th channel.The Magnitude Squared Coherence (MSC), defined as is also often used.
The type of interference is distinguished according to the shape of MSC( ) e j T w .Three types of interference are recognized: spatial coherent, spatial incoherent and diffusive interference.

Spatial coherent interference
First, let us consider a plane wave reaching an array of two microphones under angle j c .This situation is illustrated in Fig. 1.The spectrum of the signal at sensor 2 is X e j T 2 ( ) w .The wavefront reaching sensor 1 is attenuated by a constant A and delayed by where c denotes the propagation speed of an acoustic signal and D denotes sensor spacing.The spectrum of the signal at sensor 1 is given by Substituting ( 5) into (2) results in an expression for the coherence function: ( 7 )

Spatial incoherent interference
In case of spatial incoherent interference, the coherence computed from samples obtained at two different points in space is equal to zero in the whole frequency band, because X 1 and X 2 denote the spectra of two interferences and the asterisk denotes complex conjugate.Incoherent noise is represented by electrical noise in microphones.

Spatial diffuse interference
A reverberant environment is often encountered where many reflections occur.The delayed reflected signal reaches the array together with the direct wave.The characteristics of the delayed signal (magnitude and phase) depend on the acoustic properties of the given environment, e.g. a car cabin.This type of interference is very often present in real environments, and it is called spatial diffuse interference.
Diffuse noise can be modelled by an infinite number of independent sources distributed on a sphere [3].A formula for coherence derived from this model is given by G 12 ( ) where w denotes angular frequency and D and c have been defined above.A shape for diffuse noise is depicted in Fig. 2.
The shapes are depicted for microphone spacing D = 5cm, 10 cm and 20 cm.An analysis of equation ( 8) and Fig. 2 shows that the closer together the microphones are placed, the wider the main lobe of MSC is.An analysis of noise recorded in a car cabin revealed interference of a diffuse nature.Fig. 3 depicts the shapes of MSC for various distances of microphones.The shapes are very close to the model of diffuse noise.

Processing in the frequency domain
Algorithms of multichannel processing can be implemented in the time or frequency domain.The basic algorithms, e.g.GSC [7], operate in the time domain.A speech signal cannot be supposed stationary, so adaptive algorithms are used.The coefficients of adaptive filters are usually controlled by the LMS algorithm.However, advanced algorithms require processing in the frequency domain.
A block diagram of processing in the frequency domain is depicted in Fig. 4. First, the input signal is divided into quasi--stationary overlapping segments.Moreover, each segment is weighted by a Hamming window.A typical segment length is 16 ms.Second, a short time spectrum is computed.Third, the short-time spectra are processed.An input signal is finally obtained using the inverse Fourier transform and the Overlap and Add (OLA) method [14].
Weight adaptation is performed block by block.The adaptation is performed according to Minimum Mean Square Error (MMSE).The advantage of this approach is that the weights in each frequency band change according to the power of the noise in a particular band.

Beamforming algorithms
The performance of four algorithms will be presented in this paper.Their principles will be explained in this section.The following algorithms will be presented: Beamformer with Adaptive Postprocessing (BAP) [16], Generalised Sidelobe Canceler (GSC) [7], Linearly Constrained Beamformer (LCB) [5] and Modified Coherence Filtering (MCF) [10].

BAP
Delay And Sum beamformer (DAS) is the first block of BAP [16].The output of this block Y b is an average of the input channels.Weights w i are equal to 1 M. BAP improves the DAS beamformer by using a Wiener Filter (WF) behind the DAS structure, Fig. 5.The main contribution of WF is in improving the suppression level of uncorrelated interferences.The derivation for the weights of WF can be found in [15] In the case of the BAP structure, the PSDs in relation (11) are estimated by averaging the signal in a particular channel [13] { } where X e i j T

( )
w is a spectrum of the input signal.

GSC
The Structure of GSC [7] is depicted in Fig. 6.It is equal to the Adaptive Beamformer [6].The system consists of the DAS beamformer and the Adaptive Noise Canceler (ANC).ANC serves to suppress the coherent interference.
The weights of ANC filters are in accordance with Wiener theory [7].A formula for optimal weights is given by )denotes the CPSD of signals Y i and Y w , the meaning of which is obvious from Fig. 6.
The Proper function of the ANC is given by perfect separation of the desired signal from the input signal.Let us denote any coherent signal incident on the array from any This separation is arranged by the Blocking Matrix (BM).
The most commonly used BM differentiates neighbouring channels.BM consists of M columns, and (M -1) rows, and looks like this [7]:
The direct branch composed of BAP suppresses incoherent interference.The lower branch consisting of ANC is responsible for coherent interference suppression.
The greatest difference between GSC and LCB is the way in which the weights of the ANC filters are computed.In LCB they are computed from signals at the outputs of BM and WF.
The relation for calculating ANC filters has to be written as f Y Y i w denotes the CPSD of signals Y i and Y w the meaning of which is obvious from Fig. 7.

Coherence Filtering
Coherence Filtering differs from the other multichannel systems.It is a representative of double channel methods.The idea of this method [2] is based on the fact that the coherence function of the spatially coherent desired signal is close to one, and the coherence of the incoherent interference is close to zero.
The authors of [10] propose a modification to Coherence Filtering.The Coherence Filter is included in the BAP structure, see Fig. 8.The coefficients of the Modified Coherence Filter (MCF) C(k) are computed as follows where W(k) denotes an estimated frequency response of the Wiener filter, equation ( 9

Criteria for system evaluation
The criteria for assessing the level of speech enhancement can be classified into two classes, objective and subjective.The  is the PSD of the interference processed by the system.The assumption for NR calculation is that no desired signal is present at the input of the system.
NR considers only the influence of the system on the interference.It does not consider the influence on the desired signal.This criterion has to be combined with other criteria.

Log Area Ratio
LAR [12] takes into account the influence of the system on the desired signal and speech intelligibility.An advantage of this criterion is its high correlation with listening tests [4].A presumption when using this criterion is the presence of speech.LAR is calculated on the basis of the partial correlation coefficients (PARCOR) of the auto regressive model [8].
Computing LAR requires a clear speech signal s[n] and an output signal y s [n].The computing is performed in the following steps: 1. Estimation of PARCOR coefficients k(p, l) of the signal segment.Index p denotes the p-th PARCOR coefficient and l the signal segment.The order of the model is chosen as P =12.A Burg algorithm can be used for estimating the coefficients [8].

Calculation of area coefficients
where k(p, l) is a p-th PARCOR coefficient of the l-th segment.(PARCOR coefficients k(p, l) are marked in some sources [11] as a negative of reflection coefficients.)

Experiments
Two approaches were used to verify the algorithms presented in this paper.First, a model of the desired signal and noise recorded in a car were used as an input mixture.A model of the desired signal was created by copying the clear speech signal into all channels.The purpose of this approach is to verify the performance of the algorithm.The influence of the properties of the microphone array is not considered.
Breaking the assumptions mentioned in section 3 introduces additional delays of signals between the individual microphones.Additional delays can be due to the fact that the acoustic signals cannot be represented by plane waves, and due to array imperfections.Solving these problems is a separate issue.
The purpose of the second experiment is to show the properties of the whole system.It should show that the properties of the array are significant and that it is worth taking them into account.
Each of the experiments was performed for two different environments.The first environment was a standing car with a running engine, and the second environment was a car moving outside a village (70 km/h).The criteria NR, LAR and SNRE were calculated for segments of 128 samples.An mean value was calculated for each criterion.
The experiments were performed for an array of 4 microphones with 4 cm spacing and SNR in was set to 0 dB.The sample rate was 8 kHz.The parameters of MCF were set to T = 02 .and a = 2. Tables 1 and 2 show the results for a model of the desired signal.The results for a real signal are displayed in Tables 3   and 4.
The experiment with a model of the signal was done for different values of SNR in .The results are summarized in the graphs in Figs. 10, 11 and 12.

Conclusion
The experiments enabled a comparison of the methods for speech enhancement presented here.All of the algorithms showed much worse performance for real signals.There is both high speech distortion and low enhancement.There are no so significant differences between a standing car and a moving car.NR for BAP and LCB is an exception.The lowest values of LAR and SNRE are for BAP and MCF.
The second experiment focused on the influence of SNR in on the results.The shape of NR (Fig. 10) reveals strong dependance on SNR in for GSC and LCB.The NR of LCB falls below BAP, and GSC falls below MCF for high values of SNR in .
The shape of SNRE (Fig. 12) shows a very similar trend.BAP and MCF are almost independent of SNRin with respect to both NR and SNRE.
Only BAP, LCB and MCF can be considered when observing LAR (Fig. 11).GSC does not distort speech in the case of a model of the input signal, due to perfect separation of the desired signal.BAP and LCB have the same shape of LAR for the same reason.The highest speech distortion was for MCF.The figure also shows that speech distortion decreases with growing SNR in .
This paper has shown the properties of selected algorithms for speech enhancement in a noisy environment.The experiments with a model of the input signal showed that these methods are capable of speech enhancement.A problem occurred when the methods were used for real signals.
The assumptions of proper functionality were broken in this case.The input signals did not match the model that the methods were developed for.It is necessary to focus on compensating the array imperfections and signal propagation in future work.
multichannel system.A criterion for classification is the coherence function G( ) e j T w .This function expresses the reciprocal dependency (correlation) of particular signals in individual frequency bands.The Coherence Function G ij j T e ( ) w of two signals is defined by the relation [14]

Fig. 4 :
Fig. 4: Block diagram of processing in the frequency domain

Fig. 6 :Fig. 7 :Fig. 8 :
Fig. 6: GSC subjective criteria are represented by listening tests.Listening tests are very difficult.It is necessary to gather several qualified listeners.The test also consumes a great deal of time.However, these tests can show how the output signals are perceived by human subjects.Objective criteria give exact information and are not influenced by external factors, e.g. the mood of the listener.The following criteria will be used for evaluating the algorithms: Noise Reduction (NR), Log Area Ratio (LAR), Signal to Noise Ratio Enhancement (SNRE) and spectrograms.All of the criteria will be computed from quasi-stationary segments of the signal.

7. 1
Noise reductionNR expresses the ability of an algorithm to reduce noise.NR is defined as the PSD of the interference at the input of the system, and F y

3 .
Calculation of LAR for block l LAR expresses the "distance" of the model of signal s[n] from the model of signal y[n].The lower LAR is, the less the speech is distorted.

Fig. 11 :
Fig. 11: LAR for various SNR in . Weights in the frequency domain are obtained as

Table 1 .
The results are very different for a model of the desired signal and for a real sig-GSC behaves in an opposite way.MCF seems to have the weakest performance.It produced high speech distortion (high values of LAR) and low SNRE and NR.Zero speech distortion is worth noting in the case of GSC.This is due to perfect separation of the desired signal at the input of the ANC filters.

Table 2 :
Results for a model of a signal, running car (70 km/h)

Table 3 :
Results for a real signal, standing car

Table 4 :
Results for a real signal, running car (70 km/h)

Table 1 :
Results for a model of a signal, standing car Acta Polytechnica Vol.49 No. 2-3/2009 Fig. 10: NR for various SNR in