Speech Signal Recovery in Communication Networks

Interpolation approaches to the shape recovery of a speech signal in transmission over packet switched communications networks are proposed. The samples of signal fragments are mixed and transmitted in correspondence with standard procedure for packet-switched transmission. After reception a reverse permutation is made. In the case of packet losses missing samples are separated by several samples of the source signal. Correlation properties of the signal are used for the recovery samples due to first- and second-order non-adaptive and adaptive interpolation. For the loss of 25 % packets and second order adaptive interpolation a 2- 4 % error distribution range has been achieved.


Introduction
In packet speech transmission over a network, part of the information is lost.If we are to preserve an acceptable quality of speech, the permissible percentage of losses is limited.The authors of [1] and other papers admit 1-3 % percentage loss.As a result, the average network load is limited.The authors of [2], [3] propose the use of a variable speech packet-encoding rate to enable smoothing of the effect of network overloads on the received speech quality.Paper [4] proposes the classification of speech segments in accordance with their structure.Packets belonging to different classes are assigned different priorities of delivery.When network overloads appear, packets with lower priority are discarded first.At the receiving end, regeneration of lost packets is performed.
The articles mentioned above deal mainly with information transfer based on the encoding of speech signal parameters.This paper deals with transfer based on waveform encoding using pulse-code modulation.
Section 2 considers the main principle of loss recovery in waveform encoding.Section 3 analyses first-order interpolation methods.Section 4 analyses the possibilities of second-order interpolation.Section 5 discusses some experimental results on the recovery of phrases and suggest some future investigations.

Basic interpolation principle
For channels with error bursts, the sequence is often subjected to an alternation prior to transmission and is recovered at the receiving end.In this case the errors are distributed in a more uniform manner [5].We will use this principle for packet speech transmission.The source sequence of the samples of a signal segment is memorized, and some permutation of the samples is made, which is followed by separation into packets, and then transfer.At the receiving end a reverse procedure is performed.If a packet is lost during transmission, the lost samples are separated by one or several samples of the source signal.Such a procedure enables the recovery of losses as a result of correlation.
Interpolation of samples was applied in [6].A signal is divided into two groups of even and odd samples.Each group is shaped into a packet, following reverse permutation, and in the case of the loss of one packet, one source sample appears between the missing samples.This allows the applicability extrapolation and first-order interpolation.The "Odd-even" alternation allows only the correlation of the neighbouring samples to be used for the recovery.Using the information on a greater number of samples, the value of the lost samples can be recovered more accurately.To do this, the samples of the source signal have to be interchanged on a segment containing more than two packets.For example, using a block encoder, the sequence of samples is written into an n×m matrix column-wise and is read row-wise.Having defined the length of row n as being equal to the packet length, after the reading we will obtain m packets.If one of the m packets is lost, then after the reverse permutation the lost samples will be separated by m-l samples of the source signal.In this case interpolation procedures ranging from the zero order and up to the (m-l)-th order may be applied for the recovery.

First-order interpolation
We will assume that not more than one packet is randomly lost from the sequence of signal samples consisting of packets.The sequence X i (i = 1, 2, ...) of the centered signal X(t) is transmitted.The reception and the reverse permutation follows.In the case of the loss of one packet, the received signal X i differs from X i only in the points t i = t j , where the samples are missing.In this case the probability is The mean square error of the transmission of a sequence consisting of m packets is If the estimate X i of the values of the missing samples is equated to zero, then: where R XX (0) is the value of the correlation function for a zero shift.
In the recovery of the values of X i on the basis of the first-order prediction X X It is usually assumed that coefficient a = 1.Then, in accordance with (1), the mean square error will be: where R XX (1) is the value of the correlation function of signal X(t) with the shift equal to Dt.

Speech Signal Recovery in Communication Networks
V. Zagursky, A. Riekstinsh Interpolation approaches to the shape recovery of a speech signal in transmission over packet switched communications networks are proposed.The samples of signal fragments are mixed and transmitted in correspondence with standard procedure for packet-switched transmission.After reception a reverse permutation is made.In the case of packet losses missing samples are separated by several samples of the source signal.Correlation properties of the signal are used for the recovery samples due to first-and second-order non-adaptive and adaptive interpolation.For the loss of 25 % packets and second order adaptive interpolation a 2-4 % error distribution range has been achieved.
For the first-order interpolation: X a X a X For the first-order interpolation it is usually assumed that a a a . For the most commonly used procedure a = 0.5 (linear interpolation).This procedure was referred to in [6] as non-adaptive, as well as the prediction with a = 1.For X X X . ( ) the mean square error will be: Let us compare expressions (3) and ( 5): [ ] Expression (6) yields that interpolation provides a smaller recovery error, as compared with prediction, since for any random signals R XX (2) < R XX (0).
The error can be minimized by applying the adaptive approach, as was demonstrated in [6], by calculating the error for interpolation with the use (4) and by determining the minimal error depending on the coefficients a -1 and a +1 , we will obtain: where 0 is the correlation coefficient.The figure 1 shows the charts of the errors of recovering a signal which corresponds to the sounds a, d, c, sh.The vertical axis are laid off, in percent, the value of the normalized mean square errors g e = E R XX ( ) ( ) 2 0 g .
The horizontal axis displays the ordinal numbers of the appropriate signal segment with length n = 128 samples.A real speech signal was used in the experiments.
A situation was simulated in which every fourth packet is lost (m = 4).When no recovery is applied, the error at the receiving end equals 25 %, since from (2) we have g = 1/m.It follows from the figure that for practically all signal segments g 1.2 < g 1.1 .

Second-order interpolation
The estimate of the values of the lost samples X i using the known samples with numbers i ± 1, i ± 2 will be made using the expression: The normalized mean-square error of the second-order interpolation will be determined in a way similar to (3). [ Like for the first-order interpolation, the procedure for determining, coefficients b 1 , b 2 may be non-adaptive or adaptive.For a non-adaptive procedure some known family of polynomials can be used, for example, the family of second-order Chebyshev polynomials.It can be shown that b 1 = 0.667 and b 2 = -0.167.Then (9) is converted to the following form: 1 0 44 2 0 44 3 . .

( ) . ( ) . ( )
Analysing (11), it is easy to see that the efficiency of the interpolation is determined by the type of correlation function of the signal.Specially, g 2.1 < g 1.1 for the signals for which the correlation between the adjacent samples is high, and afterwards it rapidly decreases.This is true for many speech sounds [7].However, for hushing sounds and fricatives, the value of r XX (1) may not be high.Then it is possible that g 1.2 < g 1.1 .In selecting coefficients in (8) the second-order adaptive interpolation allows us to take account of the effect of all values of the correlation function.The formulas for the optimum values of coefficients b 1 and b 2 will be obtained by equating the partial derivatives ¶g ¶ b to zero in (9).This is illustrated by Fig. 1, which shows the experimental results of signal recovery for 25 % losses for the firstand second-order non-adaptive and adaptive interpolation.For the sounds "a" and "d", the second-order interpolation provides better results than the first-order interpolation.For the sounds "c" and "sh" this is true for adaptive interpolation, while with the use of the second-order Chebyshev interpolation the recovery error increases.

Experimental results and conclusions
Experiments have been made on the recovery of losses in fused speech.Signal samples are divided into packets, 128 readings each.The packets are combined into groups of 4 packets.Within one group permutations are made so as to ensure that all 4 packets are interrelated.Each fourth packet is discarded, following which a reverse permutation and recovery are performed.Use was made of first-and second-order interpolation -non-adaptive and adaptive.The table shows the integral error estimates for all recovery procedures for different speech phrases up to 5 sec. in length.
Our investigations testify to the efficiency of approaches that invo1ve waveform recovery.The first-order adaptive interpolation yielded results, acceptable in terms of sound quality, for the loss of 25 % packets.The second-order adaptive interpolation yields better results in terms of both sound quality and root-mean-square error.
The adaptive procedure requires additional processing of the signal at the receiving end in order to calculate the correlation instants and coefficients, followed by transmission of the calculated information in the packet.The authors intend to investigate other approaches -determining the relation between the interpolation coefficients and the sign characteristics of the signal, which are easier to determine than the correlation ones, as well as calculating the characteristics at the receiving end directly from the signal with losses.