The Sinusoidal plus Residual model can cover a wide "compromise space" and can in fact be seen as the generalization of both the STFT and the Sinusoidal models. Using this approach, we can decide what part of the spectral information is modeled as sinusoids and what is left as STFT. With a good analysis, the Sinusoidal plus Residual representation is very flexible while maintaining a good sound fidelity, and the representation is quite efficient. In this approach, the Sinusoidal representation is used to model only the stable partials of a sound. The residual, or its approximation, models what is left, which should ideally be a stochastic component. This model is less general than either the STFT or the Sinusoidal representations but it results in an enormous gain in flexibility [Serra, 1989,Serra, 1996,Serra, 1990].
The sinusoidal plus residual model assumes that the sinusoids are stable partials of the sound with a slowly changing amplitude and frequency. With this restriction, we are able to add major constraints to the detection of sinusoids in the spectrum and omit the detection of the phase of each peak.
Within this model we can either leave the residual signal, , to be the difference between the original sound and the sinusoidal component, resulting into an identity system, or we can assume that is a stochastic signal. In this case, the residual can be described as filtered white noise. That is, the residual is modeled by the time-domain convolution of white noise with a time-varying frequency-shaping filter.
The implementation of the analysis for the Sinusoidal plus Residual Model is more complex than the one for the Sinusoidal Model. Figure B.1 shows a simplified block- diagram of this analysis.
The first few steps are the same than in a sinusoidal-only analysis. The major differences start on the peak continuation process since in order to have a good partial-residual decomposition we have to refine the peak-continuation process in such a way as to be able to identify the stable partials of the sound. Several strategies can be used to accomplish this. The simplest case is when the sound is monophonic and pseudo-harmonic. By using the fundamental frequency information in the peak continuation algorithm, we can identify the harmonic partials.
The residual component is obtained by first generating the sinusoidal component with additive synthesis, and then subtracting it from the original waveform. This is possible because the instantaneous phases of the original sound are matched and therefore the shape of the time domain waveform preserved. A spectral analysis of this time domain residual is done by first windowing it, window which is independent of the one used to find sinusoids, and thus we are free to choose a different time-frequency compromise. An amplitude correction step can improve the time smearing produced in the sinusoidal subtraction. Then the FFT is computed and the resulting spectrum can be modeled using several existing techniques. The spectral phases might be discarded if the residual can be approximated as a stochastic signal.
Once the different components in the SMS have been obtained, different interesting transformations can be applied in the spectral domain [Amatriain et al., 2002b]. After processing the spectral components, these must be synthesized back to produce the output sound. The diagram in figure B.2 illustrates the SMS synthesis algorithm.
The original sinusoidal plus residual model has led to other different spectral models that still share some of its basis. [Ding and Qian, 1997; Fitz, Haken and Christensen, 2000; Verma, 2000]
2004-10-18