The main task of the decoder is to interpret the information received through the channel in order to be able to feed the Synthesizer with the correct parameters. Encoded sound objects must be interpreted and prepared for the next module requirements. A very common of such requirements is that the Synthesizer does not expect to receive synchronous Processing Data but rather asynchronous Controls. Because of this, and as already illustrated in Figure 5.6, the Decoder may output either Processing Data or Controls depending on the particular needs of the Synthesizer. In case of the Synthesizer needing Controls the conversion from Processing Data to Controls is performed by basically reading the incoming encoded objects and sequencing them as events when the associated time tags correspond to the current synthesis time. From here on we will assume that the Decoder will by default work this way, outputting Controls that are sequenced from Processing Data Sound Objects received through the channel.
The transmitted stream may have a varying degree of abstraction that will affect the way the receiver will respond. The stream may contain from signal related processing data with associated low-level descriptors to high-level processing data representing high-level classifying descriptors. The way the receiver has to process the input stream depends on how high or low-leveled the content description received is. Two main processes are involved in bringing the description into the optimal level: abstraction and inference. We will now detail their main characteristics.
If the decoder is input low-leveled descriptions, there are two options, depending on the application requirements. The low-level descriptors can be directly fed into the Synthesis engine or there can be an intermediate abstraction process. At first sight it may seem that the first approach is more sensible and obviously more economical. But it has a inherent problem that is difficult to solve: if the description is very low-level it also has to be exhaustive and this, in many situations, is not easy to accomplish or is not worth it in terms of channel bandwidth (we may end-up having a description that is a few times the original size). An example of a situation where a non-exhaustive low-leveled description is received would be an input like ``sound object, centroid=120Hz''. It is obvious that many sound objects comply with this low-level description, the decoder would be in charge of adjusting other necessary parameters.
For all those reasons it may be usually interesting to include the intermediate abstraction process. In this process the decoder has to use 'real world' knowledge in order to convert low-level information into mid-level information, more understandable from the synthesizer point of view. If the abstraction process is omitted and the synthesizer receives low-level information but this description is not exhaustive, those parameters not specified should be taken as default. Thus, paradoxically, the synthesizer is granted some degrees of freedom and the result may loose concretion.
On the other hand, if the input to the decoder consists only of high-level semantic information, an intermediate inference process is always needed in order to make the content description understandable by the synthesis engine. This process, contrary of the abstraction process earlier mentioned, might be better understood by using an example. Imagine the decoder's input is 'violin.note'. The synthesizer will be unable to interpret that content description because of its degree of abstraction. The decoder is therefore forced to lower the level of abstraction by suppressing degrees of freedom. The output of the decoder should be something like 'violin note, pitch: C4, loudness: mf'.
Both abstraction and inference are indeed one-to-many process, that is, the same input should yield a finite set of different outputs. The way the decoding process gets rid of the degrees of freedom should rely on user or application preferences as well as on random processes or context awareness. In the previous example, the decision on the note and loudness to be played could be based on knowledge on the author, the style, the user's likes, previous or future notes, harmony and a final random process to choose one of the best alternatives.
In any case, the decoder must translate the input metadata into some sort of synthesis language that can be easily interpreted by the synthesizer. Therefore the key point of the language used for expressing synthesis parameters is that it must not only meet the requirements of the synthesizer's input but also the needs of the decoder's output. Note that this translation means in most cases a translation from Processing Data into asynchronous control events as most synthesis languages handle simple events, not complex synchronous data.
2004-10-18