Although sometimes it may be useful to conceptually separate the receiver into a decoder and a synthesizer, many other times, a combined scheme that treats the receiver as a whole will be more feasible.
In that case, the resulting receiver scheme is what we call a Content-based Synthesizer, or Object-based Synthesizer which, at first sight, does not differ much from that of a traditional synthesizer. As illustrated in Figure 5.13 the input metadata is converted to control events and mapped to synthesizer paramatersparameters.
In a general situation, a simple mapping strategy may be sufficient. But if the level of abstraction of the input metadata is higher, the gap between the information transmitted and the parameters that are to be fed to the synthesis engine might be impossible to fill using conventional techniques. Imagine for example a situation where the transmitted metadata included a content description such as: (genre: jazz, mood: sad, user_profile: musician).
The latter example leads to the fact that we are facing a problem of search and retrieval more than one of finding an appropriate mapping strategy. We could have a database made up of sound files with an attached content description in the form of metadata. The goal of the system is then to find what object in the database fulfils the requirements of the input metadata.
A problem we still have to face with such a model is the difficulty to automatically extract parameters with such a level of abstraction from the signal itself. We can find examples of existing applications that implement the system depicted in Figure 5.14 but they always need a previous step of manually annotating the content of the whole database.
A possible solution to this inconvenience is the use of machine learning techniques. It is recently becoming usual in this sort of frameworks to implement, for example, collaborative filtering engines (classification based on the analysis of users preferences: if most of our users classify item X as being Y, we label it that way). In that case though, the classification and identification is performed without taking into account any inner property of the sounds. On the other hand, if what we intend to have is a system capable of learning from the sound features, we may favor a Case-Based Reasoning (CBR) engine as the one used in [Arcos et al., 1998].
Anyhow, a first precondition for deciding on the system's viability would be to reduce the size of the resulting database. We observe though that there is no need to store sounds that could be easily obtained from other already existing in the database. In the case that no sound exactly matched the content description at the input we could then just find the most similar one and adapt it in the desired direction. This adaptation step is basically a content-based transformation (see section 5.3.4)5.5.
One of the problems that still remains is what similarity measure the system has to deal with. Similarity in sound and music is obviously a many-dimensional measure that can be highly dependent on a particular application. Furthermore, it may turn out that our database has more than one case that is similar to the content description received. All of them may need a further adaptation (transformation) but the problem is how to decide on what transformation is more immediate and effective. In that sense, it may be interesting to identify and classify items for the database not only for what they actually are but for what they may become. A sound can thus be classified as bright-able, piano-able, fast-able . If a solution is confirmed as accepted by the user we may not only add the resulting sound and its content description to the database but also the knowledge derived from the adaptation process.
2004-10-18