Publications by the Author

In this annex a compilation of the most relevant publications in which the author has participated is given in order to have a better overview of the author's research work. For each publication, we will include the abstract (or the introduction where not available) and the chapter(s) in this Thesis to which it is most relevant. The publications are sorted by date, in decreasing order.

author : Amatriain, X. and Bonada, J. and Loscos, A. and Arcos, J. and Verfaille, V.

title : Content-based Transformations

year : 2003

journal : Journal of New Music Research

volume : 32

number : 1

related to chapter : 5

abstract :

Content processing is a vast and growing field that integrates different approaches borrowed from the signal processing, information retrieval and machine learning disciplines. In this article we deal with a particular type of content processing: the so-called content-based transformations. We will not focus on any particular application but rather try to give an overview of different techniques and conceptual implications. We first describe the transformation process itself, including the main model schemes that are commonly used, which lead to the establishment of the formal basis for a definition of content-based transformations. Then we take a quick look at a general spectral based analysis/synthesis approach to process audio signals and how to extract features that can be used in the content-based transformation context. Using this analysis/synthesis approach we give some examples on how content-based transformations can be applied to modify the basic perceptual axis of a sound and how we can even combine different basic effects in order to perform more meaningful transformations. We finish by going a step further in the abstraction ladder and present transformations that are related to musical (and thus symbolic) properties rather than to those of the sound or the signal itself.

author : Gómez, E. and Gouyon, F. and Herrera, P. and Amatriain, X.

title : Using and enhancing the current MPEG-7 standard for a music content processing tool

year : 2003

book title : Proceedings of Audio Engineering Society, 114th Convention

related to chapter : 5

abstract :

The aim of this document is to discuss possible ways of describing some music constructs in a dual context. First, that of the current standard for multimedia content description: MPEG-7. Second, that of a specific software application, the Sound Palette (a tool for content-based management, content edition and transformation of simple audio phrases). We discuss some MPEG-7 limitations regarding different musical layers: melodic (present but underdeveloped), rhythmic (practically absent) and instrumental (present though using an exclusive procedure). Some proposals for overcoming them are presented in the context of our application.

author : Gómez, E. and Gouyon, F. and Herrera, P. and Amatriain, X.

title : MPEG-7 for Content-based Music Processing

year : 2003

book title : Proceedings of 4th WIAMIS-Special session on Audio Segmentation and Digital Music

related to chapter : 5

abstract :

The aim of this document is to present how the MPEG-7 standard has been used in a tool for content-based management, edition and transformation of audio signals: the Sound Palette. We discuss some MPEG-7 limitations regarding different musical layers, and some proposals for overcoming them are presented.

author : Gómez, E. and Grachten, M. and Amatriain, X. and Arcos, J.

title : Melodic characterization of monophonic recordings for expressive tempo transformations

year : 2003

book title : Proceedings of Stockholm Music Acoustics Conference 2003

related to chapter : 5

abstract:

The work described in this paper aims at characterizing tempo changes in terms of expressivity, in order to develop a transformation system to perform expressive tempo transformations in monophonic instrument phrases.

For this purpose, we have developed an analysis tool that extracts a set of acoustic features from monophonic recordings. This set of features is structured and stored following a description scheme that is derived from the current MPEG-7 standard. These performance descriptions are then compared with their corresponding scores, using edit distance techniques, for automatically annotating the expressive transformations performed by the musician. Then, these annotated performance descriptions are incorporated in a case-based reasoning (CBR) system in order to build an expressive tempo transformations case base. The transformation system will use this CBR system to perform tempo transformations in an expressive manner.

Saxophone performances of jazz standards played by a professional performer have been recorded for this characterization.

In this paper, we first describe which are the acoustic features that have been used for this characterization and how they are structured and stored. Then, we explain the analysis methods that have been implemented to extract this set of features from audio signals and how they are processed by the CBR system. Results are finally presented and discussed.

author : Gómez, E. and Peterschmitt, G. and Amatriain, X. and Herrera, P.

title : Content-based melodic transformations of audio for a music processing application

year : 2003

book title : Proceedings of 6th International Conference on Digital Audio Effects

related to chapter : 5

abstract :

The goal of this paper is to present a system that performs melodic transformations to monophonic audio phrases. First, it extracts a melodic description from the audio. This description is presented to the user and can be stored and loaded in a structured format. The system proposes a set of high-level melodic transformations for the audio signal. These transformations are mapped into a set of low-level transformations of the melodic description that are then applied to the audio signal. The algorithms for description extraction and audio transformation are presented.

author : Geiger, G. and Mora, A. and Rubio, X. and Amatriain, X.

title : AGNULA: A GNU Linux Audio Distribution

year 2003

book title : Proceedings of II Jornades de Software Lliure

related to chapter : 3

abstract (in original Catalan language):

En aquest document es presenta el projecte AGNULA , enmarcat dins la tasca del foment de programari lliure a nivell europeu. S'expliquen els seus objectius, promotors i les diferents distribucions que en formen part. Finalment, es fa un resum de les principals aplicacions incloses.

author : Arumi, P. and Garcia, D. and Amatriain, X.

title : CLAM, Una llibreria lliure per Audio i Música

year : 2003

book title : Proceedings of II Jornades de Software Lliure

related to chapter : 3

abstract (in original Catalan language):

CLAM és un framework lliure i orientat a objectes en C++ que ofereix als desenvolupadors solucions de disseny i un conjunt de components reusables per construir aplicacions musicals i d'audio i per la recerca en l'àmbit del processat del senyal. Algunes d'aquestes eines, també lliures, ja s'han desenvolupat per part de l'MTG. La metodologia de desenvolupament de CLAM assegura la seva qualitat. Degut, sobretot, a la incorporació de CLAM a diverses distribucions de GNU/Linux està facilitant l'aparició d'eines multimèdia de tecnologia avançada en entorns lliures.

author : Amatriain, X. and Bonada, J. and Loscos, A. and Serra, X.

title : Spectral Processing

year : 2002

book title : DAFX: Digital Audio Effects

editor : Udo Zölzer

publisher : John Wiley and Sons, Ltd.

related to chapter : 3

introduction :

In the context of this book, we are looking for representations of sound signals and signal processing systems that can give us ways to design sound transformations in a variety of music applications and contexts. It should have been clear throughout the book, that several points of view have to be considered, including a mathematical, thus objective perspective, and a cognitive, thus mainly subjective, standpoint. Both points of view are necessary to fully understand the concept of sound effects and to be able to use the described techniques in practical situations.

The mathematical and signal processing points of view are straightforward to present, which does not mean easy, since the language of the equations and of flow diagrams is suitable for them. However, the top-down implications are much harder to express due to the huge number of variables involved and to the inherent perceptual subjectivity of the music making process. This is clearly one of the main challenges of the book and the main reason for its existence.

The use of a spectral representation of a sound yields a perspective that is sometimes closer to the one used in a sound engineering approach. By understanding the basic concepts of frequency domain analysis, we are able to acquire the tools to use a large number of effects processors and to understand many types of sound transformations systems. Moreover, being the frequency domain analysis a somewhat similar process than the one performed by the human hearing system, it yields fairly intuitive intermediate representations.

The basic idea of spectral processing is that we can analyze a sound to obtain alternative frequency domain representations, which can then be transformed and inverted to produce new sounds. Most of the approaches start by developing an analysis/synthesis system from which the input sound is reconstructed without any perceptual loss of sound quality. The techniques described in the previous chapter are clear examples of this approach. Then the main issue is what is the intermediate representation and what parameters are available for applying the desired transformations.

Perceptual or musical concepts such as timbre or pitch are clearly related to the spectral characteristics of a sound. Even some common processes for sound effects are better explained using a frequency domain representation. We usually think on the frequency axis when we talk about equalizing, filtering, pitch shifting, harmonizing... In fact, some of them are specific to this signal processing approach and do not have an immediate counterpart on the time domain. On the other hand, most (but not all) of the sound effects presented in this book can be implemented in the frequency domain.

Another issue is whether or not this approach is the most efficient, or practical, for a given application. The process of transforming a time domain signal into a frequency domain representation is, by itself, not an immediate step. Some parameters are difficult to adjust and force us to take several compromises. Some settings, such as the size of the analysis window, have little or nothing to do with the high-level approach we intend to favor, and require the user to have a basic signal processing understanding.

In that sense, when we talk about higher level spectral processing we are thinking of an intermediate analysis step in which relevant features are extracted or computed from the spectrum. These relevant features should be much closer to a musical or high-level approach. We can then process the features themselves or even apply transformations that keep some of the features unchanged. For example, we can extract the fundamental frequency and the spectral shape from a sound and then modify the fundamental frequency without affecting the shape of the spectrum.

Assuming the fact that there is no single representation and processing system optimal for everything, our approach will be to present a set of complementary spectral models that can be combined to be used for the largest possible set of sounds and musical applications.

In the next section we introduce two spectral models: Sinusoidal and Sinusoidal plus Residual. These models already represent a step up on the abstraction ladder and from either of them, we can identify and extract higher-level information of a sound, such as: harmonics, pitch, spectral shape, vibrato, or note boundaries, that is Higher Level Features. This analysis step brings the representation closer to our perceptual understanding of a sound. The complexity of the analysis will depend on the type of feature that we want to identify and the sound to analyze. The benefits of going to this higher level of analysis are enormous and open up a wide range of new musical applications.

Having set the basis of the Sinusoidal plus Residual model, we will then give some details of the techniques used both in its analysis and synthesis process, providing Matlab code to implement an analysis-synthesis framework. This Matlab implementation is based on the Spectral Modeling Synthesis framework. SMS [http://www.iua.upf.es/~sms] is a set of spectral based techniques and related implementations for the analysis/transformation/synthesis of an audio signal based on the scheme presented in .

We will provide a set of basic audio effects and transformations based on the implemented Sinusoidal plus Residual analysis/synthesis. Matlab code is provided for all of them.

We will finish with an explanation of content dependant processing implementations. We introduce a real-time singing voice conversion application that has been developed for use in Karaoke applications, and we define the basis of a nearly loss less Time Scaling algorithm. The complexity and extension of these implementations prevent us from providing the associated Matlab code, so we leave that task as a challenge for advanced readers.english

author : Amatriain, X. and Herrera, P.

title : Transmitting Audio Content as Sound Objects

year : 2002

book title : Proceedings of AES22 International Conference on Virtual, Synthetic and Entertainment Audio

related to chapter : 5

abstract :

As audio and music applications tend to a higher level of abstraction and to fill in the gap between the signal processing world and the end-user we are more and more interested on processing content and not (only) signal. This change in point of view leads to the redefinition of several "classical" concepts, and a new conceptual framework needs to be set to give support to these new trends. In [Amatriain and Herrera 2001], a model for the transmission of audio content was introduced. The model is now extended to include the idea of Sound Objects. With these thoughts in mind, examples of design decisions that have led to the implementation of the CLAM framework are also given.

author : Amatriain, X. and de Boer, M. and Robledo, E. and Garcia, D.

title : CLAM: An OO Framework for Developing Audio and Music Applications

year : 2002

book title : Proceedings of 17th Annual ACM Conference on Object-Oriented Programming, Systems, Languages and Applications

related to chapter : 3

abstract :

CLAM (C++ Library for Audio and Music) is a framework for audio and music programming. It may be used for developing any type of audio or music application as well as for doing more complex research related with the field. In this paper we introduce the practicalities of CLAM&acutes first release as well as some of the sample application that have been developed within the framework. See [1] for a more conceptual approach to the description of the CLAM framework.

author : Amatriain, X. and Arumi, P. and Ramírez, M.

title : CLAM, Yet Another Library for Audio and Music Processing?

year : 2002

book title : Proceedings of 17th Annual ACM Conference on Object-Oriented Programming, Systems, Languages and Applications

related to chapter : 3

abstract :

CLAM (C++ Library for Audio and Music) is a framework that aims to offer extensible, generic and efficient design and implementation solutions for developing Audio and Music applications as well as for doing more complex research related with the field. Although similar libraries exist, some particularities make CLAM of high interest for anyone interested in the field.

author : Garcia, D. and Amatriain, X.

title : XML as a means of control for audio processing, synthesis and analysis

year : 2001

book title : Proceedings of MOSART Workshop on Current Research Directions in Computer Music

related to chapter : 3 and 5

abstract :

This paper discusses about benefits derived from providing XML support to the component based framework for audio systems that we are developing. XML is used as data format for persistence, visualization and inter-application interface. Direct XML support is a very useful feature for an audio framework because of the popularity of the XML format as data interchange format, and the introduction of MPEG7 standard, an XML based description format for multimedia content. Formatting task has been distributed along the system objects in a compositional way, making easy to format a single object from its parts. The system minimizes the overhead added to a class and the programmer effort to support XML I/O. A default XML implementation has been provided for most of the future data structures, giving the chance to customize it. The system has been designed to be reused with other formats with a minimal impact on the system.

author : Amatriain, X. and Bonada, J. and Loscos, A. and Serra, X.

title : Spectral Modeling for Higher-level Sound Transformation

year : 2001

book title : Proceedings of MOSART Workshop on Current Research Directions in Computer Music

related to chapter : 3 and 5

abstract :

When designing audio effects for music processing, we are always aiming at providing higher-level representations that may somehow fill in the gap between the signal processing world and the end-user. Spectral models in general, and the Sinusoidal plus Residual model in particular, can sometimes offer ways to implement such schemes.

author : Amatriain, X. and Herrera, P.

title : Audio Content Transmission

year : 2001

book title : Proceedings of COST G6 Conference on Digital Audio Effects 2001

related to chapter : 5

abstract :

Content description has become a topic of interest for many researchers in the audiovisual field. While manual annotation has been used for many years in different applications, the focus now is on finding automatic content-extraction and content-navigation tools. An increasing number of projects, in some of which we are actively involved, focus on the extraction of meaningful features from an audio signal. Meanwhile, standards like MPEG7 are trying to find a convenient way of describing audiovisual content. Nevertheless, content description is usually thought of as an additional information stream attached to the actual content and the only envisioned scenario is that of a search and retrieval framework. However, in this article it will be argued that if there is a suitable content description, the actual content itself may no longer be needed and we can concentrate on transmitting only its description. Thus, the receiver should be able to interpret the information that, in the form of metadata, is available at its inputs, and synthesize new content relying only on this description. It is possibly in the music field where this last step has been further developed, and that fact allows us to think of such a transmission scheme being available on the near future.

author : Herrera, P. and Amatriain, X. and Batlle, E. and Serra, X.

title : Towards Instrument Segmentation for Music Content Description: a Critical Review of Instrument Classification Techniques

year : 2000

book title : Proceedings of International Symposium on Music Information Retrieval

related to chapter : 5

abstract :

A system capable of describing the musical content of any kind of sound file or sound stream, as it is supposed to be done in MPEG7-compliant applications, should provide an account of the different moments where a certain instrument can be listened to. In this paper we concentrate on reviewing the different techniques that have been so far proposed for automatic classification of musical instruments. As most of the techniques to be discussed are usable only in "solo" performances we will evaluate their applicability to the more complex case of describing sound mixes. We conclude this survey discussing the necessity of developing new strategies for classifying sound mixes without a priori separation of sound sources.

author : Amatriain, X. and Bonada, J. and Serra, X.

title : METRIX: A Musical Data Definition Language and Data Structure for a Spectral Modeling Based Synthesizer

year : 1998

book title : Proceedings of COST G6 Conference on Digital Audio Effects 1998

related to chapter : 6

abstract :

Since the MIDI 1.0 specification, well over 15 years ago, many have been the attempts to give a solution to all the limitations that soon became clear. None of these have had a happy ending, mainly due to commercial interests and as a result, when trying to find an appropriate synthesis control user interface, we had not many choices but the use of MIDI. That\x{2019}s the reason why the idea of defining a new user interface aroused. In this article, the main components of this interface will be discussed, paying special attention to the advantages and new features it reports to the end-user.

2004-10-18