In this annex a compilation of the most relevant publications in which
the author has participated is given in order to have a better overview
of the author's research work. For each publication, we will include
the abstract (or the introduction where not available) and the chapter(s)
in this Thesis to which it is most relevant. The publications are
sorted by date, in decreasing order.
author : Amatriain, X. and Bonada, J. and Loscos, A. and Arcos, J. and Verfaille, V.
title : Content-based Transformations
year : 2003
journal : Journal of New Music Research
volume : 32
number : 1
related to chapter : 5
abstract :
Content processing is a vast and growing field that integrates different
approaches borrowed from the signal processing, information retrieval
and machine learning disciplines. In this article we deal with a particular
type of content processing: the so-called content-based transformations.
We will not focus on any particular application but rather try to
give an overview of different techniques and conceptual implications.
We first describe the transformation process itself, including the
main model schemes that are commonly used, which lead to the establishment
of the formal basis for a definition of content-based transformations.
Then we take a quick look at a general spectral based analysis/synthesis
approach to process audio signals and how to extract features that
can be used in the content-based transformation context. Using this
analysis/synthesis approach we give some examples on how content-based
transformations can be applied to modify the basic perceptual axis
of a sound and how we can even combine different basic effects in
order to perform more meaningful transformations. We finish by going
a step further in the abstraction ladder and present transformations
that are related to musical (and thus symbolic) properties rather
than to those of the sound or the signal itself.
author : Gómez, E. and Gouyon, F. and Herrera, P. and Amatriain, X.
title : Using and enhancing the current MPEG-7 standard for a music content processing tool
year : 2003
book title : Proceedings of Audio Engineering Society, 114th Convention
related to chapter : 5
abstract :
The aim of this document is to discuss possible ways of describing
some music constructs in a dual context. First, that of the current
standard for multimedia content description: MPEG-7. Second, that
of a specific software application, the Sound Palette (a tool for
content-based management, content edition and transformation of simple
audio phrases). We discuss some MPEG-7 limitations regarding different
musical layers: melodic (present but underdeveloped), rhythmic (practically
absent) and instrumental (present though using an exclusive procedure).
Some proposals for overcoming them are presented in the context of
our application.
author : Gómez, E. and Gouyon, F. and Herrera, P. and Amatriain, X.
title : MPEG-7 for Content-based Music Processing
year : 2003
book title : Proceedings of 4th WIAMIS-Special session on Audio Segmentation and Digital Music
related to chapter : 5
abstract :
The aim of this document is to present how the MPEG-7 standard has
been used in a tool for content-based management, edition and transformation
of audio signals: the Sound Palette. We discuss some MPEG-7 limitations
regarding different musical layers, and some proposals for overcoming
them are presented.
author : Gómez, E. and Grachten, M. and Amatriain, X. and Arcos, J.
title : Melodic characterization of monophonic recordings for expressive tempo transformations
year : 2003
book title : Proceedings of Stockholm Music Acoustics Conference 2003
related to chapter : 5
abstract:
The work described in this paper aims at characterizing tempo changes in terms of expressivity, in order to develop a transformation system to perform expressive tempo transformations in monophonic instrument phrases.
For this purpose, we have developed an analysis tool that extracts a set of acoustic features from monophonic recordings. This set of features is structured and stored following a description scheme that is derived from the current MPEG-7 standard. These performance descriptions are then compared with their corresponding scores, using edit distance techniques, for automatically annotating the expressive transformations performed by the musician. Then, these annotated performance descriptions are incorporated in a case-based reasoning (CBR) system in order to build an expressive tempo transformations case base. The transformation system will use this CBR system to perform tempo transformations in an expressive manner.
Saxophone performances of jazz standards played by a professional performer have been recorded for this characterization.
In this paper, we first describe which are the acoustic features that
have been used for this characterization and how they are structured
and stored. Then, we explain the analysis methods that have been implemented
to extract this set of features from audio signals and how they are
processed by the CBR system. Results are finally presented and discussed.
author : Gómez, E. and Peterschmitt, G. and Amatriain, X. and Herrera, P.
title : Content-based melodic transformations of audio for a music processing application
year : 2003
book title : Proceedings of 6th International Conference on Digital Audio Effects
related to chapter : 5
abstract :
The goal of this paper is to present a system that performs melodic
transformations to monophonic audio phrases. First, it extracts a
melodic description from the audio. This description is presented
to the user and can be stored and loaded in a structured format. The
system proposes a set of high-level melodic transformations for the
audio signal. These transformations are mapped into a set of low-level
transformations of the melodic description that are then applied to
the audio signal. The algorithms for description extraction and audio
transformation are presented.
author : Geiger, G. and Mora, A. and Rubio, X. and Amatriain, X.
title : AGNULA: A GNU Linux Audio Distribution
year 2003
book title : Proceedings of II Jornades de Software Lliure
related to chapter : 3
abstract (in original Catalan language):
En aquest document es presenta el projecte AGNULA , enmarcat dins
la tasca del foment de programari lliure a nivell europeu. S'expliquen
els seus objectius, promotors i les diferents distribucions que en
formen part. Finalment, es fa un resum de les principals aplicacions
incloses.
author : Arumi, P. and Garcia, D. and Amatriain, X.
title : CLAM, Una llibreria lliure per Audio i Música
year : 2003
book title : Proceedings of II Jornades de Software Lliure
related to chapter : 3
abstract (in original Catalan language):
CLAM és un framework lliure i orientat a objectes en C++ que ofereix
als desenvolupadors solucions de disseny i un conjunt de components
reusables per construir aplicacions musicals i d'audio i per la recerca
en l'àmbit del processat del senyal. Algunes d'aquestes eines, també
lliures, ja s'han desenvolupat per part de l'MTG. La metodologia de
desenvolupament de CLAM assegura la seva qualitat. Degut, sobretot,
a la incorporació de CLAM a diverses distribucions de GNU/Linux està
facilitant l'aparició d'eines multimèdia de tecnologia avançada en
entorns lliures.
author : Amatriain, X. and Bonada, J. and Loscos, A. and Serra, X.
title : Spectral Processing
year : 2002
book title : DAFX: Digital Audio Effects
editor : Udo Zölzer
publisher : John Wiley and Sons, Ltd.
related to chapter : 3
introduction :
In the context of this book, we are looking for representations of sound signals and signal processing systems that can give us ways to design sound transformations in a variety of music applications and contexts. It should have been clear throughout the book, that several points of view have to be considered, including a mathematical, thus objective perspective, and a cognitive, thus mainly subjective, standpoint. Both points of view are necessary to fully understand the concept of sound effects and to be able to use the described techniques in practical situations.
The mathematical and signal processing points of view are straightforward to present, which does not mean easy, since the language of the equations and of flow diagrams is suitable for them. However, the top-down implications are much harder to express due to the huge number of variables involved and to the inherent perceptual subjectivity of the music making process. This is clearly one of the main challenges of the book and the main reason for its existence.
The use of a spectral representation of a sound yields a perspective that is sometimes closer to the one used in a sound engineering approach. By understanding the basic concepts of frequency domain analysis, we are able to acquire the tools to use a large number of effects processors and to understand many types of sound transformations systems. Moreover, being the frequency domain analysis a somewhat similar process than the one performed by the human hearing system, it yields fairly intuitive intermediate representations.
The basic idea of spectral processing is that we can analyze a sound to obtain alternative frequency domain representations, which can then be transformed and inverted to produce new sounds. Most of the approaches start by developing an analysis/synthesis system from which the input sound is reconstructed without any perceptual loss of sound quality. The techniques described in the previous chapter are clear examples of this approach. Then the main issue is what is the intermediate representation and what parameters are available for applying the desired transformations.
Perceptual or musical concepts such as timbre or pitch are clearly related to the spectral characteristics of a sound. Even some common processes for sound effects are better explained using a frequency domain representation. We usually think on the frequency axis when we talk about equalizing, filtering, pitch shifting, harmonizing... In fact, some of them are specific to this signal processing approach and do not have an immediate counterpart on the time domain. On the other hand, most (but not all) of the sound effects presented in this book can be implemented in the frequency domain.
Another issue is whether or not this approach is the most efficient, or practical, for a given application. The process of transforming a time domain signal into a frequency domain representation is, by itself, not an immediate step. Some parameters are difficult to adjust and force us to take several compromises. Some settings, such as the size of the analysis window, have little or nothing to do with the high-level approach we intend to favor, and require the user to have a basic signal processing understanding.
In that sense, when we talk about higher level spectral processing we are thinking of an intermediate analysis step in which relevant features are extracted or computed from the spectrum. These relevant features should be much closer to a musical or high-level approach. We can then process the features themselves or even apply transformations that keep some of the features unchanged. For example, we can extract the fundamental frequency and the spectral shape from a sound and then modify the fundamental frequency without affecting the shape of the spectrum.
Assuming the fact that there is no single representation and processing system optimal for everything, our approach will be to present a set of complementary spectral models that can be combined to be used for the largest possible set of sounds and musical applications.
In the next section we introduce two spectral models: Sinusoidal and Sinusoidal plus Residual. These models already represent a step up on the abstraction ladder and from either of them, we can identify and extract higher-level information of a sound, such as: harmonics, pitch, spectral shape, vibrato, or note boundaries, that is Higher Level Features. This analysis step brings the representation closer to our perceptual understanding of a sound. The complexity of the analysis will depend on the type of feature that we want to identify and the sound to analyze. The benefits of going to this higher level of analysis are enormous and open up a wide range of new musical applications.
Having set the basis of the Sinusoidal plus Residual model, we will then give some details of the techniques used both in its analysis and synthesis process, providing Matlab code to implement an analysis-synthesis framework. This Matlab implementation is based on the Spectral Modeling Synthesis framework. SMS [http://www.iua.upf.es/~sms] is a set of spectral based techniques and related implementations for the analysis/transformation/synthesis of an audio signal based on the scheme presented in .
We will provide a set of basic audio effects and transformations based on the implemented Sinusoidal plus Residual analysis/synthesis. Matlab code is provided for all of them.
We will finish with an explanation of content dependant processing
implementations. We introduce a real-time singing voice conversion
application that has been developed for use in Karaoke applications,
and we define the basis of a nearly loss less Time Scaling algorithm.
The complexity and extension of these implementations prevent us from
providing the associated Matlab code, so we leave that task as a challenge
for advanced readers.english
author : Amatriain, X. and Herrera, P.
title : Transmitting Audio Content as Sound Objects
year : 2002
book title : Proceedings of AES22 International Conference on Virtual, Synthetic and Entertainment Audio
related to chapter : 5
abstract :
As audio and music applications tend to a higher level of abstraction
and to fill in the gap between the signal processing world and the
end-user we are more and more interested on processing content and
not (only) signal. This change in point of view leads to the redefinition
of several "classical" concepts, and a new conceptual
framework needs to be set to give support to these new trends. In
[Amatriain and Herrera 2001], a model for the transmission of
audio content was introduced. The model is now extended to include
the idea of Sound Objects. With these thoughts in mind, examples of
design decisions that have led to the implementation of the CLAM framework
are also given.
author : Amatriain, X. and de Boer, M. and Robledo, E. and Garcia, D.
title : CLAM: An OO Framework for Developing Audio and Music Applications
year : 2002
book title : Proceedings of 17th Annual ACM Conference on Object-Oriented Programming, Systems, Languages and Applications
related to chapter : 3
abstract :
CLAM (C++ Library for Audio and Music) is a framework for audio and
music programming. It may be used for developing any type of audio
or music application as well as for doing more complex research related
with the field. In this paper we introduce the practicalities of CLAM´s
first release as well as some of the sample application that have
been developed within the framework. See [1] for a more conceptual
approach to the description of the CLAM framework.
author : Amatriain, X. and Arumi, P. and Ramírez, M.
title : CLAM, Yet Another Library for Audio and Music Processing?
year : 2002
book title : Proceedings of 17th Annual ACM Conference on Object-Oriented Programming, Systems, Languages and Applications
related to chapter : 3
abstract :
CLAM (C++ Library for Audio and Music) is a framework that aims to
offer extensible, generic and efficient design and implementation
solutions for developing Audio and Music applications as well as for
doing more complex research related with the field. Although similar
libraries exist, some particularities make CLAM of high interest for
anyone interested in the field.
author : Garcia, D. and Amatriain, X.
title : XML as a means of control for audio processing, synthesis and analysis
year : 2001
book title : Proceedings of MOSART Workshop on Current Research Directions in Computer Music
abstract :
This paper discusses about benefits derived from providing XML support
to the component based framework for audio systems that we are developing.
XML is used as data format for persistence, visualization and inter-application
interface. Direct XML support is a very useful feature for an audio
framework because of the popularity of the XML format as data interchange
format, and the introduction of MPEG7 standard, an XML based description
format for multimedia content. Formatting task has been distributed
along the system objects in a compositional way, making easy to format
a single object from its parts. The system minimizes the overhead
added to a class and the programmer effort to support XML I/O. A default
XML implementation has been provided for most of the future data structures,
giving the chance to customize it. The system has been designed to
be reused with other formats with a minimal impact on the system.
author : Amatriain, X. and Bonada, J. and Loscos, A. and Serra, X.
title : Spectral Modeling for Higher-level Sound Transformation
year : 2001
book title : Proceedings of MOSART Workshop on Current Research Directions in Computer Music
abstract :
When designing audio effects for music processing, we are always aiming
at providing higher-level representations that may somehow fill in
the gap between the signal processing world and the end-user. Spectral
models in general, and the Sinusoidal plus Residual model in particular,
can sometimes offer ways to implement such schemes.
author : Amatriain, X. and Herrera, P.
title : Audio Content Transmission
year : 2001
book title : Proceedings of COST G6 Conference on Digital Audio Effects 2001
related to chapter : 5
abstract :
Content description has become a topic of interest for many researchers
in the audiovisual field. While manual annotation has been used for
many years in different applications, the focus now is on finding
automatic content-extraction and content-navigation tools. An increasing
number of projects, in some of which we are actively involved, focus
on the extraction of meaningful features from an audio signal. Meanwhile,
standards like MPEG7 are trying to find a convenient way of describing
audiovisual content. Nevertheless, content description is usually
thought of as an additional information stream attached to the actual
content and the only envisioned scenario is that of a search and retrieval
framework. However, in this article it will be argued that if there
is a suitable content description, the actual content itself may no
longer be needed and we can concentrate on transmitting only its description.
Thus, the receiver should be able to interpret the information that,
in the form of metadata, is available at its inputs, and synthesize
new content relying only on this description. It is possibly in the
music field where this last step has been further developed, and that
fact allows us to think of such a transmission scheme being available
on the near future.
author : Herrera, P. and Amatriain, X. and Batlle, E. and Serra, X.
title : Towards Instrument Segmentation for Music Content Description: a Critical Review of Instrument Classification Techniques
year : 2000
book title : Proceedings of International Symposium on Music Information Retrieval
related to chapter : 5
abstract :
A system capable of describing the musical content of any kind of
sound file or sound stream, as it is supposed to be done in MPEG7-compliant
applications, should provide an account of the different moments where
a certain instrument can be listened to. In this paper we concentrate
on reviewing the different techniques that have been so far proposed
for automatic classification of musical instruments. As most of the
techniques to be discussed are usable only in "solo"
performances we will evaluate their applicability to the more complex
case of describing sound mixes. We conclude this survey discussing
the necessity of developing new strategies for classifying sound mixes
without a priori separation of sound sources.
author : Amatriain, X. and Bonada, J. and Serra, X.
title : METRIX: A Musical Data Definition Language and Data Structure for a Spectral Modeling Based Synthesizer
year : 1998
book title : Proceedings of COST G6 Conference on Digital Audio Effects 1998
related to chapter : 6
abstract :
Since the MIDI 1.0 specification, well over 15 years ago, many have been the attempts to give a solution to all the limitations that soon became clear. None of these have had a happy ending, mainly due to commercial interests and as a result, when trying to find an appropriate synthesis control user interface, we had not many choices but the use of MIDI. That\x{2019}s the reason why the idea of defining a new user interface aroused. In this article, the main components of this interface will be discussed, paying special attention to the advantages and new features it reports to the end-user.
2004-10-18