Marsyas [Tzanetakis and Cook, 1999,Tzanetakis and Cook, 2002,Tzanetakis and Cook, 2000,Tzanetakis, 2002] or MusicAl Research SYStem for Analysis and Synthesis is a framework for experimenting, evaluating and integrating techniques for audio content analysis. Although the name includes the word Synthesis, Marsyas' focus is clearly on sound analysis tools and information retrieval techniques. The framework allows to integrate these tools using a semi-automatic approach and a graphical interface. On the other hand Marsyas is released under the GPL license and is therefore Free Software.
In order to come up with a valid model for Marsyas, different algorithms and techniques were studied and common behavior and features were abstracted. OO programming techniques were used to implement abstract classes that provide a common API for the building blocks of the system and inheritance is used to factor out common operations.
The environment is able to combine traditional bottom-up processing (from signal to metadata) as well as top-down (according to the author prediction-driven, for instance, has proven to be interesting). Although the objects form a natural bottom-up hierarchy, top-down flow of information can be expressed in the framework (e.g. a silence feature can be used by an iterator for music/speech to avoid calculating features on silent frames).
The framework design is based on a client-server architecture. The server is written in C++ and contains all the signal processing and pattern recognition algorithms, optimized for performance. The client is written in Java, contains only the graphical interface and communicates with the server using sockets. Both the server and the client run on Solaris, SGI, Linux and Windows.
The main classes of the system can roughly be divided into process-like and data-structure-like.
The Process-like classes can be divided in the following categories:
Implemented features in the framework include spectral centroid, spectral moments, spectral flux, pitch, harmonicity, mel-frequency cepstral coefficients (MFCC), linear prediction (LPC) reflection coefficients, zero crossings, RMS, and spectral rolloff. For all of them means, variances and higher-order statistics can be computed using memories. New features can be easily added by just writing the code for computing the feature over a frame of samples.
Two classifiers have been implemented: the Gaussian (MAP) classifier and the K-Nearest Neighbor (KNN).
Different applications such as music/speech discriminator have been implemented in order to test the architecture.
The user interface looks like a typical tape-recorder wave editor but in addition it allows skipping by either user-defined fixed duration blocks or time lines containing regions of different duration.
At the moment of this writing Marsyas is going an overall rewrite towards a 0.2 version of the framework.
2004-10-18