The Analysis step has two major goals: extract content from the input signal and identify objects.
The easiest way to add content description to an audiovisual chunk of information is by means of textual or oral annotation. The extraction process is in that case performed by an expert that can interpret the content, extract some useful information and classify each sound object, provided there is an appropriate taxonomy available.
When thinking in terms of automatic content-extraction [Scheirer, 2000], two levels of descriptors are usually distinguished: low-level and high-level content descriptors. As a first approach, and in a broad sense, low-level descriptors are those related to the signal itself and have little or no meaning to the end-user. In other words, and thinking in terms of the audio domain, these descriptors cannot be heard. On the other hand, high-level descriptors are meaningful and might be related to semantic or syntactic features of the sound. They will be used to classify sound objects into the class they belong.
The borderline between these categories is thin and not always clear.
As previously mentioned the question of ``what is'' and ``what
is not'' meaningful is not an objective property of a descriptor
but rather a property of the whole process. Therefore some descriptors
can be viewed as either low or high-level depending on the characteristics
of the extraction process and the targeted use. When using these classification
we might better think in terms of a multilevel analysis scheme as
the one depicted in figure 5.7.