When training machine learning systems on audio data for tasks like speech recognition it is useful to first transform the audio into a rich intermediate representation like a spectrogram. Although with enough data effective models can be trained to use the raw audio as inputs models which begin with rich representations typically perform better. I will talk about several different audio representation schemes including spectrograms, mel filter banks, and MFCC's and wavelets. We will discuss how each of these representations works, the types of information preserved and destroyed by each, and their strengths and weaknesses from a machine learning perspective. [322]

Comments

Comments are closed.