Audio Representation for Machine Learning

Tim Anderton

Friday 8 June 2018 from 15:00 to 15:50

Talk in English - US at OpenWest Conference 2018
Track Name: 200C
View Slides: https://drive.google.com/open?id=1pe6UKNe329kYscI37kBbPIwsQGAlucpi
Short URL: https://joind.in/talk/6ac77 (QR-Code (opens in new window))

When training machine learning systems on audio data for tasks like speech recognition it is useful to first transform the audio into a rich intermediate representation like a spectrogram. Although with enough data effective models can be trained to use the raw audio as inputs models which begin with rich representations typically perform better. I will talk about several different audio representation schemes including spectrograms, mel filter banks, and MFCC's and wavelets. We will discuss how each of these representations works, the types of information preserved and destroyed by each, and their strengths and weaknesses from a machine learning perspective. [322]

Comments

Comments are closed.