import numpy as np
import matplotlib.pyplot as plt
from pydub import AudioSegment
import librosa
import librosa.display
import IPython.display as ipd
Audio files and concepts
In audio data analysis, we process and transform audio signals captured by digital devices. Depending on how they’re captured, they can come in many different formats such as wav, mp3, m4a, aiff, and flac.
Quoting Izotope.com, Waveform (wav) is one of the most popular digital audio formats. It is a lossless file format — which means it captures the closest mathematical representation of the original audio with no noticeable audio quality loss. In mp3 or m4a (Apple’s mp3 format) the data is compressed in such a way so it can be more easily distributed although in lower quality. In audio data analytics, most libraries support wav file processing.
As a form of a wave, sound/audio signal has the generic properties of:
- Frequency: occurrences of vibrations per unit of time
- Amplitude: maximum displacement or distance moved by a point on a wave measured from its equilibrium position; impacting the sound intensity
- Speed of sound: distance traveled per unit of time by a soundwave
The information to be extracted from audio files are just transformations of the main properties above.
Exploratory analysis on audio files
For this analysis, I’m going to compare two demo tracks that our band Thirteen-Seven produced.
The files will be analyzed mainly with these Python packages:
General audio parameters
Just like how we usually start evaluating tabular data by getting the statistical summary of the data (i.e using “Dataframe.describe” method), in the audio analysis we can start by getting the audio metadata summary. We can do so by utilizing the audiosegment module in pydub.
Below are some generic features that can be extracted:
- Channels: number of channels; 1 for mono, 2 for stereo audio
- Sample width: number of bytes per sample; 1 means 8-bit, 2 means 16-bit, 3 means 24-bit, 4 means 32-bit
- Frame rate(sample rate): frequency of samples used (in Hertz)
- Frame width: Number of bytes for each “frame”. One frame contains a sample for each channel.
- Length: audio file length (in milliseconds)
- Frame count: the number of frames from the sample
- Intensity: loudness in dBFS (dB relative to the maximum possible loudness)
# Load in the track and create widget to listen
= librosa.load('Audio/all_the_excuses.wav')
all_the_excuses, sr =sr) ipd.Audio(all_the_excuses, rate