spafe.utils.spectral

spafe.utils.spectral.audspec(p_spectrum, fs=16000, nfilts=0, fb_type='bark', low_freq=0, high_freq=0, sumpower=1, bwidth=1)[source]

perform critical band analysis (see PLP) based on the power spectrogram.

Parameters:
  • aspectrum (array) – the power spectrum array.
  • nfft (int) – the FFT size. (Default is 512)
  • fs (int) – sample rate/ sampling frequency of the signal. (Default 16000 Hz)
  • nfilts (int) – the number of filters in the filterbank. (Default 20)
  • fb_type (str) – type of bins [Mel, Bark, …].
  • bwidth (int) – the constant width of each band relative to standard Mel (default 1). Default is 1.
  • low_freq (int) – lowest band edge of mel filters. (Default 0 Hz)
  • high_freq (int) – highest band edge of mel filters. (Default samplerate/2)
  • sumpower (bool) – sum power if True. Default is True.
Returns:

auditory spectrum array.

spafe.utils.spectral.compute_stft(x, win, hop)[source]

Compute the short time Fourrier transform of an audio signal x.

Parameters:
  • x (array) – audio signal in the time domain
  • win (int) – window to be used for the STFT
  • hop (int) – hop-size
Returns:

2d array of the STFT coefficients of x

Return type:

X

spafe.utils.spectral.cqt(sig, fs=16000, low_freq=10, high_freq=3000, b=48)[source]

Compute the constant Q-transform.

  • take the absolute value of the FFT
  • warp to a Mel frequency scale
  • take the DCT of the log-Mel-spectrum
  • return the first <num_ceps> components
Parameters:
  • sig (array) – a mono audio signal (Nx1) from which to compute features.
  • fs (int) – the sampling frequency of the signal we are working with. Default is 16000.
  • low_freq (int) – lowest band edge of mel filters (Hz). Default is 10.
  • high_freq (int) – highest band edge of mel filters (Hz). Default is 3000.
  • b (int) – number of bins per octave. Default is 48.
Returns:

array including the Q-transform coefficients.

spafe.utils.spectral.dct(x, type=2, axis=1, norm='ortho')[source]
spafe.utils.spectral.display_stft(X, fs, len_sig, low_freq=0, high_freq=3000, min_db=-10, max_db=0, normalize=True)[source]

Plot the stft of an audio signal in the time-frequency plane.

Parameters:
  • X (array) – STFT coefficients
  • fs (int) – sampling frequency in Hz (assumed to be integer)
  • hop (int) – hop-size used in the STFT (for labeling the time axis)
  • low_freq (int) – minimun frequency to plot in hz. Default is 0 Hz.
  • high_freq (int) – maximum frequency tp plot in Hz. Default is 3000 Hz.
  • min_db (int) – minimun magnitude to display in dB Default is 0 dB.
  • max_db (int) – maximum magnitude to display in dB. Default is -10 dB.
  • normalize (bool) – Normalize input. Default is True.
spafe.utils.spectral.invaudspec(aspectrum, fs=16000, nfft=512, fb_type='bark', low_freq=0, high_freq=None, sumpower=True, bwidth=1)[source]

Compute the power spectrum from the auditory spectrum. Invert (~might not be that accurate) the effects of audspec()

Parameters:
  • aspectrum (array) – the auditory spectrum array.
  • nfft (int) – the FFT size. (Default is 512)
  • fs (int) – sample rate/ sampling frequency of the signal. (Default 16000 Hz)
  • nfilts (int) – the number of filters in the filterbank. (Default 20)
  • fb_type (str) – type of bins [Mel, Bark, …].
  • bwidth (int) – the constant width of each band relative to standard Mel (default 1). Default is 1.
  • low_freq (int) – lowest band edge of mel filters. (Default 0 Hz)
  • high_freq (int) – highest band edge of mel filters. (Default samplerate/2)
  • sumpower (bool) – sum power if True. Default is True.
Returns:

power spectrum array.

spafe.utils.spectral.invpostaud(y, fmax, fb_type='bark', broaden=0)[source]
invert the effects of postaud (loudness equalization and cube
  • root compression)
  • y = postaud output
  • x = reconstructed critical band filters
  • rows = critical bands
  • cols = frames
spafe.utils.spectral.invpowspec(y, fs, win_len, win_hop, excit=[])[source]

x = invpowspec(y, fs, wintime, steptime, excit)

Attempt to go back from specgram-like power spectrum to audio waveform by scaling specgram of white noise

default values:
fs = 8000Hz wintime = 25ms (200 samps) steptime = 10ms (80 samps) which means use 256 point fft hamming window
excit is input excitation; white noise is used if not specified
for fs = 8000 NFFT = 256; NOVERLAP = 120; SAMPRATE = 8000; WINDOW = hamming(200);
spafe.utils.spectral.istft(X, fs=16000, win_type='hann', win_len=0.025, win_hop=0.01)[source]
Parameters:
  • X – STFT coefficients
  • win – window to be used for the STFT
  • hop – hop-size
Returns :
x : inverse STFT of X
spafe.utils.spectral.lifter(x, lift=0.6, invs=False)[source]

apply lifter to matrix of cepstra (one per column)

Parameters:
  • lift (float) – exponent of x i^n liftering or, as a negative integer, the length of HTK-style sin-curve liftering.
  • inverse (bool) – if inverse == 1 (default 0), undo the liftering.
Returns:

liftered cepstra.

spafe.utils.spectral.normalize_window(win, hop)[source]

Normalize the window according to the provided hop-size so that the STFT is a tight frame.

Parameters:
  • win (int) – window to be used for the STFT
  • hop (int) – hop-size
spafe.utils.spectral.postaud(x, fmax, fb_type='bark', broaden=0)[source]
do loudness equalization and cube root compression
  • x = critical band filters
  • rows = critical bands
  • cols = frames
spafe.utils.spectral.power_spectrum(fourrier_transform, nfft=512)[source]
spafe.utils.spectral.powspec(sig, fs=16000, nfft=512, win_type='hann', win_len=0.025, win_hop=0.01, dither=1)[source]

compute the powerspectrum and frame energy of the input signal. basically outputs a power spectrogram

each column represents a power spectrum for a given frame each row represents a frequency

default values:
fs = 8000Hz wintime = 25ms (200 samps) steptime = 10ms (80 samps) which means use 256 point fft hamming window

$Header: /Users/dpwe/matlab/rastamat/RCS/powspec.m,v 1.3 2012/09/03 14:02:01 dpwe Exp dpwe $

for fs = 8000
NFFT = 256; NOVERLAP = 120; SAMPRATE = 8000; WINDOW = hamming(200);
spafe.utils.spectral.pre_process_x(sig, fs=16000, win_type='hann', win_len=0.025, win_hop=0.01)[source]

Prepare window and pad signal audio

spafe.utils.spectral.rfft(x, n=512)[source]

compute the fourrier transform of a certain signal frames.

spafe.utils.spectral.stft(sig, fs=16000, win_type='hann', win_len=0.025, win_hop=0.01)[source]

Compute the short time Fourrier transform of an audio signal x.

Parameters:
  • x (array) – audio signal in the time domain
  • win (int) – window to be used for the STFT
  • hop (int) – hop-size
Returns:

2d array of the STFT coefficients of x

Return type:

X