spafe.utils.spectral¶

spafe.utils.spectral.audspec(p_spectrum, fs=16000, nfilts=0, fb_type='bark', low_freq=0, high_freq=0, sumpower=1, bwidth=1)[source]¶

perform critical band analysis (see PLP) based on the power spectrogram.

Parameters:

aspectrum (array) – the power spectrum array.
nfft (int) – the FFT size. (Default is 512)
fs (int) – sample rate/ sampling frequency of the signal. (Default 16000 Hz)
nfilts (int) – the number of filters in the filterbank. (Default 20)
fb_type (str) – type of bins [Mel, Bark, …].
bwidth (int) – the constant width of each band relative to standard Mel (default 1). Default is 1.
low_freq (int) – lowest band edge of mel filters. (Default 0 Hz)
high_freq (int) – highest band edge of mel filters. (Default samplerate/2)
sumpower (bool) – sum power if True. Default is True.

Returns:

auditory spectrum array.

spafe.utils.spectral.compute_stft(x, win, hop)[source]¶

Compute the short time Fourrier transform of an audio signal x.

Parameters:	x (array) – audio signal in the time domain win (int) – window to be used for the STFT hop (int) – hop-size
Returns:	2d array of the STFT coefficients of x
Return type:	X

spafe.utils.spectral.cqt(sig, fs=16000, low_freq=10, high_freq=3000, b=48)[source]¶

Compute the constant Q-transform.

take the absolute value of the FFT

warp to a Mel frequency scale

take the DCT of the log-Mel-spectrum

return the first <num_ceps> components

Parameters:	sig (array) – a mono audio signal (Nx1) from which to compute features. fs (int) – the sampling frequency of the signal we are working with. Default is 16000. low_freq (int) – lowest band edge of mel filters (Hz). Default is 10. high_freq (int) – highest band edge of mel filters (Hz). Default is 3000. b (int) – number of bins per octave. Default is 48.
Returns:	array including the Q-transform coefficients.

spafe.utils.spectral.dct(x, type=2, axis=1, norm='ortho')[source]¶

spafe.utils.spectral.display_stft(X, fs, len_sig, low_freq=0, high_freq=3000, min_db=-10, max_db=0, normalize=True)[source]¶

Plot the stft of an audio signal in the time-frequency plane.

Parameters:

X (array) – STFT coefficients
fs (int) – sampling frequency in Hz (assumed to be integer)
hop (int) – hop-size used in the STFT (for labeling the time axis)
low_freq (int) – minimun frequency to plot in hz. Default is 0 Hz.
high_freq (int) – maximum frequency tp plot in Hz. Default is 3000 Hz.
min_db (int) – minimun magnitude to display in dB Default is 0 dB.
max_db (int) – maximum magnitude to display in dB. Default is -10 dB.
normalize (bool) – Normalize input. Default is True.

spafe.utils.spectral.invaudspec(aspectrum, fs=16000, nfft=512, fb_type='bark', low_freq=0, high_freq=None, sumpower=True, bwidth=1)[source]¶

Compute the power spectrum from the auditory spectrum. Invert (~might not be that accurate) the effects of audspec()

Parameters:

aspectrum (array) – the auditory spectrum array.
nfft (int) – the FFT size. (Default is 512)
fs (int) – sample rate/ sampling frequency of the signal. (Default 16000 Hz)
nfilts (int) – the number of filters in the filterbank. (Default 20)
fb_type (str) – type of bins [Mel, Bark, …].
bwidth (int) – the constant width of each band relative to standard Mel (default 1). Default is 1.
low_freq (int) – lowest band edge of mel filters. (Default 0 Hz)
high_freq (int) – highest band edge of mel filters. (Default samplerate/2)
sumpower (bool) – sum power if True. Default is True.

Returns:

power spectrum array.

spafe.utils.spectral.invpostaud(y, fmax, fb_type='bark', broaden=0)[source]¶

invert the effects of postaud (loudness equalization and cube

root compression)
y = postaud output
x = reconstructed critical band filters
rows = critical bands
cols = frames

spafe.utils.spectral.invpowspec(y, fs, win_len, win_hop, excit=[])[source]¶

x = invpowspec(y, fs, wintime, steptime, excit)

Attempt to go back from specgram-like power spectrum to audio waveform by scaling specgram of white noise

default values:: fs = 8000Hz wintime = 25ms (200 samps) steptime = 10ms (80 samps) which means use 256 point fft hamming window
excit is input excitation; white noise is used if not specified: for fs = 8000 NFFT = 256; NOVERLAP = 120; SAMPRATE = 8000; WINDOW = hamming(200);

spafe.utils.spectral.istft(X, fs=16000, win_type='hann', win_len=0.025, win_hop=0.01)[source]¶

Parameters:	X – STFT coefficients win – window to be used for the STFT hop – hop-size

Returns :: x : inverse STFT of X

spafe.utils.spectral.lifter(x, lift=0.6, invs=False)[source]¶

apply lifter to matrix of cepstra (one per column)

Parameters:	lift (float) – exponent of x i^n liftering or, as a negative integer, the length of HTK-style sin-curve liftering. inverse (bool) – if inverse == 1 (default 0), undo the liftering.
Returns:	liftered cepstra.

spafe.utils.spectral.normalize_window(win, hop)[source]¶

Normalize the window according to the provided hop-size so that the STFT is a tight frame.

Parameters:	win (int) – window to be used for the STFT hop (int) – hop-size

spafe.utils.spectral.postaud(x, fmax, fb_type='bark', broaden=0)[source]¶

do loudness equalization and cube root compression

Parameters:	x – critical band filters rows – critical bands cols – frames

spafe.utils.spectral.power_spectrum(fourrier_transform, nfft=512)[source]¶

spafe.utils.spectral.powspec(sig, fs=16000, nfft=512, win_type='hann', win_len=0.025, win_hop=0.01, dither=1)[source]¶

compute the powerspectrum and frame energy of the input signal. basically outputs a power spectrogram

each column represents a power spectrum for a given frame each row represents a frequency

default values:: fs = 8000Hz wintime = 25ms (200 samps) steptime = 10ms (80 samps) which means use 256 point fft hamming window

$Header: /Users/dpwe/matlab/rastamat/RCS/powspec.m,v 1.3 2012/09/03 14:02:01 dpwe Exp dpwe $

for fs = 8000: NFFT = 256; NOVERLAP = 120; SAMPRATE = 8000; WINDOW = hamming(200);

spafe.utils.spectral.pre_process_x(sig, fs=16000, win_type='hann', win_len=0.025, win_hop=0.01)[source]¶: Prepare window and pad signal audio

spafe.utils.spectral.rfft(x, n=512)[source]¶: compute the fourrier transform of a certain signal frames.

spafe.utils.spectral.stft(sig, fs=16000, win_type='hann', win_len=0.025, win_hop=0.01)[source]¶

Compute the short time Fourrier transform of an audio signal x.

Parameters:	x (array) – audio signal in the time domain win (int) – window to be used for the STFT hop (int) – hop-size
Returns:	2d array of the STFT coefficients of x
Return type:	X