spafe.utils.spectral¶
-
spafe.utils.spectral.
audspec
(p_spectrum, fs=16000, nfilts=0, fb_type='bark', low_freq=0, high_freq=0, sumpower=1, bwidth=1)[source]¶ perform critical band analysis (see PLP) based on the power spectrogram.
Parameters: - aspectrum (array) – the power spectrum array.
- nfft (int) – the FFT size. (Default is 512)
- fs (int) – sample rate/ sampling frequency of the signal. (Default 16000 Hz)
- nfilts (int) – the number of filters in the filterbank. (Default 20)
- fb_type (str) – type of bins [Mel, Bark, …].
- bwidth (int) – the constant width of each band relative to standard Mel (default 1). Default is 1.
- low_freq (int) – lowest band edge of mel filters. (Default 0 Hz)
- high_freq (int) – highest band edge of mel filters. (Default samplerate/2)
- sumpower (bool) – sum power if True. Default is True.
Returns: auditory spectrum array.
-
spafe.utils.spectral.
compute_stft
(x, win, hop)[source]¶ Compute the short time Fourrier transform of an audio signal x.
Parameters: - x (array) – audio signal in the time domain
- win (int) – window to be used for the STFT
- hop (int) – hop-size
Returns: 2d array of the STFT coefficients of x
Return type: X
-
spafe.utils.spectral.
cqt
(sig, fs=16000, low_freq=10, high_freq=3000, b=48)[source]¶ Compute the constant Q-transform.
- take the absolute value of the FFT
- warp to a Mel frequency scale
- take the DCT of the log-Mel-spectrum
- return the first <num_ceps> components
Parameters: - sig (array) – a mono audio signal (Nx1) from which to compute features.
- fs (int) – the sampling frequency of the signal we are working with. Default is 16000.
- low_freq (int) – lowest band edge of mel filters (Hz). Default is 10.
- high_freq (int) – highest band edge of mel filters (Hz). Default is 3000.
- b (int) – number of bins per octave. Default is 48.
Returns: array including the Q-transform coefficients.
-
spafe.utils.spectral.
display_stft
(X, fs, len_sig, low_freq=0, high_freq=3000, min_db=-10, max_db=0, normalize=True)[source]¶ Plot the stft of an audio signal in the time-frequency plane.
Parameters: - X (array) – STFT coefficients
- fs (int) – sampling frequency in Hz (assumed to be integer)
- hop (int) – hop-size used in the STFT (for labeling the time axis)
- low_freq (int) – minimun frequency to plot in hz. Default is 0 Hz.
- high_freq (int) – maximum frequency tp plot in Hz. Default is 3000 Hz.
- min_db (int) – minimun magnitude to display in dB Default is 0 dB.
- max_db (int) – maximum magnitude to display in dB. Default is -10 dB.
- normalize (bool) – Normalize input. Default is True.
-
spafe.utils.spectral.
invaudspec
(aspectrum, fs=16000, nfft=512, fb_type='bark', low_freq=0, high_freq=None, sumpower=True, bwidth=1)[source]¶ Compute the power spectrum from the auditory spectrum. Invert (~might not be that accurate) the effects of audspec()
Parameters: - aspectrum (array) – the auditory spectrum array.
- nfft (int) – the FFT size. (Default is 512)
- fs (int) – sample rate/ sampling frequency of the signal. (Default 16000 Hz)
- nfilts (int) – the number of filters in the filterbank. (Default 20)
- fb_type (str) – type of bins [Mel, Bark, …].
- bwidth (int) – the constant width of each band relative to standard Mel (default 1). Default is 1.
- low_freq (int) – lowest band edge of mel filters. (Default 0 Hz)
- high_freq (int) – highest band edge of mel filters. (Default samplerate/2)
- sumpower (bool) – sum power if True. Default is True.
Returns: power spectrum array.
-
spafe.utils.spectral.
invpostaud
(y, fmax, fb_type='bark', broaden=0)[source]¶ - invert the effects of postaud (loudness equalization and cube
- root compression)
- y = postaud output
- x = reconstructed critical band filters
- rows = critical bands
- cols = frames
-
spafe.utils.spectral.
invpowspec
(y, fs, win_len, win_hop, excit=[])[source]¶ x = invpowspec(y, fs, wintime, steptime, excit)
Attempt to go back from specgram-like power spectrum to audio waveform by scaling specgram of white noise
- default values:
- fs = 8000Hz wintime = 25ms (200 samps) steptime = 10ms (80 samps) which means use 256 point fft hamming window
- excit is input excitation; white noise is used if not specified
- for fs = 8000 NFFT = 256; NOVERLAP = 120; SAMPRATE = 8000; WINDOW = hamming(200);
-
spafe.utils.spectral.
istft
(X, fs=16000, win_type='hann', win_len=0.025, win_hop=0.01)[source]¶ Parameters: - X – STFT coefficients
- win – window to be used for the STFT
- hop – hop-size
- Returns :
- x : inverse STFT of X
-
spafe.utils.spectral.
lifter
(x, lift=0.6, invs=False)[source]¶ apply lifter to matrix of cepstra (one per column)
Parameters: - lift (float) – exponent of x i^n liftering or, as a negative integer, the length of HTK-style sin-curve liftering.
- inverse (bool) – if inverse == 1 (default 0), undo the liftering.
Returns: liftered cepstra.
-
spafe.utils.spectral.
normalize_window
(win, hop)[source]¶ Normalize the window according to the provided hop-size so that the STFT is a tight frame.
Parameters: - win (int) – window to be used for the STFT
- hop (int) – hop-size
-
spafe.utils.spectral.
postaud
(x, fmax, fb_type='bark', broaden=0)[source]¶ do loudness equalization and cube root compression
Parameters: - x – critical band filters
- rows – critical bands
- cols – frames
-
spafe.utils.spectral.
powspec
(sig, fs=16000, nfft=512, win_type='hann', win_len=0.025, win_hop=0.01, dither=1)[source]¶ compute the powerspectrum and frame energy of the input signal. basically outputs a power spectrogram
each column represents a power spectrum for a given frame each row represents a frequency
- default values:
- fs = 8000Hz wintime = 25ms (200 samps) steptime = 10ms (80 samps) which means use 256 point fft hamming window
$Header: /Users/dpwe/matlab/rastamat/RCS/powspec.m,v 1.3 2012/09/03 14:02:01 dpwe Exp dpwe $
- for fs = 8000
- NFFT = 256; NOVERLAP = 120; SAMPRATE = 8000; WINDOW = hamming(200);
-
spafe.utils.spectral.
pre_process_x
(sig, fs=16000, win_type='hann', win_len=0.025, win_hop=0.01)[source]¶ Prepare window and pad signal audio
-
spafe.utils.spectral.
rfft
(x, n=512)[source]¶ compute the fourrier transform of a certain signal frames.
-
spafe.utils.spectral.
stft
(sig, fs=16000, win_type='hann', win_len=0.025, win_hop=0.01)[source]¶ Compute the short time Fourrier transform of an audio signal x.
Parameters: - x (array) – audio signal in the time domain
- win (int) – window to be used for the STFT
- hop (int) – hop-size
Returns: 2d array of the STFT coefficients of x
Return type: X