|
HART
0.2.0
High level Audio Regression and Testing
|
Common audio-related metrics. More...
Classes | |
| class | MetricQuery< ValueType > |
| Manages the metrics calculations. More... | |
Functions | |
| template<typename SampleType > | |
| MetricQuery< double > | channelCorrelation (const AudioBuffer< SampleType > &buffer) |
| Calculates zero-lag normalized cross-correlation between two channels of an audio buffer. | |
| template<typename SampleType > | |
| MetricQuery< double > | crestFactor (const AudioBuffer< SampleType > &buffer) |
| Calculates linear crest factor for a single channel of an audio buffer. | |
| MetricQuery< double > | interpolatedPeakFrequency (const Spectrum &spectrum) |
| Returns the center frequency of the loudest FFT bin. | |
| template<typename SampleType > | |
| MetricQuery< double > | lagAtMaxCrossCorrelation (const AudioBuffer< SampleType > &bufferA, const AudioBuffer< SampleType > &bufferB, double maxLagSeconds, double minAbsBestCorrelation=0.5, CorrelationSearchMode searchMode=bestAbsoluteCorrelation) |
| Calculates lag corresponding to maximum normalized cross-correlation between two audio buffers. | |
| MetricQuery< double > | loudestBinFrequency (const Spectrum &spectrum) |
| Returns the center frequency of the loudest FFT bin. | |
| MetricQuery< double > | loudestBinMagnitude (const Spectrum &spectrum) |
| Calculates the magnitude of the loudest FFT bin. | |
| template<typename SampleType > | |
| MetricQuery< double > | maxCrossCorrelation (const AudioBuffer< SampleType > &bufferA, const AudioBuffer< SampleType > &bufferB, double maxLagSeconds, CorrelationSearchMode searchMode=bestAbsoluteCorrelation) |
| Calculates maximum normalized cross-correlation between two audio buffers. | |
| MetricQuery< double > | quinns2 (const Spectrum &spectrum) |
| Returns somewhat accurate loudest frequency in the spectrum. | |
| template<typename SampleType > | |
| MetricQuery< double > | samplePeak (const AudioBuffer< SampleType > &audioBuffer) |
| Calculates Sample Peak of an audio buffer. | |
| MetricQuery< double > | spectralCentroid (const Spectrum &spectrum, SpectralCentroid::Weighting weighting=SpectralCentroid::Weighting::magnitude) |
| Calculates spectral centroid. | |
| MetricQuery< double > | spectralFlatness (const Spectrum &spectrum, double floorLinear=1e-16) |
| Calculates spectral flatness, also known as Wiener entropy, or tonality coefficient. | |
| template<typename SampleType > | |
| MetricQuery< double > | truePeak (const AudioBuffer< SampleType > &audioBuffer, Oversampling oversamplingRatio=Oversampling::x4, typename TruePeak< SampleType >::FilterQuality filterQuality=TruePeak< SampleType >::FilterQuality::low) |
| Estimates true peak (inter-sample peak) level. | |
Common audio-related metrics.
| MetricQuery< double > channelCorrelation | ( | const AudioBuffer< SampleType > & | buffer | ) |
Calculates zero-lag normalized cross-correlation between two channels of an audio buffer.
Operates per specified pairs of channels, use a reducer to get a scalar value (see Reducers). If custom channel subset is not specified via ch(), defaults to a set of all unique unordered channel pairs, e.g. {{0, 1}} for stereo buffer, or {{0, 1}, {0, 2}, {1, 2}, {1, 3}, {2, 3}} for 3-channel buffer. A specific order of pairs is not guaranteed, unless you explicitly pass a custom list of channel pairs via ch().
Returns a unitless value, suppoert Unit::none and Unit::native, which are the same here, so there's no need to request any unit with a chained as() call.
Usage examples
Be careful when you want to specify only one channel pair:
Uses the normalized cross-correlation formula:
\[ \rho = \frac{\sum_n x[n]\,y[n]} {\sqrt{\left(\sum_n x[n]^2\right)\left(\sum_n y[n]^2\right)}} \]
(sum (x[n] * y[n]) / sqrt (sum (x[n]^2) * sum (y[n]^2)))
where x and y are the selected channels of the same buffer.
The returned value is in the range [-1, 1]:
1.0 means perfectly correlated channels0.0 means no linear correlation-1.0 means perfectly inverted polarityThe function returns NaN if correlation is undefined, such as when:
| buffer | Input audio buffer |
MetricQuery, which calculates normalized correlation coefficient per pair of channels, or NaN if correlation is undefined | SampleType | Floating point sample type, typically float or double |
| hart::IndexError | if either channel index is out of bounds |
Definition at line 70 of file hart_channel_correlation.hpp.
| MetricQuery< double > crestFactor | ( | const AudioBuffer< SampleType > & | buffer | ) |
Calculates linear crest factor for a single channel of an audio buffer.
Crest factor is defined as the ratio between the absolute peak value and RMS value:
\[ \frac{\max_n \left|x[n]\right|}{\sqrt{\frac{1}{N}\sum_n x[n]^2}} \]
(max (abs (x[n])) / sqrt ((1 / N) * sum (x[n]^2)))
Calculates values independently for each channel. Use a reducer to get a scalar value (see Reducers). Supports Unit::linear (default) and Unit::dB units. Decibel conversion is performed as 20 * log10 (x), i.e. the amplitude-ratio form, see hart::ratioToDecibels().
Usage example:
| buffer | Input audio buffer |
MetricQuery, which calculates crest factor in linear ratio units or dBNaN if the audio buffer contains zero frames.inf if the selected channel is silent, making RMS equal to (or close to) zero. | SampleType | Floating point sample type of the audio buffer, typically float or double |
| hart::IndexError | if the channel index is out of bounds, or slice boundary is out of range |
Definition at line 56 of file hart_crest_factor.hpp.
|
inline |
Returns the center frequency of the loudest FFT bin.
Finds the maximum-magnitude FFT bin independently for each channel. Use reducers to combine multi-channel results (see Reducers).
Supports Unit::Hz (native/default) unit, so requesting a unit explicitly via MetricQuery::as() is not required.
This metric operates on FFT bins exactly as stored in the Spectrum. In a not-so-likely event where multiple bins have exactly the same magnitube, the lowest frequency will be returned. This is not intended for precise pitch tracking, but still useful for some types of tests, based around looking for peaks in a band of frequencies, but not a specific frequency.
For more precise peak frequency estimation, consider using Quinn's second estimator (hart::quinns2() metric) instead.
Usage examples:
| spectrum | Input frequency-domain spectrum |
| hart::UnitError | if unsupported unit is requested |
Definition at line 58 of file hart_interpolated_peak_frequency.hpp.
| MetricQuery< double > lagAtMaxCrossCorrelation | ( | const AudioBuffer< SampleType > & | bufferA, |
| const AudioBuffer< SampleType > & | bufferB, | ||
| double | maxLagSeconds, | ||
| double | minAbsBestCorrelation = 0.5, |
||
| CorrelationSearchMode | searchMode = bestAbsoluteCorrelation |
||
| ) |
Calculates lag corresponding to maximum normalized cross-correlation between two audio buffers.
Searches for the lag producing the strongest normalized cross-correlation independently for each selected pair of channels.
Cross-correlation is calculated using the following formula:
\[ \frac{\sum_n x[n]\,y[n+k]} {\sqrt{ \left(\sum_n x[n]^2\right) \left(\sum_n y[n+k]^2\right) }} \]
(sum (x[n] * y[n + k]) / sqrt (sum (x[n]^2) * sum (y[n + k]^2)))
where:
x[n] is the left-hand-side signaly[n + k] is the right-hand-side signal shifted by lag kk is searched in the range [-maxLag, +maxLag]Positive lag means that bufferB is delayed relative to bufferA.
Depending on searchMode, the metric either:
Correlation is calculated independently for each selected pair of channels. Use a reducer to combine multiple lag values into a scalar.
Supports Unit::frames (default/native) and Unit::seconds. For conversion to seconds, it uses sample rate metadata contained in the provided buffers.
Usage examples:
Notes:
NaN.| bufferA | Left-hand-side audio buffer |
| bufferB | Right-hand-side audio buffer |
| maxLagSeconds | Maximum lag to search in seconds |
| minAbsBestCorrelation | If best correlation (rectified) is under this value, then signals will be considered to not have valid overlap, and result will be NaN |
| searchMode | Controls how the best lag is selected |
| SampleType | Floating point sample type, typically float or double |
| hart::ValueError | If maxLagSeconds is negative |
| hart::SampleRateError | If sample rates differ |
| hart::IndexError | If requested channel indices are out of range |
| hart::UnitError | If unsupported unit is requested |
Definition at line 99 of file hart_lag_at_max_cross_correlation.hpp.
|
inline |
Returns the center frequency of the loudest FFT bin.
Finds the maximum-magnitude FFT bin independently for each channel. Use reducers to combine multi-channel results (see Reducers).
Supports Unit::Hz (native/default) unit, so requesting a unit explicitly via MetricQuery::as() is not required.
This metric operates on FFT bins exactly as stored in the Spectrum. In a not-so-likely event where multiple bins have exactly the same magnitube, the lowest frequency will be returned. This is not intended for precise pitch tracking, but still useful for some types of tests, based around looking for peaks in a band of frequencies, but not a specific frequency.
For more precise peak frequency estimation, consider using Quinn's second estimator (hart::quinns2() metric) instead.
Usage examples:
| spectrum | Input frequency-domain spectrum |
| hart::UnitError | if unsupported unit is requested |
Definition at line 58 of file hart_loudest_bin_frequency.hpp.
|
inline |
Calculates the magnitude of the loudest FFT bin.
Finds the maximum-magnitude FFT bin independently for each channel. Use reducers to combine multi-channel results (see Reducers).
Supports:
Unit::linear (native/default)Unit::dB as linear ratio, not powerThis metric operates on FFT bins exactly as stored in the Spectrum.
Typical use cases:
Usage examples:
| spectrum | Input frequency-domain spectrum |
| hart::UnitError | if unsupported unit is requested |
Definition at line 60 of file hart_loudest_bin_magnitude.hpp.
| MetricQuery< double > maxCrossCorrelation | ( | const AudioBuffer< SampleType > & | bufferA, |
| const AudioBuffer< SampleType > & | bufferB, | ||
| double | maxLagSeconds, | ||
| CorrelationSearchMode | searchMode = bestAbsoluteCorrelation |
||
| ) |
Calculates maximum normalized cross-correlation between two audio buffers.
Searches for the best normalized cross-correlation value within a specified lag range independently for each selected pair of channels.
Cross-correlation is calculated using the following formula:
\[ \frac{\sum_n x[n]\,y[n+k]} {\sqrt{ \left(\sum_n x[n]^2\right) \left(\sum_n y[n+k]^2\right) }} \]
(sum (x[n] * y[n + k]) / sqrt (sum (x[n]^2) * sum (y[n + k]^2)))
where:
x[n] is the left-hand-side signaly[n + k] is the right-hand-side signal shifted by lag kk is searched in the range [-maxLag, +maxLag]The result is normalized to the range [-1, 1], where:
+1 means perfect positive correlation-1 means perfect negative correlation (polarity inversion)0 means no linear correlationDepending on searchMode, the metric either:
Correlation is calculated independently for each selected pair of channels. Use a reducer to combine multiple channel-pair results into a scalar.
Usage examples:
Notes:
bestAbsoluteCorrelation mode.NaN.Supports only Unit::native and Unit::none units.
| bufferA | Left-hand-side audio buffer |
| bufferB | Right-hand-side audio buffer |
| maxLagSeconds | Maximum lag to search in seconds |
| searchMode | Controls how the best lag is selected, see `CorrelationSearchMode` |
| SampleType | Floating point sample type, typically float or double |
| hart::ValueError | If maxLagSeconds is negative |
| hart::SampleRateError | If sample rates differ |
| hart::IndexError | If requested channel indices are out of range |
| hart::UnitError | If unsupported unit is requested |
Definition at line 109 of file hart_max_cross_correlation.hpp.
|
inline |
Returns somewhat accurate loudest frequency in the spectrum.
Implements algorithm commonly referred to as "Quinn's Second Estimator", described by B. G. Quinn in "Estimating frequency by interpolation using Fourier coefficients", IEEE Transactions on Signal Processing, Vol. 42, No. 5, pp. 1264-1268.
It's provides a quite accurate way if interpolating frequency value, that is somewhere in between FFT bin centers. Note that it's undefined near DC and Nyquist frequencies, and it will return NaN for those bins. For those edge cases, consider using a more simple hart::loudestBinFrequency() metric instead.
Supports Unit::Hz (native/default) unit, so requesting a unit explicitly via MetricQuery::as() is not required.
This metric operates on FFT bins exactly as stored in the Spectrum. In a not-so-likely event where multiple bins have exactly the same magnitube, the lowest frequency will be returned.
Usage examples:
| spectrum | Input frequency-domain spectrum |
| hart::UnitError | if unsupported unit is requested |
Definition at line 61 of file hart_quinns2.hpp.
| MetricQuery< double > samplePeak | ( | const AudioBuffer< SampleType > & | audioBuffer | ) |
Calculates Sample Peak of an audio buffer.
Calculates rectified peak values for each channel. Use a reducer to get a scalar value (see Reducers). Supports Unit::linear (default) and Unit::dB units. Usage example:
truePeak() metric or TruePeaksBelow matcher. | audioBuffer | Buffer to measure sample peaks in. |
| hart::IndexError | if slice's boundary is out of audio buffer's range |
| hart::UnitError | if unsupported unit is requested |
Definition at line 33 of file hart_sample_peak.hpp.
|
inline |
Calculates spectral centroid.
Commonly used to numerically express amound of "brightness" of a sound.
You have an option to pick one of two common weighting methods:
\[ C_{mag}=\frac{\sum_{k=0}^{N-1} f_k |X_k|}{\sum_{k=0}^{N-1} |X_k|} \]
(C_mag = sum(f_k * abs(X_k)) / sum(abs(X_k))),
\[ C_{pow}=\frac{\sum_{k=0}^{N-1} f_k |X_k|^2}{\sum_{k=0}^{N-1} |X_k|^2} \]
C_pow = sum(f_k * abs(X_k)^2) / sum(abs(X_k)^2)
Where fk is a center frequency of each bin, and Xk is a magnitude of this bin. In both cases the result is measured in Hertz, so accepted units are Unit::native and Unit::Hz, which are the same.
See Using Metrics And Reducers for how to use metrics that return a MetricQuery object line this one.
| spectrum | Spectrum of a single to operate on |
| weighting | Type of weighting (see description above) |
MetricQuery, which calculates per-channel spectral centroid values in Hz Definition at line 55 of file hart_spectral_centroid.hpp.
|
inline |
Calculates spectral flatness, also known as Wiener entropy, or tonality coefficient.
Useful to judge how noise-like spectrum is.
You have an option to pick one of two common weighting methods:
\[ \mathrm{SpectralFlatness} = \frac{ \exp \left( \frac{1}{N} \sum_{n=0}^{N-1} \ln(x[n]) \right) }{ \frac{1}{N} \sum_{n=0}^{N-1} x[n] } \]
(SpectralFlatness = exp(sum (log (x[n])) / N) / (sum(x[n]) / N),
Where x[n] is a magnitude of a bin, and N is number of bins. The result can be represented in Unit::linear (default/ native) or Unit::dB. For decibel value, it will be converted as power (not voltage).
Typical values:
See Using Metrics And Reducers for how to use metrics like this one.
| spectrum | Spectrum of a single to operate on |
| floorLinear | Bin magnitude threshold for numerical stability. Each bin's magnitude will be evaluated as x[n] = max (binMagnitudes[n], floorLinear). |
MetricQuery, which calculates per-channel spectral flatness values Definition at line 53 of file hart_spectral_flatness.hpp.
| MetricQuery< double > truePeak | ( | const AudioBuffer< SampleType > & | audioBuffer, |
| Oversampling | oversamplingRatio = Oversampling::x4, |
||
| typename TruePeak< SampleType >::FilterQuality | filterQuality = TruePeak<SampleType>::FilterQuality::low |
||
| ) |
Estimates true peak (inter-sample peak) level.
It checks inter-sample peaks by observing oversampled signal, following ITU-R BS.1770-5 guidelines. Some of the implementation choices are exposed via arguments, such as oversampling factor and number of taps in the internal poly-phase FIR filter, as the standard does not specify the exact values.
Supports values in dB TP (Unit::dB) and linear domain (Unit::linear). Operating at default unit (Unit::native) will yield values in dB TP.
Shares the same implementation as TruePeaksBelow matcher, but lets you make more versatile expressions.
| audioBuffer | Buffer to estimate true peaks in |
| oversamplingRatio | Oversampling for the estimator. Higher OS ratios are expected to result in more accurate estimations. |
| filterQuality | Represent number of taps for the internal FIR filter. Higher will result in more accurate estimate. Note that even the highest filter quality is way lower than what is used in actual DAC oversamplers, but it's okay, since we're merely estimating here. |
Definition at line 265 of file hart_true_peak.hpp.