[openal] Questions about OpenAL Soft's Resampler

Tue Nov 7 21:17:55 EST 2017

On 11/07/2017 11:10 AM, Ethan Lee wrote:
> Hey there!
> 
> I'm currently working on a reimplementation of Microsoft's XACT runtime, 
> called FACT...
> 
> https://github.com/flibitijibibo/FACT

Sounds interesting.

> ... but I also have to factor in pitch changes, which SDL_AudioStream 
> doesn't account for (it only expects one input frequency for the 
> duration of its existence).
> 
> Normally I'd just use OpenAL for this sort of thing, but part of FACT's 
> job is mimicking XAudio2 accurately, which unfortunately makes the two 
> incompatible in really subtle ways (and fixing them would mean ripping 
> up the library for what is essentially one exact use case and nothing 
> else). I'd still like to use OpenAL's ideas though, but to keep the 
> permissive license intact I'd have to, at most, look at papers/documents 
> that explain how it's done, rather than just using the code directly.
> 
> TL;DR: How does OpenAL's resampler work with adjustable pitch changes, 
> and are there any resources online that were used as references for the 
> resampler? I'm not too concerned about the resampling function itself 
> (we'll probably just use a linear resampler), but I'm a lot more 
> concerned about stuff like weird step sizes, fractions of samples, 
> padding buffer sizes (for both resampling and possibly for output...?), 
> handling wildly fluctuating pitches, little details like that.

For OpenAL Soft, the general idea is that each source maintains a 
fixed-point offset, split into two 32-bit ints (could probably use a 
64-bit int these days), with 12 bits of precision. A stepping value is 
calculated in the same 12-bit fixed-point format based on the input 
sample rate, output sample rate, and the pitch multiple (so for example, 
a 22050hz input sample rate with 44100hz output sample rate has an 
inherent 0.5 bias factor in the stepping value just to play at the 
correct speed, which is multiplied with the pitch to get the desired 
pitch shift).

The actual resampling is essentially a FIR filter. An output sample is 
generated by filtering the input samples around the source's fixed-point 
offset using some method. Then the source's offset is incremented by the 
stepping value and it goes to the next output sample. This is repeated 
until the until the end of the input samples is reached or the output is 
filled.

In regards to the resamplers themselves, point and linear are the 
easiest, where the former just drops the fractional offset and the 
latter uses the two samples around the current offset with the 
fractional component being a blend factor them, but they also have a 
fair bit of noise. Because of how simple they are though, changing the 
pitch is accomplished by merely calculating a new stepping value. The 
higher order resamplers make use of precalculated, rate-specific filter 
tables to improve quality, in which case changing the pitch needs to 
calculate the new stepping value and recalculate the table offset for 
the new rate (not difficult if done right, but something to be aware of).

As a notable detail, OpenAL Soft "loads" samples from the source input 
into some reusable scratch memory and supplies that as the resampler 
input. This helps with resamplers that want to use future samples (such 
as linear or the higher order sinc resamplers) by guaranteeing 
0-amplitude samples after the end of the input with no risk of 
overrunning. This also allows for perfect looping and multi-buffer 
queues by concatenating samples into a continuous stream for the 
resampler to act on, ensuring no glitches when crossing boundaries. It's 
also used for deinterlacing multichannel streams and converting samples 
to floats, as the resamplers can then assume single-channel float32 
input (to avoid having multiple versions of the same resampler for 
different channel configurations or sample types).

Hope that helps with understanding it. If you still need something 
clarified, feel free to ask.