[openal] Panning, Ambisonics, and HRTF
chris.kcat at gmail.com
Sat Sep 13 21:51:38 EDT 2014
I'm not really sure who's here that could help with this (CC'ing Richard
just in case he doesn't see it on the ML), but here's something I've
been up to lately.
Over the past few days, I've been digging into ambisonics b-format
audio. For the longest time I couldn't figure out how it worked, but I
think I've finally got a handle on its structure and how to play them,
thanks in part to
Granted, that's admittedly a simple decoding method and there's probably
better ones available, but it works. I've made a simple program that can
decode first- and second-order .amb files to a 5.1 stream. It would be
nice to support them directly with the AL_EXT_BFORMAT extension, and
work with the actual speaker configuration.
However, I'm a bit confused over how the coefficients were calculated. I
think I can see the basic methodology:
channel[c].w_coeff = 0.7071
channel[c].x_coeff = channel[c].dir.x
channel[c].y_coeff = channel[c].dir.y
channel[c].z_coeff = channel[c].dir.z
channel[c].r_coeff = 0.7071 * (3 * channel[c].dir.z^2 - 1)
channel[c].s_coeff = 2 * channel[c].dir.z * channel[c].dir.x
But there's extra attenuation applied at some point in the calculations
based on the number of output speakers. I can't figure out where the
attenuation values actually come from or how they get applied.
Also for 5.0, I can't figure out how the front-center speaker is
factored into it. It seems to "steal" a little power from the other
front speakers (I imagine it does this because the front-center speaker
is not treated as a normal speaker), but again the scale is a mystery.
Figuring this out is important if I'm going to get OpenAL Soft's 6.1 and
7.1 output to work with it, and/or allow users to manually tweak speaker
positions (as they can currently). Plus I try to avoid using "magic"
values that I don't know where they came from.
Relatedly, this gave me ideas on a different way to pan sounds within
OpenAL Soft's current pipeline. Taking the idea that a panned mono sound
could basically be encoded into b-format like:
w[s] = sample[s] * 0.7071
x[s] = sample[s] * pos.x
y[s] = sample[s] * pos.y
and get rendered like:
out[s*num_channels + c] = channel[c].w_coeff*w[s] +
channel[c].y_coeff*y[s] + ...
That can be altered to produce per-channel gains:
Gain[c] = channel[c].w_coeff*0.7071 + channel[c].x_coeff*pos.x +
channel[c].y_coeff*pos.y + ...
and get mixed normally:
out[s*num_channels + c] += sample[s] * Gain[c];
This works. I have some uncommitted code locally that mixes this way to
5.1 using complete second-order coefficients (taking care of the
coordinate differences, obviously).
This has some interesting implications. First being that a sound would
not be rendered as just a single point between the two nearest speakers,
but instead be spread out a bit more. For 5.1 (and 6.1 and 7.1), it also
means 3D sounds would not contribute that much to the front-center
speaker, more using the front-left/right speakers instead and leaving
the center more open for the AL_EFFECT_DEDICATED_DIALOGUE effect. I'm
not sure if this all is actually better or not, compared to the current
But it also means it could be very easily extended to include speaker
verticality, supporting something like 3D7.1 or 8-channel cube, with the
proper coefficients. And even with 2D surround sound, it feels nicer
mathematically when it comes to sounds that move up and down (but of
course, having nice math does not mean good sound quality).
There's one other major issue with implementing b-format ambisonics,
aside from calculating the coefficients. With HRTF, you're not really
dealing with discrete output channels you mix directly to. The input
samples get delayed and then filtered before mixing together. The main
problem I see with feeding the b-format samples through HRTF is that the
different axis could have different mixing delays associated with them,
which would mess up the cancellation effect the omnidirectional feed
provides for the panned samples.
There's a few ways I'm thinking of that could maybe fix this, but I'm
not sure what would actually work or work well (e.g. decoding to an
8-channel cube, or taking averages of the hrtf's coefficients, or
premixing the w component with x, y, z, etc).
Anyway, thoughts and ideas are welcome. :)
More information about the openal