[openal] Panning, Ambisonics, and HRTF [also, TEEM]

Sun Sep 14 21:40:58 EDT 2014

On 09/14/2014 02:13 AM, Richard Furse wrote:
> Excellent - it sounds as if you might be getting the Ambisonic Bug!
> I'd thoroughly recommend it - IMHO it's definitely the best way to
> capture a 3D audio scene if you don't need interactivity (or can mix
> it on the fly!).

It is neat. I'm kinda sad there aren't more ambisonic audio files (.amb) 
available, since even a first-order 4-channel encoding can give a pretty 
good surround sound response. But I suppose one of the big issues there, 
aside from general support, is file size. Since .amb files are stored as 
uncompressed wav, even a few minutes of audio can easily be 100+MB. 
Would be nice to see a variation of the format that's compressed and 
stored as FLAC (if not Vorbis or Opus, though perhaps the lossy nature 
of those codecs would be too much of a problem).

I'm a little confused on .amb files themselves though, since I've read 
about two different channel orderings available. One being the 
WXYZRST... and the other being WYZXVTR... (aka ACN, Ambisonic Channel 
Number). There doesn't seem to be a way to tell which is used. Also, 
there's a bit of uncertainty with the GUIDs:
http://dream.cs.bath.ac.uk/researchdev/wave-ex/bformat.html
First, it says the B-Format integer PCM GUID is
  {00000001-0721-11d3-8644-C8C1CA000000}
and then, after explaining the GUID struct, says it gets written as
  {0x00000001,0x0000,0x0010,{0x80,0x00,0x00,0xaa,0x00,0x38,0x9b,0x71}}
which is a completely different value. I've seen both GUIDs used, but I 
have no idea what the difference is between them.

That page also says the channel mask should be set to 0, but I've run 
across at least one file with it non-0.

> What you're describing below is essentially how the Rapture3D OpenAL
> driver works. You might be interested in a paper on the topic I wrote
> for an AES Games conference a few years back, which seems directly
> relevant: "Building An OpenAL Implementation Using Ambisonics".

Interesting. Thanks. :)

I had also thought about rendering to b-format internally and then doing 
a decode on the final output, so that even the rendering is oblivious to 
the speaker count. It'd be cool to modify the wave file writer to 
optionally output amb files.

But this would have problems with multi-channel sources (where you 
generally want to be able to speaker-match input to output, particularly 
for front-center and LFE, or if the input is already binaural). I also 
fear it would reduce the apparent localization with HRTF, since it would 
effectively turn the HRTF into a post-process. While that could be a 
huge win on processing time, it would lose the exact position 
information a sound would have available during the mix (though maybe 
second- and third-order helps solve that? I'm not sure, really).

> Blue Ripple Sound was actually set up after I'd spent 10+ years
> slightly(?) obsessed by Higher Order Ambisonic decoding and felt
> happy that I'd cracked it sufficiently to try releasing a commercial
> product, and games happened to be a good vehicle (because mixing
> happens live, so the user never has to worry about the horrid
> B-Format).

Heh, I'm actually coming at this from the other direction. After a while 
of working on a number of game-related software projects, I got involved 
with OpenAL because I wanted to see more commercial games be 
cross-platform, and I felt having a good, up-to-date OpenAL 
implementation would help with that. I see Ambisonics as a potentially 
good addition to it.

> The 5.0 decoder is one of Bruce's, and if you want to
> find out how it was derived you need to read "The Design and
> Optimisation of Surround Sound Decoders Using Heuristic Methods" by
> Bruce and others (can't remember exactly where that was published).
> This definitely isn't the only way to derive decoders for irregular
> layouts, particularly at higher orders with more coefficients (go on,
> just another step...) and there isn't a "right answer" as such
> because it depends on your design criteria.

Hmm, this seems to be far more complex than I thought. At this point, it 
looks like if I'm going to go through with using this new panning 
method, I'm going to have to drop the ability for users to alter speaker 
positions. At least until I get a proper handle on this stuff.

Out of curiosity, how do the different order coefficients relate? For 
instance, can I mix second- or third-order input with a first-order 
decoder, and simply treat the missing decoder coefficients as 0? And the 
same for a first-order input with a second- or third-order decoder 
(treat the missing inputs as 0)? I really hope that's the case, but I 
fear it won't be...

> In particular, if any experts on lossy
> compression want to get involved that would be particularly
> exciting!

I'm far from an expert on lossy compression, but that does bring to mind 
the guys at Xiph.org. I presume you know of them, but they're good at 
developing both lossy and lossless audio codecs (Vorbis, Opus, FLAC, 
etc), and even a couple video codecs (Theora, with a new one in the 
works called Daala), and release them open and royalty-free.