[openal] Panning, Ambisonics, and HRTF [also, TEEM]
Chris Robinson
chris.kcat at gmail.com
Sun Sep 14 21:40:58 EDT 2014
On 09/14/2014 02:13 AM, Richard Furse wrote:
> Excellent - it sounds as if you might be getting the Ambisonic Bug!
> I'd thoroughly recommend it - IMHO it's definitely the best way to
> capture a 3D audio scene if you don't need interactivity (or can mix
> it on the fly!).
It is neat. I'm kinda sad there aren't more ambisonic audio files (.amb)
available, since even a first-order 4-channel encoding can give a pretty
good surround sound response. But I suppose one of the big issues there,
aside from general support, is file size. Since .amb files are stored as
uncompressed wav, even a few minutes of audio can easily be 100+MB.
Would be nice to see a variation of the format that's compressed and
stored as FLAC (if not Vorbis or Opus, though perhaps the lossy nature
of those codecs would be too much of a problem).
I'm a little confused on .amb files themselves though, since I've read
about two different channel orderings available. One being the
WXYZRST... and the other being WYZXVTR... (aka ACN, Ambisonic Channel
Number). There doesn't seem to be a way to tell which is used. Also,
there's a bit of uncertainty with the GUIDs:
http://dream.cs.bath.ac.uk/researchdev/wave-ex/bformat.html
First, it says the B-Format integer PCM GUID is
{00000001-0721-11d3-8644-C8C1CA000000}
and then, after explaining the GUID struct, says it gets written as
{0x00000001,0x0000,0x0010,{0x80,0x00,0x00,0xaa,0x00,0x38,0x9b,0x71}}
which is a completely different value. I've seen both GUIDs used, but I
have no idea what the difference is between them.
That page also says the channel mask should be set to 0, but I've run
across at least one file with it non-0.
> What you're describing below is essentially how the Rapture3D OpenAL
> driver works. You might be interested in a paper on the topic I wrote
> for an AES Games conference a few years back, which seems directly
> relevant: "Building An OpenAL Implementation Using Ambisonics".
Interesting. Thanks. :)
I had also thought about rendering to b-format internally and then doing
a decode on the final output, so that even the rendering is oblivious to
the speaker count. It'd be cool to modify the wave file writer to
optionally output amb files.
But this would have problems with multi-channel sources (where you
generally want to be able to speaker-match input to output, particularly
for front-center and LFE, or if the input is already binaural). I also
fear it would reduce the apparent localization with HRTF, since it would
effectively turn the HRTF into a post-process. While that could be a
huge win on processing time, it would lose the exact position
information a sound would have available during the mix (though maybe
second- and third-order helps solve that? I'm not sure, really).
> Blue Ripple Sound was actually set up after I'd spent 10+ years
> slightly(?) obsessed by Higher Order Ambisonic decoding and felt
> happy that I'd cracked it sufficiently to try releasing a commercial
> product, and games happened to be a good vehicle (because mixing
> happens live, so the user never has to worry about the horrid
> B-Format).
Heh, I'm actually coming at this from the other direction. After a while
of working on a number of game-related software projects, I got involved
with OpenAL because I wanted to see more commercial games be
cross-platform, and I felt having a good, up-to-date OpenAL
implementation would help with that. I see Ambisonics as a potentially
good addition to it.
> The 5.0 decoder is one of Bruce's, and if you want to
> find out how it was derived you need to read "The Design and
> Optimisation of Surround Sound Decoders Using Heuristic Methods" by
> Bruce and others (can't remember exactly where that was published).
> This definitely isn't the only way to derive decoders for irregular
> layouts, particularly at higher orders with more coefficients (go on,
> just another step...) and there isn't a "right answer" as such
> because it depends on your design criteria.
Hmm, this seems to be far more complex than I thought. At this point, it
looks like if I'm going to go through with using this new panning
method, I'm going to have to drop the ability for users to alter speaker
positions. At least until I get a proper handle on this stuff.
Out of curiosity, how do the different order coefficients relate? For
instance, can I mix second- or third-order input with a first-order
decoder, and simply treat the missing decoder coefficients as 0? And the
same for a first-order input with a second- or third-order decoder
(treat the missing inputs as 0)? I really hope that's the case, but I
fear it won't be...
> In particular, if any experts on lossy
> compression want to get involved that would be particularly
> exciting!
I'm far from an expert on lossy compression, but that does bring to mind
the guys at Xiph.org. I presume you know of them, but they're good at
developing both lossy and lossless audio codecs (Vorbis, Opus, FLAC,
etc), and even a couple video codecs (Theora, with a new one in the
works called Daala), and release them open and royalty-free.
More information about the openal
mailing list