[openal] AL_SOFT_UHJ proposal

Sun Apr 3 21:00:20 EDT 2022

One of the last two extensions before the next version, as long as there's no 
big concerns or issues. As mentioned here:

https://openal.org/pipermail/openal/2021-December/000812.html

this is adding support for 2-, 3-, and 4-channel UHJ buffer formats, and a 
Super Stereo processing mode for sources.

Feedback is welcome.
-------------- next part --------------
Name

    AL_SOFT_UHJ

Contributors

    Chris Robinson

Contact

    Chris Robinson (chris.kcat 'at' gmail.com)

Status

    In progress

Dependencies

    This extension is for OpenAL 1.1.
    This extension requires AL_EXT_BFORMAT.

Overview

    This extension adds support for UHJ channel formats and a Super Stereo
    (a.k.a. Stereo Enhance) processor. UHJ is a method of encoding surround
    sound from a first-order B-Format signal into a stereo-compatible signal.
    Such signals can be played as normal stereo (with more stable and wider
    stereo imaging than pan-pot mixing) or decoded back to surround sound,
    which makes it a decent choice where 3+ channel surround sound isn't
    available or desirable. When decoded, a UHJ signal behaves like B-Format,
    which allows it to be rotated through AL_EXT_BFORMAT's source orientation
    property as with its B-Format formats.

    The standard equation for decoding UHJ to B-Format is:

        S = Left + Right
        D = Left - Right

        W = 0.981532*S + 0.197484*j(0.828331*D + 0.767820*T)
        X = 0.418496*S - j(0.828331*D + 0.767820*T)
        Y = 0.795968*D - 0.676392*T + j(0.186633*S)
        Z = 1.023332*Q

    where j is a wide-band +90 degree phase shift. 2-channel UHJ excludes the
    T and Q input channels, and 3-channel excludes the Q input channel. Be
    aware that the resulting W, X, Y, and Z signals are 3dB louder than their
    FuMa counterparts, and the implementation should account for that to
    properly balance it against other sounds.

    An alternative equation for decoding 2-channel-only UHJ is:

        S = Left + Right
        D = Left - Right

        W = 0.981532*S + j(0.163582*D)
        X = 0.418496*S - j(0.828331*D)
        Y = 0.762956*D + j(0.384230*S)

    Which equation to use depends on the implementation and user preferences.
    It's relevant to note that the standard equation is reversible with the
    standard encoding equations, meaning decoding UHJ to B-Format and then
    encoding B-Format to UHJ results in the original UHJ signal, even for
    2-channel.

    One additional note for decoding 2-channel UHJ is the resulting B-Format
    signal should pass through alternate shelf filters for frequency-dependent
    processing. For the standard equation, suitable shelf filters are given
    as:

        W:   LF = 0.661, HF = 1.000
        X/Y: LF = 1.293, HF = 1.000

    And for the alternative equation, suitable shelf filters are given as:

        W:   LF = 0.646, HF = 1.000
        X/Y: LF = 1.263, HF = 1.000

    3- and 4-channel UHJ should use the normal shelf filters for B-Format.

    Super Stereo is a technique for processing a plain (non-UHJ) stereo signal
    to derive a B-Format signal. It's backed by the same functionality as UHJ
    decoding, making it an easy addition on top of UHJ support. Super Stereo
    has a variable width control, allowing the stereo soundfield to encompass
    more or less around the listener while maintaining a stable center image
    (a more naive virtual speaker approach would cause the center image to
    collapse as the soundfield widens). As this derives a B-Format signal like
    UHJ, it similarly allows such sources to be rotated through the source
    orientation property.

    There are various forms of Super Stereo, with varying equations, but a
    good option is:

        S = Left + Right
        D = Left - Right

        W = 0.6098637*S - j(0.6896511*w*D)
        X = 0.8624776*S + j(0.7626955*w*D)
        Y = 1.6822415*w*D - j(0.2156194*S)

    where w is a variable width control, in the range [0...0.7]. As with UHJ,
    the resulting W, X, Y, and Z signals are 3dB louder than their FuMa
    counterparts. The normal shelf filters for playing B-Format should apply.

Issues

    Q: 3- and 4-channel UHJ weren't widely, if ever, used, in part due to the
       extra channels not being stereo-compatible (players need to be aware to
       drop them if not decoding them) and making it more practical and
       efficient to use B-Format directly. Why include them here?
    A: UHJ is a hierarchal system, where 3-channel is a subset of 4-channel,
       and 2-channel is a subset of 3-channel. There's little extra work
       necessary to support them, and there are techniques for getting 3- and
       4-channel UHJ into a stereo-compatible stream, so having the option is
       not a bad idea.

    Q: Why include Super Stereo here as it's not strictly UHJ?
    A: Super Stereo is built on the same structure as UHJ, utilizing phase
       shift filters to generate a B-Format signal from pre-existing stereo
       content. Given the similarity in functionality, it provides a good
       option for handling stereo content. Additionally, even in the hardware
       space it's not uncommon for UHJ decoders to have Super Stereo
       capabilities, so it makes sense to have it here too.

    Q: Super Stereo seems to have a width factor limit of 0.7, but the
       AL_SUPER_STEREO_WIDTH_SOFT attribute goes up to 1.0. Why?
    A: For flexibility of implementation. If a method is developed that allows
       using wider factors, an arbitrary 0.7 limit would be unnecessary. There
       is some precedent for this with the source's AL_PITCH property being
       any finite non-negative value, but an implementation internally clamps
       to its own limits.

New Procedures and Functions

    None.

New Tokens

    Accepted by the <format> parameter of alBufferData:

        AL_FORMAT_UHJ2CHN8_SOFT                  0x19A2
        AL_FORMAT_UHJ2CHN16_SOFT                 0x19A3
        AL_FORMAT_UHJ2CHN_FLOAT32_SOFT           0x19A4
        AL_FORMAT_UHJ3CHN8_SOFT                  0x19A5
        AL_FORMAT_UHJ3CHN16_SOFT                 0x19A6
        AL_FORMAT_UHJ3CHN_FLOAT32_SOFT           0x19A7
        AL_FORMAT_UHJ4CHN8_SOFT                  0x19A8
        AL_FORMAT_UHJ4CHN16_SOFT                 0x19A9
        AL_FORMAT_UHJ4CHN_FLOAT32_SOFT           0x19AA

    Accepted by the <param> parameter of alSourcei, alSourceiv, alGetSourcei,
    and alGetSourceiv:

        AL_STEREO_MODE_SOFT                      0x19B0

    Accepted by the <param> parameter of alSourcef, alSourcefv, alGetSourcef,
    and alGetSourcefv:

        AL_SUPER_STEREO_WIDTH_SOFT               0x19B1

    Accepted by the <value> parameter of alSourcei and alSourceiv for
    AL_STEREO_MODE_SOFT:

        AL_NORMAL_SOFT                           0x0000
        AL_SUPER_STEREO_SOFT                     0x0001

Additions to Specification

    UHJ Buffer Formats

    The formats AL_FORMAT_UHJ2CHN8_SOFT, AL_FORMAT_UHJ2CHN16_SOFT,
    AL_FORMAT_UHJ2CHN_FLOAT32_SOFT, AL_FORMAT_UHJ3CHN8_SOFT,
    AL_FORMAT_UHJ3CHN16_SOFT, AL_FORMAT_UHJ3CHN_FLOAT32_SOFT,
    AL_FORMAT_UHJ4CHN8_SOFT, AL_FORMAT_UHJ4CHN16_SOFT, and
    AL_FORMAT_UHJ4CHN_FLOAT32_SOFT may be used for buffering data to functions
    like alBufferData.

    8-bit data is expressed as an unsigned value over the range 0 to 255, 128
    being an audio output level of zero.

    16-bit data is expressed as a signed value over the range -32768 to 32767,
    0 being an audio output level of zero. Byte order for 16-bit values is
    determined by the native format of the CPU.

    32-bit float data is expressed as a signed value with the normalized range
    -1.0 to +1.0, 0.0 being an audio output level of zero. Byte order for 32-
    bit values is determined by the native format of the CPU.

    These formats are interleaved, with UHJ2 having the left and right samples
    in order, UHJ3 having the left, right, and T samples in order, and UHJ4
    having left, right, T, and Q samples in order.

    UHJ formats are decoded and played according to the rules of BFORMAT
    buffer formats. When played, such formats may be oriented according to the
    source's AL_ORIENTATION and AL_SOURCE_RELATIVE properties.

    Super Stereo Processing

    When playing Stereo formats, a source may opt to enable Super Stereo
    processing with the AL_STEREO_MODE_SOFT attribute.

    Name                 Signature  Values                Default
    -------------------  ---------  --------------------  --------------
    AL_STEREO_MODE_SOFT  i,iv       AL_NORMAL_SOFT,       AL_NORMAL_SOFT
                                    AL_SUPER_STEREO_SOFT

    Description: When AL_STEREO_MODE_SOFT is set to AL_NORMAL_SOFT, Stereo
    formats are processed and mixed as normal for multi-channel formats. When
    set to AL_SUPER_STEREO_SOFT, Stereo formats are processed with a Super
    Stereo (sometimes called Stereo Enhance) algrorithm. In this mode, the
    stereo sound is converted to B-Format using the width factor specified by
    AL_SUPER_STEREO_WIDTH_SOFT, and is treated as a B-Format source which may
    be oriented with this source's AL_ORIENTATION and AL_SOURCE_RELATIVE
    properties.

    This attribute cannot be changed while the source is in an AL_PLAYING or
    AL_PAUSED stated, and it has no effect when the source is not playing a
    STEREO format.

    Name                        Signature  Values        Default
    --------------------------  ---------  ------------  -------
    AL_SUPER_STEREO_WIDTH_SOFT  f,fv       [0.0f, 1.0f]  I/D

    Description: The width factor for the resulting soundfield using Super
    Stereo processing. The default value is implementation-defined, with a
    suggested value that provides good quality for a wide range of stereo
    content. An implementation may internally clamp the maximum value
    depending on the limits imposed by the selected algorithm.

    Has no effect when AL_STEREO_MODE_SOFT is not AL_SUPER_STEREO_SOFT or the
    source is not playing a STEREO format.

Errors

    An AL_INVALID_OPERATION error is generated if an attempt is made to set
    AL_STEREO_MODE_SOFT on a source while it's in an AL_PLAYING or AL_PAUSED
    state.