[openal] [PATCH] Add SSE version of Resample_lerp32

Timothy Arceri t_arceri at yahoo.com.au
Sun Jun 1 08:40:36 EDT 2014


On Sun, 2014-06-01 at 04:56 -0700, Chris Robinson wrote:
> On 06/01/2014 03:19 AM, Timothy Arceri wrote:
> > I'm open to suggestions to improve this further. This is my first time
> > using SSE so its very possible I haven't done this the best way. Also
> > one thing I was worried about is using _mm_cvtepi32_ps() to convert
> > 'frac' from an integer to a float as its meant to be used on signed
> > integers. Is it likely that this value will ever be so large that this
> > will actually matter?
> 
> Shouldn't be a problem at all. 'frac' is a normalized value in 18.14 
> fixed point, and isn't more than FRACTIONONE (16384) since it's the 
> fractional component of the current sample offset. It can temporarily be 
> larger when the increment is added to it, but when that happens the 
> whole-number 'overflow' is added to the sample offset before getting 
> masked out, which brings it back under FRACTIONONE.
> 
> There is, however, a general problem with the code. The _mm_*_epi32 
> intrinsics are for SSE2. I'm actually surprised it compiles without 
> including emmintrin.h, which GCC doesn't allow without also adding the 
> -msse2 switch*, and that puts a hard SSE2 requirement on code that's 
> compiled with the switch, even for functions that don't explicitly use 
> it (GCC will automatically use the available registers and opcodes 
> provided as it sees fit). Which defeats the purpose of run-time CPU 
> detection.
> 
> So basically, using SSE2 or SSE4.1 intrinsics has to go into their own 
> source files, and that will require additional cmake checks and 
> configuration. It's unfortunate, really, because it could be kept in a 
> single source... if GCC would simply allow including the intrinsic 
> headers regardless, and only error if it ends up generating function 
> bodies with those opcodes where it can't use them (you can use 
> __attribute__((target(...))) to enable specific extensions on a 
> per-function basis).
> 
> 
> * -msse2 is implied for x86_64 targets, but not x86.

Thanks for the quick feedback.

For some reason I was thinking you were already targeting SSE2 not just
SSE I guess it was because gcc wasn't failing. For the record yes I am
building for x86_64. I guess I will take a look at making some cmake
changes.

Also I've just noticed the following code in mixer.c is big on cpu
(about 2/3 of the time in my test case) so it might be a good place for
some code sharing with my current patch.

        /* Update positions */
        for(j = 0;j < DstBufferSize;j++)
        {
            DataPosFrac += increment;
            DataPosInt  += DataPosFrac>>FRACTIONBITS;
            DataPosFrac &= FRACTIONMASK;
        }






More information about the openal mailing list