[openal] [PATCH] Add SSE version of Resample_lerp32
t_arceri at yahoo.com.au
Sun Jun 1 08:40:36 EDT 2014
On Sun, 2014-06-01 at 04:56 -0700, Chris Robinson wrote:
> On 06/01/2014 03:19 AM, Timothy Arceri wrote:
> > I'm open to suggestions to improve this further. This is my first time
> > using SSE so its very possible I haven't done this the best way. Also
> > one thing I was worried about is using _mm_cvtepi32_ps() to convert
> > 'frac' from an integer to a float as its meant to be used on signed
> > integers. Is it likely that this value will ever be so large that this
> > will actually matter?
> Shouldn't be a problem at all. 'frac' is a normalized value in 18.14
> fixed point, and isn't more than FRACTIONONE (16384) since it's the
> fractional component of the current sample offset. It can temporarily be
> larger when the increment is added to it, but when that happens the
> whole-number 'overflow' is added to the sample offset before getting
> masked out, which brings it back under FRACTIONONE.
> There is, however, a general problem with the code. The _mm_*_epi32
> intrinsics are for SSE2. I'm actually surprised it compiles without
> including emmintrin.h, which GCC doesn't allow without also adding the
> -msse2 switch*, and that puts a hard SSE2 requirement on code that's
> compiled with the switch, even for functions that don't explicitly use
> it (GCC will automatically use the available registers and opcodes
> provided as it sees fit). Which defeats the purpose of run-time CPU
> So basically, using SSE2 or SSE4.1 intrinsics has to go into their own
> source files, and that will require additional cmake checks and
> configuration. It's unfortunate, really, because it could be kept in a
> single source... if GCC would simply allow including the intrinsic
> headers regardless, and only error if it ends up generating function
> bodies with those opcodes where it can't use them (you can use
> __attribute__((target(...))) to enable specific extensions on a
> per-function basis).
> * -msse2 is implied for x86_64 targets, but not x86.
Thanks for the quick feedback.
For some reason I was thinking you were already targeting SSE2 not just
SSE I guess it was because gcc wasn't failing. For the record yes I am
building for x86_64. I guess I will take a look at making some cmake
Also I've just noticed the following code in mixer.c is big on cpu
(about 2/3 of the time in my test case) so it might be a good place for
some code sharing with my current patch.
/* Update positions */
for(j = 0;j < DstBufferSize;j++)
DataPosFrac += increment;
DataPosInt += DataPosFrac>>FRACTIONBITS;
DataPosFrac &= FRACTIONMASK;
More information about the openal