[openal] [PATCH] Add SSE version of Resample_lerp32

Sun Jun 1 07:56:00 EDT 2014

On 06/01/2014 03:19 AM, Timothy Arceri wrote:
> I'm open to suggestions to improve this further. This is my first time
> using SSE so its very possible I haven't done this the best way. Also
> one thing I was worried about is using _mm_cvtepi32_ps() to convert
> 'frac' from an integer to a float as its meant to be used on signed
> integers. Is it likely that this value will ever be so large that this
> will actually matter?

Shouldn't be a problem at all. 'frac' is a normalized value in 18.14 
fixed point, and isn't more than FRACTIONONE (16384) since it's the 
fractional component of the current sample offset. It can temporarily be 
larger when the increment is added to it, but when that happens the 
whole-number 'overflow' is added to the sample offset before getting 
masked out, which brings it back under FRACTIONONE.

There is, however, a general problem with the code. The _mm_*_epi32 
intrinsics are for SSE2. I'm actually surprised it compiles without 
including emmintrin.h, which GCC doesn't allow without also adding the 
-msse2 switch*, and that puts a hard SSE2 requirement on code that's 
compiled with the switch, even for functions that don't explicitly use 
it (GCC will automatically use the available registers and opcodes 
provided as it sees fit). Which defeats the purpose of run-time CPU 
detection.

So basically, using SSE2 or SSE4.1 intrinsics has to go into their own 
source files, and that will require additional cmake checks and 
configuration. It's unfortunate, really, because it could be kept in a 
single source... if GCC would simply allow including the intrinsic 
headers regardless, and only error if it ends up generating function 
bodies with those opcodes where it can't use them (you can use 
__attribute__((target(...))) to enable specific extensions on a 
per-function basis).

* -msse2 is implied for x86_64 targets, but not x86.