[openal] [PATCH] Add SSE version of Resample_lerp32
Chris Robinson
chris.kcat at gmail.com
Sun Jun 1 07:56:00 EDT 2014
On 06/01/2014 03:19 AM, Timothy Arceri wrote:
> I'm open to suggestions to improve this further. This is my first time
> using SSE so its very possible I haven't done this the best way. Also
> one thing I was worried about is using _mm_cvtepi32_ps() to convert
> 'frac' from an integer to a float as its meant to be used on signed
> integers. Is it likely that this value will ever be so large that this
> will actually matter?
Shouldn't be a problem at all. 'frac' is a normalized value in 18.14
fixed point, and isn't more than FRACTIONONE (16384) since it's the
fractional component of the current sample offset. It can temporarily be
larger when the increment is added to it, but when that happens the
whole-number 'overflow' is added to the sample offset before getting
masked out, which brings it back under FRACTIONONE.
There is, however, a general problem with the code. The _mm_*_epi32
intrinsics are for SSE2. I'm actually surprised it compiles without
including emmintrin.h, which GCC doesn't allow without also adding the
-msse2 switch*, and that puts a hard SSE2 requirement on code that's
compiled with the switch, even for functions that don't explicitly use
it (GCC will automatically use the available registers and opcodes
provided as it sees fit). Which defeats the purpose of run-time CPU
detection.
So basically, using SSE2 or SSE4.1 intrinsics has to go into their own
source files, and that will require additional cmake checks and
configuration. It's unfortunate, really, because it could be kept in a
single source... if GCC would simply allow including the intrinsic
headers regardless, and only error if it ends up generating function
bodies with those opcodes where it can't use them (you can use
__attribute__((target(...))) to enable specific extensions on a
per-function basis).
* -msse2 is implied for x86_64 targets, but not x86.
More information about the openal
mailing list