[openal] [PATCH V2] Add some mixer SSE2/4.1 optimisations
Timothy Arceri
t_arceri at yahoo.com.au
Tue Jun 3 17:23:55 EDT 2014
On Tue, 2014-06-03 at 09:45 -0700, Chris Robinson wrote:
> On 06/03/2014 06:28 AM, Timothy Arceri wrote:
> > Yes that does seem to work (at least in my test) and also seems to
> > perform much better. My SSE2 resample code was taking around 4.45% of
> > cpu with this change its down to 2.22%. For reference the C code is at
> > 6.23% and SSE4.1 1.5%.
>
> Weird that the SSE4.1 linear resampler is performing that much better
> than the SSE2 version. With the _mm_store_ps/_mm_castsi128_ps trick, the
> code for the two becomes exactly the same.
I only applied the _mm_store_ps/_mm_castsi128_ps trick to the SSE2 code.
Using four calls to _mm_extract_epi32 still seems to be faster for
SSE4.1.
More information about the openal
mailing list