[openal] [PATCH V2] Add some mixer SSE2/4.1 optimisations

Timothy Arceri t_arceri at yahoo.com.au
Tue Jun 3 20:59:23 EDT 2014

On Tue, 2014-06-03 at 09:45 -0700, Chris Robinson wrote:
> On 06/03/2014 06:28 AM, Timothy Arceri wrote:
> > Yes that does seem to work (at least in my test) and also seems to
> > perform much better. My SSE2 resample code was taking around 4.45% of
> > cpu with this change its down to 2.22%. For reference the C code is at
> > 6.23% and SSE4.1 1.5%.
> Weird that the SSE4.1 linear resampler is performing that much better 
> than the SSE2 version. With the _mm_store_ps/_mm_castsi128_ps trick, the 
> code for the two becomes exactly the same.

After a second look it seems the latest profiling results I used for
comparisons were inconsistent. On some occasions Resample_lerp32 is
called around 88,000 times on others around 200,000+ times. I've rerun
the profiling and have better results for comparison, it seems the
improvements are not quite as impressive as it seemed (but still

Resample_lerp32_C - 6.43% called 203,970 times
Resample_lerp32_SSE2 - 4.87% called 206,627 times
(with_mm_store_ps/_mm_castsi128_ps trick)
Resample_lerp32_SSE41 - 4.08% called 206,407 times
(with_mm_store_ps/_mm_castsi128_ps trick)
Resample_lerp32_SSE41 - 3.75% called 207,708 times (with

So the improvement is more like 
25%-42%. Which is closer to the initial results I put in my commit
message, not sure how the profiling with the lower calls came about.

More information about the openal mailing list