[openal] [PATCH V2] Add some mixer SSE2/4.1 optimisations
Timothy Arceri
t_arceri at yahoo.com.au
Tue Jun 3 20:59:23 EDT 2014
On Tue, 2014-06-03 at 09:45 -0700, Chris Robinson wrote:
> On 06/03/2014 06:28 AM, Timothy Arceri wrote:
> > Yes that does seem to work (at least in my test) and also seems to
> > perform much better. My SSE2 resample code was taking around 4.45% of
> > cpu with this change its down to 2.22%. For reference the C code is at
> > 6.23% and SSE4.1 1.5%.
>
> Weird that the SSE4.1 linear resampler is performing that much better
> than the SSE2 version. With the _mm_store_ps/_mm_castsi128_ps trick, the
> code for the two becomes exactly the same.
After a second look it seems the latest profiling results I used for
comparisons were inconsistent. On some occasions Resample_lerp32 is
called around 88,000 times on others around 200,000+ times. I've rerun
the profiling and have better results for comparison, it seems the
improvements are not quite as impressive as it seemed (but still
noteworthy).
Resample_lerp32_C - 6.43% called 203,970 times
Resample_lerp32_SSE2 - 4.87% called 206,627 times
(with_mm_store_ps/_mm_castsi128_ps trick)
Resample_lerp32_SSE41 - 4.08% called 206,407 times
(with_mm_store_ps/_mm_castsi128_ps trick)
Resample_lerp32_SSE41 - 3.75% called 207,708 times (with
_mm_extract_epi32)
So the improvement is more like
25%-42%. Which is closer to the initial results I put in my commit
message, not sure how the profiling with the lower calls came about.
More information about the openal
mailing list