[openal] [PATCH V2] Add some mixer SSE2/4.1 optimisations
t_arceri at yahoo.com.au
Fri Jun 6 04:01:32 EDT 2014
On Thu, 2014-06-05 at 22:07 -0700, Chris Robinson wrote:
> On 06/05/2014 03:55 PM, Timothy Arceri wrote:
> > After playing around with an AVX version of the optimisation I'm
> > starting to think maybe my logic is wrong. Is using const __m128i
> > increment4 = _mm_set1_epi32(increment*4); to jump the value of frac4
> > forward correct? Or does the mask need to be applied between each
> > iteration meaning I cant just times by 4.
> I can't see any reason why it would be wrong. It works when I do
> DataPosFrac += increment*DstBufferSize;
> DataPosInt += DataPosFrac>>FRACTIONBITS;
> DataPosFrac &= FRACTIONMASK;
> which can be up to 1024x. And it appears to work with the SSE linear
> resamplers which do 4x. So I don't see why 8x wouldn't also work with
> AVX if you're doing 8 samples at a time.
Ok thanks that's what I thought. I've been staring at this code for too
long so I think I'll send in a wip patch and see if someone else can
spot the issue.
More information about the openal