[openal] [PATCH] Add AVX linear resampler

Sun Jun 8 01:01:42 EDT 2014

On Sat, 2014-06-07 at 21:36 -0700, Chris Robinson wrote:
> On 06/07/2014 02:59 AM, Timothy Arceri wrote:
> > +    float const fracOne = 1.0f/FRACTIONONE;
> > +    ...
> > +    const __m256 fracOne8 = _mm256_broadcast_ss(&fracOne);
> 
> I'm a bit confused by this. According to Intel's docs for 
> _mm256_broadcast_ss and _mm_broadcast_ss:
> 
> <https://software.intel.com/sites/products/documentation/studio/composer/en-us/2011Update/compiler_c/intref_cls/common/intref_avx_broadcast_ss.htm>
> 
> the given parameter is a "pointer to a memory location that can hold 
> constant 256-bit or 128-bit float32 values". In that case, isn't a raw 
> float too small? It may also not be aligned correctly. Though if it's 
> only taking a single float value, I don't see why it would have to be 
> 256/128 bits large. Maybe its in error in their docs.
> 

I think it must be an error. This example from the intel website just
uses a plain float:

https://software.intel.com/en-us/articles/introduction-to-intel-advanced-vector-extensions

> Perhaps it would be better to do:
> 
> const __m256 fracOne8 = _mm256_set1_ps(1.0f/FRACTIONONE);
> 
> GCC should actually turn that call into a compile-time vector constant 
> and simply move it into a register as needed (it does for _mm_set1_ps 
> and _mm_set1_epi32 when given a compile-time constant).

Yeah thats fine. I just used _mm256_broadcast_ss as I read that its
faster but it was never going to matter much anyway as its not inside
the critical loop.