[openal] [PATCH] Add AVX linear resampler
Timothy Arceri
t_arceri at yahoo.com.au
Sun Jun 8 01:01:42 EDT 2014
On Sat, 2014-06-07 at 21:36 -0700, Chris Robinson wrote:
> On 06/07/2014 02:59 AM, Timothy Arceri wrote:
> > + float const fracOne = 1.0f/FRACTIONONE;
> > + ...
> > + const __m256 fracOne8 = _mm256_broadcast_ss(&fracOne);
>
> I'm a bit confused by this. According to Intel's docs for
> _mm256_broadcast_ss and _mm_broadcast_ss:
>
> <https://software.intel.com/sites/products/documentation/studio/composer/en-us/2011Update/compiler_c/intref_cls/common/intref_avx_broadcast_ss.htm>
>
> the given parameter is a "pointer to a memory location that can hold
> constant 256-bit or 128-bit float32 values". In that case, isn't a raw
> float too small? It may also not be aligned correctly. Though if it's
> only taking a single float value, I don't see why it would have to be
> 256/128 bits large. Maybe its in error in their docs.
>
I think it must be an error. This example from the intel website just
uses a plain float:
https://software.intel.com/en-us/articles/introduction-to-intel-advanced-vector-extensions
> Perhaps it would be better to do:
>
> const __m256 fracOne8 = _mm256_set1_ps(1.0f/FRACTIONONE);
>
> GCC should actually turn that call into a compile-time vector constant
> and simply move it into a register as needed (it does for _mm_set1_ps
> and _mm_set1_epi32 when given a compile-time constant).
Yeah thats fine. I just used _mm256_broadcast_ss as I read that its
faster but it was never going to matter much anyway as its not inside
the critical loop.
More information about the openal
mailing list