That's the corresponding thread to:
http://forum.nasm.us/index.php?topic=1605.0. The functionality is the same and the timings are similar, because it's the same machine.
Supported by Processor and installed Operating System:
------------------------------------------------------
MMX, CMOV and FCOMI, SSE, SSE2, SSE3, SSSE3, SSE4.1,
POPCNT, SSE4.2, AVX, PCLMUL and AES
Calculating the sum of a float array with different methods.
That'll take a little while. Please be patient ...
Simple C implementation:
------------------------
sum1 = 8390656.00
Elapsed Time = 15.74 Seconds
FPU code with 4 accumulators:
-----------------------------
sum2 = 8390656.00
Elapsed Time = 7.02 Seconds
Performance Boost = 224%
C implementation with 4 accumulators:
-------------------------------------
sum3 = 8390656.00
Elapsed Time = 5.34 Seconds
Performance Boost = 295%
SSE2 code with 4 accumulators:
------------------------------
sum4 = 8390656.00
Elapsed Time = 1.34 Seconds
Performance Boost = 1175%
AVX code with 4 accumulators:
-----------------------------
sum5 = 8390656.00
Elapsed Time = 0.69 Seconds
Performance Boost = 2281%
Gerhard