I've added an archive to this message: float.zip. Please read the readme.txt file first (it's included in the archive). The applications should run under Win64, SP1 (native or VM). I couldn't test it under Windows 8, but it should work too.
The program checks the available instruction sets for the underlying machine during runtime. If your CPU doesn't support AVX, the application won't crash; in that case only the last procedure is skipped and the program terminates correct.
The program floatsum.exe sums up an array of float (REAL4) numbers in C and assembly language (with SSE2 instructions and the new AVX instructions). The differences are tremendous. Here is the application's output on my machine: Intel Core i7-3770, 3.4 GHz with Win7 (64 bit) and SP1:
Supported by Processor and installed Operating System:
------------------------------------------------------
MMX, CMOV and FCOMI, SSE, SSE2, SSE3, SSSE3, SSE4.1,
POPCNT, SSE4.2, AVX, PCLMUL and AES
Calculating the sum of a float array with different methods.
That'll take a little while. Please be patient ...
Simple C implementation:
------------------------
sum1 = 8390656.00
Elapsed Time = 15.96 Seconds
FPU code with 4 accumulators:
-----------------------------
sum2 = 8390656.00
Elapsed Time = 7.10 Seconds
Performance Boost = 225%
C implementation with 4 accumulators:
-------------------------------------
sum3 = 8390656.00
Elapsed Time = 5.38 Seconds
Performance Boost = 297%
SSE2 code with 4 accumulators:
------------------------------
sum4 = 8390656.00
Elapsed Time = 1.36 Seconds
Performance Boost = 1175%
AVX code with 4 accumulators:
-----------------------------
sum5 = 8390656.00
Elapsed Time = 0.69 Seconds
Performance Boost = 2326%
For the C sources I used gcc 4.7.2 for Windows, but with some minimal changes (especially the data alignment) should it work with VC or Pelles C, too, but that's not tested. The assembly language sources are processed with nasm 2.10.07 for Windows.
The software isn't in a final stadium. Hints and proposals for improvements are welcome, as well as any feedback. The Linux version is coming soon.
Gerhard