18/12/2010 x86 instruction sets

I have been working on SIMD optimizations for GLM for a while so that most of the GLSL functions implemented by GLM have some equivalent based on SIMD instructions... since 2008. Unfortunately, in GLM 0.9.0 none of these optimizations are available as I strugle to find a good API to expose these optimizations...

Finally, few weeks ago I merged, the SIMD branch to the GLM 0.9.1 branch as I defined some sort of API based on experimental extension so that I can experience this solution, see how it goes and move it to the right direction.

One topic was to let the user select the instruction sets they what to use to that they can define which platform will be supported by this build.

Unfortunately the world of x86 processors is quite complex on that matter. These days both AMD processors (Ahtlon 64 San Diego) and Intel processors (Pentium 4 Prescot) support SSE3 instructions but behond that it's more complex. SSSE3(Core 2) SSE4.1 and SSE4.2 instruction sets are only supported by Intel(i7) and SSE4A is only supported by AMD (Phenom)... A subset of instructions are actually shared between AMD and Intel CPUs... Fortunately with the future realise of AMD Bulldozer and Intel Sandy Bridge, the situation should get better thanks to the AVX instruction set even if few imcompatibilities might remains.

With the rescent released of Visual Studio 2010 SP1 beta, Microsoft has catched up with GCC in term of AVX support. Support for SSSE3 SSE4A, SSE4.1 and SSE4.2 is available in Visual Studio 2008 and SSE, SSE2 and SSE3 in Visual Studio 2005. GCC 4.4 already support AVX and AES instruction sets, GCC 4.3 brought support for popcnt, SSSE3 SSE4A SSE4.1 and SSE4.2 instruction sets and GCC 4.0 supports SSE, SSE2 and SSE3 instruction sets,

Finally, I made a list of all the instruction sets headers that need to be included to use intrinsect functions with GCC and Visual Studio (and Intel compiler for supported instruction set).

  • x86intrin.h: x86 instructions
  • mmintrin.h: MMX (Pentium MMX!)
  • mm3dnow.h: 3dnow! (K6-2) (deprecated)
  • xmmintrin.h: SSE + MMX (Pentium 3, Athlon XP)
  • emmintrin.h: SSE2 + SSE + MMX (Pentiuem 4, Ahtlon 64)
  • pmmintrin.h: SSE3 + SSE2 + SSE + MMX (Pentium 4 Prescott, Ahtlon 64 San Diego)
  • tmmintrin.h: SSSE3 + SSE3 + SSE2 + SSE + MMX (Core 2, Bulldozer)
  • popcntintrin.h: POPCNT (Core i7, Phenom subset of SSE4.2 and SSE4A)
  • ammintrin.h: SSE4A + SSE3 + SSE2 + SSE + MMX (Phenom)
  • smmintrin.h: SSE4_1 + SSSE3 + SSE3 + SSE2 + SSE + MMX (Core i7, Bulldozer)
  • nmmintrin.h: SSE4_2 + SSE4_1 + SSSE3 + SSE3 + SSE2 + SSE + MMX (Core i7, Bulldozer)
  • wmmintrin.h: AES (Core i7 Westmere, Bulldozer)
  • immintrin.h: AVX, SSE4_2 + SSE4_1 + SSSE3 + SSE3 + SSE2 + SSE + MMX (Core i7 Sandy Bridge, Bulldozer)

Such a mess... but beside OpenGL what can bring that much fun?

GLM 0.9.0.6 released >
< December 2010 OpenGL drivers status
Copyright © Christophe Riccio 2002-2016 all rights reserved
Designed for Chrome 9, Firefox 4, Opera 11 and Safari 5