05/11/2010 A bit of perspective on OpenGL 4.1 separate programs

After my post dedicated to the OpenGL 4.1 drivers status, I receive quite some feedbacks from AMD. My tests are based on my OpenGL Samples Pack 4.1 developed over nVidia OpenGL 4.1 drivers as it has been released since the OpenGL BOF end of July. A conscequence is that my OpenGL 4.1 samples are build upon nVidia implementation which has implied some quite bad results while running on AMD because of implementation philosophy differences.

Obviously, before publishing my post, I had a look at the samples trying to figure out what went wrong but when you are facing "unexpected error" messages, it's pretty hard to make progress. This is how it begins with early drivers either from AMD or nVidia and probably anyone. Hence, Graham Sellers from AMD point me to the direction of understanding AMD implementation throught specification quotes so that I could make my sample work on AMD... and this is where the separate programs drama began.

Drivers implementation philosophy

This is something I figure out across the year. I believe that AMD and nVidia has 2 differents approach regarding OpenGL. AMD tries to follow the specification by the letter in a quite pedantic maner even if the specification doesn't make sense. For nVidia the approach is quite different. Some developers speak about "nVidia's OpenGL" regarding nVidia's implementation. nVidia approaches is less strict and more pragmatics with an implementation that doesn't hesitate to relax some restrictions and even provides more features not only through extensions. Explicit varying location are implemented since nVidia OpenGL 3.3 beta drivers for example.

Separate programs issues

Regarding GL_ARB_separate_shader_objects, I assumed some specification details that are actually not valid according to the specification. These assumptions came from good sense, OpenGL uses but also a long interest on nVidia's separate programs.

GL_ARB_separate_shader_objects is the promoted extension to core for GL_EXT_separate_shader_objects a pretty badly designed extension relying on deprecated mecanisms and fixed functions legacy. It became quite interesting once promoted to ARB despite a name which is a total non-sense following the OpenGL tokens dictionary. "Separate shader objects"? What does it mean? Shader objects are already per-shader stage since the beginning... GL_ARB_separate_program_objects or GL_ARB_program_pipeline_object would have been better to me but well.

GL_ARB_separate_shader_objects allows using multiple different program objects to setup all the GPU stages.

  • One program for the vertex stage, one for the fragment stage.
  • One for all stage before rasterisation and one for the fragment stage.
  • One program per-stage, up to 5 with OpenGL 4 hardware class.
  • Etc.

Separated programs issues

1. Differents matching rules being separate and non-separate programs

So far with OpenGL, the GLSL linker ensures that the communication between stages was going well and even performs some interesting optimisations removing across stages unused varying variables for example.

With separate programs, the compiler has to make some assumptions about inputs provided by the previous stage whatever this stage actually is. For this purpose, a new section called "shader interface matching" has be written in the specification. Unfortunatly, following this section by the letter implies differents shader matching rules for separate and non-separate programs regarding explicit varying locations, which can lead to force OpenGL programmer to write different shaders for both program types... for no good technical reasons. Let's take a problematic example:

Vertex shader:
  • layout(location = 1) in vec2 Texcoord0
  • layout(location = 0) in vec2 Texcoord1
Fragment shader:
  • layout(location = 0) in vec2 Texcoord0
  • layout(location = 1) in vec2 Texcoord1

With separate programs, the location is going to be used for the shader interface matching. However with non-separate programs, the matching is performed per names which implies that the location qualifier is ignored. That doesn't make any sense to do this, but this is what the specification says...

Concretly, explicit varying locations override name matching with separate programs but are silently ignored with non separable programs.

2. Required verbose separate programs

Finally, separate programs require to redeclare gl_PerVertex blocks... hum... why?

Separate programs and non-separate programs evolves with different set of rules which leaves them apart while technically they are connected. There are good reasons to use non-separate programs for compiler optimizations purposes but there are also good reasons to use seperate programs for software design optimization purposes putting OpenGL programmers in this middle ground.

3. Not 100% direct state access

Since OpenGL 3.1 but especially OpenGL 3.3, the specifications has made a move to direct state access (DSA) and the new OpenGL program pipeline object is no exception with a pretty DSA API... with one exception! The specification clearly says that a program pipeline object is actually created by binding the object...

A program pipeline object is created by binding a name returned by GenProgramPipelines with the command void BindProgramPipeline(uint pipeline);OpenGL 4.1 core specification, section 2.11.4

Adding verbose declarations, using different matching rules from separate programs and non-separate programs and having to use glBindProgramPipeline to create the effective pipeline object don't make sense but this is what is written in the specification, what the ARB has agreed on. AMD and nVidia has implemented logically OpenGL 4.1 following their own philosophies: AMD has interpretted the OpenGL specification by the letter implementing some fairly stupid ideas and nVidia has interpretted the OpenGL specification in its own way, a clever way but a non-conformed way... Well, all in all we are pretty doomed to use the full capabilities of the separate programs.

How everything could have been better? I quite believe that if the ARB has put more attention hen reviewing the specification, which means probably taking more time, these issues would have been fixed as these problems are maybe "details" but quite obvious.

My separate program use recommandations

Let's start with the DSA issue. As my OpenGL 4.1 samples demonstrated, this grose specification mistake has been implemented by both AMD and nVidia in a way that the program pipeline object can be used as a pure DSA object. AMD and nVidia OpenGL teams are particularly talented, it makes sense to have the implementation writted this way as it doesn't make any difference when the implementations are used following strictly the specifications. Could we really rely on this work-around? What is going to happen when Intel and Apple will provides implementations for OpenGL 4.1? (within 10 years from now...) This could be a software bug so I think the specification as to be followed to the letter. Anyway, OpenGL 4.1 is far from being completly DSA which makes it impossible to design a fully DSA renderer.

On the regard of the verbose and useless gl_PerVertex redeclarations, it implies a compilation error on nVidia but this is something that will eventually be fixed, so that unfortunatly it has to be use following the specification.

Finally, the shader mathing rules: I much as I love the explicit varying location, as it isn't supported with non separate programs, I think it should not be used. Fortunately, the name matching is working the same way between separate programs and non-separate programs. Using varying structures allows de define a clear protocol between stages. It's less flexible than explicit varying location but really robust.

Updated OpenGL 4.1 samples

Following this discussion, I updated the OpenGL 4.1 samples pack to report the drivers status. I really wish that nVidia implementation was what OpenGL specifies but it's not. The goal of specification is to follow them and weihter of not the specification is good or not is another problem. Hence, for my samples I decided to follow the specification by the letter. However, I decide to add some sort of extented samples using the postfix "gtc" to illustrate the changes I would enjoy for OpenGL 4.2 and wish some are already supported.

  • White: Unsupported.
  • Blue: The sample works but it doesn't follow the OpenGL specification.
  • Green: The sample works following the OpenGL specification.
  • Orange: The sample doesn't work correctly but a workaround is possible.
  • Red: The sample does't work and I haven't found any workaround.
  • Black: Really distubing problem!
Drivers: AMD Catalyst 10.10c (beta)nVidia Forceware 260.93 (beta)
410-debug-output-arbAMD_debug_output support only
410-program-varyinggl_PerVertex redeclaration involves compiler errors...
410-program-separategl_PerVertex redeclaration involves compiler errors...
410-program-binaryGL_PROGRAM_BINARY_RETRIEVABLE_HINT must be set to GL_TRUE or can't be retrived on fsome platform
410-program-64glVertexAttribLPointer is null
410-primitive-tessellation-5gl_PerVertex redeclaration involves compiler errors...
410-primitive-tessellation-2gl_PerVertex redeclaration involves compiler errors...
410-primitive-instancedUsing explicit location silently ignore throw a parsing error.Unexpected warning
410-fbo-layeredUnexpected warning
400-transform-feedback-object
400-texture-compression-arb
400-texture-buffer-rgb
400-sampler-gather
400-sampler-fetch
400-sampler-array
400-program-varying-structsDoesn't support varying struct and offensive error message
400-program-varying-blocksUnexpected warning / gl_in.length() not fully supported
400-program-subroutine
400-program-64
400-primitive-tessellationUnexpected warning
400-primitive-smooth-shadingUnexpected warning
400-primitive-instancedUnexpected warning
400-fbo-rtt-texture-array
400-fbo-rtt
400-fbo-multisample
400-fbo-layered
400-draw-indirect
400-buffer-uniformUnsupported uniform block array
400-blend-rtt
330-texture-arrayRequired glTexParameteri to setup filtering, sampler unsupported
330-sampler-objectSampler object doesn't always oversede texture parameters

Following some samples that illustrates some OpenGL 4.2+ feature requests I made and taking the "gtc" post-fix. I wrote the following samples as it shows I think either specification bugs, design mistakes, lack of arrucacy or lack of perspectives like the issue discussed in this post.

Drivers: AMD Catalyst 10.10c (beta)nVidia Forceware 260.93 (beta)
410-program-varying-gtcNot supported as OpenGL specify...A GLSL compiler warning would be nice
410-program-separate-dsa-gtcA debug output warning would be niceA debug output warning would be nice
400-sampler-array-gtcNot supported as OpenGL specify...A GLSL compiler warning would be nice
400-buffer-uniform-shared-gtcNot supported as OpenGL specify...A GLSL compiler warning would be nice

  • Download: OpenGL Samples Pack 4.1.3.2 (ZIP, 14.23 MB) (7Z, 7.42 MB)
  • Link: Report a bug or submit a request
  • Intel releases OpenCL 1.1 alpha SDK for CPUs >
    < GLM 0.9.0.5 released
    Copyright Christophe Riccio 2002-2013 all rights reserved
    Designed for Chrome 9, Firefox 4, Opera 11 and Safari 5