SSBOs and other resources are limited per pipeline on Intel and AMD.
Heuristically reserve resources per stage having in mind the reported
OpenGL limits.
The current shared memory size seems to be smaller than what the game
actually uses. This makes Nvidia's driver consistently blow up; in the
case of FE3H it made it explode on Qt's SwapBuffers while SDL2 worked
just fine. For now keep this hack since it's still progress over the
previous hardcoded shared memory size.
Drop the usage of ARB_compute_variable_group_size and specialize compute
shaders instead. This permits compute to run on AMD and Intel
proprietary drivers.
Nvidia has sane default output values for varyings, but the other
vendors don't apply these. To properly emulate this we would have to
analyze the shader header. For the time being, apply the same default
Nvidia applies so we get the same behaviour on non-Nvidia drivers.
Abstracted ComponentType was not being used in a meaningful way.
This commit drops its usage.
There is one place where it was being used to test compatibility between
two cached surfaces, but this one is implied in the pixel format.
Removing the component type test doesn't change the behaviour.
Emulates negative y viewports with ARB_clip_control. This allows us to
more easily emulated pipelines with tessellation and/or geometry shader
stages. It also avoids corrupting games with transform feedbacks and
negative viewports (gl_Position.y was being modified).
Nvidia's OpenGL driver maps gl(Named)BufferSubData with some requirements
to a fast. This path has an extra memcpy but updates the buffer without
orphaning or waiting for previous calls. It can be seen as a better
model for "push constants" that can upload a whole UBO instead of 256
bytes.
This path has some requirements established here:
http://on-demand.gputechconf.com/gtc/2014/presentations/S4379-opengl-44-scene-rendering-techniques.pdf#page=24
Instead of using the stream buffer, this commits moves constant buffers
uploads to calls of glNamedBufferSubData and from my testing it brings a
performance improvement. This is disabled when the vendor is not Nvidia
since it brings performance regressions.
Bindless textures were using u64 to pack the buffer and offset from
where they come from. Drop this in favor of separated entries in the
struct.
Remove the usage of std::set in favor of std::list (it's not std::vector
to avoid reference invalidations) for samplers and images.