Metal GPU Programming 02 - Shading a Triangle
Metal GPU Programming 02
Welcome back to an exploration of Metal GPU programming for iOS and macOS. Having built OS-level UI in Part 1, we now delve deeper into the Metal Shading Language. In this entry, we'll rasterize a triangle using Metal C++14 shaders. Our starting shaders aren't too complex, so let's dive right into the code.
Triangle Shader
Here's the shader code to render a triangle on the screen without requiring any input from the host (CPU). That is, any input other than a call to drawPrimitives
. The following shader code is included in the C++ source and assigned to g_shaderCode
#include <metal_stdlib>
using namespace metal;
struct VertexOutput
float4 position [[position]];
float4 color;
vertex VertexOutput render_vertex(uint vid [[vertex_id]])
VertexOutput vertexOut;
// Clockwise winding order
if (vid == 0)
// Middle top of screen.
vertexOut.position = float4(0.0, 1.0, 0.0, 1.0);
vertexOut.color = float4(1.0, 0.3, 0.3, 1.0);
else if (vid == 1)
// Bottom right
vertexOut.position = float4(1.0, -1.0, 0.0, 1.0);
vertexOut.color = float4(0.3, 1.0, 0.3, 1.0);
else if (vid == 2)
// Bottom left
vertexOut.position = float4(-1.0, -1.0, 0.0, 1.0);
vertexOut.color = float4(0.3, 0.3, 1.0, 1.0);
return vertexOut;
fragment float4 render_fragment(VertexOutput vertexIn [[stage_in]])
return vertexIn.color;
This fragment is handy: all that is required from the host is a single drawPrimitives
call and a render target. No need to upload Vertex Buffers or Index Buffers to get a triangle on the screen. We pull this off by using the vertex_id
attribute (Section 4.3.4 - Attributes for built-in Variables)
Starting from the top, we include the Metal standard library metal_stdlib
and import the metal
namespace. Both are typical in Metal shaders. Next we specify the VertexOutput
structure which defines the output from the vertex shader. This structure binds together the vertex shader (render_vertex
) and fragment shader (render_fragment
). Through VertexOutput
, the vertex shader transmits the position
attribute to the fixed-function rasterizer. The rasterizer then uses the position data to ensure color
is interpolated before passing the result to the fragment shader.
In order to generate geometry without requiring MTLBuffers from the host, we position vertices in the vertex shader by switching on the input vid
parameter. vid
is declared with the vertex_id
attribute implying that the absolute index of the vertex will be passed in vid
. We use these absolute indices to programmatically determine the position of each vertex. vid = 0
targets middle top, vid = 1
bottom right, and vid = 2
bottom left. We set VertexOutput
's position member and apply red, green, blue clockwise over the triangle.
Keep in mind, vertex_id
is useful only for simple geometry. In later tutorials we'll use MTLBuffers as Vertex and Index buffers to transmit geometry.
For every relevant fragment (a pixel in many cases), the fragment shader is run and the interpolated result from the vertex shader is used to generate results. Per-fragment lighting calculations and other effects are generally applied to interpolated vertex data at this stage. Fragment processing is usually the most expensive stage in the pipeline and often the most versatile. For example, sites such as Shader Toy only need to execute the fragment stage.
In our case, the fragment shader (render_fragment
) is simple. The shader's first parameter is declared with the stage_in
attribute. This attribute indicates the parameter is an output from the vertex shader. The fragment shader takes this parameter and returns it's interpolated color as the result of the shader. This return value is written into the first colorAttachment specified in MTLRenderPassDescriptor.
Compiling Shaders and Initialization
Now lets compile our shader code. We do so by modifying the renderInit
function from our previous tutorial. Let's load our shader source into an NSString
and create a MTLLibrary:
NSString* source = [[NSString alloc] initWithUTF8String:g_shaderCode];
MTLCompileOptions* compileOpts = [[MTLCompileOptions alloc] init];
compileOpts.languageVersion = MTLLanguageVersion2_0;
NSError* err = nil;
id<MTLLibrary> library =
[g_mtlDevice newLibraryWithSource:source
On the first line, we create an NSString from our shader source. On the next two lines, we initialize MTLCompileOptions and set our desired shader version (we can also set preprocessor macros and toggle fast math through the compile options). The last thing to do is construct a new MTLLibrary
Before we forget, let's clean up since we don't need the source or compile options anymore:
[compileOpts release];
[source release];
Next in renderInit
, we construct the rasterization pipeline state. The pipeline state, represented by MTLRenderPipelineState, contains the fully compiled state. This state is passed to the MTLRenderCommandEncoder
created in our rendering function. MTLRenderPipelineDescriptor
completely defines MTLRenderPipelineState
MTLRenderPipelineDescriptor* pipelineDescriptor =
[MTLRenderPipelineDescriptor new];
pipelineDescriptor.vertexFunction =
[library newFunctionWithName:@"render_vertex"];
pipelineDescriptor.fragmentFunction =
[library newFunctionWithName:@"render_fragment"];
[library release];
pipelineDescriptor.colorAttachments[0].pixelFormat = MTLPixelFormatBGRA8Unorm;
pipelineDescriptor.depthAttachmentPixelFormat = MTLPixelFormatInvalid;
NSError* error = nil;
g_mtlPipelineState =
[g_mtlDevice newRenderPipelineStateWithDescriptor:pipelineDescriptor
if (!g_mtlPipelineState)
NSLog(@"Failed to create render pipeline state: %@", error);
After creating the MTLRenderPipelineDescriptor
object we set its vertexFunction
and fagmentFunction
properties. To bind functions in the shader to the descriptor, we call newFunctionWithName
with the same names used in our shader code above. In this case "render_vertex"
for the vertex shader and "render_fragment"
for the fragment shader. After creating our shaders we release
our shader library as it is no longer needed.
Afterwords it's a matter of setting the expected pixel format for the pipeline descriptor and creating MTLRenderPipelineState
from the descriptor. With that finished, our pipeline is set up and we are ready to render our triangle.
The changes to renderInit
are pretty small. Likewise, the changes to our render
function are minimal. We keep the MTLRenderPassDescriptor
from the last tutorial and expand only on MTLRenderCommandEncoder
. In the previous tutorial we created a MTLRenderCommandEncoder
then immediately ended encoding. Here, we must set appropriate state and call drawPrimitives
while the encoder is active. This call will render our triangle in screen space by passing 3 vertices to our shader. Don't sweat the term screen space, we'll dedicate an upcoming tutorial to coordinate systems.
Here's the relevant changes to the doRender
function from the last tutorial:
void doRender()
id<MTLCommandBuffer> commandBuffer = [g_mtlCommandQueue commandBuffer];
id<MTLRenderCommandEncoder> commandEncoder =
[commandBuffer renderCommandEncoderWithDescriptor:passDescriptor];
[commandEncoder setFrontFacingWinding:MTLWindingClockwise];
[commandEncoder setCullMode:MTLCullModeNone];
[commandEncoder setRenderPipelineState:g_mtlPipelineState];
[commandEncoder drawPrimitives:MTLPrimitiveTypeTriangle
[commandEncoder endEncoding];
As in tutorial 1, we create a command buffer and start a render command encoder. Afterwards, we set a couple rasterization properties.
sets the order in which vertices will be delivered to the vertex shader. This informs the next property, setCullMode
, which determines visibility based on triangle winding order. Since we are winding the vertices clockwise in the vertex shader, we choose clockwise.
will perform back or front-face culling of geometry. In our case there is nothing to cull. In most cases this will cause infamous 'black screens' or leave the developer asking "why isn't my geometry rendering?" To avoid that, we do away with culling until we have more complex geometry.
assigns the MTLRenderPipelineState
we compiled in renderInit
to this encoder. Most importantly, this sets the vertex and fragment shader that will be used to rasterize our triangle.
signals the beginning of triangle rasterization. Since all positioning is performed in the vertex shader we simply send 3 vertices and nothing else. The vertex shader will properly position our triangle.
Finally we end encoding and commit the command buffer to the GPU.
Whew! Now that we've learned a basic shader we're done with most of our Metal set up. We laid the groundwork for a simple rasterizer, and in upcoming tutorials we'll dive deeper into graphics programming. Topics such as 3D transforms, rendering complex scenes, and raytracing.
Metal API Usage
Below is a list of the Metal functions used in this tutorial organized by usage in a C++ function. The links in this section take you to my blog; the blog allows you to explore the Metal API using an interactive dependency tree.
- Metal 02 - Triangle
- renderInit
- MTLCreateSystemDefaultDevice
- MTLDevice, newCommandQueue
- MTLCompileOptions
- MTLDevice, newLibraryWithSource:options:error:
- MTLRenderPipelineDescriptor
- MTLLibrary, newFunctionWithName:
- MTLDevice, newRenderPipelineStateWithDescriptor:error:
- renderDestroy
- MTLRenderPipelineDescriptor, release
- MTLCommandQueue, release
- MTLDevice, release
- doRender
- MTLRenderPassDescriptor, renderPassDescriptor
- MTLCommandQueue, commandBuffer
- MTLCommandBuffer, renderCommandEncoderWithDescriptor:
- MTLRenderCommandEncoder, setFrontFacingWinding:
- MTLRenderCommandEncoder, setCullMode:
- MTLRenderCommandEncoder, setRenderPipelineState:
- MTLRenderCommandEncoder, drawPrimitives:vertexStart:vertexCount:
- MTLRenderCommandEncoder, endEncoding
- MTLCommandBuffer, presentDrawable
- MTLCommandBuffer, commit
- shaders (in
, attribute
, attribute
, attribute
- renderInit
Blog Entry
This article is exclusive to Steem during the monetization period. Afterwards, it's posted on my blog.
Questions, comments and feedback remain on Steem. The blog post will permanently link to this Steem post to drive interaction back to Steem.
Here's a video tutorial from Apple covering triangle rasterization. Apple's example uses a vertex buffer where as our example does away with that by using Vertex IDs. We also compile our own shaders instead of relying on XCode, which we don't use on macOS.
A couple good tidbits from the video:
- Shaders are in 'fast mode' by default. This implies NaN handling is undefined and trigonometric functions have a limited range. Either set
compiler option or use themetal::precise
namespace. Covered at 49:00. packed_float
types. Turns out host alignment is the primary consideration, not GPU alignment. GPU likely still expects alignment between array elements. Covered at 31:47.
I thank you for your contribution. Here are my thoughts;
Your contribution has been evaluated according to Utopian policies and guidelines, as well as a predefined set of questions pertaining to the category.
To view those questions and the relevant answers related to your post, click here.
Need help? Write a ticket on
Chat with us on Discord.
Thanks for the thoughtful review. For the next tutorial in the series, I'll increase the volume and retag under the tutorial subcategory instead of development.
Hey @iauns
Thanks for contributing on Utopian.
We’re already looking forward to your next contribution!
Want to chat? Join us on Discord
Vote for Utopian Witness!