This tutorial is part of a Collection: 04. DirectX 12 - Braynzar Soft Tutorials
rate up
1
rate down
21250
views
bookmark
08. Constant Buffers (Using Root Descriptor Tables)

In this tutorial we will see how to send data to the shaders using a descriptor table containing a constant buffer view.

BzTuts08.rar 72.41 kb
1089 downloads
####Root Signature#### I'll start this tutorial off with an explanation of root signatures. A **Root Signature** is basically a function signature (a.k.a. parameter list) for the shaders in the pipeline. This part of the Root Signature is called the **Root Parameters**. A function signature describes what data a function expects. void somefunction(int arg1, int arg2); // (int arg1, int arg2) is the function signature, or parameter list A Root Signature also contains the arguments, or data, for the Root Parameters, called **Root Arguments**. ##Root Parameters## When we write shaders, we write the shader as a function. We include a function signature, which describes the data our shader expects from the *previous* stage. Lets take a look for example at a simple vertex shader: float4 main(float3 pos : POSITION) : SV_POSITION { return float4(pos,1.0); } In the above function, you see the vertex shader expects a float3 as input to the function. This is the input it expects from the previous pipeline stage, which is the Input Assembler (IA) stage. Since we already have a parameter list for our shaders, what are Root Parameters? Look at the Pipeline itself as being a function, called by our application running on the CPU. The pipeline itself has a parameter list, and this parameter list is described by the Root Signature. These parameters describe the data we want from our application. They can be in the form of Constant Buffer Views (CBV), Shader Resource Views (SRV), or Unordered Access Views (UAV). It might get a little confusing at first, but resource **views** are another (the old) word for resource **descriptors**. CBV, SRV, and UAV for example just kept their names from the previous iterations of directx. Calling them Constant Buffer Descriptor or Shader Resource Descriptor would mean the same thing. Lets visualize the pipeline as a function for a second: (This of course is not how the pipeline really is, its just to get a visual idea of where the root signature fits in) // this is our gpu memory where resource heaps are actually at ResourceHeap resourceHeaps[]; // this is the descriptor heap DescriptorHeap descriptorHeaps[]; // this is our register list register b[]; // constant buffer register list register t[]; // shader resource register list register u[]; // uav register list // our root signature is the parameter list to the pipeline RenderTargetList RunPipeline(RootSignature rootSignature) { // loop through each descriptor table for(int i = 0; i < rootSignature.DescriptorTables.length; i++) { int startRegister = rootSignature.DescriptorTables[i].Range.BaseShaderRegister; for(int k = 0; k < rootSignature.DescriptorTables[i].Range.length; k++) { // if its a constant buffer descriptor table use b registers if(rootSignature.DescriptorTables[i].Range[k].RangeType == D3D12_DESCRIPTOR_RANGE_TYPE_CBV) { // there are two indirections for descriptor tables b[startRegister + k] = GetResourcePointer(GetDescriptorFromTable(rootSignature.DescriptorTables[i].Range[k].descriptorIndex)); } // use t registers for srv's else if(rootSignature.DescriptorTables[i].Range[k].RangeType == D3D12_DESCRIPTOR_RANGE_TYPE_SRV) { // there are two indirections for descriptor tables t[startRegister + k] = GetResourcePointer(GetDescriptorFromTable(rootSignature.DescriptorTables[i].Range[k].descriptorIndex)); } // ... then uav's and samplers } } // loop through each root descriptor for(int i = 0; i < rootSignature.RootDescriptors.length; i++) { // set registers for root descriptors. There is only one redirection here } // loop through each root constant for(int i = 0; i < rootSignature.RootConstants.length; i++) { // set registers to root constants. root constants have no indirections, making them the fastest // to access, but the number of them are limited by the root signature parameter limit. } VertexInput vi = null; if(rootSignature.D3D12_ROOT_SIGNATURE_FLAG_ALLOW_INPUT_ASSEMBLER_INPUT_LAYOUT) { // If we specify to use the input assembler in the root signature, the IA will run and assembler // all the geometry we have bound to it, then pass the vertices to the vertex shader // it is possible to not use the input assembler at all, but instead draw a certain number of vertices // and use their index to differentiate them, then create more goemetry in the geometry shader. vi = RunInputAssembler(); } // here we run the bound vertex shader VertexOutput vo = RunVertexShader(vi); // ... run other stages } // heres an example of a vertex shader now VertexOutput RunVertexShader(VertexInput vi) { // this constant buffer is bound to register b0. We must // make sure that the bound root signature has a parameter that // sets the b0 register cbuffer ConstantBuffer : register(b0) { float4 positionOffset; }; // here is our vertex shader function. We use positionOffset, which is defined in a constant buffer. // This constant buffer is updated by the root signature. We must make sure that the root signature contains // a parameter for register b0, since that is what the constant buffer is bound to. float4 main(float3 pos : POSITION) : SV_POSITION { output.pos = float4(input.pos, 1.0f); output.color = input.color; return float4(pos.x + positionOffset.x, pos.y + positionOffset.y + pos.z + positionOffset.z, 1.0); } } The above may be an excess amount of code to visualize the root signatures position in all this, but hopefully this will give visual learners a better idea. So from above, you can see that the root signature is the parameter list of the pipeline. Any registers being used must be set by the root signature, which means that the root signature must have a parameter for all the registers being set. There are three types of Root Parameters: Descriptor Tables, Root Descriptors, and Root Constants. There is a limit to the size of a root signature. Root parameters can only add up to a total of 64 DWORDs. If the input assembler is used, then the total memory Root parameters have to use is 63 DWORDs. Some hardware has less than 64 DWORDs space available to the Root Signature. Any root parameters at the end of the root signature that overflow the available memory in the root signature add 1 indirection. You can look at an indirection as a pointer to memory. A Root constant for example has 0 indirections, meaning the data is accessed immediately in the shaders. Root descriptors have 1 indirection, meaning that the shaders must follow the pointer to get the actual location of the data. Finally Descriptor Tables have 2 indirections, meaning it follows the descriptor table pointer to the descriptor heap, then follows the descriptor inside the descriptor heap to the actual resource data. So again any parameters that do not fit into the available memory for a root descriptor add 1 indirection, causing root constants to be 1 indirection, root descriptors to be 2 indirections, and descriptor tables to be 3 indirections. When creating a root parameter, you want to deny any shader stages access to that parameter that do not need it. This allows the GPU to optimize access's to the data. ##Root Constants## Root constants are 32 bit values (1 DWORD) stored directly in the root signature. They take up 1 DWORD of space in the root signature. These should be variables that are accessed most often since they will be accessed faster than constant buffers pointed to by descriptors. ProjectionView Matrix might be a good candidate for a root constant, since it is usually accessed by every vertex in the visible scene. ##Root Descriptors## Root descriptors are inline descriptors. They cost 2 DWORDs of space in the root signature. These should be descriptors to resources that are accessed often since there is limited space in the root descriptor to store parameters. Root descriptors take 1 indirection to get to the resource data. This indirection comes from the descriptor being a pointer to the resource data. When you access a root descriptor from a shader, you must look up the resource that the descriptor points to. ##Descriptor Tables## Finally we have descriptor tables. These cost 1 DWORD each, and are a range of descriptors. The range specifies the start and number of descriptors in a descriptor heap. Using descriptor tables, you are able to use as many descriptors as you want. The disadvantage is the extra indirection. Descriptor tables take 2 indirections to get to the resource data. The descriptor table points to a descriptor inside a descriptor heap, which points to the actual resource data. In this tutorial we will work with descriptor tables, since they will be the most used type of parameter, as most scenes have more textures and data that shaders need than can fit in the 64 DWORDs root parameters can add up to. ##Frame Buffering## We will need a descriptor heap and resource heap for each frame for the constant buffer. This is so when one frame is using the constant buffer, another frame can be updating it. We do not want to update a constant buffer that is currently being used, so we will create a constant buffer resource heap and descriptor heap for each frame. ####Tutorial Code#### In this tutorial, we will change the color of our quad every frame. We do this by updating a variable inside a constant buffer each frame called colorMultiplier. Then we multiply the color of each vertex by this value. We will create a descriptor heap to store our constant buffer view (CBV), and a descriptor table which is a range into that descriptor heap (the range is only one since we only have one constant buffer). Alright, i think we're ready for the code ##A ConstantBuffer structure## When we update constant buffer data on the GPU using map, we need to make sure we are updating the correct part of the memory. To make this easier we can create a constant buffer structure. We will create an instance of this structure on the CPU. After we update the instance with the data we want in the constant buffer, we basically copy the data from this instance to the mapped constant buffer data on the GPU. We don't exactly need a structure, but it makes it a lot easier to update the constant buffer on the GPU. Our constant buffer contains only a colorMultiplier variable right now. This is a vector of 4 floating point values, x,y,z, and w. These are the red, green, blue, and alpha channels of the color. In the vertex shader, we will multiply the color passed in for each vertex with this color multiplier to get the new color. // this is the structure of our constant buffer. struct ConstantBuffer { XMFLOAT4 colorMultiplier; }; ##New Globals## The first variable is a descriptor heap. This is a descriptor heap to store our constant buffers (we only have one, but you would have more in a larger app). The second variable is a resource which will be the actual memory on the GPU where our constant buffer data is stored. This is called a resource heap, which we will create as an upload heap. The third is an instance of our ConstantBuffer structure. Finally we have a memory address we get from the map method of our resource heap. We can use this address to copy our constant buffer data to. We must finish copying the data to the upload heap before we execute the command list that uses that memory. ID3D12DescriptorHeap* mainDescriptorHeap[frameBufferCount]; // this heap will store the descripor to our constant buffer ID3D12Resource* constantBufferUploadHeap[frameBufferCount]; // this is the memory on the gpu where our constant buffer will be placed. ConstantBuffer cbColorMultiplierData; // this is the constant buffer data we will send to the gpu // (which will be placed in the resource we created above) UINT8* cbColorMultiplierGPUAddress[frameBufferCount]; // this is a pointer to the memory location we get when we map our constant buffer ##Adding Descriptor Table Parameter to Root Signature## The root signature that we create for a PSO must be compatible with the shaders of the PSO. Our Vertex Shader uses a Constant Buffer, bound to register b0, which means we must create a root signature with a parameter that is bound to register b0. We will create a descriptor table, which will describe a range of descriptors inside our constant buffer descriptor heap. Lets first create the descriptor range. We do this by filling out a D3D12_DESCRIPTOR_RANGE structure. typedef struct .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn859380%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396][D3D12_DESCRIPTOR_RANGE] { D3D12_DESCRIPTOR_RANGE_TYPE RangeType; UINT NumDescriptors; UINT BaseShaderRegister; UINT RegisterSpace; UINT OffsetInDescriptorsFromTableStart; } D3D12_DESCRIPTOR_RANGE; - **RangeType** - *This is a .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn859381(v=vs.85).aspx][D3D12_DESCRIPTOR_RANGE_TYPE] enumeration. This describes whether this is a range of srv's, uav's, cbv's, or samplers* - **NumDescriptors** - *This is the number of descriptors in the range. In this tutorial we only have one constant buffer, so there is only one descriptor in the range.* - **BaseShaderRegister** - *This is the first register this range is bound to. Each descriptor should map to one register. We specified the RangeType as D3D12_DESCRIPTOR_RANGE_TYPE_CBV, which means it's the b registers. We only have one descriptor, so we say this range starts at the 0 register, which is register **b0** * - **RegisterSpace** - *This is the register space. We will set this to 0.* - **OffsetInDescriptorsFromTableStart** - *This is the offset of descriptors from the start of the descriptors in the root signature that this range starts. We can specify D3D12_DESCRIPTOR_RANGE_OFFSET_APPEND here to say that we just append this descriptor table to the end of the root parameters.* // create a descriptor range (descriptor table) and fill it out // this is a range of descriptors inside a descriptor heap D3D12_DESCRIPTOR_RANGE descriptorTableRanges[1]; // only one range right now descriptorTableRanges[0].RangeType = D3D12_DESCRIPTOR_RANGE_TYPE_CBV; // this is a range of constant buffer views (descriptors) descriptorTableRanges[0].NumDescriptors = 1; // we only have one constant buffer, so the range is only 1 descriptorTableRanges[0].BaseShaderRegister = 0; // start index of the shader registers in the range descriptorTableRanges[0].RegisterSpace = 0; // space 0. can usually be zero descriptorTableRanges[0].OffsetInDescriptorsFromTableStart = D3D12_DESCRIPTOR_RANGE_OFFSET_APPEND; // this appends the range to the end of the root signature descriptor tables Now we create a descriptor table by filling out a D3D12_ROOT_DESCRIPTOR_TABLE structure. typedef struct .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn859382%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396][D3D12_ROOT_DESCRIPTOR_TABLE] { UINT NumDescriptorRanges; const D3D12_DESCRIPTOR_RANGE *pDescriptorRanges; } D3D12_ROOT_DESCRIPTOR_TABLE; - **NumDescriptorRanges** - *The number of ranges this descriptor table will contain. We can put combinations of srv, cbv, and uav ranges together here. Samplers cannot be combined with the others.* - **pDescriptorRanges** - *A pointer to an array of ranges.* Although ranges can only contain resource descriptors of the same type (ie. cbv, uav, srv), descriptor tables can contain an array of ranges of different types. // create a descriptor table D3D12_ROOT_DESCRIPTOR_TABLE descriptorTable; descriptorTable.NumDescriptorRanges = _countof(descriptorTableRanges); // we only have one range descriptorTable.pDescriptorRanges = &descriptorTableRanges[0]; // the pointer to the beginning of our ranges array Now we create a root parameter. We do this by filling out a D3D12_ROOT_PARAMETER structure. typedef struct .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn879477%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396][D3D12_ROOT_PARAMETER] { D3D12_ROOT_PARAMETER_TYPE ParameterType; union { D3D12_ROOT_DESCRIPTOR_TABLE DescriptorTable; D3D12_ROOT_CONSTANTS Constants; D3D12_ROOT_DESCRIPTOR Descriptor; }; D3D12_SHADER_VISIBILITY ShaderVisibility; } D3D12_ROOT_PARAMETER; - **ParameterType** - *This is the type of the parameter, defined by the enumeration .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn879478(v=vs.85).aspx][D3D12_ROOT_PARAMETER_TYPE]. We are creating a descriptor table root parameter, so we use the D3D12_ROOT_PARAMETER_TYPE_DESCRIPTOR_TABLE enumeration* - **DescriptorTable** - *Only fill this out if D3D12_ROOT_PARAMETER_TYPE_DESCRIPTOR_TABLE is specified as the ParameterType. This is a .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn859382(v=vs.85).aspx][D3D12_ROOT_DESCRIPTOR_TABLE] structure.* - **Constants** - *Only fill this out if D3D12_ROOT_PARAMETER_TYPE_32BIT_CONSTANTS is specified as the ParameterType. This is a .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn879475%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396][D3D12_ROOT_CONSTANTS] structure* - **Descriptor** - *Only fill this out if any of the other D3D12_ROOT_PARAMETER_TYPE is specified as the ParameterType. This is a .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn879476(v=vs.85).aspx][D3D12_ROOT_DESCRIPTOR]* - **ShaderVisibility** - *This is a .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn879482(v=vs.85).aspx][D3D12_SHADER_VISIBILITY] enumeration. This parameter describes which shaders can access this parameter. You can specify D3D12_SHADER_VISIBILITY_ALL to allow all shaders access, or "OR" (|) the others together for each that have access. Only give access to the shaders that use it which will allow the GPU to optimize the parameter. Only our vertex shader has access to this constant buffer at the moment, so we specify D3D12_SHADER_VISIBILITY_VERTEX* // create a root parameter and fill it out D3D12_ROOT_PARAMETER rootParameters[1]; // only one parameter right now rootParameters[0].ParameterType = D3D12_ROOT_PARAMETER_TYPE_DESCRIPTOR_TABLE; // this is a descriptor table rootParameters[0].DescriptorTable = descriptorTable; // this is our descriptor table for this root parameter rootParameters[0].ShaderVisibility = D3D12_SHADER_VISIBILITY_VERTEX; // our pixel shader will be the only shader accessing this parameter for now Finally we fill out our root signature structure. We must provide a pointer to our array of D3D12_ROOT_PARAMETER. Notice how i've also added flags to deny shaders access to the root signature that don't need it. basically only the vertex shader needs the root signature at the moment. CD3DX12_ROOT_SIGNATURE_DESC rootSignatureDesc; rootSignatureDesc.Init(_countof(rootParameters), // we have 1 root parameter rootParameters, // a pointer to the beginning of our root parameters array 0, nullptr, D3D12_ROOT_SIGNATURE_FLAG_ALLOW_INPUT_ASSEMBLER_INPUT_LAYOUT | // we can deny shader stages here for better performance D3D12_ROOT_SIGNATURE_FLAG_DENY_HULL_SHADER_ROOT_ACCESS | D3D12_ROOT_SIGNATURE_FLAG_DENY_DOMAIN_SHADER_ROOT_ACCESS | D3D12_ROOT_SIGNATURE_FLAG_DENY_GEOMETRY_SHADER_ROOT_ACCESS | D3D12_ROOT_SIGNATURE_FLAG_DENY_PIXEL_SHADER_ROOT_ACCESS); ##New Vertex List## We have removed the second quad from the last tutorial // a quad Vertex vList[] = { // first quad (closer to camera, blue) { -0.5f, 0.5f, 0.5f, 1.0f, 0.0f, 0.0f, 1.0f }, { 0.5f, -0.5f, 0.5f, 1.0f, 0.0f, 1.0f, 1.0f }, { -0.5f, -0.5f, 0.5f, 0.0f, 0.0f, 1.0f, 1.0f }, { 0.5f, 0.5f, 0.5f, 0.0f, 1.0f, 0.0f, 1.0f } }; ##Constant Buffer Descriptor Heap## We must create a descriptor heap to store our constant buffer descriptor. We can actually create one descriptor heap for all cbv, uav, and srv descriptors, so in future tutorials we will add descriptors for srv's to this descriptor heap. We start by filling out a D3D12_DESCRIPTOR_HEAP_DESC. We have actually discussed this structure in a previous tutorial so i will not get into the details here. You will notice we have set the type of this descriptor heap to D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV, so that we use it to store our constant buffer descriptor. for (int i = 0; i < frameBufferCount; ++i) { D3D12_DESCRIPTOR_HEAP_DESC heapDesc = {}; heapDesc.NumDescriptors = 1; heapDesc.Flags = D3D12_DESCRIPTOR_HEAP_FLAG_SHADER_VISIBLE; heapDesc.Type = D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV; hr = device->CreateDescriptorHeap(&heapDesc, IID_PPV_ARGS(&mainDescriptorHeap[i])); if (FAILED(hr)) { Running = false; } } ##Create the Constant Buffer Resource Heap## We will create an upload heap to hold our constant buffer. Since we will be updating this constant buffer frequently (at least once every frame), there is no reason to create a default heap to copy the upload heap to. The constant buffer will be uploaded to the GPU every frame anyway, so we just keep it in an upload heap. We create a Buffer of the size 64KB. This has to do with alignment requirements. Resource Heaps must be a multiple of 64KB. So even though our constant buffer is only 16 bytes (array of 4 floats), we must allocate at least 64KB. If our constant buffer was 65KB, we would need to allocate 128KB. Single-texture and buffer resources must be 64KB aligned. Multi-sampled texture resources must be 4MB aligned. This resource heap will be read by the shaders, so we set the starting state to D3D12_RESOURCE_STATE_GENERIC_READ. // create the constant buffer resource heap // We will update the constant buffer one or more times per frame, so we will use only an upload heap // unlike previously we used an upload heap to upload the vertex and index data, and then copied over // to a default heap. If you plan to use a resource for more than a couple frames, it is usually more // efficient to copy to a default heap where it stays on the gpu. In this case, our constant buffer // will be modified and uploaded at least once per frame, so we only use an upload heap // create a resource heap, descriptor heap, and pointer to cbv for each frame for (int i = 0; i < frameBufferCount; ++i) { hr = device->CreateCommittedResource( &CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_UPLOAD), // this heap will be used to upload the constant buffer data D3D12_HEAP_FLAG_NONE, // no flags &CD3DX12_RESOURCE_DESC::Buffer(1024 * 64), // size of the resource heap. Must be a multiple of 64KB for single-textures and constant buffers D3D12_RESOURCE_STATE_GENERIC_READ, // will be data that is read from so we keep it in the generic read state nullptr, // we do not have use an optimized clear value for constant buffers IID_PPV_ARGS(&constantBufferUploadHeap)); constantBufferUploadHeap->SetName(L"Constant Buffer Upload Resource Heap"); ##Creating the Constant Buffer View## We will create a constant buffer view, which describes the constant buffer and contains a pointer to the memory where the constant buffer data resides. We do this by filling out a D3D12_CONSTANT_BUFFER_VIEW_DESC structure. We can get a pointer to the GPU memory by calling the GetGPUVirtualAddress method of our constant buffer resource heap. Constant buffers must be 256 byte aligned, which is different than the alignment requirement for a resource heap. Constant buffer reads must be 256 byte aligned from the start of a resource heap. For the SizeInBytes field, we get the size of our constant buffer, and add a number of bytes to make it 256 byte aligned. D3D12_CONSTANT_BUFFER_VIEW_DESC cbvDesc = {}; cbvDesc.BufferLocation = constantBufferUploadHeap->GetGPUVirtualAddress(); cbvDesc.SizeInBytes = (sizeof(ConstantBuffer) + 255) & ~255; // CB size is required to be 256-byte aligned. device->CreateConstantBufferView(&cbvDesc, mainDescriptorHeap->GetCPUDescriptorHandleForHeapStart()); ##Clear the Constant Buffer Data## Here we simply want to zero out all the memory inside the constant buffer data to start off with. ZeroMemory(&cbColorMultiplierData, sizeof(cbColorMultiplierData)); ##Mapping the Constant Buffer## The last thing we need to do to with our constant buffer during initialization is mapping it. First we create a range. This range is the area of memory within the constant buffer the cpu can access. We can set the begin to be equal to or larger than end which means the CPU will not read from the constant buffer. Next we map our constant buffer resource. We will get a pointer to a chunk of memory that the GPU will access and upload when a command list uses it. We have to make sure that we are finished modifying this mapped address on the CPU by the time we execute a command list that will use this resource. It is ok to keep a resource mapped for as long as you need it. We just need to make sure we do not access the mapped area after we have released the resource. We will keep the constant buffer resource mapped throughout our entire application. Once we have our resource mapped, we can copy data to it using memcpy. We will memcpy our entire ConstantBuffer instance every time we update the value to the address we get from Map, so that when we execute our command list, it will upload that chunk of data to the register b0 and our vertex shader will have access to the new data. CD3DX12_RANGE readRange(0, 0); // We do not intend to read from this resource on the CPU. (End is less than or equal to begin) hr = constantBufferUploadHeap->Map(0, &readRange, reinterpret_cast<void**>(&cbColorMultiplierGPUAddress)); memcpy(cbColorMultiplierGPUAddress, &cbColorMultiplierData, sizeof(cbColorMultiplierData)); } ##The Update Function## Here we have our update function. We finally have something in it! This is where we update our game logic, which in this tutorial involves updating the color multiplier and copying our ConstantBuffer instance data to the mapped constant buffer resource void Update() { // update app logic, such as moving the camera or figuring out what objects are in view static float rIncrement = 0.00002f; static float gIncrement = 0.00006f; static float bIncrement = 0.00009f; cbColorMultiplierData.colorMultiplier.x += rIncrement; cbColorMultiplierData.colorMultiplier.y += gIncrement; cbColorMultiplierData.colorMultiplier.z += bIncrement; if (cbColorMultiplierData.colorMultiplier.x >= 1.0 || cbColorMultiplierData.colorMultiplier.x <= 0.0) { cbColorMultiplierData.colorMultiplier.x = cbColorMultiplierData.colorMultiplier.x >= 1.0 ? 1.0 : 0.0; rIncrement = -rIncrement; } if (cbColorMultiplierData.colorMultiplier.y >= 1.0 || cbColorMultiplierData.colorMultiplier.y <= 0.0) { cbColorMultiplierData.colorMultiplier.y = cbColorMultiplierData.colorMultiplier.y >= 1.0 ? 1.0 : 0.0; gIncrement = -gIncrement; } if (cbColorMultiplierData.colorMultiplier.z >= 1.0 || cbColorMultiplierData.colorMultiplier.z <= 0.0) { cbColorMultiplierData.colorMultiplier.z = cbColorMultiplierData.colorMultiplier.z >= 1.0 ? 1.0 : 0.0; bIncrement = -bIncrement; } // copy our ConstantBuffer instance to the mapped constant buffer resource memcpy(cbColorMultiplierGPUAddress[frameIndex], &cbColorMultiplierData, sizeof(cbColorMultiplierData)); } ##Setting the Descriptor Heap and Root Descriptor Table## First we create an array of descriptor heaps. Then we set the pipelines descriptor heaps to our mainDescriptorHeap. Once we have our descriptor heap set, we need to set our root parameter 0 (the descriptor table) value to the location of our mainDescriptorHeap. // set constant buffer descriptor heap ID3D12DescriptorHeap* descriptorHeaps[] = { mainDescriptorHeap[frameIndex] }; commandList->SetDescriptorHeaps(_countof(descriptorHeaps), descriptorHeaps); // set the root descriptor table 0 to the constant buffer descriptor heap commandList->SetGraphicsRootDescriptorTable(0, mainDescriptorHeap[frameIndex]->GetGPUDescriptorHandleForHeapStart()); ##Clean Up## Finally we release our resources for (int i = 0; i < frameBufferCount; ++i) { SAFE_RELEASE(mainDescriptorHeap[i]); SAFE_RELEASE(constantBufferUploadHeap[i]); }; ##New Vertex Shader## We have added a constant buffer to our vertex shader. You will see we have bound this vertex buffer to register b0. We have to make sure that the root signature we set is compatible with the shaders we have set in our PSO. We will use the variable colorMultiplier from the ConstantBuffer to get our new vertex color, by multiplying it with the original vertex color. struct VS_INPUT { float3 pos : POSITION; float4 color: COLOR; }; struct VS_OUTPUT { float4 pos: SV_POSITION; float4 color: COLOR; }; cbuffer ConstantBuffer : register(b0) { float4 colorMultiplier; }; VS_OUTPUT main(VS_INPUT input) { VS_OUTPUT output; output.pos = float4(input.pos, 1.0f); output.color = input.color * colorMultiplier; return output; } ####Source Code#### ##VertexShader.hlsl## struct VS_INPUT { float3 pos : POSITION; float4 color: COLOR; }; struct VS_OUTPUT { float4 pos: SV_POSITION; float4 color: COLOR; }; cbuffer ConstantBuffer : register(b0) { float4 colorMultiplier; }; VS_OUTPUT main(VS_INPUT input) { VS_OUTPUT output; output.pos = float4(input.pos, 1.0f); output.color = input.color * colorMultiplier; return output; } ##PixelShader.hlsl## struct VS_OUTPUT { float4 pos: SV_POSITION; float4 color: COLOR; }; float4 main(VS_OUTPUT input) : SV_TARGET { // return interpolated color return input.color; } ##stdafx.h## #pragma once #ifndef WIN32_LEAN_AND_MEAN #define WIN32_LEAN_AND_MEAN // Exclude rarely-used stuff from Windows headers. #endif #include <windows.h> #include <d3d12.h> #include <dxgi1_4.h> #include <D3Dcompiler.h> #include <DirectXMath.h> #include "d3dx12.h" #include <string> // this will only call release if an object exists (prevents exceptions calling release on non existant objects) #define SAFE_RELEASE(p) { if ( (p) ) { (p)->Release(); (p) = 0; } } using namespace DirectX; // we will be using the directxmath library // Handle to the window HWND hwnd = NULL; // name of the window (not the title) LPCTSTR WindowName = L"BzTutsApp"; // title of the window LPCTSTR WindowTitle = L"Bz Window"; // width and height of the window int Width = 800; int Height = 600; // is window full screen? bool FullScreen = false; // we will exit the program when this becomes false bool Running = true; // create a window bool InitializeWindow(HINSTANCE hInstance, int ShowWnd, bool fullscreen); // main application loop void mainloop(); // callback function for windows messages LRESULT CALLBACK WndProc(HWND hWnd, UINT msg, WPARAM wParam, LPARAM lParam); // direct3d stuff const int frameBufferCount = 3; // number of buffers we want, 2 for double buffering, 3 for tripple buffering ID3D12Device* device; // direct3d device IDXGISwapChain3* swapChain; // swapchain used to switch between render targets ID3D12CommandQueue* commandQueue; // container for command lists ID3D12DescriptorHeap* rtvDescriptorHeap; // a descriptor heap to hold resources like the render targets ID3D12Resource* renderTargets[frameBufferCount]; // number of render targets equal to buffer count ID3D12CommandAllocator* commandAllocator[frameBufferCount]; // we want enough allocators for each buffer * number of threads (we only have one thread) ID3D12GraphicsCommandList* commandList; // a command list we can record commands into, then execute them to render the frame ID3D12Fence* fence[frameBufferCount]; // an object that is locked while our command list is being executed by the gpu. We need as many //as we have allocators (more if we want to know when the gpu is finished with an asset) HANDLE fenceEvent; // a handle to an event when our fence is unlocked by the gpu UINT64 fenceValue[frameBufferCount]; // this value is incremented each frame. each fence will have its own value int frameIndex; // current rtv we are on int rtvDescriptorSize; // size of the rtv descriptor on the device (all front and back buffers will be the same size) // function declarations bool InitD3D(); // initializes direct3d 12 void Update(); // update the game logic void UpdatePipeline(); // update the direct3d pipeline (update command lists) void Render(); // execute the command list void Cleanup(); // release com ojects and clean up memory void WaitForPreviousFrame(); // wait until gpu is finished with command list ID3D12PipelineState* pipelineStateObject; // pso containing a pipeline state ID3D12RootSignature* rootSignature; // root signature defines data shaders will access D3D12_VIEWPORT viewport; // area that output from rasterizer will be stretched to. D3D12_RECT scissorRect; // the area to draw in. pixels outside that area will not be drawn onto ID3D12Resource* vertexBuffer; // a default buffer in GPU memory that we will load vertex data for our triangle into ID3D12Resource* indexBuffer; // a default buffer in GPU memory that we will load index data for our triangle into D3D12_VERTEX_BUFFER_VIEW vertexBufferView; // a structure containing a pointer to the vertex data in gpu memory // the total size of the buffer, and the size of each element (vertex) D3D12_INDEX_BUFFER_VIEW indexBufferView; // a structure holding information about the index buffer ID3D12Resource* depthStencilBuffer; // This is the memory for our depth buffer. it will also be used for a stencil buffer in a later tutorial ID3D12DescriptorHeap* dsDescriptorHeap; // This is a heap for our depth/stencil buffer descriptor // this is the structure of our constant buffer. struct ConstantBuffer { XMFLOAT4 colorMultiplier; }; ID3D12DescriptorHeap* mainDescriptorHeap[frameBufferCount]; // this heap will store the descripor to our constant buffer ID3D12Resource* constantBufferUploadHeap[frameBufferCount]; // this is the memory on the gpu where our constant buffer will be placed. ConstantBuffer cbColorMultiplierData; // this is the constant buffer data we will send to the gpu // (which will be placed in the resource we created above) UINT8* cbColorMultiplierGPUAddress[frameBufferCount]; // this is a pointer to the memory location we get when we map our constant buffer ##main.cpp## #include "stdafx.h" struct Vertex { Vertex(float x, float y, float z, float r, float g, float b, float a) : pos(x, y, z), color(r, g, b, z) {} XMFLOAT3 pos; XMFLOAT4 color; }; int WINAPI WinMain(HINSTANCE hInstance, //Main windows function HINSTANCE hPrevInstance, LPSTR lpCmdLine, int nShowCmd) { // create the window if (!InitializeWindow(hInstance, nShowCmd, FullScreen)) { MessageBox(0, L"Window Initialization - Failed", L"Error", MB_OK); return 1; } // initialize direct3d if (!InitD3D()) { MessageBox(0, L"Failed to initialize direct3d 12", L"Error", MB_OK); Cleanup(); return 1; } // start the main loop mainloop(); // we want to wait for the gpu to finish executing the command list before we start releasing everything WaitForPreviousFrame(); // close the fence event CloseHandle(fenceEvent); // clean up everything Cleanup(); return 0; } // create and show the window bool InitializeWindow(HINSTANCE hInstance, int ShowWnd, bool fullscreen) { if (fullscreen) { HMONITOR hmon = MonitorFromWindow(hwnd, MONITOR_DEFAULTTONEAREST); MONITORINFO mi = { sizeof(mi) }; GetMonitorInfo(hmon, &mi); Width = mi.rcMonitor.right - mi.rcMonitor.left; Height = mi.rcMonitor.bottom - mi.rcMonitor.top; } WNDCLASSEX wc; wc.cbSize = sizeof(WNDCLASSEX); wc.style = CS_HREDRAW | CS_VREDRAW; wc.lpfnWndProc = WndProc; wc.cbClsExtra = NULL; wc.cbWndExtra = NULL; wc.hInstance = hInstance; wc.hIcon = LoadIcon(NULL, IDI_APPLICATION); wc.hCursor = LoadCursor(NULL, IDC_ARROW); wc.hbrBackground = (HBRUSH)(COLOR_WINDOW + 2); wc.lpszMenuName = NULL; wc.lpszClassName = WindowName; wc.hIconSm = LoadIcon(NULL, IDI_APPLICATION); if (!RegisterClassEx(&wc)) { MessageBox(NULL, L"Error registering class", L"Error", MB_OK | MB_ICONERROR); return false; } hwnd = CreateWindowEx(NULL, WindowName, WindowTitle, WS_OVERLAPPEDWINDOW, CW_USEDEFAULT, CW_USEDEFAULT, Width, Height, NULL, NULL, hInstance, NULL); if (!hwnd) { MessageBox(NULL, L"Error creating window", L"Error", MB_OK | MB_ICONERROR); return false; } if (fullscreen) { SetWindowLong(hwnd, GWL_STYLE, 0); } ShowWindow(hwnd, ShowWnd); UpdateWindow(hwnd); return true; } void mainloop() { MSG msg; ZeroMemory(&msg, sizeof(MSG)); while (Running) { if (PeekMessage(&msg, NULL, 0, 0, PM_REMOVE)) { if (msg.message == WM_QUIT) break; TranslateMessage(&msg); DispatchMessage(&msg); } else { // run game code Update(); // update the game logic Render(); // execute the command queue (rendering the scene is the result of the gpu executing the command lists) } } } LRESULT CALLBACK WndProc(HWND hwnd, UINT msg, WPARAM wParam, LPARAM lParam) { switch (msg) { case WM_KEYDOWN: if (wParam == VK_ESCAPE) { if (MessageBox(0, L"Are you sure you want to exit?", L"Really?", MB_YESNO | MB_ICONQUESTION) == IDYES) { Running = false; DestroyWindow(hwnd); } } return 0; case WM_DESTROY: // x button on top right corner of window was pressed Running = false; PostQuitMessage(0); return 0; } return DefWindowProc(hwnd, msg, wParam, lParam); } bool InitD3D() { HRESULT hr; // -- Create the Device -- // IDXGIFactory4* dxgiFactory; hr = CreateDXGIFactory1(IID_PPV_ARGS(&dxgiFactory)); if (FAILED(hr)) { return false; } IDXGIAdapter1* adapter; // adapters are the graphics card (this includes the embedded graphics on the motherboard) int adapterIndex = 0; // we'll start looking for directx 12 compatible graphics devices starting at index 0 bool adapterFound = false; // set this to true when a good one was found // find first hardware gpu that supports d3d 12 while (dxgiFactory->EnumAdapters1(adapterIndex, &adapter) != DXGI_ERROR_NOT_FOUND) { DXGI_ADAPTER_DESC1 desc; adapter->GetDesc1(&desc); if (desc.Flags & DXGI_ADAPTER_FLAG_SOFTWARE) { // we dont want a software device continue; } // we want a device that is compatible with direct3d 12 (feature level 11 or higher) hr = D3D12CreateDevice(adapter, D3D_FEATURE_LEVEL_11_0, _uuidof(ID3D12Device), nullptr); if (SUCCEEDED(hr)) { adapterFound = true; break; } adapterIndex++; } if (!adapterFound) { return false; } // Create the device hr = D3D12CreateDevice( adapter, D3D_FEATURE_LEVEL_11_0, IID_PPV_ARGS(&device) ); if (FAILED(hr)) { return false; } // -- Create a direct command queue -- // D3D12_COMMAND_QUEUE_DESC cqDesc = {}; cqDesc.Flags = D3D12_COMMAND_QUEUE_FLAG_NONE; cqDesc.Type = D3D12_COMMAND_LIST_TYPE_DIRECT; // direct means the gpu can directly execute this command queue hr = device->CreateCommandQueue(&cqDesc, IID_PPV_ARGS(&commandQueue)); // create the command queue if (FAILED(hr)) { return false; } // -- Create the Swap Chain (double/tripple buffering) -- // DXGI_MODE_DESC backBufferDesc = {}; // this is to describe our display mode backBufferDesc.Width = Width; // buffer width backBufferDesc.Height = Height; // buffer height backBufferDesc.Format = DXGI_FORMAT_R8G8B8A8_UNORM; // format of the buffer (rgba 32 bits, 8 bits for each chanel) // describe our multi-sampling. We are not multi-sampling, so we set the count to 1 (we need at least one sample of course) DXGI_SAMPLE_DESC sampleDesc = {}; sampleDesc.Count = 1; // multisample count (no multisampling, so we just put 1, since we still need 1 sample) // Describe and create the swap chain. DXGI_SWAP_CHAIN_DESC swapChainDesc = {}; swapChainDesc.BufferCount = frameBufferCount; // number of buffers we have swapChainDesc.BufferDesc = backBufferDesc; // our back buffer description swapChainDesc.BufferUsage = DXGI_USAGE_RENDER_TARGET_OUTPUT; // this says the pipeline will render to this swap chain swapChainDesc.SwapEffect = DXGI_SWAP_EFFECT_FLIP_DISCARD; // dxgi will discard the buffer (data) after we call present swapChainDesc.OutputWindow = hwnd; // handle to our window swapChainDesc.SampleDesc = sampleDesc; // our multi-sampling description swapChainDesc.Windowed = !FullScreen; // set to true, then if in fullscreen must call SetFullScreenState with true for full screen to get uncapped fps IDXGISwapChain* tempSwapChain; dxgiFactory->CreateSwapChain( commandQueue, // the queue will be flushed once the swap chain is created &swapChainDesc, // give it the swap chain description we created above &tempSwapChain // store the created swap chain in a temp IDXGISwapChain interface ); swapChain = static_cast<IDXGISwapChain3*>(tempSwapChain); frameIndex = swapChain->GetCurrentBackBufferIndex(); // -- Create the Back Buffers (render target views) Descriptor Heap -- // // describe an rtv descriptor heap and create D3D12_DESCRIPTOR_HEAP_DESC rtvHeapDesc = {}; rtvHeapDesc.NumDescriptors = frameBufferCount; // number of descriptors for this heap. rtvHeapDesc.Type = D3D12_DESCRIPTOR_HEAP_TYPE_RTV; // this heap is a render target view heap // This heap will not be directly referenced by the shaders (not shader visible), as this will store the output from the pipeline // otherwise we would set the heap's flag to D3D12_DESCRIPTOR_HEAP_FLAG_SHADER_VISIBLE rtvHeapDesc.Flags = D3D12_DESCRIPTOR_HEAP_FLAG_NONE; hr = device->CreateDescriptorHeap(&rtvHeapDesc, IID_PPV_ARGS(&rtvDescriptorHeap)); if (FAILED(hr)) { return false; } // get the size of a descriptor in this heap (this is a rtv heap, so only rtv descriptors should be stored in it. // descriptor sizes may vary from device to device, which is why there is no set size and we must ask the // device to give us the size. we will use this size to increment a descriptor handle offset rtvDescriptorSize = device->GetDescriptorHandleIncrementSize(D3D12_DESCRIPTOR_HEAP_TYPE_RTV); // get a handle to the first descriptor in the descriptor heap. a handle is basically a pointer, // but we cannot literally use it like a c++ pointer. CD3DX12_CPU_DESCRIPTOR_HANDLE rtvHandle(rtvDescriptorHeap->GetCPUDescriptorHandleForHeapStart()); // Create a RTV for each buffer (double buffering is two buffers, tripple buffering is 3). for (int i = 0; i < frameBufferCount; i++) { // first we get the n'th buffer in the swap chain and store it in the n'th // position of our ID3D12Resource array hr = swapChain->GetBuffer(i, IID_PPV_ARGS(&renderTargets[i])); if (FAILED(hr)) { return false; } // the we "create" a render target view which binds the swap chain buffer (ID3D12Resource[n]) to the rtv handle device->CreateRenderTargetView(renderTargets[i], nullptr, rtvHandle); // we increment the rtv handle by the rtv descriptor size we got above rtvHandle.Offset(1, rtvDescriptorSize); } // -- Create the Command Allocators -- // for (int i = 0; i < frameBufferCount; i++) { hr = device->CreateCommandAllocator(D3D12_COMMAND_LIST_TYPE_DIRECT, IID_PPV_ARGS(&commandAllocator[i])); if (FAILED(hr)) { return false; } } // -- Create a Command List -- // // create the command list with the first allocator hr = device->CreateCommandList(0, D3D12_COMMAND_LIST_TYPE_DIRECT, commandAllocator[frameIndex], NULL, IID_PPV_ARGS(&commandList)); if (FAILED(hr)) { return false; } // -- Create a Fence & Fence Event -- // // create the fences for (int i = 0; i < frameBufferCount; i++) { hr = device->CreateFence(0, D3D12_FENCE_FLAG_NONE, IID_PPV_ARGS(&fence[i])); if (FAILED(hr)) { return false; } fenceValue[i] = 0; // set the initial fence value to 0 } // create a handle to a fence event fenceEvent = CreateEvent(nullptr, FALSE, FALSE, nullptr); if (fenceEvent == nullptr) { return false; } // create root signature // create a descriptor range (descriptor table) and fill it out // this is a range of descriptors inside a descriptor heap D3D12_DESCRIPTOR_RANGE descriptorTableRanges[1]; // only one range right now descriptorTableRanges[0].RangeType = D3D12_DESCRIPTOR_RANGE_TYPE_CBV; // this is a range of constant buffer views (descriptors) descriptorTableRanges[0].NumDescriptors = 1; // we only have one constant buffer, so the range is only 1 descriptorTableRanges[0].BaseShaderRegister = 0; // start index of the shader registers in the range descriptorTableRanges[0].RegisterSpace = 0; // space 0. can usually be zero descriptorTableRanges[0].OffsetInDescriptorsFromTableStart = D3D12_DESCRIPTOR_RANGE_OFFSET_APPEND; // this appends the range to the end of the root signature descriptor tables // create a descriptor table D3D12_ROOT_DESCRIPTOR_TABLE descriptorTable; descriptorTable.NumDescriptorRanges = _countof(descriptorTableRanges); // we only have one range descriptorTable.pDescriptorRanges = &descriptorTableRanges[0]; // the pointer to the beginning of our ranges array // create a root parameter and fill it out D3D12_ROOT_PARAMETER rootParameters[1]; // only one parameter right now rootParameters[0].ParameterType = D3D12_ROOT_PARAMETER_TYPE_DESCRIPTOR_TABLE; // this is a descriptor table rootParameters[0].DescriptorTable = descriptorTable; // this is our descriptor table for this root parameter rootParameters[0].ShaderVisibility = D3D12_SHADER_VISIBILITY_VERTEX; // our pixel shader will be the only shader accessing this parameter for now CD3DX12_ROOT_SIGNATURE_DESC rootSignatureDesc; rootSignatureDesc.Init(_countof(rootParameters), // we have 1 root parameter rootParameters, // a pointer to the beginning of our root parameters array 0, nullptr, D3D12_ROOT_SIGNATURE_FLAG_ALLOW_INPUT_ASSEMBLER_INPUT_LAYOUT | // we can deny shader stages here for better performance D3D12_ROOT_SIGNATURE_FLAG_DENY_HULL_SHADER_ROOT_ACCESS | D3D12_ROOT_SIGNATURE_FLAG_DENY_DOMAIN_SHADER_ROOT_ACCESS | D3D12_ROOT_SIGNATURE_FLAG_DENY_GEOMETRY_SHADER_ROOT_ACCESS | D3D12_ROOT_SIGNATURE_FLAG_DENY_PIXEL_SHADER_ROOT_ACCESS); ID3DBlob* signature; hr = D3D12SerializeRootSignature(&rootSignatureDesc, D3D_ROOT_SIGNATURE_VERSION_1, &signature, nullptr); if (FAILED(hr)) { return false; } hr = device->CreateRootSignature(0, signature->GetBufferPointer(), signature->GetBufferSize(), IID_PPV_ARGS(&rootSignature)); if (FAILED(hr)) { return false; } // create vertex and pixel shaders // when debugging, we can compile the shader files at runtime. // but for release versions, we can compile the hlsl shaders // with fxc.exe to create .cso files, which contain the shader // bytecode. We can load the .cso files at runtime to get the // shader bytecode, which of course is faster than compiling // them at runtime // compile vertex shader ID3DBlob* vertexShader; // d3d blob for holding vertex shader bytecode ID3DBlob* errorBuff; // a buffer holding the error data if any hr = D3DCompileFromFile(L"VertexShader.hlsl", nullptr, nullptr, "main", "vs_5_0", D3DCOMPILE_DEBUG | D3DCOMPILE_SKIP_OPTIMIZATION, 0, &vertexShader, &errorBuff); if (FAILED(hr)) { OutputDebugStringA((char*)errorBuff->GetBufferPointer()); return false; } // fill out a shader bytecode structure, which is basically just a pointer // to the shader bytecode and the size of the shader bytecode D3D12_SHADER_BYTECODE vertexShaderBytecode = {}; vertexShaderBytecode.BytecodeLength = vertexShader->GetBufferSize(); vertexShaderBytecode.pShaderBytecode = vertexShader->GetBufferPointer(); // compile pixel shader ID3DBlob* pixelShader; hr = D3DCompileFromFile(L"PixelShader.hlsl", nullptr, nullptr, "main", "ps_5_0", D3DCOMPILE_DEBUG | D3DCOMPILE_SKIP_OPTIMIZATION, 0, &pixelShader, &errorBuff); if (FAILED(hr)) { OutputDebugStringA((char*)errorBuff->GetBufferPointer()); return false; } // fill out shader bytecode structure for pixel shader D3D12_SHADER_BYTECODE pixelShaderBytecode = {}; pixelShaderBytecode.BytecodeLength = pixelShader->GetBufferSize(); pixelShaderBytecode.pShaderBytecode = pixelShader->GetBufferPointer(); // create input layout // The input layout is used by the Input Assembler so that it knows // how to read the vertex data bound to it. D3D12_INPUT_ELEMENT_DESC inputLayout[] = { { "POSITION", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 0, D3D12_INPUT_CLASSIFICATION_PER_VERTEX_DATA, 0 }, { "COLOR", 0, DXGI_FORMAT_R32G32B32A32_FLOAT, 0, 12, D3D12_INPUT_CLASSIFICATION_PER_VERTEX_DATA, 0 } }; // fill out an input layout description structure D3D12_INPUT_LAYOUT_DESC inputLayoutDesc = {}; // we can get the number of elements in an array by "sizeof(array) / sizeof(arrayElementType)" inputLayoutDesc.NumElements = sizeof(inputLayout) / sizeof(D3D12_INPUT_ELEMENT_DESC); inputLayoutDesc.pInputElementDescs = inputLayout; // create a pipeline state object (PSO) // In a real application, you will have many pso's. for each different shader // or different combinations of shaders, different blend states or different rasterizer states, // different topology types (point, line, triangle, patch), or a different number // of render targets you will need a pso // VS is the only required shader for a pso. You might be wondering when a case would be where // you only set the VS. It's possible that you have a pso that only outputs data with the stream // output, and not on a render target, which means you would not need anything after the stream // output. D3D12_GRAPHICS_PIPELINE_STATE_DESC psoDesc = {}; // a structure to define a pso psoDesc.InputLayout = inputLayoutDesc; // the structure describing our input layout psoDesc.pRootSignature = rootSignature; // the root signature that describes the input data this pso needs psoDesc.VS = vertexShaderBytecode; // structure describing where to find the vertex shader bytecode and how large it is psoDesc.PS = pixelShaderBytecode; // same as VS but for pixel shader psoDesc.PrimitiveTopologyType = D3D12_PRIMITIVE_TOPOLOGY_TYPE_TRIANGLE; // type of topology we are drawing psoDesc.RTVFormats[0] = DXGI_FORMAT_R8G8B8A8_UNORM; // format of the render target psoDesc.SampleDesc = sampleDesc; // must be the same sample description as the swapchain and depth/stencil buffer psoDesc.SampleMask = 0xffffffff; // sample mask has to do with multi-sampling. 0xffffffff means point sampling is done psoDesc.RasterizerState = CD3DX12_RASTERIZER_DESC(D3D12_DEFAULT); // a default rasterizer state. psoDesc.BlendState = CD3DX12_BLEND_DESC(D3D12_DEFAULT); // a default blent state. psoDesc.NumRenderTargets = 1; // we are only binding one render target psoDesc.DepthStencilState = CD3DX12_DEPTH_STENCIL_DESC(D3D12_DEFAULT); // a default depth stencil state // create the pso hr = device->CreateGraphicsPipelineState(&psoDesc, IID_PPV_ARGS(&pipelineStateObject)); if (FAILED(hr)) { return false; } // Create vertex buffer // a quad Vertex vList[] = { // first quad (closer to camera, blue) { -0.5f, 0.5f, 0.5f, 1.0f, 0.0f, 0.0f, 1.0f }, { 0.5f, -0.5f, 0.5f, 1.0f, 0.0f, 1.0f, 1.0f }, { -0.5f, -0.5f, 0.5f, 0.0f, 0.0f, 1.0f, 1.0f }, { 0.5f, 0.5f, 0.5f, 0.0f, 1.0f, 0.0f, 1.0f } }; int vBufferSize = sizeof(vList); // create default heap // default heap is memory on the GPU. Only the GPU has access to this memory // To get data into this heap, we will have to upload the data using // an upload heap device->CreateCommittedResource( &CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_DEFAULT), // a default heap D3D12_HEAP_FLAG_NONE, // no flags &CD3DX12_RESOURCE_DESC::Buffer(vBufferSize), // resource description for a buffer D3D12_RESOURCE_STATE_COPY_DEST, // we will start this heap in the copy destination state since we will copy data // from the upload heap to this heap nullptr, // optimized clear value must be null for this type of resource. used for render targets and depth/stencil buffers IID_PPV_ARGS(&vertexBuffer)); // we can give resource heaps a name so when we debug with the graphics debugger we know what resource we are looking at vertexBuffer->SetName(L"Vertex Buffer Resource Heap"); // create upload heap // upload heaps are used to upload data to the GPU. CPU can write to it, GPU can read from it // We will upload the vertex buffer using this heap to the default heap ID3D12Resource* vBufferUploadHeap; device->CreateCommittedResource( &CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_UPLOAD), // upload heap D3D12_HEAP_FLAG_NONE, // no flags &CD3DX12_RESOURCE_DESC::Buffer(vBufferSize), // resource description for a buffer D3D12_RESOURCE_STATE_GENERIC_READ, // GPU will read from this buffer and copy its contents to the default heap nullptr, IID_PPV_ARGS(&vBufferUploadHeap)); vBufferUploadHeap->SetName(L"Vertex Buffer Upload Resource Heap"); // store vertex buffer in upload heap D3D12_SUBRESOURCE_DATA vertexData = {}; vertexData.pData = reinterpret_cast<BYTE*>(vList); // pointer to our vertex array vertexData.RowPitch = vBufferSize; // size of all our triangle vertex data vertexData.SlicePitch = vBufferSize; // also the size of our triangle vertex data // we are now creating a command with the command list to copy the data from // the upload heap to the default heap UpdateSubresources(commandList, vertexBuffer, vBufferUploadHeap, 0, 0, 1, &vertexData); // transition the vertex buffer data from copy destination state to vertex buffer state commandList->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(vertexBuffer, D3D12_RESOURCE_STATE_COPY_DEST, D3D12_RESOURCE_STATE_VERTEX_AND_CONSTANT_BUFFER)); // Create index buffer // a quad (2 triangles) DWORD iList[] = { // first quad (blue) 0, 1, 2, // first triangle 0, 3, 1, // second triangle }; int iBufferSize = sizeof(iList); // create default heap to hold index buffer device->CreateCommittedResource( &CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_DEFAULT), // a default heap D3D12_HEAP_FLAG_NONE, // no flags &CD3DX12_RESOURCE_DESC::Buffer(iBufferSize), // resource description for a buffer D3D12_RESOURCE_STATE_COPY_DEST, // start in the copy destination state nullptr, // optimized clear value must be null for this type of resource IID_PPV_ARGS(&indexBuffer)); // we can give resource heaps a name so when we debug with the graphics debugger we know what resource we are looking at vertexBuffer->SetName(L"Index Buffer Resource Heap"); // create upload heap to upload index buffer ID3D12Resource* iBufferUploadHeap; device->CreateCommittedResource( &CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_UPLOAD), // upload heap D3D12_HEAP_FLAG_NONE, // no flags &CD3DX12_RESOURCE_DESC::Buffer(vBufferSize), // resource description for a buffer D3D12_RESOURCE_STATE_GENERIC_READ, // GPU will read from this buffer and copy its contents to the default heap nullptr, IID_PPV_ARGS(&iBufferUploadHeap)); vBufferUploadHeap->SetName(L"Index Buffer Upload Resource Heap"); // store vertex buffer in upload heap D3D12_SUBRESOURCE_DATA indexData = {}; indexData.pData = reinterpret_cast<BYTE*>(iList); // pointer to our index array indexData.RowPitch = iBufferSize; // size of all our index buffer indexData.SlicePitch = iBufferSize; // also the size of our index buffer // we are now creating a command with the command list to copy the data from // the upload heap to the default heap UpdateSubresources(commandList, indexBuffer, iBufferUploadHeap, 0, 0, 1, &indexData); // transition the vertex buffer data from copy destination state to vertex buffer state commandList->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(indexBuffer, D3D12_RESOURCE_STATE_COPY_DEST, D3D12_RESOURCE_STATE_VERTEX_AND_CONSTANT_BUFFER)); // Create the depth/stencil buffer // create a depth stencil descriptor heap so we can get a pointer to the depth stencil buffer D3D12_DESCRIPTOR_HEAP_DESC dsvHeapDesc = {}; dsvHeapDesc.NumDescriptors = 1; dsvHeapDesc.Type = D3D12_DESCRIPTOR_HEAP_TYPE_DSV; dsvHeapDesc.Flags = D3D12_DESCRIPTOR_HEAP_FLAG_NONE; hr = device->CreateDescriptorHeap(&dsvHeapDesc, IID_PPV_ARGS(&dsDescriptorHeap)); if (FAILED(hr)) { Running = false; } D3D12_DEPTH_STENCIL_VIEW_DESC depthStencilDesc = {}; depthStencilDesc.Format = DXGI_FORMAT_D32_FLOAT; depthStencilDesc.ViewDimension = D3D12_DSV_DIMENSION_TEXTURE2D; depthStencilDesc.Flags = D3D12_DSV_FLAG_NONE; D3D12_CLEAR_VALUE depthOptimizedClearValue = {}; depthOptimizedClearValue.Format = DXGI_FORMAT_D32_FLOAT; depthOptimizedClearValue.DepthStencil.Depth = 1.0f; depthOptimizedClearValue.DepthStencil.Stencil = 0; device->CreateCommittedResource( &CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_DEFAULT), D3D12_HEAP_FLAG_NONE, &CD3DX12_RESOURCE_DESC::Tex2D(DXGI_FORMAT_D32_FLOAT, Width, Height, 1, 0, 1, 0, D3D12_RESOURCE_FLAG_ALLOW_DEPTH_STENCIL), D3D12_RESOURCE_STATE_DEPTH_WRITE, &depthOptimizedClearValue, IID_PPV_ARGS(&depthStencilBuffer) ); dsDescriptorHeap->SetName(L"Depth/Stencil Resource Heap"); device->CreateDepthStencilView(depthStencilBuffer, &depthStencilDesc, dsDescriptorHeap->GetCPUDescriptorHandleForHeapStart()); // Create a constant buffer descriptor heap for each frame // this is the descriptor heap that will store our constant buffer descriptor for (int i = 0; i < frameBufferCount; ++i) { D3D12_DESCRIPTOR_HEAP_DESC heapDesc = {}; heapDesc.NumDescriptors = 1; heapDesc.Flags = D3D12_DESCRIPTOR_HEAP_FLAG_SHADER_VISIBLE; heapDesc.Type = D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV; hr = device->CreateDescriptorHeap(&heapDesc, IID_PPV_ARGS(&mainDescriptorHeap[i])); if (FAILED(hr)) { Running = false; } } // create the constant buffer resource heap // We will update the constant buffer one or more times per frame, so we will use only an upload heap // unlike previously we used an upload heap to upload the vertex and index data, and then copied over // to a default heap. If you plan to use a resource for more than a couple frames, it is usually more // efficient to copy to a default heap where it stays on the gpu. In this case, our constant buffer // will be modified and uploaded at least once per frame, so we only use an upload heap // create a resource heap, descriptor heap, and pointer to cbv for each frame for (int i = 0; i < frameBufferCount; ++i) { hr = device->CreateCommittedResource( &CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_UPLOAD), // this heap will be used to upload the constant buffer data D3D12_HEAP_FLAG_NONE, // no flags &CD3DX12_RESOURCE_DESC::Buffer(1024 * 64), // size of the resource heap. Must be a multiple of 64KB for single-textures and constant buffers D3D12_RESOURCE_STATE_GENERIC_READ, // will be data that is read from so we keep it in the generic read state nullptr, // we do not have use an optimized clear value for constant buffers IID_PPV_ARGS(&constantBufferUploadHeap[i])); constantBufferUploadHeap[i]->SetName(L"Constant Buffer Upload Resource Heap"); D3D12_CONSTANT_BUFFER_VIEW_DESC cbvDesc = {}; cbvDesc.BufferLocation = constantBufferUploadHeap[i]->GetGPUVirtualAddress(); cbvDesc.SizeInBytes = (sizeof(ConstantBuffer) + 255) & ~255; // CB size is required to be 256-byte aligned. device->CreateConstantBufferView(&cbvDesc, mainDescriptorHeap[i]->GetCPUDescriptorHandleForHeapStart()); ZeroMemory(&cbColorMultiplierData, sizeof(cbColorMultiplierData)); CD3DX12_RANGE readRange(0, 0); // We do not intend to read from this resource on the CPU. (End is less than or equal to begin) hr = constantBufferUploadHeap[i]->Map(0, &readRange, reinterpret_cast<void**>(&cbColorMultiplierGPUAddress[i])); memcpy(cbColorMultiplierGPUAddress[i], &cbColorMultiplierData, sizeof(cbColorMultiplierData)); } // Now we execute the command list to upload the initial assets (triangle data) commandList->Close(); ID3D12CommandList* ppCommandLists[] = { commandList }; commandQueue->ExecuteCommandLists(_countof(ppCommandLists), ppCommandLists); // increment the fence value now, otherwise the buffer might not be uploaded by the time we start drawing fenceValue[frameIndex]++; hr = commandQueue->Signal(fence[frameIndex], fenceValue[frameIndex]); if (FAILED(hr)) { Running = false; } // create a vertex buffer view for the triangle. We get the GPU memory address to the vertex pointer using the GetGPUVirtualAddress() method vertexBufferView.BufferLocation = vertexBuffer->GetGPUVirtualAddress(); vertexBufferView.StrideInBytes = sizeof(Vertex); vertexBufferView.SizeInBytes = vBufferSize; // create a vertex buffer view for the triangle. We get the GPU memory address to the vertex pointer using the GetGPUVirtualAddress() method indexBufferView.BufferLocation = indexBuffer->GetGPUVirtualAddress(); indexBufferView.Format = DXGI_FORMAT_R32_UINT; // 32-bit unsigned integer (this is what a dword is, double word, a word is 2 bytes) indexBufferView.SizeInBytes = iBufferSize; // Fill out the Viewport viewport.TopLeftX = 0; viewport.TopLeftY = 0; viewport.Width = Width; viewport.Height = Height; viewport.MinDepth = 0.0f; viewport.MaxDepth = 1.0f; // Fill out a scissor rect scissorRect.left = 0; scissorRect.top = 0; scissorRect.right = Width; scissorRect.bottom = Height; return true; } void Update() { // update app logic, such as moving the camera or figuring out what objects are in view static float rIncrement = 0.00002f; static float gIncrement = 0.00006f; static float bIncrement = 0.00009f; cbColorMultiplierData.colorMultiplier.x += rIncrement; cbColorMultiplierData.colorMultiplier.y += gIncrement; cbColorMultiplierData.colorMultiplier.z += bIncrement; if (cbColorMultiplierData.colorMultiplier.x >= 1.0 || cbColorMultiplierData.colorMultiplier.x <= 0.0) { cbColorMultiplierData.colorMultiplier.x = cbColorMultiplierData.colorMultiplier.x >= 1.0 ? 1.0 : 0.0; rIncrement = -rIncrement; } if (cbColorMultiplierData.colorMultiplier.y >= 1.0 || cbColorMultiplierData.colorMultiplier.y <= 0.0) { cbColorMultiplierData.colorMultiplier.y = cbColorMultiplierData.colorMultiplier.y >= 1.0 ? 1.0 : 0.0; gIncrement = -gIncrement; } if (cbColorMultiplierData.colorMultiplier.z >= 1.0 || cbColorMultiplierData.colorMultiplier.z <= 0.0) { cbColorMultiplierData.colorMultiplier.z = cbColorMultiplierData.colorMultiplier.z >= 1.0 ? 1.0 : 0.0; bIncrement = -bIncrement; } // copy our ConstantBuffer instance to the mapped constant buffer resource memcpy(cbColorMultiplierGPUAddress[frameIndex], &cbColorMultiplierData, sizeof(cbColorMultiplierData)); } void UpdatePipeline() { HRESULT hr; // We have to wait for the gpu to finish with the command allocator before we reset it WaitForPreviousFrame(); // we can only reset an allocator once the gpu is done with it // resetting an allocator frees the memory that the command list was stored in hr = commandAllocator[frameIndex]->Reset(); if (FAILED(hr)) { Running = false; } // reset the command list. by resetting the command list we are putting it into // a recording state so we can start recording commands into the command allocator. // the command allocator that we reference here may have multiple command lists // associated with it, but only one can be recording at any time. Make sure // that any other command lists associated to this command allocator are in // the closed state (not recording). // Here you will pass an initial pipeline state object as the second parameter, // but in this tutorial we are only clearing the rtv, and do not actually need // anything but an initial default pipeline, which is what we get by setting // the second parameter to NULL hr = commandList->Reset(commandAllocator[frameIndex], pipelineStateObject); if (FAILED(hr)) { Running = false; } // here we start recording commands into the commandList (which all the commands will be stored in the commandAllocator) // transition the "frameIndex" render target from the present state to the render target state so the command list draws to it starting from here commandList->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(renderTargets[frameIndex], D3D12_RESOURCE_STATE_PRESENT, D3D12_RESOURCE_STATE_RENDER_TARGET)); // here we again get the handle to our current render target view so we can set it as the render target in the output merger stage of the pipeline CD3DX12_CPU_DESCRIPTOR_HANDLE rtvHandle(rtvDescriptorHeap->GetCPUDescriptorHandleForHeapStart(), frameIndex, rtvDescriptorSize); // get a handle to the depth/stencil buffer CD3DX12_CPU_DESCRIPTOR_HANDLE dsvHandle(dsDescriptorHeap->GetCPUDescriptorHandleForHeapStart()); // set the render target for the output merger stage (the output of the pipeline) commandList->OMSetRenderTargets(1, &rtvHandle, FALSE, &dsvHandle); // Clear the render target by using the ClearRenderTargetView command const float clearColor[] = { 0.0f, 0.2f, 0.4f, 1.0f }; commandList->ClearRenderTargetView(rtvHandle, clearColor, 0, nullptr); // clear the depth/stencil buffer commandList->ClearDepthStencilView(dsDescriptorHeap->GetCPUDescriptorHandleForHeapStart(), D3D12_CLEAR_FLAG_DEPTH, 1.0f, 0, 0, nullptr); // set root signature commandList->SetGraphicsRootSignature(rootSignature); // set the root signature // set constant buffer descriptor heap ID3D12DescriptorHeap* descriptorHeaps[] = { mainDescriptorHeap[frameIndex] }; commandList->SetDescriptorHeaps(_countof(descriptorHeaps), descriptorHeaps); // set the root descriptor table 0 to the constant buffer descriptor heap commandList->SetGraphicsRootDescriptorTable(0, mainDescriptorHeap[frameIndex]->GetGPUDescriptorHandleForHeapStart()); // draw triangle commandList->RSSetViewports(1, &viewport); // set the viewports commandList->RSSetScissorRects(1, &scissorRect); // set the scissor rects commandList->IASetPrimitiveTopology(D3D_PRIMITIVE_TOPOLOGY_TRIANGLELIST); // set the primitive topology commandList->IASetVertexBuffers(0, 1, &vertexBufferView); // set the vertex buffer (using the vertex buffer view) commandList->IASetIndexBuffer(&indexBufferView); commandList->DrawIndexedInstanced(6, 1, 0, 0, 0); // draw first quad // transition the "frameIndex" render target from the render target state to the present state. If the debug layer is enabled, you will receive a // warning if present is called on the render target when it's not in the present state commandList->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(renderTargets[frameIndex], D3D12_RESOURCE_STATE_RENDER_TARGET, D3D12_RESOURCE_STATE_PRESENT)); hr = commandList->Close(); if (FAILED(hr)) { Running = false; } } void Render() { HRESULT hr; UpdatePipeline(); // update the pipeline by sending commands to the commandqueue // create an array of command lists (only one command list here) ID3D12CommandList* ppCommandLists[] = { commandList }; // execute the array of command lists commandQueue->ExecuteCommandLists(_countof(ppCommandLists), ppCommandLists); // this command goes in at the end of our command queue. we will know when our command queue // has finished because the fence value will be set to "fenceValue" from the GPU since the command // queue is being executed on the GPU hr = commandQueue->Signal(fence[frameIndex], fenceValue[frameIndex]); if (FAILED(hr)) { Running = false; } // present the current backbuffer hr = swapChain->Present(0, 0); if (FAILED(hr)) { Running = false; } } void Cleanup() { // wait for the gpu to finish all frames for (int i = 0; i < frameBufferCount; ++i) { frameIndex = i; WaitForPreviousFrame(); } // get swapchain out of full screen before exiting BOOL fs = false; if (swapChain->GetFullscreenState(&fs, NULL)) swapChain->SetFullscreenState(false, NULL); SAFE_RELEASE(device); SAFE_RELEASE(swapChain); SAFE_RELEASE(commandQueue); SAFE_RELEASE(rtvDescriptorHeap); SAFE_RELEASE(commandList); for (int i = 0; i < frameBufferCount; ++i) { SAFE_RELEASE(renderTargets[i]); SAFE_RELEASE(commandAllocator[i]); SAFE_RELEASE(fence[i]); SAFE_RELEASE(mainDescriptorHeap[i]); SAFE_RELEASE(constantBufferUploadHeap[i]); }; SAFE_RELEASE(pipelineStateObject); SAFE_RELEASE(rootSignature); SAFE_RELEASE(vertexBuffer); SAFE_RELEASE(indexBuffer); SAFE_RELEASE(depthStencilBuffer); SAFE_RELEASE(dsDescriptorHeap); } void WaitForPreviousFrame() { HRESULT hr; // swap the current rtv buffer index so we draw on the correct buffer frameIndex = swapChain->GetCurrentBackBufferIndex(); // if the current fence value is still less than "fenceValue", then we know the GPU has not finished executing // the command queue since it has not reached the "commandQueue->Signal(fence, fenceValue)" command if (fence[frameIndex]->GetCompletedValue() < fenceValue[frameIndex]) { // we have the fence create an event which is signaled once the fence's current value is "fenceValue" hr = fence[frameIndex]->SetEventOnCompletion(fenceValue[frameIndex], fenceEvent); if (FAILED(hr)) { Running = false; } // We will wait until the fence has triggered the event that it's current value has reached "fenceValue". once it's value // has reached "fenceValue", we know the command queue has finished executing WaitForSingleObject(fenceEvent, INFINITE); } // increment fenceValue for next frame fenceValue[frameIndex]++; }
Comments
Excellent work! Please keep them coming
on Apr 16 `16
AllanF
will do~ working on the next one right now, transformations and world view and projection space matrices
on Apr 16 `16
iedoc
so,why we need map?
on May 06 `16
Fortis
We need to use map to get the CPU address (gpu virtual address) of the resource so we can write to it
on May 06 `16
iedoc
I believe it is possible to make a constant buffer heap with less than 64k size (https://msdn.microsoft.com/en-us/library/windows/desktop/dn903813(v=vs.85).aspx), as the CD3DX12_RESOURCE_DESC::Buffer(UINT64 width) automatically sets alignment to 0 (which is equivalent to setting it to 64k) - basically it is 64k aligned - we're just saying we won't be using the remaining space by specifying width<64k - basically k*constantBufferSizeAligned should be fine - where k are the number of constant buffers we want to keep in the heap.
on May 15 `16
lightxbulb
A thing I don't get is why do we cast iList to BYTE* when pData is expecting void*.
on May 15 `16
lightxbulb
What I don't get is why can we specify: D3D12_DESCRIPTOR_RANGE ranges[2]; rootParamater[0].DescriptorTable.NumDescriptorRanges = 2; rootParamater[0].DescriptorTable.pDescriptorRanges = ranges; .. when commandList->SetGraphicsRootDescriptorTable( 1, ...) throws an error because it works based on PARAMETER index, not RANGE index. There is no way to bind the handle for "range[1]" unless you go: D3D12_DESCRIPTOR_RANGE ranges[2]; rootParamater[0].DescriptorTable.NumDescriptorRanges = 1; rootParamater[0].DescriptorTable.pDescriptorRanges = &ranges[0]; rootParamater[1].DescriptorTable.NumDescriptorRanges = 1; rootParamater[1].DescriptorTable.pDescriptorRanges = &ranges[1]; .. i.e. 1x range MUST be in its own discrete parameter. This seems retarded? Why have ranges at all? Why not just put the range data at the root parameter scope? Every example I've seen always specifies 1x range entry in the params, so I don't understand what NumDescriptorRanges was for. What am I missing. Any ideas?
on Oct 27 `16
Simon
hey simon, could you ask that in the questions section? it seems like a really good question but i'm having a hard time reading through that code in the comment. also others might benefit from the question as well
on Oct 27 `16
iedoc
Hi, I try to have 2 constant buffer (so register b0 and b1 if I understood well) For that I moodify the definition of root signature like this: https://pastebin.com/heGLWimd But after I am lost I tried multiple thing when created yhe constant buffer upload heaps, but didn't find how handle that, is someone know how to do that? Thank
on Dec 21 `17
Zeldarck
These 2 are mentioned: "Range.BaseShaderRegister" and "Range[k]" 1. It would be more clear using "Ranges". 2. Actually, BaseShaderRegister is on the individual Range[k].
on Jan 09 `19
macrod