This tutorial is part of a Collection: 04. DirectX 12 - Braynzar Soft Tutorials
22429
views
09. Transformations and World View Projection Space Matrices
In this tutorial, we will learn about matrices, transformations, world/view/projection space matrices, and constant buffers per draw.
BzTuts09.rar 70.54 kb
1336 downloads
####Overview#### In this tutorial, we will learn about matrices, transformations, world/view/projection space matrices, and constant buffers per draw. - *Matrices* - *Transformations* - *World/View/Projection Spaces* - *Constant Buffers Per Draw* Alright, lets start with matrices. ####Matrices#### A matrix is a two dimensional array. We can use matrices to represent transformations which include translation, rotation, and scaling, as well as spaces, which include World (all transformations), view, and projection. In 3D graphics, we use 4x4 matrices, but 4x3 matrices can be used for skinning meshes, which saves some bandwidth when sending them to the GPU. ##Vector/Matrix Multiplication## We are able to transform a vertex from object space, to world space, to view space, and finally to projection space by multiplying it with each of the space matrices. The vertex position starts out in object space, which is the space it was created in a 3D modeling program. Generally objects in object space are centered around the point (0,0,0). The number of elements in a vector must be equal to the number of rows in a matrix for the multiplication to work. The result is going to be another vector equal the the number of columns in the matrix. To multiply, we get the dot product of the vector with each column in the matrix. The dot product will give us a scalar variable. This is how to calculate the dot product: [x1, y1, z1, w1] . [x2, y2, z2, w2] = [x1*x2 + y1*y2 + z1*z2 + w1*w2] = a scalar The dot product of a column with the vector gives us an element in the final vector. For example, the dot product of the first column and the vector will give us "x", the dot product of the second column and the vector will give us "y". Lets look at an example: vector x matrix [ 5, 6, 7, 8] [1, 2, 3, 4] x [ 9, 10, 11, 12] = [x, y, z, w] [13, 14, 15, 16] [17, 18, 19, 20] dot product of the vector and first row [ 5] [1, 2, 3, 4] . [ 9] = [1*5 + 2*9 + 3*13 + 4*17] = [5 + 18 + 39 + 68] = 130 = x [13] [17] dot product of the vector and second row [ 6] [1, 2, 3, 4] . [10] = [1*6 + 2*10 + 3*14 + 4*18] = [6 + 20 + 42 + 72] = 140 = y [14] [18] dot product of the vector and third row [ 7] [1, 2, 3, 4] . [11] = [1*7 + 2*11 + 3*15 + 4*19] = [7 + 22 + 45 + 76] = 150 = z [15] [19] dot product of the vector and fourth row [ 8] [1, 2, 3, 4] . [12] = [1*8 + 2*12 + 3*16 + 4*20] = [8 + 24 + 48 + 80] = 160 = w [16] [20] the resulting vector is then: [130, 140, 150, 160] ##Matrix/Matrix Multiplication## A matrix/matrix multiplication will result in another matrix that is x rows and y columns, where x is the number of rows in the second matrix, and y is the number of columns in the first matrix. We basically do the same thing as we did with the vector matrix multiplication. Lets look at an example: matrix x matrix [21, 22, 23, 24] [ 5, 6, 7, 8] [_00, _01, _02, _03] [25, 26, 27, 28] x [ 9, 10, 11, 12] = [_10, _11, _12, _13] [29, 30, 31, 32] [13, 14, 15, 16] [_20, _21, _22, _23] [33, 34, 35, 36] [17, 18, 19, 20] [_30, _31, _32, _33] Get the first row of the final matrix Get the first element of this row in the final matrix: [ 5] [21, 22, 23, 24] . [ 9] = [21*5 + 22*9 + 23*13 + 24*17] = [105 + 198 + 299 + 408] = 1010 = _00 [13] [17] Get the second element of this row in the final matrix: [ 6] [21, 22, 23, 24] . [10] = [21*6 + 22*10 + 23*14 + 24*18] = [126 + 220 + 322 + 432] = 1100 = _01 [14] [18] Get the third element of this row in the final matrix: [ 7] [21, 22, 23, 24] . [11] = [21*7 + 22*11 + 23*15 + 24*19] = [147 + 242 + 345 + 456] = 1190 = _02 [15] [19] Get the fourth element of this row in the final matrix: [ 8] [21, 22, 23, 24] . [12] = [21*8 + 22*12 + 23*16 + 24*20] = [168 + 264 + 368 + 480] = 1280 = _03 [16] [20] first row of final matrix is: [1010, 1100, 1190, 1280] now lets get the second row of the final matrix Get the first element of this row in the final matrix: [ 5] [25, 26, 27, 28] . [ 9] = [25*5 + 26*9 + 27*13 + 28*17] = [125 + 234 + 351 + 476] = 1186 = _00 [13] [17] Get the second element of this row in the final matrix: [ 6] [25, 26, 27, 28] . [10] = [25*6 + 26*10 + 27*14 + 28*18] = [150 + 260 + 378 + 504] = 1292 = _01 [14] [18] Get the third element of this row in the final matrix: [ 7] [25, 26, 27, 28] . [11] = [25*7 + 26*11 + 27*15 + 28*19] = [175 + 286 + 405 + 532] = 1398 = _02 [15] [19] Get the fourth element of this row in the final matrix: [ 8] [25, 26, 27, 28] . [12] = [25*8 + 26*12 + 27*16 + 28*20] = [200 + 312 + 432 + 560] = 1504 = _03 [16] [20] first row of final matrix is: [1186, 1292, 1398, 1504] I won't do the last two rows here, but i'll give you the final result so if you want to try yourself you can compare [1010, 1100, 1190, 1280] [1186, 1292, 1398, 1504] [1362, 1484, 1606, 1728] [1538, 1676, 1814, 1952] You can see that the vector is being multiplied by each column in the matrix. The order in which you multiply matrices DOES MATTER, which we will show in a bit. ##Row Major/Column Major Ordering## Matrices can be stored in either Row Major or Column Major order. Row Major Matrix Row Major Matrix [_00, _01, _02, _03] [_00, _10, _20, _30] [_10, _11, _12, _13] [_01, _11, _21, _31] [_20, _21, _22, _23] [_02, _12, _22, _32] [_30, _31, _32, _33] [_03, _13, _23, _33] As you can see above, the row major matrix has a vector for each row ([_00, _01, _02, _03] would be a vector), which means the the values in a vector are next to each other in memory. The column major has vectors in each column, so that the values in each vector are separated. The DirectX math library packs matrices in a row major order. HLSL packs matrices in a column major order so that it can easily do the vector matrix multiplication, by multiplying the vector by each row in the matrix, rather than each column. This is convenient because HLSL is now able to store an entire column (now a row in HLSL) in the GPU registers for the calculation in a single instruction. It also takes advantage of SSE, which is an extension to SIMD or Single Instruction Multiple Data, so that it can do the dot product in a single cpu cycle. Although HLSL packs matrices in a column major order, it actually reads the matrices in a row major order, which is how it is able to grab a row from a matrix and store it in a register for calculations. Lets take a quick look at HLSL assembly code for a vector multiplied by a matrix: this is the matrix we are passing from our app, which is in row major ordering: [ 1, 2, 3, 4] [ 5, 6, 7, 8] [ 9, 10, 11, 12] [13, 14, 15, 16] this is how HLSL is storing the matrix: [ 1, 5, 9, 13] [ 2, 6, 10, 14] [ 3, 7, 11, 15] [ 4, 8, 12, 16] HLSL code: output.pos = mul(input.pos, wvpMat); HLSL assembly: 0: dp4 r0.x, v0.xyzw, cb0[0].xyzw // r0.x <- output.pos.x 1: dp4 r0.y, v0.xyzw, cb0[1].xyzw // r0.y <- output.pos.y 2: dp4 r0.z, v0.xyzw, cb0[2].xyzw // r0.z <- output.pos.z 3: dp4 r0.w, v0.xyzw, cb0[3].xyzw // r0.w <- output.pos.w cb0[0] is now this in HLSL (this was a column in our app, but is now a row in HLSL, which makes the multiplication easier): [ 1, 5, 9, 13] cb0[0] is the first row of the matrix. xyzw are the values it is storing in the register from that row. Since HLSL has packed the matrix in a column major order, it can now quickly store each "column" in the registers with a single instruction. dp4 is the preferred instruction for the dot product i guess, as it performs the best. Here is some HLSL assembly from when i did matrix x vector (backwords, but did not have to transpose the matrix before i sent it to HLSL): HLSL code: output.pos = mul(wvpMat, input.pos); HLSL assembly: 0: mul r0.xyzw, v0.xxxx, cb0[0].xyzw 1: mul r1.xyzw, v0.yyyy, cb0[1].xyzw 2: add r0.xyzw, r0.xyzw, r1.xyzw 3: mul r1.xyzw, v0.zzzz, cb0[2].xyzw 4: add r0.xyzw, r0.xyzw, r1.xyzw 5: mul r1.xyzw, v0.wwww, cb0[3].xyzw There are more instructions here to get the same thing done, and dp4 is not used at all. This is why HLSL packs matrices in a column major order, which means if we are using the DirectX Math library, we will need to transpose matrices before we send them to the shaders, which brings us to transposing matrices. ##Matrix Transposition## Matrix Transposition is actually really simple. All it does is change a row major matrix to a column major, and a column major to a row major. ####Matrix Types#### ##Matrix Identity## An identity matrix is a matrix that when multiplied by another matrix, will result in that other matrix. You will usually always want to initialize your matrices to an identity matrix, which looks like this: Identity Matrix: [1, 0, 0, 0] [0, 1, 0, 0] [0, 0, 1, 0] [0, 0, 0, 1] Notice how the diagonal elements are 1's while the rest are zeroes? Like i mentioned above, anything you multiply this matrix by will result in whatever you multiplied it by. It does not matter whether you are working with row major or column major layouts, an identity matrix will always look the same. You can use the .[https://msdn.microsoft.com/en-us/library/windows/desktop/microsoft.directx_sdk.matrix.xmmatrixidentity%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396][XMMatrixIdentity()] method of the DirectX Math library to create an identity matrix. ##Transformations## Transformations are matrices that describe translation, rotation, and scaling. To get the world matrix, you will multiply these matrices together, which brings an object out of object space, and into "world" space. It matters very much the order in which you multiply these matrices (and all matrices). For example, if you want to spin an object 90 degrees, then move the object 10 units, you would place the rotation matrix before the translation matrix (rotmat*transmat). If you were to do the multiplication in the opposite order (transmat*rotmat), it would first move the object 10 units, then rotate the object around the point 0,0,0. ##Translation Matrix## The translation matrix "moves" things around by changing their position in 3D space. Objects start in object space, where the object is generally centered around the point 0,0,0. To move the object from that 0,0,0 position, you will multiply each vertex position in the object by a transformation matrix. The transformation matrix looks like this: [1, 0, 0, x] [0, 1, 0, y] [0, 0, 1, z] [0, 0, 0, 1] Where x, y, and z are the position you want the object to move to. You can use the .[https://msdn.microsoft.com/en-us/library/windows/desktop/microsoft.directx_sdk.matrix.xmmatrixtranslation(v=vs.85).aspx][XMMatrixTranslation()] method of the DirectX Math library to create a translation matrix. ##Rotation Matrices## There are three different rotation matrices, one for the x axis, one for the y axis, and one for the z axis. Rotations always rotate around the point 0,0,0, which means if you just want to spin an object, you must rotate first, then move the object using a translation matrix. If you wanted an object to "orbit" around another object for example, you would first translate that object the distance away you want the object to rotate around the other object, then rotate the object, then finally translate the object again to the position you want it rotating around. **X Axis Rotation Matrix** [1, 0, 0, 0] [0, cos(A), -sin(A), 0] [0, sin(A), cos(A), 0] [0, 0, 0, 1] **Y Axis Rotation Matrix** [ cos(A), 0, sin(A), 0] [ 0, 1, 0, 0] [-sin(A), 0, cos(A), 0] [ 0, 0, 0, 1] **Z Axis Rotation Matrix** [cos(A), -sin(A), 0, 0] [sin(A), cos(A), 0, 0] [ 0, 0, 1, 0] [ 0, 0, 0, 1] Where A is the angle in radians you would like to rotate There are a couple different rotation matrix methods in the DirectX Math library: .[https://msdn.microsoft.com/en-us/library/windows/desktop/microsoft.directx_sdk.matrix.xmmatrixrotationx(v=vs.85).aspx][XMMatrixRotationX()] .[https://msdn.microsoft.com/en-us/library/windows/desktop/microsoft.directx_sdk.matrix.xmmatrixrotationy(v=vs.85).aspx][XMMatrixRotationY()] .[https://msdn.microsoft.com/en-us/library/windows/desktop/microsoft.directx_sdk.matrix.xmmatrixrotationz(v=vs.85).aspx][XMMatrixRotationZ()] .[https://msdn.microsoft.com/en-us/library/windows/desktop/microsoft.directx_sdk.matrix.xmmatrixrotationrollpitchyawfromvector(v=vs.85).aspx][XMMatrixRotationRollPitchYawFromVector()] .[https://msdn.microsoft.com/en-us/library/windows/desktop/microsoft.directx_sdk.matrix.xmmatrixrotationrollpitchyaw(v=vs.85).aspx][XMMatrixRotationRollPitchYaw()] .[https://msdn.microsoft.com/en-us/library/windows/desktop/microsoft.directx_sdk.matrix.xmmatrixrotationquaternion(v=vs.85).aspx][XMMatrixRotationQuaternion()] .[https://msdn.microsoft.com/en-us/library/windows/desktop/microsoft.directx_sdk.matrix.xmmatrixrotationaxis(v=vs.85).aspx][XMMatrixRotationAxis()] ##Scaling Matrix## The scale matrix scales the object relative to the point 0,0,0. This means that you will almost always want to scale first before any of the other transformations. If you for example translated an object first, then scaled it, it would end up looking stretched, rather than scale nicely and keep its original shap. [x, 0, 0, 0] [0, y, 0, 0] [0, 0, z, 0] [0, 0, 0, 1] Where x, y, and z are the multiples you would like to scale each axis by. Setting x,y,z to 1 would keep the original size of the object, and in fact would make this matrix be an identity matrix. Again, remember that you will usually want to scale first before any other transformation. You can use the .[https://msdn.microsoft.com/en-us/library/windows/desktop/microsoft.directx_sdk.matrix.xmmatrixscaling(v=vs.85).aspx][XMMatrixScaling()] method of the DirectX Math library to create a translation matrix. ####World/View/Projection Space Matrices#### When you create an object in a 3D modeling program such as 3DS Max, you are creating the object in **Object Space**. A 3D model loaded into your program starts in object space, and you are able to move it between spaces by multiplying each of it's vertices by the space you want it to move into, which is ultimately projection space when passing it through the graphics pipeline. To get a 3D model from object space to projection space, you must first multiply all vertices by a world matrix, which contains transformations which positions, scales and orientates it in your virtual world. After that you will multiply the vertices by a view matrix, which represents the camera. Once it's in view space, you will multiply all vertices by the projection matrix, which moves them into projection space. ##World Matrix## The world matrix is a combination of transformation matrices multiplied together, which moves a 3D model from object space to your virtual world's space. We talked about transformations above. So to get the world matrix, multiply all the transformation matrices in the correct order, and the resulting matrix will be the world space matrix. Every object in your scene will have it's own world matrix, which means the constant buffer containg this matrix will need to be updated PER 3D model. ##View Space## View space is actually the cameras space. In 3D graphics, the math is much simpler when the entire world moves around the camera. We can get this to happen by multiplying everything by a view space matrix, which contains the cameras position, direction it is looking, and right and up vectors. When moving around your virtual world, you are not actually moving the camera, but moving everything the opposite way as you "would" have been moving the camera. This is what the view space matrix looks like: [right.x, up.x, forward.x, position.x] [right.y, up.y, forward.y, position.y] [right.z, up.z, forward.z, position.z] [0, 0, 0, 1 ] The right, up and forward vector are normalized vectors (a.k.a. unit vectors), which means they are 1.0 unit in length. The describe the camera's right direction in the virtual world, the up direction, and the forward direction (the direction the camera is facing). The position vector is an x,y,z coordinate describing the position of the camera in the virtual world. By multiplying the world space vertices by the view space matrix, you are moving the vertices from world space, into camera space. You can use the .[https://msdn.microsoft.com/en-us/library/windows/desktop/microsoft.directx_sdk.matrix.xmmatrixlookatlh%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396][XMMatrixLookAtLH()] method of the DirectX Math library to get a view matrix ##Projection Space## There are two types of projection space, perspective projection, and orthogonal (a.k.a. orthographic) projection. **Orthographic Projection** Orthographic projection is a projection space that no matter how far an object is from a camera, it will always appear to be the same size. If you've ever used a 3D Modeling program before, you will have seen viewports for top, left, right, front, etc. There are different ways to build an orthographic projection matrix, i'll show you how microsoft does it in the DirectX Math library with the XMMatrixOrthographicLH() function: w = 2/width h = 2/height a = 1.0f / (FarZ-NearZ) b = -a * NearZ orthographic projection matrix [w, 0, 0, 0] [0, h, 0, 0] [0, 0, a, 0] [0, 0, b, 1] width and height are the width and height of your viewing window (viewport). zfar is the furthest an object can be before you can't see it anymore. Anything further away than zfar from the camera will now be rendered. znear is the closest an object can be to the camera before it is not rendered. Anything closer to the camera (including behind it) than znear will not be rendered. You can use the .[https://msdn.microsoft.com/en-us/library/windows/desktop/microsoft.directx_sdk.matrix.xmmatrixorthographiclh(v=vs.85).aspx][XMMatrixOrthographicLH()] method of the DirectX Math library to get an orthographic projection matrix **Perspective Projection** Perspective Projection is how we humans view the world. Objects that are further away will appear to be smaller than objects closer to our eyes. In our virtual world, we will want this effect, so we multiply our vertices (once in view space) by a perspective projection matrix. There are a lot of different ways to construct a perspective projection matrix, so i'm just going to show how the DirectX Math library does it when you use the XMMatrixPerspectiveFovLH() function. aspectRatio = width/height h = 1 / tan(fovy*0.5) w = h / aspectRatio a = zfar / (zfar - znear) b = (-znear * zfar) / (zfar - znear) perspective projection matrix [w, 0, 0, 0] [0, h, 0, 0] [0, 0, a, 1] [0, 0, b, 0] You can use the .[https://msdn.microsoft.com/en-us/library/windows/desktop/microsoft.directx_sdk.matrix.xmmatrixperspectivefovlh(v=vs.85).aspx][XMMatrixPerspectiveFovLH()] or .[https://msdn.microsoft.com/en-us/library/windows/desktop/microsoft.directx_sdk.matrix.xmmatrixperspectivelh(v=vs.85).aspx][XMMatrixPerspectiveLH()] methods of the DirectX Math library to get an perspective projection matrix ##Moving From Space to Space## You can multiply multiple matrices together to save space and compute cycles by storing the resulting matrix. For example, Since the projection matrix does not usually change often, and the view matrix usually only changes per frame, you can calculate the view/projection matrix one time per frame, then multiply each objects world space by that matrix, instead of multiplying world*view*projection every time. So just to recap, to get a 3d model from object (a.k.a. local) space to projection space, we multiply each vertex by world, then view, then projection. It will look like this: finalvertex.pos = vertex.pos * worldMatrix * viewMatrix * projectionMatrix; In our vertex shader, we will actually be using a world/view/projection matrix (a single matrix containing all three spaces) to move each vertex into projection space, like this: output.pos = mul(input.pos, wvpMat); ####Constant Buffers Per Draw (or object)#### I think that this needs some explaining. Mostly because when i started I ended up crashing my computer a couple times before I remembered the alignment requirements. I'm going to start with a quick review of the Root Signature. ##Root Signature## The root in root signature came from the idea the roots of a plant provide nutrients (data) to the rest of the plant (pipeline). The root signature tells the rest of the pipeline what data to expect and where to find it. When creating a root signature, you have three types of root parameters, root constants, root descriptors, and descriptor tables. **Root Constants** Root constants each take 1 DWORD of space in the root signature. They have no redirections when shaders access them, meaning shaders can access them faster than accessing data from root descriptors or descriptor tables. A float constant is 4 bytes, or 1 DWORD, so a float4 would be 16 bytes, or 4 DWORDS. You will want to use root constants for data that changes very frequently or needs to be accessed as fast as possible. There is a limit to how much space a root signature can take up, which we will talk about later. You can set a root signature with the .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn903909(v=vs.85).aspx][SetGraphicsRoot32BitConstant()] function of a command list. **Root Descriptors** Root descriptors are a 64 bit GPU Virtual Address. They point to the chunk of memory on a GPU where the resource is stored. This means they each take up 2 DWORDS of space. Since root descriptors are only a virtual address, they do not have out of bounds checking like descriptors in a descriptor heap, which means you need to make sure that your shaders do not access uninitialized memory when accessing data through a root descriptor. Root descriptors have one indirection when shaders access data through them. The root descriptor contains a memory address, so the shader must first read that address, then get the data at that address, making it one indirection. Only Constant Buffer Views (CBV), and SRV/UAV buffers containing 32 bit FLOAT/UINT/SINT can be used in root descriptors. Any kind of buffer like Texture2D that requires format conversion cannot be used. So textures can only be accessed through descriptor tables. Buffers that change frequently, such as per draw or per object are good candidates for root descriptors. An example is a constant buffer containing the world/view/projection matrix, which gets changed per object/draw call. In this tutorial we will be using root descriptors to bind our constant buffers containing the world/view/projection matrix. You can change the root descriptor with the .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn903911%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396][SetGraphicsRootConstantBufferView()] function of a command list. **Descriptor Tables** Descriptor tables contain a 16 bit address from the offset of the currently bound descriptor heap (only one heap can be set at any time), and a number representing the range of descriptors to use. Descriptor tables take only 1 DWORD of space for a root signature, but have 2 indirections when shaders access data through descriptor tables. Shaders must first get the offset into a descriptor heap from the descriptor table, then get the memory location from the descriptor in the descriptor table, then finally get the data that the memory location points to, making 2 indirections. You must make sure that a descriptor heap is bound when using a descriptor table. Buffers like Textures will be accessed through descriptor tables. **Static Samplers** Finally we have static samplers. These are samplers that do not change, and are stored directly in the root signature. They are actually stored in GPU memory, so they do not count towards the total memory limit of a root signature like root constants, root descriptors or descriptor tables do. **Versioning** Versioning is the term for when data that is bound to the pipeline needs to change multiple times per command list being executed. For example, we have a constant buffer that contains a world/view/projection matrix. This matrix must be changed for each draw call. We need a way to version this constant buffer so when we change it for the second draw call, the first draw call will still use the matrix it needs. We get free versioning when for root constants and root descriptors. This is why you want data that is accessed often to be either a root descriptor or a root constant. Any time a root argument changes, such as a root descriptor, DirectX will make a complete copy of the root signature and change the argument you need, so that the previous draw call will still use the arguments that were previously set. We must do our own versioning when data is accessed through descriptor tables. There are a few different ways to do it, but the easiest is any time you need to change the data for a draw call, where the data is accessed through the descriptor table, you will add a descriptor to the end of a descriptor heap, or you can reuse a stale descriptor (a descriptor that has already been used but is for sure not being used now). **Root Signature Limits** The DirectX 12 driver limits root signatures to 64 DWORDS. There is more to this, such as how some hardware has 16 DWORDS of dedicated root signature memory, where when it goes over that 16 DWORDS, it uses one of the 16 DWORDS as an address to a chunk of memory where the rest of the root signature is. Because of this you will want to make parameters that change the most frequent at the beginning of the root signature. When a root argument is changed in the first 16 DWORDS on those devices, only the first 16 DWORDS of the root signature will be versioned. There are three tiers of hardware, where each tier has a limit to resource binding, lower being smaller limits, where you could find here: .[https://en.wikipedia.org/wiki/Feature_levels_in_Direct3D][feature levels in d3d] **Creating Root Signatures** You can create root signatures either in code during runtime (which we do in these tutorials), or you could create them directly in the shader code. When root signatures are defined in the shaders, either only one shader can have the root signature code, or the shaders which do define a root signature must have the exact same root signature. Root signatures are used when creating PSO's, because the hardware needs to know what data, where and how its stored the shaders will expect, and so it can optimize the pipeline state. Shaders will by default use the root signature that was defined in the shader code (if it exists), but that root signature can be overwritten when you define a root signature in code and use that one when creating a PSO. The PSO will fail to create if more than one shader contains different root signatures. **Changing a Root Signature** Although there is actually very little cost for changing the root signature, you will usually not want to change it often. The reason is that when you change a root signature, all data that was bound becomes unbound, and you will have to bind root arguments all over again for the new root signature. If you bind the same root signature that is currently bound, this will not happen, and all root arguments that were bound, stay bound. ##Constant Buffer Versioning## I know i just talked a little about versioning for root signatures above, but now I'm going to explain it in a little more detail for constant buffers and how we will update the constant buffer for each draw call, for each frame. We will not use a descriptor heap in this tutorial like we did in the previous tutorial. Instead, we will now store the descriptor directly in the root signature, we will use a root descriptor parameter. Root descriptors do versioning for us, which makes things easier. Constant buffer data is stored in a resource heap. We need to make sure that we do not update data in a resource heap that might currently be being accessed by a shader from a previous frame. We get around this by creating 3 resource heaps, one for each frame. We have two objects in our scene, and they each have their own constant buffer data. We could create a resource heap for each frame, for each object, but this is a waste, so what we do is store both objects constant buffer in each of the three resource heaps, then set the root descriptor to the correct memory address of the constant buffer we need to use for the next draw call. **Constant Buffer Alignment** This is where things will get messy if you do not understand the alignment requirements for buffers. The size of **Resource Heaps** must be a multiple of **64KB** for single-textures or buffers, which means for example if we store a constant buffer containing a single float in the resource, which is only 4 bytes, we must allocate 1024*64 bytes, or 65,536 bytes. if we have a constant buffer that is 16,385 floats, which is 65,540 bytes, making it larger than 64KB, we need to allocate (1024 * 64 * 2) bytes, which is 128KB. Multi-sampled texture resources must be **4MB** aligned. **Constant Buffers** themselves must be stored at **256 byte** offsets from the beginning of a resource heap. This is the one that might get you when you first start using constant buffers. When you set a root descriptor, you give it the memory location of the data you want to use. The memory address must be the memory address of the resource heap, plus a multiple of 256 byte offset. One way to get around this is by simply padding your constant buffers to be 256 byte aligned, for example: struct ConstantBuffer { float4x4 wvpMat; // now pad the constant buffer to be 256 byte aligned float4 padding[48]; } This way, we can set the offset of the next constant buffer in the heap to be resourceHeapAddress + sizeof(ConstantBuffer). This will work, however when you use memcpy, you might want to know how much of the constant buffer to copy, which is 16 bytes in the case above, otherwise if you memcpy sizeof(ConstantBuffer) like you normally would, you will end up copying 48 extra bytes that mean nothing. In this tutorial, what we do is create a variable called ConstantBufferPerObjectAlignedSize, which is a multiple of 256, and we store and access the next constant buffer by adding this value to the beginning of the resource heap virtual address. If you try to set the root descriptor to an address that is not a multiple of 256 bytes from the beginning of the resource heap, you may experience an operating system crash, as i had mistakenly done when i started. So to review, in this tutorial we create 3 resource heaps, allocating 64KB for each of them. In each resource heap, we store two constant buffers, one for each of the objects in our scene. The first objects constant buffer is stored at the beginning of the resource heaps, while the second objects constant buffer data is stored at the beginning of the resource heap PLUS 256 bytes. If our constant buffers were 260 bytes large, the second constant buffer would be stored at the beginning of the resource heap PLUS 512 bytes. ##Right Handed vs. Left Handed Coordinate Systems## There are two types of coordinate systems. Most graphics libraries work with a left handed coordinate system, but some software such as 3DS Max work with a right handed coordinate system. The difference is only that the z-axis is flipped from one to the other. We will be working with the left handed coordinate system in these tutorials. **Left Handed Coordinate System** The left handed coordinate system is when the positive y axis is pointing up, the positive x axis is pointing right, and the positive z axis is pointing forward. **Right Handed Coordinate System** The right handed coordinate system is when the positive y axis is pointing up, the positive x axis is pointing right, and the positive z axis is pointing towards you. ##XMMATRIX/XMVECTOR vs. XMFLOAT4X4/XMFLOAT4 (a.k.a. DirectX Math Library)## Please read .[https://msdn.microsoft.com/en-us/library/ee415571.aspx?f=255&MSPPError=-2147217396][DirectX Math Programming Guide], specifically the getting started section. A lot of the problems people have is storing and passing XMMATRIX and XMVECTOR around. These are SIMD types and have a lot of restrictions, so it is usually easier to use the storage types XMFLOAT4X4 and XMFLOAT4 for storing and passing around data, and loading them into XMMATRIX or XMVECTOR when doing operations on them. If you insist on storing and passing around XMMATRIX and XMVECTOR types, then please read .[https://msdn.microsoft.com/en-us/library/ee418728.aspx#Call_Conventions][this]. Using XMMATRIX and XMVECTOR as locals or globals is fine and there are no issues there, its storing them as members of classes or passing them between functions where it gets messy. In these tutorials, we will be storing the data and passing it around in FLOAT4X4 and FLOAT4 types. When we want to do any work on these variables, we load them into a XMMATRIX or XMVECTOR variable, do the work, then store the results back into a FLOAT4X4 or FLOAT4. You can load a FLOAT4X4 into an XMMATRIX with this function: .[https://msdn.microsoft.com/en-us/library/windows/desktop/microsoft.directx_sdk.loading.xmloadfloat4x4(v=vs.85).aspx][XMLoadFloat4x4()] You can store an XMMATRIX into a FLOAT4X4 with this function: .[https://msdn.microsoft.com/en-us/library/windows/desktop/microsoft.directx_sdk.storing.xmstorefloat4x4%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396][XMStoreFloat4x4()] You can load a FLOAT4 into an XMVECTOR with this function: .[https://msdn.microsoft.com/en-us/library/windows/desktop/microsoft.directx_sdk.loading.xmloadfloat4(v=vs.85).aspx][XMLoadFloat4()] You can store an XMVECTOR into a FLOAT4 with this function: .[https://msdn.microsoft.com/en-us/library/windows/desktop/microsoft.directx_sdk.storing.xmstorefloat4(v=vs.85).aspx][XMStoreFloat4()] FLOAT4X4 and FLOAT4 are only used for storage and passing around. You cannot do operations on them directly like adding or multiplying them together. There are no math operations in the DirectX Math library that work with them. XMVECTOR and XMMATRIX are where the power of the DirectX Math library comes from. You can add them together, multiply them together, and all the math functions in the library act on them. They are SIMD types, which means that 4 values at a time can be stored in registers and operated on in a single instruction. SIMD types can be multiplied with each other up to 4 times faster than regular types, since all 4 values in a vector will be operated on at once. Look up SIMD to get a better understanding. I think thats all I wanted to go over for this tutorial, so we can now get into the code~ ####The Code#### In this tutorial, we are going to make one cube rotate around another cube. This involves creating the cubes, creating the resource heaps, creating the root signature with one root descriptor parameter, mapping the resource heaps, updating the cube's world matrices, setting the root descriptor to point to the correct location in the resource heap for the current cube, and finally drawing the cubes. Alignment requirements must be strictly followed, otherwise you will experience headaches when your computer freezes up every time you run your program (happened to me a few times before i realized i did not align my constant buffers correctly in the resource heap). ##New Globals## We have removed everything that has to do with descriptor heaps, which we were working with in the last tutorial. You will see we have updated our constant buffer structure, and have now named it *ConstantBufferPerObject*, since this constant buffer contains a world/view/projection matrix, and will be updated per object. That means this cb will be updated multiple times per command list. Because of this, we needed a way to version the constant buffer, so when we changed it, it would not affect the previous objects constant buffer. Root descriptors are versioned automatically by the drivers, so we can take advantage of that. When we update the root descriptor, the previous draw call will still use the previously set root descriptor. If we had decided to access the constant buffer through a descriptor table, we would need to do the versioning ourselves, which is not nececarily bad, but it does take make the code a little more complex, and you will usually end up using more descriptor space. I've talked about this a lot, so hopefully it's sinking in by now, it's also commented in the code, constant buffers need to be aligned at 256 byte offsets in a resource heap. This has to do with constant reads. Don't try to set a root descriptor to anything but a multiple of 256 byte aligned offset from the beginning of a resource heap, your computer may experience spontaneous combustion. This is where the variable *ConstantBufferPerObjectAlignedSize* comes in. We set this variable to be the next 256 multiple after the size of our constant buffer structure. Our constant buffer is only 16 bytes right now, so this variable is set to 256. We will use this variable when updating the constant buffers in the resource heap, and setting the root descriptor. Next we create an instance of our constant buffer object, cbPerObject. This is just a buffer to store the data, which we then memcpy to the correct constant buffer in the resource heap. After that we have our resource heaps, called constantBufferUploadHeaps. We are using upload heaps because we will be updating the heap often (twice per frame). You will see we have 3 resource heaps. We have one for each frame. This is so that when we are updating the next frames constant buffers, we do not mess with the previous frames constant buffer data, which might currently be in use by the previous frame. We could create one resource heap and store everything in there, but it's honestly much easier to keep each frame separated. Next we have 3 GPU virtual addresses (UINT8), named cbvGPUAddress. These will be pointers to each of the resource heaps. We will add *ConstantBufferPerObjectAlignedSize* to it to get the address of the second constant buffer (the first constant buffer is stored at the beginning of the resource heaps). We store the projection matrix, view matrix, along with the camera position, target, and up vectors in XMFLOAT4 and XMFLOAT4X4. These types are used for storing and passing data around, while the XMMATRIX and XMVECTOR are used for the actual mathematical operations on the data. We also store the cube's position, rotation, and world transformation matrices in these types. And then we have numCubeIndices. This is just a variable which we will contain the number of indices we want to draw per cube. // this is the structure of our constant buffer. struct ConstantBufferPerObject { XMFLOAT4X4 wvpMat; }; // Constant buffers must be 256-byte aligned which has to do with constant reads on the GPU. // We are only able to read at 256 byte intervals from the start of a resource heap, so we will // make sure that we add padding between the two constant buffers in the heap (one for cube1 and one for cube2) // Another way to do this would be to add a float array in the constant buffer structure for padding. In this case // we would need to add a float padding[50]; after the wvpMat variable. This would align our structure to 256 bytes (4 bytes per float) // The reason i didn't go with this way, was because there would actually be wasted cpu cycles when memcpy our constant // buffer data to the gpu virtual address. currently we memcpy the size of our structure, which is 16 bytes here, but if we // were to add the padding array, we would memcpy 64 bytes if we memcpy the size of our structure, which is 50 wasted bytes // being copied. int ConstantBufferPerObjectAlignedSize = (sizeof(ConstantBufferPerObject) + 255) & ~255; ConstantBufferPerObject cbPerObject; // this is the constant buffer data we will send to the gpu // (which will be placed in the resource we created above) ID3D12Resource* constantBufferUploadHeaps[frameBufferCount]; // this is the memory on the gpu where constant buffers for each frame will be placed UINT8* cbvGPUAddress[frameBufferCount]; // this is a pointer to each of the constant buffer resource heaps XMFLOAT4X4 cameraProjMat; // this will store our projection matrix XMFLOAT4X4 cameraViewMat; // this will store our view matrix XMFLOAT4 cameraPosition; // this is our cameras position vector XMFLOAT4 cameraTarget; // a vector describing the point in space our camera is looking at XMFLOAT4 cameraUp; // the worlds up vector XMFLOAT4X4 cube1WorldMat; // our first cubes world matrix (transformation matrix) XMFLOAT4X4 cube1RotMat; // this will keep track of our rotation for the first cube XMFLOAT4 cube1Position; // our first cubes position in space XMFLOAT4X4 cube2WorldMat; // our first cubes world matrix (transformation matrix) XMFLOAT4X4 cube2RotMat; // this will keep track of our rotation for the second cube XMFLOAT4 cube2PositionOffset; // our second cube will rotate around the first cube, so this is the position offset from the first cube int numCubeIndices; // the number of indices to draw the cube ##The New Root Signature## We will now use a root descriptor to access our constant buffer in the shaders. We do this by making defining our root signature with a root descriptor parameter. We start by filling out a *D3D12_ROOT_DESCRIPTOR* structure: typedef struct .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn879476%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396][D3D12_ROOT_DESCRIPTOR] { UINT ShaderRegister; UINT RegisterSpace; } D3D12_ROOT_DESCRIPTOR; - **ShaderRegister** - *This is the register we want to use to store the root descriptor in. We are using a CBV, so it is in the **b** register. We have defined our constant buffer in our vertex shader in the **b0** register, so we set this parameter to 0.* - **RegisterSpace** - *By default registers are in the **0** register space. Spaces are used mostly for convenience, when we use the same register in two different shaders. We have not defined a space in the shader, which means it is in the default register space 0, so we set this parameter to 0.* Next we create a root parameter by filling out a *D3D12_ROOT_PARAMETER* structure. This structure was explained in the previous tutorial, so I will not explain it here. In this tutorial we are creating a root descriptor, rather than a descriptor table parameter, so we set the *Descriptor* member to our root descriptor we just filled out. Only the vertex shader will use this constant buffer, so we set the visibility of this parameter to only the vertex shader. You will get better performance by only allowing the shaders that need access to the parameter to see the parameter, the GPU and DirectX drivers can optimize more by broadcasting to select shaders. On some hardware, allowing visibility to all shaders can actually perform better, since the hardware might send a single broadcast to all shaders, rather than multiple to multiple shaders. There is currently now way of knowing if visibility to all shaders will perform better than visibility to some shaders, so generally you will only want to make the parameter visible to shaders that need it. Then we create a root signature description by filling out a CD3DX12_ROOT_SIGNATURE_DESC structure (provided by the d3dx12.h header). This was also explained in the last tutorial so will not be explained here. After that we serialize the root signature. What this does is convert the root signature to bytecode, which the GPU can read and process. Once we have created the root signature description and serialized the root signature (convert to bytecode the GPU can read), we create the root signature. // create root signature // create a root descriptor, which explains where to find the data for this root parameter D3D12_ROOT_DESCRIPTOR rootCBVDescriptor; rootCBVDescriptor.RegisterSpace = 0; rootCBVDescriptor.ShaderRegister = 0; // create a root parameter and fill it out D3D12_ROOT_PARAMETER rootParameters[1]; // only one parameter right now rootParameters[0].ParameterType = D3D12_ROOT_PARAMETER_TYPE_CBV; // this is a constant buffer view root descriptor rootParameters[0].Descriptor = rootCBVDescriptor; // this is the root descriptor for this root parameter rootParameters[0].ShaderVisibility = D3D12_SHADER_VISIBILITY_VERTEX; // our pixel shader will be the only shader accessing this parameter for now CD3DX12_ROOT_SIGNATURE_DESC rootSignatureDesc; rootSignatureDesc.Init(_countof(rootParameters), // we have 1 root parameter rootParameters, // a pointer to the beginning of our root parameters array 0, nullptr, D3D12_ROOT_SIGNATURE_FLAG_ALLOW_INPUT_ASSEMBLER_INPUT_LAYOUT | // we can deny shader stages here for better performance D3D12_ROOT_SIGNATURE_FLAG_DENY_HULL_SHADER_ROOT_ACCESS | D3D12_ROOT_SIGNATURE_FLAG_DENY_DOMAIN_SHADER_ROOT_ACCESS | D3D12_ROOT_SIGNATURE_FLAG_DENY_GEOMETRY_SHADER_ROOT_ACCESS | D3D12_ROOT_SIGNATURE_FLAG_DENY_PIXEL_SHADER_ROOT_ACCESS); ID3DBlob* signature; hr = D3D12SerializeRootSignature(&rootSignatureDesc, D3D_ROOT_SIGNATURE_VERSION_1, &signature, nullptr); if (FAILED(hr)) { return false; } hr = device->CreateRootSignature(0, signature->GetBufferPointer(), signature->GetBufferSize(), IID_PPV_ARGS(&rootSignature)); if (FAILED(hr)) { return false; } ##Creating the Cube Geometry## We are now going to start working with 3D objects, so I have created a cube here, which should be more interesting than the quad we have been using. All i've done is added some vertices and indices to the code we already had from the last tutorial, and also set the new variable *numCubeIndices* to the number of indices we want to draw. // Create vertex buffer // a quad Vertex vList[] = { // front face { -0.5f, 0.5f, -0.5f, 1.0f, 0.0f, 0.0f, 1.0f }, { 0.5f, -0.5f, -0.5f, 1.0f, 0.0f, 1.0f, 1.0f }, { -0.5f, -0.5f, -0.5f, 0.0f, 0.0f, 1.0f, 1.0f }, { 0.5f, 0.5f, -0.5f, 0.0f, 1.0f, 0.0f, 1.0f }, // right side face { 0.5f, -0.5f, -0.5f, 1.0f, 0.0f, 0.0f, 1.0f }, { 0.5f, 0.5f, 0.5f, 1.0f, 0.0f, 1.0f, 1.0f }, { 0.5f, -0.5f, 0.5f, 0.0f, 0.0f, 1.0f, 1.0f }, { 0.5f, 0.5f, -0.5f, 0.0f, 1.0f, 0.0f, 1.0f }, // left side face { -0.5f, 0.5f, 0.5f, 1.0f, 0.0f, 0.0f, 1.0f }, { -0.5f, -0.5f, -0.5f, 1.0f, 0.0f, 1.0f, 1.0f }, { -0.5f, -0.5f, 0.5f, 0.0f, 0.0f, 1.0f, 1.0f }, { -0.5f, 0.5f, -0.5f, 0.0f, 1.0f, 0.0f, 1.0f }, // back face { 0.5f, 0.5f, 0.5f, 1.0f, 0.0f, 0.0f, 1.0f }, { -0.5f, -0.5f, 0.5f, 1.0f, 0.0f, 1.0f, 1.0f }, { 0.5f, -0.5f, 0.5f, 0.0f, 0.0f, 1.0f, 1.0f }, { -0.5f, 0.5f, 0.5f, 0.0f, 1.0f, 0.0f, 1.0f }, // top face { -0.5f, 0.5f, -0.5f, 1.0f, 0.0f, 0.0f, 1.0f }, { 0.5f, 0.5f, 0.5f, 1.0f, 0.0f, 1.0f, 1.0f }, { 0.5f, 0.5f, -0.5f, 0.0f, 0.0f, 1.0f, 1.0f }, { -0.5f, 0.5f, 0.5f, 0.0f, 1.0f, 0.0f, 1.0f }, // bottom face { 0.5f, -0.5f, 0.5f, 1.0f, 0.0f, 0.0f, 1.0f }, { -0.5f, -0.5f, -0.5f, 1.0f, 0.0f, 1.0f, 1.0f }, { 0.5f, -0.5f, -0.5f, 0.0f, 0.0f, 1.0f, 1.0f }, { -0.5f, -0.5f, 0.5f, 0.0f, 1.0f, 0.0f, 1.0f }, }; int vBufferSize = sizeof(vList); // create default heap // default heap is memory on the GPU. Only the GPU has access to this memory // To get data into this heap, we will have to upload the data using // an upload heap device->CreateCommittedResource( &CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_DEFAULT), // a default heap D3D12_HEAP_FLAG_NONE, // no flags &CD3DX12_RESOURCE_DESC::Buffer(vBufferSize), // resource description for a buffer D3D12_RESOURCE_STATE_COPY_DEST, // we will start this heap in the copy destination state since we will copy data // from the upload heap to this heap nullptr, // optimized clear value must be null for this type of resource. used for render targets and depth/stencil buffers IID_PPV_ARGS(&vertexBuffer)); // we can give resource heaps a name so when we debug with the graphics debugger we know what resource we are looking at vertexBuffer->SetName(L"Vertex Buffer Resource Heap"); // create upload heap // upload heaps are used to upload data to the GPU. CPU can write to it, GPU can read from it // We will upload the vertex buffer using this heap to the default heap ID3D12Resource* vBufferUploadHeap; device->CreateCommittedResource( &CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_UPLOAD), // upload heap D3D12_HEAP_FLAG_NONE, // no flags &CD3DX12_RESOURCE_DESC::Buffer(vBufferSize), // resource description for a buffer D3D12_RESOURCE_STATE_GENERIC_READ, // GPU will read from this buffer and copy its contents to the default heap nullptr, IID_PPV_ARGS(&vBufferUploadHeap)); vBufferUploadHeap->SetName(L"Vertex Buffer Upload Resource Heap"); // store vertex buffer in upload heap D3D12_SUBRESOURCE_DATA vertexData = {}; vertexData.pData = reinterpret_cast<BYTE*>(vList); // pointer to our vertex array vertexData.RowPitch = vBufferSize; // size of all our triangle vertex data vertexData.SlicePitch = vBufferSize; // also the size of our triangle vertex data // we are now creating a command with the command list to copy the data from // the upload heap to the default heap UpdateSubresources(commandList, vertexBuffer, vBufferUploadHeap, 0, 0, 1, &vertexData); // transition the vertex buffer data from copy destination state to vertex buffer state commandList->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(vertexBuffer, D3D12_RESOURCE_STATE_COPY_DEST, D3D12_RESOURCE_STATE_VERTEX_AND_CONSTANT_BUFFER)); // Create index buffer // a quad (2 triangles) DWORD iList[] = { // ffront face 0, 1, 2, // first triangle 0, 3, 1, // second triangle // left face 4, 5, 6, // first triangle 4, 7, 5, // second triangle // right face 8, 9, 10, // first triangle 8, 11, 9, // second triangle // back face 12, 13, 14, // first triangle 12, 15, 13, // second triangle // top face 16, 17, 18, // first triangle 16, 19, 17, // second triangle // bottom face 20, 21, 22, // first triangle 20, 23, 21, // second triangle }; int iBufferSize = sizeof(iList); numCubeIndices = sizeof(iList) / sizeof(DWORD); // create default heap to hold index buffer device->CreateCommittedResource( &CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_DEFAULT), // a default heap D3D12_HEAP_FLAG_NONE, // no flags &CD3DX12_RESOURCE_DESC::Buffer(iBufferSize), // resource description for a buffer D3D12_RESOURCE_STATE_COPY_DEST, // start in the copy destination state nullptr, // optimized clear value must be null for this type of resource IID_PPV_ARGS(&indexBuffer)); // we can give resource heaps a name so when we debug with the graphics debugger we know what resource we are looking at vertexBuffer->SetName(L"Index Buffer Resource Heap"); // create upload heap to upload index buffer ID3D12Resource* iBufferUploadHeap; device->CreateCommittedResource( &CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_UPLOAD), // upload heap D3D12_HEAP_FLAG_NONE, // no flags &CD3DX12_RESOURCE_DESC::Buffer(vBufferSize), // resource description for a buffer D3D12_RESOURCE_STATE_GENERIC_READ, // GPU will read from this buffer and copy its contents to the default heap nullptr, IID_PPV_ARGS(&iBufferUploadHeap)); vBufferUploadHeap->SetName(L"Index Buffer Upload Resource Heap"); // store vertex buffer in upload heap D3D12_SUBRESOURCE_DATA indexData = {}; indexData.pData = reinterpret_cast<BYTE*>(iList); // pointer to our index array indexData.RowPitch = iBufferSize; // size of all our index buffer indexData.SlicePitch = iBufferSize; // also the size of our index buffer // we are now creating a command with the command list to copy the data from // the upload heap to the default heap UpdateSubresources(commandList, indexBuffer, iBufferUploadHeap, 0, 0, 1, &indexData); // transition the vertex buffer data from copy destination state to vertex buffer state commandList->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(indexBuffer, D3D12_RESOURCE_STATE_COPY_DEST, D3D12_RESOURCE_STATE_VERTEX_AND_CONSTANT_BUFFER)); ##Creating the Constant Buffer Resource Heaps## Now we create the constant buffer resource heaps. We will be using an upload heap since we will be updating the data frequently (twice per frame). You can see we are creating 3 resource heaps, that is one for each frame buffer. This is so when we are updating the next frames constant buffers, we do not interfere with the previous frame accessing it's constant buffers. Buffer resource heaps must be 64KB aligned in size, which is why we allocate 1024*64 bytes when creating the resource heaps. We have already talked about creating committed resources in the last tutorial, so i will not go into it here. Once we have created the resource heaps, we map them to get a virtual GPU address. This address is a pointer to the beginning of the resource heaps, which we will use to set the root descriptor later on. Notice how we have two memcpy calls. We have two constant buffers, one for each cube. We are storing both cube's constant buffers in the same heap, making sure the second constant buffer is at a 256 byte aligned offset from the beginning of the resource heap. We do this by adding *ConstantBufferPerObjectAlignedSize* to the virtual GPU address which points to the beginning of the resource heap. We will be setting this data in the update function as well for each cube. If you had multiple variables in your constant buffer, and only one or a couple of them were updated each frame, you could just memcpy the variables that were updated. But to make things easier, we just memcpy the entire constant buffer structure (our cb only has one variable anyway). // create the constant buffer resource heap // We will update the constant buffer one or more times per frame, so we will use only an upload heap // unlike previously we used an upload heap to upload the vertex and index data, and then copied over // to a default heap. If you plan to use a resource for more than a couple frames, it is usually more // efficient to copy to a default heap where it stays on the gpu. In this case, our constant buffer // will be modified and uploaded at least once per frame, so we only use an upload heap // first we will create a resource heap (upload heap) for each frame for the cubes constant buffers // As you can see, we are allocating 64KB for each resource we create. Buffer resource heaps must be // an alignment of 64KB. We are creating 3 resources, one for each frame. Each constant buffer is // only a 4x4 matrix of floats in this tutorial. So with a float being 4 bytes, we have // 16 floats in one constant buffer, and we will store 2 constant buffers in each // heap, one for each cube, thats only 64x2 bits, or 128 bits we are using for each // resource, and each resource must be at least 64KB (65536 bits) for (int i = 0; i < frameBufferCount; ++i) { // create resource for cube 1 hr = device->CreateCommittedResource( &CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_UPLOAD), // this heap will be used to upload the constant buffer data D3D12_HEAP_FLAG_NONE, // no flags &CD3DX12_RESOURCE_DESC::Buffer(1024 * 64), // size of the resource heap. Must be a multiple of 64KB for single-textures and constant buffers D3D12_RESOURCE_STATE_GENERIC_READ, // will be data that is read from so we keep it in the generic read state nullptr, // we do not have use an optimized clear value for constant buffers IID_PPV_ARGS(&constantBufferUploadHeaps[i])); constantBufferUploadHeaps[i]->SetName(L"Constant Buffer Upload Resource Heap"); ZeroMemory(&cbPerObject, sizeof(cbPerObject)); CD3DX12_RANGE readRange(0, 0); // We do not intend to read from this resource on the CPU. (so end is less than or equal to begin) // map the resource heap to get a gpu virtual address to the beginning of the heap hr = constantBufferUploadHeaps[i]->Map(0, &readRange, reinterpret_cast<void**>(&cbvGPUAddress[i])); // Because of the constant read alignment requirements, constant buffer views must be 256 bit aligned. Our buffers are smaller than 256 bits, // so we need to add spacing between the two buffers, so that the second buffer starts at 256 bits from the beginning of the resource heap. memcpy(cbvGPUAddress[i], &cbPerObject, sizeof(cbPerObject)); // cube1's constant buffer data memcpy(cbvGPUAddress[i] + ConstantBufferPerObjectAlignedSize, &cbPerObject, sizeof(cbPerObject)); // cube2's constant buffer data } ##Building the World/View/Projection Matrices## We start by creating a projection matrix using the XMMatrixPerspectiveFovLH() function: XMMATRIX .[https://msdn.microsoft.com/en-us/library/windows/desktop/microsoft.directx_sdk.matrix.xmmatrixperspectivefovlh%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396][XMMatrixPerspectiveFovLH]( [in] float FovAngleY, [in] float AspectRatio, [in] float NearZ, [in] float FarZ ); - **FovAngleY** - *This is the field of view for the y-axis in radians.* - **AspectRatio** - *This is the aspect ratio of your viewport, usually Width/Height* - **NearZ** - *This is the closes an object can be to the camera for it to be rendered. Anything closer or behind the camera will not be rendered.* - **FarZ** - *This is the furthest away from the camera an object can be. If it is further away, it will not be rendered.* Notice how we are creating an XMMATRIX when we create a projection matrix. This is because all DirectX Math operations work with XMVECTOR and XMMATRIX, so that it can store multiple pieces of data in registers to be operated on at one time. Once we have gotten our projection matrix, we store it in our FLOAT4X4 variable, cameraProjMat, using the XMStoreFloat4x4() function. Next we position our camera. We set the position, which is 2 units up, and 4 units back. We tell the camer to look at the point (0,0,0), where our first cube it, and the second cube is rotating around, then we set the world's up vector to be the y-axis. Now see how we load those FLOAT4's describing our camera into XMVECTOR's? We do that because like i've mentioned a bunch of times, DirectX math works with XMVECTORS and XMMATRIX types, and we will be passing these XMVECTORS to the function which creates a view matrix. When we have finished setting our camera position, target, and up vectors, and storing them in XMVECTORs, we create the view matrix using the XMMatrixLookAtLH() function. Next is setting up the original world matrix for our cubes. We start with the first cube, setting its position. We then store it in an XMVECTOR and create a translation matrix. Thats all we really need for the first cube, in fact, since it hasn't moved yet, we could have just initialized it's world matrix to an identity matrix. Notice how we are initializing the rotation matrices to an identity matrix. What we will do is update this matrix with itself each frame, so we need to make sure it's set to something at the start. Cube2's position is actually an offset from cube1's position. We will be rotating cube2 around cube1, so the way we do that is translate cube2 to it's offset from cube1, do the rotation, then translate it to cube1's actual position. This will cause it to rotate around cube1. // build projection and view matrix XMMATRIX tmpMat = XMMatrixPerspectiveFovLH(45.0f*(3.14f/180.0f), (float)Width / (float)Height, 0.1f, 1000.0f); XMStoreFloat4x4(&cameraProjMat, tmpMat); // set starting camera state cameraPosition = XMFLOAT4(0.0f, 2.0f, -4.0f, 0.0f); cameraTarget = XMFLOAT4(0.0f, 0.0f, 0.0f, 0.0f); cameraUp = XMFLOAT4(0.0f, 1.0f, 0.0f, 0.0f); // build view matrix XMVECTOR cPos = XMLoadFloat4(&cameraPosition); XMVECTOR cTarg = XMLoadFloat4(&cameraTarget); XMVECTOR cUp = XMLoadFloat4(&cameraUp); tmpMat = XMMatrixLookAtLH(cPos, cTarg, cUp); XMStoreFloat4x4(&cameraViewMat, tmpMat); // set starting cubes position // first cube cube1Position = XMFLOAT4(0.0f, 0.0f, 0.0f, 0.0f); // set cube 1's position XMVECTOR posVec = XMLoadFloat4(&cube1Position); // create xmvector for cube1's position tmpMat = XMMatrixTranslationFromVector(posVec); // create translation matrix from cube1's position vector XMStoreFloat4x4(&cube1RotMat, XMMatrixIdentity()); // initialize cube1's rotation matrix to identity matrix XMStoreFloat4x4(&cube1WorldMat, tmpMat); // store cube1's world matrix // second cube cube2PositionOffset = XMFLOAT4(1.5f, 0.0f, 0.0f, 0.0f); posVec = XMLoadFloat4(&cube2PositionOffset) + XMLoadFloat4(&cube1Position); // create xmvector for cube2's position // we are rotating around cube1 here, so add cube2's position to cube1 tmpMat = XMMatrixTranslationFromVector(posVec); // create translation matrix from cube2's position offset vector XMStoreFloat4x4(&cube2RotMat, XMMatrixIdentity()); // initialize cube2's rotation matrix to identity matrix XMStoreFloat4x4(&cube2WorldMat, tmpMat); // store cube2's world matrix ##Update() Function## All right, now comes our "game logic". This is where we will be updating our scene each frame. We start with creating 3 rotation matrices. Each rotation matrix is for one of the 3 cartesian coordinates; x, y, and z. We create the rotation matrices with the respective XMMatrixRotationN() functions, then multiply them together to get one rotation matrix. Notice how we multiply them with the cube's current rotation matrix. This will cause the rotation matrices we just created to be added to the rotation matrix for the cube. Then we translate the cube1's position. We are not actually moving cube1 in this tutorial, so the position is not really updated. Next we create it's world matrix. Remember that the order of matrix multiplication does matter. Next we load in the view and projection matrices, and then create the wvp matrix for cube1. Again, order matters, since we first need to move the cube to world space, then view space, and finally to projection space. Once we have the wvp matrix, we store it in the constant buffer object, then we copy the contents of that constant buffer object to the constant buffer on the GPU in the current frames resource heap. Cube1's constant buffer is at the very beginning of the resource heap, so the address we provide memcpy is the gpu virtual address we got from mat. Next we do the same stuff for cube2. First notice that we are reusing a lot of things, including the constant buffer object. We have already copied the constant buffer object which contained cube1's wvp matrix to the resource heap, now we will reuse it to store cube2's wvp matrix. Finally we copy the constant buffer object to cube2's location in the current frames resource heap. Notice how we are offsetting the memory address from the beginning of the heap. This is because since there is a 256 aligned constant read requirement, cube2's constant buffer is stored at 256 bytes from the beginning of the resource heap. We add *ConstantBufferPerObjectAlignedSize* to the gpu virtual address of the resour heap, which will give us 256 bytes into the resource heap, where cube2's constant buffer is located. void Update() { // update app logic, such as moving the camera or figuring out what objects are in view // create rotation matrices XMMATRIX rotXMat = XMMatrixRotationX(0.0001f); XMMATRIX rotYMat = XMMatrixRotationY(0.0002f); XMMATRIX rotZMat = XMMatrixRotationZ(0.0003f); // add rotation to cube1's rotation matrix and store it XMMATRIX rotMat = XMLoadFloat4x4(&cube1RotMat) * rotXMat * rotYMat * rotZMat; XMStoreFloat4x4(&cube1RotMat, rotMat); // create translation matrix for cube 1 from cube 1's position vector XMMATRIX translationMat = XMMatrixTranslationFromVector(XMLoadFloat4(&cube1Position)); // create cube1's world matrix by first rotating the cube, then positioning the rotated cube XMMATRIX worldMat = rotMat * translationMat; // store cube1's world matrix XMStoreFloat4x4(&cube1WorldMat, worldMat); // update constant buffer for cube1 // create the wvp matrix and store in constant buffer XMMATRIX viewMat = XMLoadFloat4x4(&cameraViewMat); // load view matrix XMMATRIX projMat = XMLoadFloat4x4(&cameraProjMat); // load projection matrix XMMATRIX wvpMat = XMLoadFloat4x4(&cube1WorldMat) * viewMat * projMat; // create wvp matrix XMMATRIX transposed = XMMatrixTranspose(wvpMat); // must transpose wvp matrix for the gpu XMStoreFloat4x4(&cbPerObject.wvpMat, transposed); // store transposed wvp matrix in constant buffer // copy our ConstantBuffer instance to the mapped constant buffer resource memcpy(cbvGPUAddress[frameIndex], &cbPerObject, sizeof(cbPerObject)); // now do cube2's world matrix // create rotation matrices for cube2 rotXMat = XMMatrixRotationX(0.0003f); rotYMat = XMMatrixRotationY(0.0002f); rotZMat = XMMatrixRotationZ(0.0001f); // add rotation to cube2's rotation matrix and store it rotMat = rotZMat * (XMLoadFloat4x4(&cube2RotMat) * (rotXMat * rotYMat)); XMStoreFloat4x4(&cube2RotMat, rotMat); // create translation matrix for cube 2 to offset it from cube 1 (its position relative to cube1 XMMATRIX translationOffsetMat = XMMatrixTranslationFromVector(XMLoadFloat4(&cube2PositionOffset)); // we want cube 2 to be half the size of cube 1, so we scale it by .5 in all dimensions XMMATRIX scaleMat = XMMatrixScaling(0.5f, 0.5f, 0.5f); // reuse worldMat. // first we scale cube2. scaling happens relative to point 0,0,0, so you will almost always want to scale first // then we translate it. // then we rotate it. rotation always rotates around point 0,0,0 // finally we move it to cube 1's position, which will cause it to rotate around cube 1 worldMat = scaleMat * translationOffsetMat * rotMat * translationMat; wvpMat = XMLoadFloat4x4(&cube2WorldMat) * viewMat * projMat; // create wvp matrix transposed = XMMatrixTranspose(wvpMat); // must transpose wvp matrix for the gpu XMStoreFloat4x4(&cbPerObject.wvpMat, transposed); // store transposed wvp matrix in constant buffer // copy our ConstantBuffer instance to the mapped constant buffer resource memcpy(cbvGPUAddress[frameIndex] + ConstantBufferPerObjectAlignedSize, &cbPerObject, sizeof(cbPerObject)); // store cube2's world matrix XMStoreFloat4x4(&cube2WorldMat, worldMat); } ##Updating the Root Descriptor## Now we get the the part where we draw the cubes. The first thing we do, as before, is set the root signature. You might be wondering, as many people starting DX12 do, why do you need to set the root signature on the command list, when you had created the PSO with that root signature? There are a couple reasons for this. One is that the root signature when creating the PSO is only used to create the PSO, it is not stored in the PSO. The PSO will assume that when you use it that the correct root signature will be bound. Another idea was that it is possible that the program may not know quite yet what PSO needs to be set, but needs to start updating and setting root argument data. The application can set the root arguments before a PSO is set. Another reason for this separation is the when you change the root signature, it unbinds the root arguments. The DirectX team did not want changing a PSO to cause the bound data to be silently unbound. This makes it so when you need to change a PSO to a PSO that has the same root signature, all the data stays bound to the pipeline. Alright, on to the tutorial stuff. We need to tell the shaders where to find the constant buffer for cube1. We do this by setting the root descriptor argument, which we created a root parameter for in the initialization of the code, to the constant buffer location in the current frames resource heap. Basically we just set the root descriptor to the GPU virtual address of the current resource heap since cube1's constant buffer is stored at the beginning of it. We then draw cube1, telling it the number of indices to draw. Now we need to update the root descriptor to point to the second cubes constant buffer data. This is where the convenience of root descriptor (and root constants, although not used here) automatic versioning comes in very handy. We can reuse the root descriptor we had previously set for cube1's constant buffer. By changing a root argument, the GPU will actually create a new copy of the root signature for the next draw call, updating the copy as needed, which in this case is updating the copies root descriptor. When it comes time to execute the command list, the first draw call will use the original copy of the root signature, which had a root descriptor pointing to cube1's constant buffer, and the second draw call will use a second copy of the root signature, which contained a root descriptor pointing to the second cubes constant buffer data. You can see when we set the root descriptor for the second cube, we are again pointing it to a 256 byte aligned offset from the beginning of the current frames resource heap. We again use the variable *ConstantBufferPerObjectAlignedSize* to do this, since we have made that variable be the next 256 multiple after the size of the constant buffer structure (constant buffer structure in this tutorial is 16 bytes, so this variable is 256 bytes). This is where your computer will possibly freeze up if you try to set the root signature to anything other than a multiple of 256 byte offset from the beginning of a resource heap address. Then we draw the second cube. // set root signature commandList->SetGraphicsRootSignature(rootSignature); // set the root signature // draw triangle commandList->RSSetViewports(1, &viewport); // set the viewports commandList->RSSetScissorRects(1, &scissorRect); // set the scissor rects commandList->IASetPrimitiveTopology(D3D_PRIMITIVE_TOPOLOGY_TRIANGLELIST); // set the primitive topology commandList->IASetVertexBuffers(0, 1, &vertexBufferView); // set the vertex buffer (using the vertex buffer view) commandList->IASetIndexBuffer(&indexBufferView); // first cube // set cube1's constant buffer commandList->SetGraphicsRootConstantBufferView(0, constantBufferUploadHeaps[frameIndex]->GetGPUVirtualAddress()); // draw first cube commandList->DrawIndexedInstanced(numCubeIndices, 1, 0, 0, 0); // second cube // set cube2's constant buffer. You can see we are adding the size of ConstantBufferPerObject to the constant buffer // resource heaps address. This is because cube1's constant buffer is stored at the beginning of the resource heap, while // cube2's constant buffer data is stored after (256 bits from the start of the heap). commandList->SetGraphicsRootConstantBufferView(0, constantBufferUploadHeaps[frameIndex]->GetGPUVirtualAddress() + ConstantBufferPerObjectAlignedSize); // draw second cube commandList->DrawIndexedInstanced(numCubeIndices, 1, 0, 0, 0); ##Cleanup## The last thing we do is release the 3 resource heaps when the program exits: for (int i = 0; i < frameBufferCount; ++i) { SAFE_RELEASE(constantBufferUploadHeaps[i]); }; ##Moving object from object space to projection space in HLSL## I almost forgot the most important part of this tutorial's code, actually getting an object from object space to projection space! Lets start by taking a quick look at the constant buffer in VertexShader.hlsl We create a constant buffer structure, using the **cbuffer** keyword. We bind this buffer to register **b0**. Constant buffers are bound to the **b** registers, textures (SRV's) are bound to the **t** registers, and UAV's are bound to the **u** registers. We have a constant buffer here, so we bind it to the "b" register. You can bind it to any b register you want, but it makes sense to start binding at register 0, and increment the register as you add more buffers. In HLSL, a matrix is represented by float4x4. So our constant buffer contains a single matrix, which is the world/view/projection matrix for the bound mesh. cbuffer ConstantBuffer : register(b0) { float4x4 wvpMat; }; Now we do the actual transitioning from object space to projection space. To multiply matrices and vertices together, you use the **mul()** function in HLSL. Again, the order that you provide the parameters here does matter. You will end up with different assembly code if you switch the parameters around. HLSL creates the most efficient assembly when the vector is provided in the first parameter, and the matrix provided in the second parameter. The result of this multiplication is a new position in projection space for the vertex. This is the vertex shader, so this will happen for every virtex you have told the pipeline to draw. VS_OUTPUT main(VS_INPUT input) { VS_OUTPUT output; output.pos = mul(input.pos, wvpMat); output.color = input.color; return output; } Hopefully you enjoyed the tutorial and it answers some questions. Next tutorial should be a pretty helpful one, using textures in DirectX 12! As always, if you find anything unclear or just wrong in the tutorial, don't hesitate to comment or PM me (I'd prefer if you commented so others can see what you have noticed in case i do not get time to update the tutorial). ####Code#### ##VertexShader.hlsl## struct VS_INPUT { float4 pos : POSITION; float4 color: COLOR; }; struct VS_OUTPUT { float4 pos: SV_POSITION; float4 color: COLOR; }; cbuffer ConstantBuffer : register(b0) { float4x4 wvpMat; }; VS_OUTPUT main(VS_INPUT input) { VS_OUTPUT output; output.pos = mul(input.pos, wvpMat); output.color = input.color; return output; } ##PixelShader.hlsl## struct VS_OUTPUT { float4 pos: SV_POSITION; float4 color: COLOR; }; float4 main(VS_OUTPUT input) : SV_TARGET { // return interpolated color return input.color; } ##stdafx.h## #pragma once #ifndef WIN32_LEAN_AND_MEAN #define WIN32_LEAN_AND_MEAN // Exclude rarely-used stuff from Windows headers. #endif #include <windows.h> #include <d3d12.h> #include <dxgi1_4.h> #include <D3Dcompiler.h> #include <DirectXMath.h> #include "d3dx12.h" #include <string> // this will only call release if an object exists (prevents exceptions calling release on non existant objects) #define SAFE_RELEASE(p) { if ( (p) ) { (p)->Release(); (p) = 0; } } using namespace DirectX; // we will be using the directxmath library // Handle to the window HWND hwnd = NULL; // name of the window (not the title) LPCTSTR WindowName = L"BzTutsApp"; // title of the window LPCTSTR WindowTitle = L"Bz Window"; // width and height of the window int Width = 800; int Height = 600; // is window full screen? bool FullScreen = false; // we will exit the program when this becomes false bool Running = true; // create a window bool InitializeWindow(HINSTANCE hInstance, int ShowWnd, bool fullscreen); // main application loop void mainloop(); // callback function for windows messages LRESULT CALLBACK WndProc(HWND hWnd, UINT msg, WPARAM wParam, LPARAM lParam); // direct3d stuff const int frameBufferCount = 3; // number of buffers we want, 2 for double buffering, 3 for tripple buffering ID3D12Device* device; // direct3d device IDXGISwapChain3* swapChain; // swapchain used to switch between render targets ID3D12CommandQueue* commandQueue; // container for command lists ID3D12DescriptorHeap* rtvDescriptorHeap; // a descriptor heap to hold resources like the render targets ID3D12Resource* renderTargets[frameBufferCount]; // number of render targets equal to buffer count ID3D12CommandAllocator* commandAllocator[frameBufferCount]; // we want enough allocators for each buffer * number of threads (we only have one thread) ID3D12GraphicsCommandList* commandList; // a command list we can record commands into, then execute them to render the frame ID3D12Fence* fence[frameBufferCount]; // an object that is locked while our command list is being executed by the gpu. We need as many //as we have allocators (more if we want to know when the gpu is finished with an asset) HANDLE fenceEvent; // a handle to an event when our fence is unlocked by the gpu UINT64 fenceValue[frameBufferCount]; // this value is incremented each frame. each fence will have its own value int frameIndex; // current rtv we are on int rtvDescriptorSize; // size of the rtv descriptor on the device (all front and back buffers will be the same size) // function declarations bool InitD3D(); // initializes direct3d 12 void Update(); // update the game logic void UpdatePipeline(); // update the direct3d pipeline (update command lists) void Render(); // execute the command list void Cleanup(); // release com ojects and clean up memory void WaitForPreviousFrame(); // wait until gpu is finished with command list ID3D12PipelineState* pipelineStateObject; // pso containing a pipeline state ID3D12RootSignature* rootSignature; // root signature defines data shaders will access D3D12_VIEWPORT viewport; // area that output from rasterizer will be stretched to. D3D12_RECT scissorRect; // the area to draw in. pixels outside that area will not be drawn onto ID3D12Resource* vertexBuffer; // a default buffer in GPU memory that we will load vertex data for our triangle into ID3D12Resource* indexBuffer; // a default buffer in GPU memory that we will load index data for our triangle into D3D12_VERTEX_BUFFER_VIEW vertexBufferView; // a structure containing a pointer to the vertex data in gpu memory // the total size of the buffer, and the size of each element (vertex) D3D12_INDEX_BUFFER_VIEW indexBufferView; // a structure holding information about the index buffer ID3D12Resource* depthStencilBuffer; // This is the memory for our depth buffer. it will also be used for a stencil buffer in a later tutorial ID3D12DescriptorHeap* dsDescriptorHeap; // This is a heap for our depth/stencil buffer descriptor // this is the structure of our constant buffer. struct ConstantBufferPerObject { XMFLOAT4X4 wvpMat; }; // Constant buffers must be 256-byte aligned which has to do with constant reads on the GPU. // We are only able to read at 256 byte intervals from the start of a resource heap, so we will // make sure that we add padding between the two constant buffers in the heap (one for cube1 and one for cube2) // Another way to do this would be to add a float array in the constant buffer structure for padding. In this case // we would need to add a float padding[50]; after the wvpMat variable. This would align our structure to 256 bytes (4 bytes per float) // The reason i didn't go with this way, was because there would actually be wasted cpu cycles when memcpy our constant // buffer data to the gpu virtual address. currently we memcpy the size of our structure, which is 16 bytes here, but if we // were to add the padding array, we would memcpy 64 bytes if we memcpy the size of our structure, which is 50 wasted bytes // being copied. int ConstantBufferPerObjectAlignedSize = (sizeof(ConstantBufferPerObject) + 255) & ~255; ConstantBufferPerObject cbPerObject; // this is the constant buffer data we will send to the gpu // (which will be placed in the resource we created above) ID3D12Resource* constantBufferUploadHeaps[frameBufferCount]; // this is the memory on the gpu where constant buffers for each frame will be placed UINT8* cbvGPUAddress[frameBufferCount]; // this is a pointer to each of the constant buffer resource heaps XMFLOAT4X4 cameraProjMat; // this will store our projection matrix XMFLOAT4X4 cameraViewMat; // this will store our view matrix XMFLOAT4 cameraPosition; // this is our cameras position vector XMFLOAT4 cameraTarget; // a vector describing the point in space our camera is looking at XMFLOAT4 cameraUp; // the worlds up vector XMFLOAT4X4 cube1WorldMat; // our first cubes world matrix (transformation matrix) XMFLOAT4X4 cube1RotMat; // this will keep track of our rotation for the first cube XMFLOAT4 cube1Position; // our first cubes position in space XMFLOAT4X4 cube2WorldMat; // our first cubes world matrix (transformation matrix) XMFLOAT4X4 cube2RotMat; // this will keep track of our rotation for the second cube XMFLOAT4 cube2PositionOffset; // our second cube will rotate around the first cube, so this is the position offset from the first cube int numCubeIndices; // the number of indices to draw the cube ##main.cpp## #include "stdafx.h" struct Vertex { Vertex(float x, float y, float z, float r, float g, float b, float a) : pos(x, y, z), color(r, g, b, a) {} XMFLOAT3 pos; XMFLOAT4 color; }; int WINAPI WinMain(HINSTANCE hInstance, //Main windows function HINSTANCE hPrevInstance, LPSTR lpCmdLine, int nShowCmd) { // create the window if (!InitializeWindow(hInstance, nShowCmd, FullScreen)) { MessageBox(0, L"Window Initialization - Failed", L"Error", MB_OK); return 1; } // initialize direct3d if (!InitD3D()) { MessageBox(0, L"Failed to initialize direct3d 12", L"Error", MB_OK); Cleanup(); return 1; } // start the main loop mainloop(); // we want to wait for the gpu to finish executing the command list before we start releasing everything WaitForPreviousFrame(); // close the fence event CloseHandle(fenceEvent); // clean up everything Cleanup(); return 0; } // create and show the window bool InitializeWindow(HINSTANCE hInstance, int ShowWnd, bool fullscreen) { if (fullscreen) { HMONITOR hmon = MonitorFromWindow(hwnd, MONITOR_DEFAULTTONEAREST); MONITORINFO mi = { sizeof(mi) }; GetMonitorInfo(hmon, &mi); Width = mi.rcMonitor.right - mi.rcMonitor.left; Height = mi.rcMonitor.bottom - mi.rcMonitor.top; } WNDCLASSEX wc; wc.cbSize = sizeof(WNDCLASSEX); wc.style = CS_HREDRAW | CS_VREDRAW; wc.lpfnWndProc = WndProc; wc.cbClsExtra = NULL; wc.cbWndExtra = NULL; wc.hInstance = hInstance; wc.hIcon = LoadIcon(NULL, IDI_APPLICATION); wc.hCursor = LoadCursor(NULL, IDC_ARROW); wc.hbrBackground = (HBRUSH)(COLOR_WINDOW + 2); wc.lpszMenuName = NULL; wc.lpszClassName = WindowName; wc.hIconSm = LoadIcon(NULL, IDI_APPLICATION); if (!RegisterClassEx(&wc)) { MessageBox(NULL, L"Error registering class", L"Error", MB_OK | MB_ICONERROR); return false; } hwnd = CreateWindowEx(NULL, WindowName, WindowTitle, WS_OVERLAPPEDWINDOW, CW_USEDEFAULT, CW_USEDEFAULT, Width, Height, NULL, NULL, hInstance, NULL); if (!hwnd) { MessageBox(NULL, L"Error creating window", L"Error", MB_OK | MB_ICONERROR); return false; } if (fullscreen) { SetWindowLong(hwnd, GWL_STYLE, 0); } ShowWindow(hwnd, ShowWnd); UpdateWindow(hwnd); return true; } void mainloop() { MSG msg; ZeroMemory(&msg, sizeof(MSG)); while (Running) { if (PeekMessage(&msg, NULL, 0, 0, PM_REMOVE)) { if (msg.message == WM_QUIT) break; TranslateMessage(&msg); DispatchMessage(&msg); } else { // run game code Update(); // update the game logic Render(); // execute the command queue (rendering the scene is the result of the gpu executing the command lists) } } } LRESULT CALLBACK WndProc(HWND hwnd, UINT msg, WPARAM wParam, LPARAM lParam) { switch (msg) { case WM_KEYDOWN: if (wParam == VK_ESCAPE) { if (MessageBox(0, L"Are you sure you want to exit?", L"Really?", MB_YESNO | MB_ICONQUESTION) == IDYES) { Running = false; DestroyWindow(hwnd); } } return 0; case WM_DESTROY: // x button on top right corner of window was pressed Running = false; PostQuitMessage(0); return 0; } return DefWindowProc(hwnd, msg, wParam, lParam); } bool InitD3D() { HRESULT hr; // -- Create the Device -- // IDXGIFactory4* dxgiFactory; hr = CreateDXGIFactory1(IID_PPV_ARGS(&dxgiFactory)); if (FAILED(hr)) { return false; } IDXGIAdapter1* adapter; // adapters are the graphics card (this includes the embedded graphics on the motherboard) int adapterIndex = 0; // we'll start looking for directx 12 compatible graphics devices starting at index 0 bool adapterFound = false; // set this to true when a good one was found // find first hardware gpu that supports d3d 12 while (dxgiFactory->EnumAdapters1(adapterIndex, &adapter) != DXGI_ERROR_NOT_FOUND) { DXGI_ADAPTER_DESC1 desc; adapter->GetDesc1(&desc); if (desc.Flags & DXGI_ADAPTER_FLAG_SOFTWARE) { // we dont want a software device continue; } // we want a device that is compatible with direct3d 12 (feature level 11 or higher) hr = D3D12CreateDevice(adapter, D3D_FEATURE_LEVEL_11_0, _uuidof(ID3D12Device), nullptr); if (SUCCEEDED(hr)) { adapterFound = true; break; } adapterIndex++; } if (!adapterFound) { return false; } // Create the device hr = D3D12CreateDevice( adapter, D3D_FEATURE_LEVEL_11_0, IID_PPV_ARGS(&device) ); if (FAILED(hr)) { return false; } // -- Create a direct command queue -- // D3D12_COMMAND_QUEUE_DESC cqDesc = {}; cqDesc.Flags = D3D12_COMMAND_QUEUE_FLAG_NONE; cqDesc.Type = D3D12_COMMAND_LIST_TYPE_DIRECT; // direct means the gpu can directly execute this command queue hr = device->CreateCommandQueue(&cqDesc, IID_PPV_ARGS(&commandQueue)); // create the command queue if (FAILED(hr)) { return false; } // -- Create the Swap Chain (double/tripple buffering) -- // DXGI_MODE_DESC backBufferDesc = {}; // this is to describe our display mode backBufferDesc.Width = Width; // buffer width backBufferDesc.Height = Height; // buffer height backBufferDesc.Format = DXGI_FORMAT_R8G8B8A8_UNORM; // format of the buffer (rgba 32 bits, 8 bits for each chanel) // describe our multi-sampling. We are not multi-sampling, so we set the count to 1 (we need at least one sample of course) DXGI_SAMPLE_DESC sampleDesc = {}; sampleDesc.Count = 1; // multisample count (no multisampling, so we just put 1, since we still need 1 sample) // Describe and create the swap chain. DXGI_SWAP_CHAIN_DESC swapChainDesc = {}; swapChainDesc.BufferCount = frameBufferCount; // number of buffers we have swapChainDesc.BufferDesc = backBufferDesc; // our back buffer description swapChainDesc.BufferUsage = DXGI_USAGE_RENDER_TARGET_OUTPUT; // this says the pipeline will render to this swap chain swapChainDesc.SwapEffect = DXGI_SWAP_EFFECT_FLIP_DISCARD; // dxgi will discard the buffer (data) after we call present swapChainDesc.OutputWindow = hwnd; // handle to our window swapChainDesc.SampleDesc = sampleDesc; // our multi-sampling description swapChainDesc.Windowed = !FullScreen; // set to true, then if in fullscreen must call SetFullScreenState with true for full screen to get uncapped fps IDXGISwapChain* tempSwapChain; dxgiFactory->CreateSwapChain( commandQueue, // the queue will be flushed once the swap chain is created &swapChainDesc, // give it the swap chain description we created above &tempSwapChain // store the created swap chain in a temp IDXGISwapChain interface ); swapChain = static_cast<IDXGISwapChain3*>(tempSwapChain); frameIndex = swapChain->GetCurrentBackBufferIndex(); // -- Create the Back Buffers (render target views) Descriptor Heap -- // // describe an rtv descriptor heap and create D3D12_DESCRIPTOR_HEAP_DESC rtvHeapDesc = {}; rtvHeapDesc.NumDescriptors = frameBufferCount; // number of descriptors for this heap. rtvHeapDesc.Type = D3D12_DESCRIPTOR_HEAP_TYPE_RTV; // this heap is a render target view heap // This heap will not be directly referenced by the shaders (not shader visible), as this will store the output from the pipeline // otherwise we would set the heap's flag to D3D12_DESCRIPTOR_HEAP_FLAG_SHADER_VISIBLE rtvHeapDesc.Flags = D3D12_DESCRIPTOR_HEAP_FLAG_NONE; hr = device->CreateDescriptorHeap(&rtvHeapDesc, IID_PPV_ARGS(&rtvDescriptorHeap)); if (FAILED(hr)) { return false; } // get the size of a descriptor in this heap (this is a rtv heap, so only rtv descriptors should be stored in it. // descriptor sizes may vary from device to device, which is why there is no set size and we must ask the // device to give us the size. we will use this size to increment a descriptor handle offset rtvDescriptorSize = device->GetDescriptorHandleIncrementSize(D3D12_DESCRIPTOR_HEAP_TYPE_RTV); // get a handle to the first descriptor in the descriptor heap. a handle is basically a pointer, // but we cannot literally use it like a c++ pointer. CD3DX12_CPU_DESCRIPTOR_HANDLE rtvHandle(rtvDescriptorHeap->GetCPUDescriptorHandleForHeapStart()); // Create a RTV for each buffer (double buffering is two buffers, tripple buffering is 3). for (int i = 0; i < frameBufferCount; i++) { // first we get the n'th buffer in the swap chain and store it in the n'th // position of our ID3D12Resource array hr = swapChain->GetBuffer(i, IID_PPV_ARGS(&renderTargets[i])); if (FAILED(hr)) { return false; } // the we "create" a render target view which binds the swap chain buffer (ID3D12Resource[n]) to the rtv handle device->CreateRenderTargetView(renderTargets[i], nullptr, rtvHandle); // we increment the rtv handle by the rtv descriptor size we got above rtvHandle.Offset(1, rtvDescriptorSize); } // -- Create the Command Allocators -- // for (int i = 0; i < frameBufferCount; i++) { hr = device->CreateCommandAllocator(D3D12_COMMAND_LIST_TYPE_DIRECT, IID_PPV_ARGS(&commandAllocator[i])); if (FAILED(hr)) { return false; } } // -- Create a Command List -- // // create the command list with the first allocator hr = device->CreateCommandList(0, D3D12_COMMAND_LIST_TYPE_DIRECT, commandAllocator[frameIndex], NULL, IID_PPV_ARGS(&commandList)); if (FAILED(hr)) { return false; } // -- Create a Fence & Fence Event -- // // create the fences for (int i = 0; i < frameBufferCount; i++) { hr = device->CreateFence(0, D3D12_FENCE_FLAG_NONE, IID_PPV_ARGS(&fence[i])); if (FAILED(hr)) { return false; } fenceValue[i] = 0; // set the initial fence value to 0 } // create a handle to a fence event fenceEvent = CreateEvent(nullptr, FALSE, FALSE, nullptr); if (fenceEvent == nullptr) { return false; } // create root signature // create a root descriptor, which explains where to find the data for this root parameter D3D12_ROOT_DESCRIPTOR rootCBVDescriptor; rootCBVDescriptor.RegisterSpace = 0; rootCBVDescriptor.ShaderRegister = 0; // create a root parameter and fill it out D3D12_ROOT_PARAMETER rootParameters[1]; // only one parameter right now rootParameters[0].ParameterType = D3D12_ROOT_PARAMETER_TYPE_CBV; // this is a constant buffer view root descriptor rootParameters[0].Descriptor = rootCBVDescriptor; // this is the root descriptor for this root parameter rootParameters[0].ShaderVisibility = D3D12_SHADER_VISIBILITY_VERTEX; // our pixel shader will be the only shader accessing this parameter for now CD3DX12_ROOT_SIGNATURE_DESC rootSignatureDesc; rootSignatureDesc.Init(_countof(rootParameters), // we have 1 root parameter rootParameters, // a pointer to the beginning of our root parameters array 0, nullptr, D3D12_ROOT_SIGNATURE_FLAG_ALLOW_INPUT_ASSEMBLER_INPUT_LAYOUT | // we can deny shader stages here for better performance D3D12_ROOT_SIGNATURE_FLAG_DENY_HULL_SHADER_ROOT_ACCESS | D3D12_ROOT_SIGNATURE_FLAG_DENY_DOMAIN_SHADER_ROOT_ACCESS | D3D12_ROOT_SIGNATURE_FLAG_DENY_GEOMETRY_SHADER_ROOT_ACCESS | D3D12_ROOT_SIGNATURE_FLAG_DENY_PIXEL_SHADER_ROOT_ACCESS); ID3DBlob* signature; hr = D3D12SerializeRootSignature(&rootSignatureDesc, D3D_ROOT_SIGNATURE_VERSION_1, &signature, nullptr); if (FAILED(hr)) { return false; } hr = device->CreateRootSignature(0, signature->GetBufferPointer(), signature->GetBufferSize(), IID_PPV_ARGS(&rootSignature)); if (FAILED(hr)) { return false; } // create vertex and pixel shaders // when debugging, we can compile the shader files at runtime. // but for release versions, we can compile the hlsl shaders // with fxc.exe to create .cso files, which contain the shader // bytecode. We can load the .cso files at runtime to get the // shader bytecode, which of course is faster than compiling // them at runtime // compile vertex shader ID3DBlob* vertexShader; // d3d blob for holding vertex shader bytecode ID3DBlob* errorBuff; // a buffer holding the error data if any hr = D3DCompileFromFile(L"VertexShader.hlsl", nullptr, nullptr, "main", "vs_5_0", D3DCOMPILE_DEBUG | D3DCOMPILE_SKIP_OPTIMIZATION, 0, &vertexShader, &errorBuff); if (FAILED(hr)) { OutputDebugStringA((char*)errorBuff->GetBufferPointer()); return false; } // fill out a shader bytecode structure, which is basically just a pointer // to the shader bytecode and the size of the shader bytecode D3D12_SHADER_BYTECODE vertexShaderBytecode = {}; vertexShaderBytecode.BytecodeLength = vertexShader->GetBufferSize(); vertexShaderBytecode.pShaderBytecode = vertexShader->GetBufferPointer(); // compile pixel shader ID3DBlob* pixelShader; hr = D3DCompileFromFile(L"PixelShader.hlsl", nullptr, nullptr, "main", "ps_5_0", D3DCOMPILE_DEBUG | D3DCOMPILE_SKIP_OPTIMIZATION, 0, &pixelShader, &errorBuff); if (FAILED(hr)) { OutputDebugStringA((char*)errorBuff->GetBufferPointer()); return false; } // fill out shader bytecode structure for pixel shader D3D12_SHADER_BYTECODE pixelShaderBytecode = {}; pixelShaderBytecode.BytecodeLength = pixelShader->GetBufferSize(); pixelShaderBytecode.pShaderBytecode = pixelShader->GetBufferPointer(); // create input layout // The input layout is used by the Input Assembler so that it knows // how to read the vertex data bound to it. D3D12_INPUT_ELEMENT_DESC inputLayout[] = { { "POSITION", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 0, D3D12_INPUT_CLASSIFICATION_PER_VERTEX_DATA, 0 }, { "COLOR", 0, DXGI_FORMAT_R32G32B32A32_FLOAT, 0, 12, D3D12_INPUT_CLASSIFICATION_PER_VERTEX_DATA, 0 } }; // fill out an input layout description structure D3D12_INPUT_LAYOUT_DESC inputLayoutDesc = {}; // we can get the number of elements in an array by "sizeof(array) / sizeof(arrayElementType)" inputLayoutDesc.NumElements = sizeof(inputLayout) / sizeof(D3D12_INPUT_ELEMENT_DESC); inputLayoutDesc.pInputElementDescs = inputLayout; // create a pipeline state object (PSO) // In a real application, you will have many pso's. for each different shader // or different combinations of shaders, different blend states or different rasterizer states, // different topology types (point, line, triangle, patch), or a different number // of render targets you will need a pso // VS is the only required shader for a pso. You might be wondering when a case would be where // you only set the VS. It's possible that you have a pso that only outputs data with the stream // output, and not on a render target, which means you would not need anything after the stream // output. D3D12_GRAPHICS_PIPELINE_STATE_DESC psoDesc = {}; // a structure to define a pso psoDesc.InputLayout = inputLayoutDesc; // the structure describing our input layout psoDesc.pRootSignature = rootSignature; // the root signature that describes the input data this pso needs psoDesc.VS = vertexShaderBytecode; // structure describing where to find the vertex shader bytecode and how large it is psoDesc.PS = pixelShaderBytecode; // same as VS but for pixel shader psoDesc.PrimitiveTopologyType = D3D12_PRIMITIVE_TOPOLOGY_TYPE_TRIANGLE; // type of topology we are drawing psoDesc.RTVFormats[0] = DXGI_FORMAT_R8G8B8A8_UNORM; // format of the render target psoDesc.SampleDesc = sampleDesc; // must be the same sample description as the swapchain and depth/stencil buffer psoDesc.SampleMask = 0xffffffff; // sample mask has to do with multi-sampling. 0xffffffff means point sampling is done psoDesc.RasterizerState = CD3DX12_RASTERIZER_DESC(D3D12_DEFAULT); // a default rasterizer state. psoDesc.BlendState = CD3DX12_BLEND_DESC(D3D12_DEFAULT); // a default blent state. psoDesc.NumRenderTargets = 1; // we are only binding one render target psoDesc.DepthStencilState = CD3DX12_DEPTH_STENCIL_DESC(D3D12_DEFAULT); // a default depth stencil state // create the pso hr = device->CreateGraphicsPipelineState(&psoDesc, IID_PPV_ARGS(&pipelineStateObject)); if (FAILED(hr)) { return false; } // Create vertex buffer // a quad Vertex vList[] = { // front face { -0.5f, 0.5f, -0.5f, 1.0f, 0.0f, 0.0f, 1.0f }, { 0.5f, -0.5f, -0.5f, 1.0f, 0.0f, 1.0f, 1.0f }, { -0.5f, -0.5f, -0.5f, 0.0f, 0.0f, 1.0f, 1.0f }, { 0.5f, 0.5f, -0.5f, 0.0f, 1.0f, 0.0f, 1.0f }, // right side face { 0.5f, -0.5f, -0.5f, 1.0f, 0.0f, 0.0f, 1.0f }, { 0.5f, 0.5f, 0.5f, 1.0f, 0.0f, 1.0f, 1.0f }, { 0.5f, -0.5f, 0.5f, 0.0f, 0.0f, 1.0f, 1.0f }, { 0.5f, 0.5f, -0.5f, 0.0f, 1.0f, 0.0f, 1.0f }, // left side face { -0.5f, 0.5f, 0.5f, 1.0f, 0.0f, 0.0f, 1.0f }, { -0.5f, -0.5f, -0.5f, 1.0f, 0.0f, 1.0f, 1.0f }, { -0.5f, -0.5f, 0.5f, 0.0f, 0.0f, 1.0f, 1.0f }, { -0.5f, 0.5f, -0.5f, 0.0f, 1.0f, 0.0f, 1.0f }, // back face { 0.5f, 0.5f, 0.5f, 1.0f, 0.0f, 0.0f, 1.0f }, { -0.5f, -0.5f, 0.5f, 1.0f, 0.0f, 1.0f, 1.0f }, { 0.5f, -0.5f, 0.5f, 0.0f, 0.0f, 1.0f, 1.0f }, { -0.5f, 0.5f, 0.5f, 0.0f, 1.0f, 0.0f, 1.0f }, // top face { -0.5f, 0.5f, -0.5f, 1.0f, 0.0f, 0.0f, 1.0f }, { 0.5f, 0.5f, 0.5f, 1.0f, 0.0f, 1.0f, 1.0f }, { 0.5f, 0.5f, -0.5f, 0.0f, 0.0f, 1.0f, 1.0f }, { -0.5f, 0.5f, 0.5f, 0.0f, 1.0f, 0.0f, 1.0f }, // bottom face { 0.5f, -0.5f, 0.5f, 1.0f, 0.0f, 0.0f, 1.0f }, { -0.5f, -0.5f, -0.5f, 1.0f, 0.0f, 1.0f, 1.0f }, { 0.5f, -0.5f, -0.5f, 0.0f, 0.0f, 1.0f, 1.0f }, { -0.5f, -0.5f, 0.5f, 0.0f, 1.0f, 0.0f, 1.0f }, }; int vBufferSize = sizeof(vList); // create default heap // default heap is memory on the GPU. Only the GPU has access to this memory // To get data into this heap, we will have to upload the data using // an upload heap device->CreateCommittedResource( &CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_DEFAULT), // a default heap D3D12_HEAP_FLAG_NONE, // no flags &CD3DX12_RESOURCE_DESC::Buffer(vBufferSize), // resource description for a buffer D3D12_RESOURCE_STATE_COPY_DEST, // we will start this heap in the copy destination state since we will copy data // from the upload heap to this heap nullptr, // optimized clear value must be null for this type of resource. used for render targets and depth/stencil buffers IID_PPV_ARGS(&vertexBuffer)); // we can give resource heaps a name so when we debug with the graphics debugger we know what resource we are looking at vertexBuffer->SetName(L"Vertex Buffer Resource Heap"); // create upload heap // upload heaps are used to upload data to the GPU. CPU can write to it, GPU can read from it // We will upload the vertex buffer using this heap to the default heap ID3D12Resource* vBufferUploadHeap; device->CreateCommittedResource( &CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_UPLOAD), // upload heap D3D12_HEAP_FLAG_NONE, // no flags &CD3DX12_RESOURCE_DESC::Buffer(vBufferSize), // resource description for a buffer D3D12_RESOURCE_STATE_GENERIC_READ, // GPU will read from this buffer and copy its contents to the default heap nullptr, IID_PPV_ARGS(&vBufferUploadHeap)); vBufferUploadHeap->SetName(L"Vertex Buffer Upload Resource Heap"); // store vertex buffer in upload heap D3D12_SUBRESOURCE_DATA vertexData = {}; vertexData.pData = reinterpret_cast<BYTE*>(vList); // pointer to our vertex array vertexData.RowPitch = vBufferSize; // size of all our triangle vertex data vertexData.SlicePitch = vBufferSize; // also the size of our triangle vertex data // we are now creating a command with the command list to copy the data from // the upload heap to the default heap UpdateSubresources(commandList, vertexBuffer, vBufferUploadHeap, 0, 0, 1, &vertexData); // transition the vertex buffer data from copy destination state to vertex buffer state commandList->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(vertexBuffer, D3D12_RESOURCE_STATE_COPY_DEST, D3D12_RESOURCE_STATE_VERTEX_AND_CONSTANT_BUFFER)); // Create index buffer // a quad (2 triangles) DWORD iList[] = { // ffront face 0, 1, 2, // first triangle 0, 3, 1, // second triangle // left face 4, 5, 6, // first triangle 4, 7, 5, // second triangle // right face 8, 9, 10, // first triangle 8, 11, 9, // second triangle // back face 12, 13, 14, // first triangle 12, 15, 13, // second triangle // top face 16, 17, 18, // first triangle 16, 19, 17, // second triangle // bottom face 20, 21, 22, // first triangle 20, 23, 21, // second triangle }; int iBufferSize = sizeof(iList); numCubeIndices = sizeof(iList) / sizeof(DWORD); // create default heap to hold index buffer device->CreateCommittedResource( &CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_DEFAULT), // a default heap D3D12_HEAP_FLAG_NONE, // no flags &CD3DX12_RESOURCE_DESC::Buffer(iBufferSize), // resource description for a buffer D3D12_RESOURCE_STATE_COPY_DEST, // start in the copy destination state nullptr, // optimized clear value must be null for this type of resource IID_PPV_ARGS(&indexBuffer)); // we can give resource heaps a name so when we debug with the graphics debugger we know what resource we are looking at vertexBuffer->SetName(L"Index Buffer Resource Heap"); // create upload heap to upload index buffer ID3D12Resource* iBufferUploadHeap; device->CreateCommittedResource( &CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_UPLOAD), // upload heap D3D12_HEAP_FLAG_NONE, // no flags &CD3DX12_RESOURCE_DESC::Buffer(vBufferSize), // resource description for a buffer D3D12_RESOURCE_STATE_GENERIC_READ, // GPU will read from this buffer and copy its contents to the default heap nullptr, IID_PPV_ARGS(&iBufferUploadHeap)); vBufferUploadHeap->SetName(L"Index Buffer Upload Resource Heap"); // store vertex buffer in upload heap D3D12_SUBRESOURCE_DATA indexData = {}; indexData.pData = reinterpret_cast<BYTE*>(iList); // pointer to our index array indexData.RowPitch = iBufferSize; // size of all our index buffer indexData.SlicePitch = iBufferSize; // also the size of our index buffer // we are now creating a command with the command list to copy the data from // the upload heap to the default heap UpdateSubresources(commandList, indexBuffer, iBufferUploadHeap, 0, 0, 1, &indexData); // transition the vertex buffer data from copy destination state to vertex buffer state commandList->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(indexBuffer, D3D12_RESOURCE_STATE_COPY_DEST, D3D12_RESOURCE_STATE_VERTEX_AND_CONSTANT_BUFFER)); // Create the depth/stencil buffer // create a depth stencil descriptor heap so we can get a pointer to the depth stencil buffer D3D12_DESCRIPTOR_HEAP_DESC dsvHeapDesc = {}; dsvHeapDesc.NumDescriptors = 1; dsvHeapDesc.Type = D3D12_DESCRIPTOR_HEAP_TYPE_DSV; dsvHeapDesc.Flags = D3D12_DESCRIPTOR_HEAP_FLAG_NONE; hr = device->CreateDescriptorHeap(&dsvHeapDesc, IID_PPV_ARGS(&dsDescriptorHeap)); if (FAILED(hr)) { Running = false; } D3D12_DEPTH_STENCIL_VIEW_DESC depthStencilDesc = {}; depthStencilDesc.Format = DXGI_FORMAT_D32_FLOAT; depthStencilDesc.ViewDimension = D3D12_DSV_DIMENSION_TEXTURE2D; depthStencilDesc.Flags = D3D12_DSV_FLAG_NONE; D3D12_CLEAR_VALUE depthOptimizedClearValue = {}; depthOptimizedClearValue.Format = DXGI_FORMAT_D32_FLOAT; depthOptimizedClearValue.DepthStencil.Depth = 1.0f; depthOptimizedClearValue.DepthStencil.Stencil = 0; device->CreateCommittedResource( &CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_DEFAULT), D3D12_HEAP_FLAG_NONE, &CD3DX12_RESOURCE_DESC::Tex2D(DXGI_FORMAT_D32_FLOAT, Width, Height, 1, 0, 1, 0, D3D12_RESOURCE_FLAG_ALLOW_DEPTH_STENCIL), D3D12_RESOURCE_STATE_DEPTH_WRITE, &depthOptimizedClearValue, IID_PPV_ARGS(&depthStencilBuffer) ); dsDescriptorHeap->SetName(L"Depth/Stencil Resource Heap"); device->CreateDepthStencilView(depthStencilBuffer, &depthStencilDesc, dsDescriptorHeap->GetCPUDescriptorHandleForHeapStart()); // create the constant buffer resource heap // We will update the constant buffer one or more times per frame, so we will use only an upload heap // unlike previously we used an upload heap to upload the vertex and index data, and then copied over // to a default heap. If you plan to use a resource for more than a couple frames, it is usually more // efficient to copy to a default heap where it stays on the gpu. In this case, our constant buffer // will be modified and uploaded at least once per frame, so we only use an upload heap // first we will create a resource heap (upload heap) for each frame for the cubes constant buffers // As you can see, we are allocating 64KB for each resource we create. Buffer resource heaps must be // an alignment of 64KB. We are creating 3 resources, one for each frame. Each constant buffer is // only a 4x4 matrix of floats in this tutorial. So with a float being 4 bytes, we have // 16 floats in one constant buffer, and we will store 2 constant buffers in each // heap, one for each cube, thats only 64x2 bits, or 128 bits we are using for each // resource, and each resource must be at least 64KB (65536 bits) for (int i = 0; i < frameBufferCount; ++i) { // create resource for cube 1 hr = device->CreateCommittedResource( &CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_UPLOAD), // this heap will be used to upload the constant buffer data D3D12_HEAP_FLAG_NONE, // no flags &CD3DX12_RESOURCE_DESC::Buffer(1024 * 64), // size of the resource heap. Must be a multiple of 64KB for single-textures and constant buffers D3D12_RESOURCE_STATE_GENERIC_READ, // will be data that is read from so we keep it in the generic read state nullptr, // we do not have use an optimized clear value for constant buffers IID_PPV_ARGS(&constantBufferUploadHeaps[i])); constantBufferUploadHeaps[i]->SetName(L"Constant Buffer Upload Resource Heap"); ZeroMemory(&cbPerObject, sizeof(cbPerObject)); CD3DX12_RANGE readRange(0, 0); // We do not intend to read from this resource on the CPU. (so end is less than or equal to begin) // map the resource heap to get a gpu virtual address to the beginning of the heap hr = constantBufferUploadHeaps[i]->Map(0, &readRange, reinterpret_cast<void**>(&cbvGPUAddress[i])); // Because of the constant read alignment requirements, constant buffer views must be 256 bit aligned. Our buffers are smaller than 256 bits, // so we need to add spacing between the two buffers, so that the second buffer starts at 256 bits from the beginning of the resource heap. memcpy(cbvGPUAddress[i], &cbPerObject, sizeof(cbPerObject)); // cube1's constant buffer data memcpy(cbvGPUAddress[i] + ConstantBufferPerObjectAlignedSize, &cbPerObject, sizeof(cbPerObject)); // cube2's constant buffer data } // Now we execute the command list to upload the initial assets (triangle data) commandList->Close(); ID3D12CommandList* ppCommandLists[] = { commandList }; commandQueue->ExecuteCommandLists(_countof(ppCommandLists), ppCommandLists); // increment the fence value now, otherwise the buffer might not be uploaded by the time we start drawing fenceValue[frameIndex]++; hr = commandQueue->Signal(fence[frameIndex], fenceValue[frameIndex]); if (FAILED(hr)) { Running = false; } // create a vertex buffer view for the triangle. We get the GPU memory address to the vertex pointer using the GetGPUVirtualAddress() method vertexBufferView.BufferLocation = vertexBuffer->GetGPUVirtualAddress(); vertexBufferView.StrideInBytes = sizeof(Vertex); vertexBufferView.SizeInBytes = vBufferSize; // create a vertex buffer view for the triangle. We get the GPU memory address to the vertex pointer using the GetGPUVirtualAddress() method indexBufferView.BufferLocation = indexBuffer->GetGPUVirtualAddress(); indexBufferView.Format = DXGI_FORMAT_R32_UINT; // 32-bit unsigned integer (this is what a dword is, double word, a word is 2 bytes) indexBufferView.SizeInBytes = iBufferSize; // Fill out the Viewport viewport.TopLeftX = 0; viewport.TopLeftY = 0; viewport.Width = Width; viewport.Height = Height; viewport.MinDepth = 0.0f; viewport.MaxDepth = 1.0f; // Fill out a scissor rect scissorRect.left = 0; scissorRect.top = 0; scissorRect.right = Width; scissorRect.bottom = Height; // build projection and view matrix XMMATRIX tmpMat = XMMatrixPerspectiveFovLH(45.0f*(3.14f/180.0f), (float)Width / (float)Height, 0.1f, 1000.0f); XMStoreFloat4x4(&cameraProjMat, tmpMat); // set starting camera state cameraPosition = XMFLOAT4(0.0f, 2.0f, -4.0f, 0.0f); cameraTarget = XMFLOAT4(0.0f, 0.0f, 0.0f, 0.0f); cameraUp = XMFLOAT4(0.0f, 1.0f, 0.0f, 0.0f); // build view matrix XMVECTOR cPos = XMLoadFloat4(&cameraPosition); XMVECTOR cTarg = XMLoadFloat4(&cameraTarget); XMVECTOR cUp = XMLoadFloat4(&cameraUp); tmpMat = XMMatrixLookAtLH(cPos, cTarg, cUp); XMStoreFloat4x4(&cameraViewMat, tmpMat); // set starting cubes position // first cube cube1Position = XMFLOAT4(0.0f, 0.0f, 0.0f, 0.0f); // set cube 1's position XMVECTOR posVec = XMLoadFloat4(&cube1Position); // create xmvector for cube1's position tmpMat = XMMatrixTranslationFromVector(posVec); // create translation matrix from cube1's position vector XMStoreFloat4x4(&cube1RotMat, XMMatrixIdentity()); // initialize cube1's rotation matrix to identity matrix XMStoreFloat4x4(&cube1WorldMat, tmpMat); // store cube1's world matrix // second cube cube2PositionOffset = XMFLOAT4(1.5f, 0.0f, 0.0f, 0.0f); posVec = XMLoadFloat4(&cube2PositionOffset) + XMLoadFloat4(&cube1Position); // create xmvector for cube2's position // we are rotating around cube1 here, so add cube2's position to cube1 tmpMat = XMMatrixTranslationFromVector(posVec); // create translation matrix from cube2's position offset vector XMStoreFloat4x4(&cube2RotMat, XMMatrixIdentity()); // initialize cube2's rotation matrix to identity matrix XMStoreFloat4x4(&cube2WorldMat, tmpMat); // store cube2's world matrix return true; } void Update() { // update app logic, such as moving the camera or figuring out what objects are in view // create rotation matrices XMMATRIX rotXMat = XMMatrixRotationX(0.0001f); XMMATRIX rotYMat = XMMatrixRotationY(0.0002f); XMMATRIX rotZMat = XMMatrixRotationZ(0.0003f); // add rotation to cube1's rotation matrix and store it XMMATRIX rotMat = XMLoadFloat4x4(&cube1RotMat) * rotXMat * rotYMat * rotZMat; XMStoreFloat4x4(&cube1RotMat, rotMat); // create translation matrix for cube 1 from cube 1's position vector XMMATRIX translationMat = XMMatrixTranslationFromVector(XMLoadFloat4(&cube1Position)); // create cube1's world matrix by first rotating the cube, then positioning the rotated cube XMMATRIX worldMat = rotMat * translationMat; // store cube1's world matrix XMStoreFloat4x4(&cube1WorldMat, worldMat); // update constant buffer for cube1 // create the wvp matrix and store in constant buffer XMMATRIX viewMat = XMLoadFloat4x4(&cameraViewMat); // load view matrix XMMATRIX projMat = XMLoadFloat4x4(&cameraProjMat); // load projection matrix XMMATRIX wvpMat = XMLoadFloat4x4(&cube1WorldMat) * viewMat * projMat; // create wvp matrix XMMATRIX transposed = XMMatrixTranspose(wvpMat); // must transpose wvp matrix for the gpu XMStoreFloat4x4(&cbPerObject.wvpMat, transposed); // store transposed wvp matrix in constant buffer // copy our ConstantBuffer instance to the mapped constant buffer resource memcpy(cbvGPUAddress[frameIndex], &cbPerObject, sizeof(cbPerObject)); // now do cube2's world matrix // create rotation matrices for cube2 rotXMat = XMMatrixRotationX(0.0003f); rotYMat = XMMatrixRotationY(0.0002f); rotZMat = XMMatrixRotationZ(0.0001f); // add rotation to cube2's rotation matrix and store it rotMat = rotZMat * (XMLoadFloat4x4(&cube2RotMat) * (rotXMat * rotYMat)); XMStoreFloat4x4(&cube2RotMat, rotMat); // create translation matrix for cube 2 to offset it from cube 1 (its position relative to cube1 XMMATRIX translationOffsetMat = XMMatrixTranslationFromVector(XMLoadFloat4(&cube2PositionOffset)); // we want cube 2 to be half the size of cube 1, so we scale it by .5 in all dimensions XMMATRIX scaleMat = XMMatrixScaling(0.5f, 0.5f, 0.5f); // reuse worldMat. // first we scale cube2. scaling happens relative to point 0,0,0, so you will almost always want to scale first // then we translate it. // then we rotate it. rotation always rotates around point 0,0,0 // finally we move it to cube 1's position, which will cause it to rotate around cube 1 worldMat = scaleMat * translationOffsetMat * rotMat * translationMat; wvpMat = XMLoadFloat4x4(&cube2WorldMat) * viewMat * projMat; // create wvp matrix transposed = XMMatrixTranspose(wvpMat); // must transpose wvp matrix for the gpu XMStoreFloat4x4(&cbPerObject.wvpMat, transposed); // store transposed wvp matrix in constant buffer // copy our ConstantBuffer instance to the mapped constant buffer resource memcpy(cbvGPUAddress[frameIndex] + ConstantBufferPerObjectAlignedSize, &cbPerObject, sizeof(cbPerObject)); // store cube2's world matrix XMStoreFloat4x4(&cube2WorldMat, worldMat); } void UpdatePipeline() { HRESULT hr; // We have to wait for the gpu to finish with the command allocator before we reset it WaitForPreviousFrame(); // we can only reset an allocator once the gpu is done with it // resetting an allocator frees the memory that the command list was stored in hr = commandAllocator[frameIndex]->Reset(); if (FAILED(hr)) { Running = false; } // reset the command list. by resetting the command list we are putting it into // a recording state so we can start recording commands into the command allocator. // the command allocator that we reference here may have multiple command lists // associated with it, but only one can be recording at any time. Make sure // that any other command lists associated to this command allocator are in // the closed state (not recording). // Here you will pass an initial pipeline state object as the second parameter, // but in this tutorial we are only clearing the rtv, and do not actually need // anything but an initial default pipeline, which is what we get by setting // the second parameter to NULL hr = commandList->Reset(commandAllocator[frameIndex], pipelineStateObject); if (FAILED(hr)) { Running = false; } // here we start recording commands into the commandList (which all the commands will be stored in the commandAllocator) // transition the "frameIndex" render target from the present state to the render target state so the command list draws to it starting from here commandList->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(renderTargets[frameIndex], D3D12_RESOURCE_STATE_PRESENT, D3D12_RESOURCE_STATE_RENDER_TARGET)); // here we again get the handle to our current render target view so we can set it as the render target in the output merger stage of the pipeline CD3DX12_CPU_DESCRIPTOR_HANDLE rtvHandle(rtvDescriptorHeap->GetCPUDescriptorHandleForHeapStart(), frameIndex, rtvDescriptorSize); // get a handle to the depth/stencil buffer CD3DX12_CPU_DESCRIPTOR_HANDLE dsvHandle(dsDescriptorHeap->GetCPUDescriptorHandleForHeapStart()); // set the render target for the output merger stage (the output of the pipeline) commandList->OMSetRenderTargets(1, &rtvHandle, FALSE, &dsvHandle); // Clear the render target by using the ClearRenderTargetView command const float clearColor[] = { 0.0f, 0.2f, 0.4f, 1.0f }; commandList->ClearRenderTargetView(rtvHandle, clearColor, 0, nullptr); // clear the depth/stencil buffer commandList->ClearDepthStencilView(dsDescriptorHeap->GetCPUDescriptorHandleForHeapStart(), D3D12_CLEAR_FLAG_DEPTH, 1.0f, 0, 0, nullptr); // set root signature commandList->SetGraphicsRootSignature(rootSignature); // set the root signature // draw triangle commandList->RSSetViewports(1, &viewport); // set the viewports commandList->RSSetScissorRects(1, &scissorRect); // set the scissor rects commandList->IASetPrimitiveTopology(D3D_PRIMITIVE_TOPOLOGY_TRIANGLELIST); // set the primitive topology commandList->IASetVertexBuffers(0, 1, &vertexBufferView); // set the vertex buffer (using the vertex buffer view) commandList->IASetIndexBuffer(&indexBufferView); // first cube // set cube1's constant buffer commandList->SetGraphicsRootConstantBufferView(0, constantBufferUploadHeaps[frameIndex]->GetGPUVirtualAddress()); // draw first cube commandList->DrawIndexedInstanced(numCubeIndices, 1, 0, 0, 0); // second cube // set cube2's constant buffer. You can see we are adding the size of ConstantBufferPerObject to the constant buffer // resource heaps address. This is because cube1's constant buffer is stored at the beginning of the resource heap, while // cube2's constant buffer data is stored after (256 bits from the start of the heap). commandList->SetGraphicsRootConstantBufferView(0, constantBufferUploadHeaps[frameIndex]->GetGPUVirtualAddress() + ConstantBufferPerObjectAlignedSize); // draw second cube commandList->DrawIndexedInstanced(numCubeIndices, 1, 0, 0, 0); // transition the "frameIndex" render target from the render target state to the present state. If the debug layer is enabled, you will receive a // warning if present is called on the render target when it's not in the present state commandList->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(renderTargets[frameIndex], D3D12_RESOURCE_STATE_RENDER_TARGET, D3D12_RESOURCE_STATE_PRESENT)); hr = commandList->Close(); if (FAILED(hr)) { Running = false; } } void Render() { HRESULT hr; UpdatePipeline(); // update the pipeline by sending commands to the commandqueue // create an array of command lists (only one command list here) ID3D12CommandList* ppCommandLists[] = { commandList }; // execute the array of command lists commandQueue->ExecuteCommandLists(_countof(ppCommandLists), ppCommandLists); // this command goes in at the end of our command queue. we will know when our command queue // has finished because the fence value will be set to "fenceValue" from the GPU since the command // queue is being executed on the GPU hr = commandQueue->Signal(fence[frameIndex], fenceValue[frameIndex]); if (FAILED(hr)) { Running = false; } // present the current backbuffer hr = swapChain->Present(0, 0); if (FAILED(hr)) { Running = false; } } void Cleanup() { // wait for the gpu to finish all frames for (int i = 0; i < frameBufferCount; ++i) { frameIndex = i; WaitForPreviousFrame(); } // get swapchain out of full screen before exiting BOOL fs = false; if (swapChain->GetFullscreenState(&fs, NULL)) swapChain->SetFullscreenState(false, NULL); SAFE_RELEASE(device); SAFE_RELEASE(swapChain); SAFE_RELEASE(commandQueue); SAFE_RELEASE(rtvDescriptorHeap); SAFE_RELEASE(commandList); for (int i = 0; i < frameBufferCount; ++i) { SAFE_RELEASE(renderTargets[i]); SAFE_RELEASE(commandAllocator[i]); SAFE_RELEASE(fence[i]); }; SAFE_RELEASE(pipelineStateObject); SAFE_RELEASE(rootSignature); SAFE_RELEASE(vertexBuffer); SAFE_RELEASE(indexBuffer); SAFE_RELEASE(depthStencilBuffer); SAFE_RELEASE(dsDescriptorHeap); for (int i = 0; i < frameBufferCount; ++i) { SAFE_RELEASE(constantBufferUploadHeaps[i]); }; } void WaitForPreviousFrame() { HRESULT hr; // swap the current rtv buffer index so we draw on the correct buffer frameIndex = swapChain->GetCurrentBackBufferIndex(); // if the current fence value is still less than "fenceValue", then we know the GPU has not finished executing // the command queue since it has not reached the "commandQueue->Signal(fence, fenceValue)" command if (fence[frameIndex]->GetCompletedValue() < fenceValue[frameIndex]) { // we have the fence create an event which is signaled once the fence's current value is "fenceValue" hr = fence[frameIndex]->SetEventOnCompletion(fenceValue[frameIndex], fenceEvent); if (FAILED(hr)) { Running = false; } // We will wait until the fence has triggered the event that it's current value has reached "fenceValue". once it's value // has reached "fenceValue", we know the command queue has finished executing WaitForSingleObject(fenceEvent, INFINITE); } // increment fenceValue for next frame fenceValue[frameIndex]++; }
I think that you have some typo errors in your ConstantBuffer 256 byte alignment explantions.
Here:
struct ConstantBuffer
{
float4x4 wvpMat;
// now pad the constant buffer to be 256 byte aligned
float4 padding[48];
}
"float4 padding[48]" should be "float padding[48]" (192 bytes) to correctly offset the "float4x4 wvpMat" (64 bytes) to 256 byte alignment.
And here:
// Constant buffers must be 256-byte aligned which has to do with constant reads on the GPU.
// We are only able to read at 256 byte intervals from the start of a resource heap, so we will
// make sure that we add padding between the two constant buffers in the heap (one for cube1 and one for cube2)
// Another way to do this would be to add a float array in the constant buffer structure for padding. In this case
// we would need to add a float padding[50]; after the wvpMat variable. This would align our structure to 256 bytes (4 bytes per float)
"float padding[50]" should be "float padding[48]"
on Apr 23 `16
AllanF
My original formatting seems to have got jumbled...but you should be able to get the gist :)
on Apr 23 `16
AllanF
Thank you for catching that AllanF, I will get those fixed
on Apr 24 `16
iedoc
I just had completed putting this to my sources, my difficulty was assigning this to every object I load, but the objects are loaded at later stages, thanks to your tutorials it is working as it should but with heavy modifications :) thanks iedoc :) hope for more tutorials soon :)
on May 03 `16
maxiorek82
@iedoc, Nice tutorial. Thanks for posting. I did see one typo. In the "Row Major/Column Major Ordering" section within the first pre-formated snippet, you labeled both matrices as Row Major Order, where the right matrix should be Column Major Order.
on May 30 `16
DustinB
@iedoc, In the "Translation Matrix" and "Rotation Matrices" sections, consider transposing your matrices so they are consistent with the DirectXMath row-major ordering.
on May 30 `16
DustinB
@iedoc, In the sentence, "You can use the XMMatrixScaling() method of the DirectX Math library to create a translation matrix." change "translation matrix" to "scaling matrix".
on May 30 `16
DustinB
Thank you DustinB~ I will fix those later today
on May 30 `16
iedoc
@iedoc, Under the "View Matrix" section, the matrix supplied is not quite right. The view matrix is a rotation matrix times a negative translation matrix V = R * T(-position). But there is a simplified form that looks like your matrix if you simply replace the column [position.x, position.y, position.z, 1] with [dot(right, -position), dot(up, -position), dot(forward, -position), 1 ]. Then it will be correct. As with my previous comment, this too would be better if transposed to keep everything consistent with row-major ordering.
on May 30 `16
DustinB
If its looklike your model are flipped in the x axis replace the "XMMatrixPerspectiveFovLH" and "XMMatrixLookAtLH" functions for the right handed ones "XMMatrixPerspectiveFovRH" and "XMMatrixLookAtRH"
on May 15 `17
CLoyz
Hi, I try to have 2 constant buffer (so register b0 and b1 if I understood well)
For that I moodify the definition of root signature like this: https://pastebin.com/heGLWimd
But after I am lost I tried multiple thing when created yhe constant buffer upload heaps, but didn't find how handle that, is someone know how to do that?
Thank
on Dec 21 `17
Zeldarck
- fail of tuto chapter -
on Dec 21 `17
Zeldarck
@Zeldarck, head over to the Discord channel. There, someone will help you
on Dec 22 `17
aman2218
It seems that the application is updating on the frame currently being rendered, and render on the frame with data updated 3 frames before. Is this behavior on purpose, or just i missed something in the code, thanks.
on Aug 13 `22
decayfang
Sign in to comment