Quantcast
Channel: Intel Developer Zone Articles
Viewing all 533 articles
Browse latest View live

API without Secrets: Introduction to Vulkan* Part 3: First Triangle

$
0
0

Download [PDF 885 KB]

Link to Github Sample Code


Go to: API without Secrets: Introduction to Vulkan* Part 2: Swap Chain


Table of Contents

Tutorial 3: First Triangle – Graphics Pipeline and Drawing

In this tutorial we will finally draw something on the screen. One single triangle should be just fine for our first Vulkan-generated “image.”

The graphics pipeline and drawing in general require lots of preparations in Vulkan (in the form of filling many structures with even more different fields). There are potentially many places where we can make mistakes, and in Vulkan, even simple mistakes may lead to the application not working as expected, displaying just a blank screen, and leaving us wondering what went wrong. In such situations validation layers can help us a lot. But I didn’t want to dive into too many different aspects and the specifics of the Vulkan API. So I prepared the code to be as small and as simple as possible.

This led me to create an application that is working properly and displays a simple triangle the way I expected, but it also uses mechanics that are not recommended, not flexible, and also probably not too efficient (though correct). I don’t want to teach solutions that aren’t recommended, but here it simplifies the tutorial quite considerably and allows us to focus only on the minimal required set of API usage. I will point out the “disputable” functionality as soon as we get to it. And in the next tutorial, I will show the recommended way of drawing triangles.

To draw our first simple triangle, we need to create a render pass, a framebuffer, and a graphics pipeline. Command buffers are of course also needed, but we already know something about them. We will create simple GLSL shaders and compile them into Khronos’s SPIR*-V language—the only (at this time) form of shaders that Vulkan (officially) understands.

If nothing displays on your computer’s screen, try to simplify the code as much as possible or even go back to the second tutorial. Check whether command buffer that just clears image behaves as expected, and that the color the image was cleared to is properly displayed on the screen. If yes, modify the code and add the parts from this tutorial. Check every return value if it is not VK_SUCCESS. If these ideas don’t help, wait for the tutorial about validation layers.

About the Source Code Example

For this and succeeding tutorials, I’ve changed the sample project. Vulkan preparation phases that were described in the previous tutorials were placed in a “VulkanCommon” class found in separate files (header and source). The class for a given tutorial that is responsible for presenting topics described in a given tutorial, inherits from the “VulkanCommon” class and has access to some (required) Vulkan variables like device or swap chain. This way I can reuse Vulkan creation code and prepare smaller classes focusing only on the presented topics. The code from the earlier chapters works properly so it should also be easier to find potential mistakes.

I’ve also added a separate set of files for some utility functions. Here we will be reading SPIR-V shaders from binary files, so I’ve added a function for checking loading contents of a binary file. It can be found in Tools.cpp and Tools.h files.

Creating a Render Pass

To draw anything on the screen, we need a graphics pipeline. But creating it now will require pointers to other structures, which will probably also need pointers to yet other structures. So we’ll start with a render pass.

What is a render pass? A general picture can give us a “logical” render pass that may be found in many known rendering techniques like deferred shading. This technique consists of many subpasses. The first subpass draws the geometry with shaders that fill the G-Buffer: store diffuse color in one texture, normal vectors in another, shininess in another, depth (position) in yet another. Next for each light source, drawing is performed that reads some of the data (normal vectors, shininess, depth/position), calculates lighting and stores it in another texture. Final pass aggregates lighting data with diffuse color. This is a (very rough) explanation of deferred shading but describes the render pass—a set of data required to perform some drawing operations: storing data in textures and reading data from other textures.

In Vulkan, a render pass represents (or describes) a set of framebuffer attachments (images) required for drawing operations and a collection of subpasses that drawing operations will be ordered into. It is a construct that collects all color, depth and stencil attachments and operations modifying them in such a way that driver does not have to deduce this information by itself what may give substantial optimization opportunities on some GPUs. A subpass consists of drawing operations that use (more or less) the same attachments. Each of these drawing operations may read from some input attachments and render data into some other (color, depth, stencil) attachments. A render pass also describes the dependencies between these attachments: in one subpass we perform rendering into the texture, but in another this texture will be used as a source of data (that is, it will be sampled from). All this data help the graphics hardware optimize drawing operations.

To create a render pass in Vulkan, we call the vkCreateRenderPass() function, which requires a pointer to a structure describing all the attachments involved in rendering and all the subpasses forming the render pass. As usual, the more attachments and subpasses we use, the more array elements containing properly filed structures we need. In our simple example, we will be drawing only into a single texture (color attachment) with just a single subpass.

Render Pass Attachment Description

VkAttachmentDescription attachment_descriptions[] = {
  {
    0,                                          // VkAttachmentDescriptionFlags   flags
    GetSwapChain().Format,                      // VkFormat                       format
    VK_SAMPLE_COUNT_1_BIT,                      // VkSampleCountFlagBits          samples
    VK_ATTACHMENT_LOAD_OP_CLEAR,                // VkAttachmentLoadOp             loadOp
    VK_ATTACHMENT_STORE_OP_STORE,               // VkAttachmentStoreOp            storeOp
    VK_ATTACHMENT_LOAD_OP_DONT_CARE,            // VkAttachmentLoadOp             stencilLoadOp
    VK_ATTACHMENT_STORE_OP_DONT_CARE,           // VkAttachmentStoreOp            stencilStoreOp
    VK_IMAGE_LAYOUT_PRESENT_SRC_KHR,            // VkImageLayout                  initialLayout;
    VK_IMAGE_LAYOUT_PRESENT_SRC_KHR             // VkImageLayout                  finalLayout
  }
};

1.Tutorial03.cpp, function CreateRenderPass()

To create a render pass, first we prepare an array with elements describing each attachment, regardless of the type of attachment and how it will be used inside a render pass. Each array element is of type VkAttachmentDescription, which contains the following fields:

  • flags – Describes additional properties of an attachment. Currently, only an aliasing flag is available, which informs the driver that the attachment shares the same physical memory with another attachment; it is not the case here so we set this parameter to zero.
  • format – Format of an image used for the attachment; here we are rendering directly into a swap chain so we need to take its format.
  • samples – Number of samples of the image; we are not using any multisampling here so we just use one sample.
  • loadOp – Specifies what to do with the image’s contents at the beginning of a render pass, whether we want them to be cleared, preserved, or we don’t care about them (as we will overwrite them all). Here we want to clear the image to the specified value. This parameter also refers to depth part of depth/stencil images.
  • storeOp – Informs the driver what to do with the image’s contents after the render pass (after a subpass in which the image was used for the last time). Here we want the contents of the image to be preserved after the render pass as we intend to display them on screen. This parameter also refers to the depth part of depth/stencil images.
  • stencilLoadOp – The same as loadOp but for the stencil part of depth/stencil images; for color attachments it is ignored.
  • stencilStoreOp – The same as storeOp but for the stencil part of depth/stencil images; for color attachments this parameter is ignored.
  • initialLayout – The layout the given attachment will have when the render pass starts (what the layout image is provided with by the application).
  • finalLayout – The layout the driver will automatically transition the given image into at the end of a render pass.

Some additional information is required for load and store operations and initial and final layouts.

Load op refers to the attachment’s contents at the beginning of a render pass. This operation describes what the graphics hardware should do with the attachment: clear it, operate on its existing contents (leave its contents untouched), or it shouldn’t matter about the contents because the application intends to overwrite them. This gives the hardware an opportunity to optimize memory operations. For example, if we intend to overwrite all of the contents, the hardware won’t bother with them and, if it is faster, may allocate totally new memory for the attachment.

Store op, as the name suggests, is used at the end of a render pass and informs the hardware whether we want to use the contents of the attachment after the render pass or if we don’t care about it and the contents may be discarded. In some scenarios (when contents are discarded) this creates the ability for the hardware to create an image in temporary, fast memory as the image will “live” only during the render pass and the implementations may save some memory bandwidth avoiding writing back data that is not needed anymore.

When an attachment has a depth format (and potentially also a stencil component) load and store ops refer only to the depth component. If a stencil is present, stencil values are treated the way stencil load and store ops describe. For color attachments, stencil ops are not relevant.

Layout, as I described in the swap chain tutorial, is an internal memory arrangement of an image. Image data may be organized in such a way that neighboring “image pixels” are also neighbors in memory, which can increase cache hits (faster memory reading) when image is used as a source of data (that is, during texture sampling). But caching is not necessary when the image is used as a target for drawing operations, and the memory for that image may be organized in a totally different way. Image may have linear layout (which gives the CPU ability to read or populate image’s memory contents) or optimal layout (which is optimized for performance but is also hardware/vendor dependent). So some hardware may have special memory organization for some types of operations; other hardware may be operations-agnostic. Some of the memory layouts may be better suited for some intended image “usages.” Or from the other side, some usages may require specific memory layouts. There is also a general layout that is compatible with all types of operations. But from the performance point of view, it is always best to set the layout appropriate for an intended image usage and it is application’s responsibility to inform the driver about transitions.

Image layouts may be changed using image memory barriers. We did this in the swap chain tutorial when we first changed the layout from the presentation source (image was used by the presentation engine) to transfer destination (we wanted to clear the image with a given color). But layouts, apart from image memory barriers, may also be changed automatically by the hardware inside a render pass. If we specify a different initial layout, subpass layouts (described later), and final layout, the hardware does the transition automatically at the appropriate time.

Initial layout informs the hardware about the layout the application “provides” (or “leaves”) the given attachment with. This is the layout the image starts with at the beginning of a render pass (in our example we acquire the image from the presentation engine so the image has a “presentation source” layout set). Each subpass of a render pass may use a different layout, and the transition will be done automatically by the hardware between subpasses. The final layout is the layout the given attachment will be transitioned into (automatically) at the end of a render pass (after a render pass is finished).

This information must be prepared for each attachment that will be used in a render pass. When graphics hardware receives this information a priori, it may optimize operations and memory during the render pass to achieve the best possible performance.

Subpass Description

VkAttachmentReference color_attachment_references[] = {
  {
    0,                                          // uint32_t                       attachment
    VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL    // VkImageLayout                  layout
  }
};

VkSubpassDescription subpass_descriptions[] = {
  {
    0,                                          // VkSubpassDescriptionFlags      flags
    VK_PIPELINE_BIND_POINT_GRAPHICS,            // VkPipelineBindPoint            pipelineBindPoint
    0,                                          // uint32_t                       inputAttachmentCount
    nullptr,                                    // const VkAttachmentReference   *pInputAttachments
    1,                                          // uint32_t                       colorAttachmentCount
    color_attachment_references,                // const VkAttachmentReference   *pColorAttachments
    nullptr,                                    // const VkAttachmentReference   *pResolveAttachments
    nullptr,                                    // const VkAttachmentReference   *pDepthStencilAttachment
    0,                                          // uint32_t                       preserveAttachmentCount
    nullptr                                     // const uint32_t*                pPreserveAttachments
  }
};

2.Tutorial03.cpp, function CreateRenderPass()

Next we specify the description of each subpass our render pass will include. This is done using VkSubpassDescription structure, which contains the following fields:

  • flags – Parameter reserved for future use.
  • pipelineBindPoint – Type of pipeline in which this subpass will be used (graphics or compute). Our example, of course, uses a graphics pipeline.
  • inputAttachmentCount – Number of elements in the pInputAttachments array.
  • pInputAttachments – Array with elements describing which attachments are used as an input and can be read from inside shaders. We are not using any input attachments here so we set this value to 0.
  • colorAttachmentCount – Number of elements in pColorAttachments and pResolveAttachments arrays.
  • pColorAttachments – Array describing (pointing to) attachments which will be used as color render targets (that image will be rendered into).
  • pResolveAttachments – Array closely connected with color attachments. Each element from this array corresponds to an element from a color attachments array; any such color attachment will be resolved to a given resolve attachment (if a resolve attachment at the same index is not null or if the whole pointer is not null). This is optional and can be set to null.
  • pDepthStencilAttachment – Description of an attachment that will be used for depth (and/or stencil) data. We don’t use depth information here so we can set it to null.
  • preserveAttachmentCount – Number of elements in pPreserveAttachments array.
  • pPreserveAttachments – Array describing attachments that should be preserved. When we have multiple subpasses not all of them will use all attachments. If a subpass doesn’t use some of the attachments but we need their contents in the later subpasses, we must specify these attachments here.

The pInputAttachments, pColorAttachments, pResolveAttachments, pPreserveAttachments, and pDepthStencilAttachment parameters are all of type VkAttachmentReference. This structure contains only these two fields:

  • attachment – Index into an attachment_descriptions array of VkRenderPassCreateInfo.
  • layout – Requested (required) layout the attachment will use during a given subpass. The hardware will perform an automatic transition into a provided layout just before a given subpass.

This structure contains references (indices) into the attachment_descriptions array of VkRenderPassCreateInfo. When we create a render pass we must provide a description of all attachments used during a render pass. We’ve prepared this description earlier in “Render pass attachment description” when we created the attachment_descriptions array. Right now it contains only one element, but in more advanced scenarios there will be multiple attachments. So this “general” collection of all render pass attachments is used as a reference point. In the subpass description, when we fill pColorAttachments or pDepthStencilAttachment members, we provide indices into this very “general” collection, like this: take the first attachment from all render pass attachments and use it as a color attachment. The second attachment from that array will be used for depth data.

There is a separation between a whole render pass and its subpasses because each subpass may use multiple attachments in a different way, that is, in one subpass we are rendering into one color attachment but in the next subpass we are reading from this attachment. In this way, we can prepare a list of all attachments used in the whole render pass, and at the same time we can specify how each attachment will be used in each subpass. And as each subpass may use a given attachment in its own way, we must also specify each image’s layout for each subpass.

So before we can specify a description of all subpasses (an array with elements of type VkSubpassDescription) we must create references for each attachment used in each subpass. And this is what the color_attachment_references variable was created for. When I write a tutorial for rendering into a texture, this usage will be more apparent.

Render Pass Creation

We now have all the data we need to create a render pass.

vkRenderPassCreateInfo render_pass_create_info = {
  VK_STRUCTURE_TYPE_RENDER_PASS_CREATE_INFO,    // VkStructureType                sType
  nullptr,                                      // const void                    *pNext
  0,                                            // VkRenderPassCreateFlags        flags
  1,                                            // uint32_t                       attachmentCount
  attachment_descriptions,                      // const VkAttachmentDescription *pAttachments
  1,                                            // uint32_t                       subpassCount
  subpass_descriptions,                         // const VkSubpassDescription    *pSubpasses
  0,                                            // uint32_t                       dependencyCount
  nullptr                                       // const VkSubpassDependency     *pDependencies
};

if( vkCreateRenderPass( GetDevice(), &render_pass_create_info, nullptr, &Vulkan.RenderPass ) != VK_SUCCESS ) {
  printf( "Could not create render pass!\n" );
  return false;
}

return true;

3.Tutorial03.cpp, function CreateRenderPass()

We start by filling the VkRenderPassCreateInfo structure, which contains the following fields:

  • sType – Type of structure (VK_STRUCTURE_TYPE_RENDER_PASS_CREATE_INFO here).
  • pNext – Parameter not currently used.
  • flags – Parameter reserved for future use.
  • attachmentCount – Number of all different attachments (elements in pAttachments array) used during whole render pass (here just one).
  • pAttachments – Array specifying all attachments used in a render pass.
  • subpassCount – Number of subpasses a render pass consists of (and number of elements in pSubpasses array – just one in our simple example).
  • pSubpasses – Array with descriptions of all subpasses.
  • dependencyCount – Number of elements in pDependencies array (zero here).
  • pDependencies – Array describing dependencies between pairs of subpasses. We don’t have many subpasses so we don’t have dependencies here (set it to null here).

Dependencies describe what parts of the graphics pipeline use memory resource in what way. Each subpass may use resources in a different way. Layouts of each resource may not solely define how they use resources. Some subpasses may render into images or store data through shader images. Other may not use images at all or may read from them at different pipeline stages (that is, vertex or fragment).

This information helps the driver optimize automatic layout transitions and, more generally, optimize barriers between subpasses. When we are writing into images only in a vertex shader there is no point waiting until the fragment shader executes (of course in terms of used images). After all the vertex operations are done, images may immediately change their layouts and memory access type, and even some parts of graphics hardware may start executing the next operations (that are referencing or reading the given images) without the need to wait for the rest of the commands from the given subpass to finish. For now, just remember that dependencies are important from a performance point of view.

So now that we have prepared all the information required to create a render pass, we can safely call the vkCreateRenderPass() function.

Creating a Framebuffer

We have created a render pass. It describes all attachments and all subpasses used during the render pass. But this description is quite abstract. We have specified formats of all attachments (just one image in this example) and described how attachments will be used by each subpass (also just one here). But we didn’t specify WHAT attachments we will be using or, in other words, what images will be used as these attachments. This is done through a framebuffer.

A framebuffer describes specific images that the render pass operates on. In OpenGL*, a framebuffer is a set of textures (attachments) we are rendering into. In Vulkan, this term is much broader. It describes all the textures (attachments) used during the render pass, not only the images we are rendering into (color and depth/stencil attachments) but also images used as a source of data (input attachments).

This separation of render pass and framebuffer gives us some additional flexibility. We can use the given render pass with different framebuffers and a given framebuffer with different render passes, if they are compatible, meaning that they operate in a similar fashion on images of similar types and usages.

Before we can create a framebuffer, we must create image views for each image used as a framebuffer and render pass attachment. In Vulkan, not only in the case of framebuffers, but in general, we don’t operate on images themselves. Images are not accessed directly. For this purpose, image views are used. Image views represent images, they “wrap” images and provide additional (meta)data for them.

Creating Image Views

In this simple application, we want to render directly into swap chain images. We have created a swap chain with multiple images, so we must create an image view for each of them.

const std::vector<VkImage> &swap_chain_images = GetSwapChain().Images;
Vulkan.FramebufferObjects.resize( swap_chain_images.size() );

for( size_t i = 0; i < swap_chain_images.size(); ++i ) {
  VkImageViewCreateInfo image_view_create_info = {
    VK_STRUCTURE_TYPE_IMAGE_VIEW_CREATE_INFO,   // VkStructureType                sType
    nullptr,                                    // const void                    *pNext
    0,                                          // VkImageViewCreateFlags         flags
    swap_chain_images[i],                       // VkImage                        image
    VK_IMAGE_VIEW_TYPE_2D,                      // VkImageViewType                viewType
    GetSwapChain().Format,                      // VkFormat                       format
    {                                           // VkComponentMapping             components
      VK_COMPONENT_SWIZZLE_IDENTITY,              // VkComponentSwizzle             r
      VK_COMPONENT_SWIZZLE_IDENTITY,              // VkComponentSwizzle             g
      VK_COMPONENT_SWIZZLE_IDENTITY,              // VkComponentSwizzle             b
      VK_COMPONENT_SWIZZLE_IDENTITY               // VkComponentSwizzle             a
    },
    {                                           // VkImageSubresourceRange        subresourceRange
      VK_IMAGE_ASPECT_COLOR_BIT,                  // VkImageAspectFlags             aspectMask
      0,                                          // uint32_t                       baseMipLevel
      1,                                          // uint32_t                       levelCount
      0,                                          // uint32_t                       baseArrayLayer
      1                                           // uint32_t                       layerCount
    }
  };

  if( vkCreateImageView( GetDevice(), &image_view_create_info, nullptr, &Vulkan.FramebufferObjects[i].ImageView ) != VK_SUCCESS ) {
    printf( "Could not create image view for framebuffer!\n" );
    return false;
  }

4.Tutorial03.cpp, function CreateFramebuffers()

To create an image view, we must first create a variable of type VkImageViewCreateInfo. It contains the following fields:

  • sType – Structure type, in this case it should be set to VK_STRUCTURE_TYPE_IMAGE_VIEW_CREATE_INFO.
  • pNext – Parameter typically set to null.
  • flags – Parameter reserved for future use.
  • image – Handle to an image for which view will be created.
  • viewType – Type of view we want to create. View type must be compatible with an image it is created for.  (that is, we can create a 2D view for an image that has multiple array layers or we can create a CUBE view for a 2D image with six layers).
  • format – Format of  an image view; it must be compatible with the image’s format but may not be the same format (that is, it may be a different format but with the same number of bits per pixel).
  • components – Mapping of an image components into a vector returned in the shader by texturing operations. This applies only to read operations (sampling), but since we are using an image as a color attachment (we are rendering into an image) we must set the so-called identity mapping (R component into R, G -> G, and so on) or just use “identity” value (VK_COMPONENT_SWIZZLE_IDENTITY).
  • subresourceRange – Describes the set of mipmap levels and array layers that will be accessible to a view. If our image is mipmapped, we may specify the specific mipmap level we want to render to (and in case of render targets we must specify exactly one mipmap level of one array layer).

As you can see here, we acquire handles to all swap chain images, and we are referencing them inside a loop. This way we fill the structure required for image view creation, which we pass to a vkCreateImageView() function. We do this for each image that was created along with a swap chain.

Specifying Framebuffer Parameters

Now we can create a framebuffer. To do this we call the vkCreateFramebuffer() function.

VkFramebufferCreateInfo framebuffer_create_info = {
    VK_STRUCTURE_TYPE_FRAMEBUFFER_CREATE_INFO,  // VkStructureType                sType
    nullptr,                                    // const void                    *pNext
    0,                                          // VkFramebufferCreateFlags       flags
    Vulkan.RenderPass,                          // VkRenderPass                   renderPass
    1,                                          // uint32_t                       attachmentCount&Vulkan.FramebufferObjects[i].ImageView,    // const VkImageView             *pAttachments
    300,                                        // uint32_t                       width
    300,                                        // uint32_t                       height
    1                                           // uint32_t                       layers
  };

  if( vkCreateFramebuffer( GetDevice(), &framebuffer_create_info, nullptr, &Vulkan.FramebufferObjects[i].Handle ) != VK_SUCCESS ) {
    printf( "Could not create a framebuffer!\n" );
    return false;
  }
}
return true;

5.Tutorial03.cpp, function CreateFramebuffers()

vkCreateFramebuffer() function requires us to provide a pointer to a variable of type VkFramebufferCreateInfo so we must first prepare it. It contains the following fields:

  • sType – Structure type set to VK_STRUCTURE_TYPE_FRAMEBUFFER_CREATE_INFO in this situation.
  • pNext – Parameter most of the time set to null.
  • flags – Parameter reserved for future use.
  • renderPass – Render pass this framebuffer will be compatible with.
  • attachmentCount – Number of attachments in a framebuffer (elements in pAttachments array).
  • pAttachments – Array of image views representing all attachments used in a framebuffer and render pass. Each element in this array (each image view) corresponds to each attachment in a render pass.
  • width – Width of a framebuffer.
  • height – Height of a framebuffer.
  • layers – Number of layers in a framebuffer (OpenGL’s layered rendering with geometry shaders, which could specify the layer into which fragments rasterized from a given polygon will be rendered).

The framebuffer specifies what images are used as attachments on which the render pass operates. We can say that it translates image (image view) into a given attachment. The number of images specified for a framebuffer must be the same as the number of attachments in a render pass for which we are creating a framebuffer. Also, each pAttachments array’s element corresponds directly to an attachment in a render pass description structure. Render pass and framebuffer are closely connected, and that’s why we also must specify a render pass during framebuffer creation. But we may use a framebuffer not only with the specified render pass but also with all render passes that are compatible with the one specified. Compatible render passes, in general, must have the same number of attachments and corresponding attachments must have the same format and number of samples. But image layouts (initial, final, and for each subpass) may differ and doesn’t involve render pass compatibility.

After we have finished creating and filling the VkFramebufferCreateInfo structure, we call the vkCreateFramebuffer() function.

The above code executes in a loop. A framebuffer references image views. Here the image view is created for each swap chain image. So for each swap chain image and its view, we are creating a framebuffer. We are doing this in order to simplify the code called in a rendering loop. In a normal, real-life scenario we wouldn’t (probably) create a framebuffer for each swap chain image. I assume that a better solution would be to render into a single image (texture) and after that use command buffers that would copy rendering results from that image into a given swap chain image. This way we will have only three simple command buffers that are connected with a swap chain. All other rendering commands would be independent of a swap chain, making it easier to maintain.

Creating a Graphics Pipeline

Now we are ready to create a graphics pipeline. A pipeline is a collection of stages that process data one stage after another. In Vulkan there is currently a compute pipeline and a graphics pipeline. The compute pipeline allows us to perform some computational work, such as performing physics calculations for objects in games. The graphics pipeline is used for drawing operations.

In OpenGL there are multiple programmable stages (vertex, tessellation, fragment shaders, and so on) and some fixed function stages (rasterizer, depth test, blending, and so on). In Vulkan, the situation is similar. There are similar (if not identical) stages. But the whole pipeline’s state is gathered in one monolithic object. OpenGL allows us to change the state that influences rendering operations anytime we want, we can change parameters for each stage (mostly) independently. We can set up shader programs, depths test, blending, and whatever state we want, and then we can render some objects. Next we can change just some small part of the state and render another object. In Vulkan, such operations can’t be done (we say that pipelines are “immutable”). We must prepare the whole state and set up parameters for pipeline stages and group them in a pipeline object. At the beginning this was one of the most startling pieces information for me. I’m not able to change shader program anytime I want? Why?

The easiest and more valid explanation is because of the performance implications of such state changes. Changing just one single state of the whole pipeline may cause graphics hardware to perform many background operations like state and error checking. Different hardware vendors may implement (and usually are implementing) such functionality differently. This may cause applications to perform differently (meaning unpredictably, performance-wise) when executed on different graphics hardware. So the ability to change anything at any time is convenient for developers. But, unfortunately, it is not so convenient for the hardware.

That’s why in Vulkan the state of the whole pipeline is to gather in one, single object. All the relevant state and error checking is performed when the pipeline object is created. When there are problems (like different parts of pipeline are set up in an incompatible way) pipeline object creation fails. But we know that upfront. The driver doesn’t have to worry for us and do whatever it can to properly use such a broken pipeline. It can immediately tell us about the problem. But during real usage, in performance-critical parts of the application, everything is already set up correctly and can be used as is.

The downside of this methodology is that we have to create multiple pipeline objects, multiple variations of pipeline objects when we are drawing many objects in a different way (some opaque, some semi-transparent, some with depth test enabled, others without). Unfortunately, even different shaders make us create different pipeline objects. If we want to draw objects using different shaders, we also have to create multiple pipeline objects, one for each combination of shader programs. Shaders are also connected with the whole pipeline state. They use different resources (like textures and buffers), render into different color attachments, and read from different attachments (possibly that were rendered into before). These connections must also be initialized, prepared, and set up correctly. We know what we want to do, the driver does not. So it is better and far more logical that we do it, not the driver. In general this approach makes sense.

To begin the pipeline creation process, let’s start with shaders.

Creating a Shader Module

Creating a graphics pipeline requires us to prepare lots of data in the form of structures or even arrays of structures. The first such data is a collection of all shader stages and shader programs that will be used during rendering with a given graphics pipeline bound.

In OpenGL, we write shaders in GLSL. They are compiled and then linked into shader programs directly in our application. We can use or stop using a shader program anytime we want in our application.

Vulkan on the other hand accepts only a binary representation of shaders, an intermediate language called SPIR-V. We can’t provide GLSL code like we did in OpenGL. But there is an official, separate compiler that can transform shaders written in GLSL into a binary SPIR-V language. To use it, we have to do it offline. After we prepare the SPIR-V assembly we can create a shader module from it. Such modules are then composed into an array of VkPipelineShaderStageCreateInfo structures, which are used, among other parameters, to create graphics pipeline.

Here’s the code that creates a shader module from a specified file that contains a binary SPIR-V.

const std::vector<char> code = Tools::GetBinaryFileContents( filename );
if( code.size() == 0 ) {
  return Tools::AutoDeleter<VkShaderModule, PFN_vkDestroyShaderModule>();
}

VkShaderModuleCreateInfo shader_module_create_info = {
  VK_STRUCTURE_TYPE_SHADER_MODULE_CREATE_INFO,    // VkStructureType                sType
  nullptr,                                        // const void                    *pNext
  0,                                              // VkShaderModuleCreateFlags      flags
  code.size(),                                    // size_t                         codeSize
  reinterpret_cast<const uint32_t*>(&code[0])     // const uint32_t                *pCode
};

VkShaderModule shader_module;
if( vkCreateShaderModule( GetDevice(), &shader_module_create_info, nullptr, &shader_module ) != VK_SUCCESS ) {
  printf( "Could not create shader module from a %s file!\n", filename );
  return Tools::AutoDeleter<VkShaderModule, PFN_vkDestroyShaderModule>();
}

return Tools::AutoDeleter<VkShaderModule, PFN_vkDestroyShaderModule>( shader_module, vkDestroyShaderModule, GetDevice() );

6.Tutorial03.cpp, function CreateShaderModule()

First we prepare a VkShaderModuleCreateInfo structure that contains the following fields:

  • sType – Type of structure, in this example set to VK_STRUCTURE_TYPE_SHADER_MODULE_CREATE_INFO.
  • pNext – Pointer not yet used.
  • flags – Parameter reserved for future use.
  • codeSize – Size in bytes of the code passed in pCode parameter.
  • pCode – Pointer to an array with source code (binary SPIR-V assembly).

To acquire the contents of the file, I have prepared a simple utility function GetBinaryFileContents() that reads the entire contents of a specified file. It returns the content in a vector of chars.

After we prepare a structure, we can call the vkCreateShaderModule() function and check whether everything went fine.

The AutoDeleter<> class from Tools namespace is a helper class that wraps a given Vulkan object handle and takes a function that is called to delete that object. This class is similar to smart pointers, which delete the allocated memory when the object (the smart pointer) goes out of scope. AutoDeleter<> class takes the handle of a given object and deletes it with a provided function when the object of this class’s type goes out of scope.

template<class T, class F>
class AutoDeleter {
public:
  AutoDeleter() :
    Object( VK_NULL_HANDLE ),
    Deleter( nullptr ),
    Device( VK_NULL_HANDLE ) {
  }

  AutoDeleter( T object, F deleter, VkDevice device ) :
    Object( object ),
    Deleter( deleter ),
    Device( device ) {
  }

  AutoDeleter( AutoDeleter&& other ) {
    *this = std::move( other );
  }

  ~AutoDeleter() {
    if( (Object != VK_NULL_HANDLE) && (Deleter != nullptr) && (Device != VK_NULL_HANDLE) ) {
      Deleter( Device, Object, nullptr );
    }
  }

  AutoDeleter& operator=( AutoDeleter&& other ) {
    if( this != &other ) {
      Object = other.Object;
      Deleter = other.Deleter;
      Device = other.Device;
      other.Object = VK_NULL_HANDLE;
    }
    return *this;
  }

  T Get() {
    return Object;
  }

  bool operator !() const {
    return Object == VK_NULL_HANDLE;
  }

private:
  AutoDeleter( const AutoDeleter& );
  AutoDeleter& operator=( const AutoDeleter& );
  T         Object;
  F         Deleter;
  VkDevice  Device;
};

7.Tools.h

Why so much effort for one simple object? Shader modules are one of the objects required to create the graphics pipeline. But after the pipeline is created, we don’t need these shader modules anymore. Sometimes it is convenient to keep them as we may need to create additional, similar pipelines. But in this example they may be safely destroyed after we create a graphics pipeline. Shader modules are destroyed by calling the vkDestroyShaderModule() function. But in the example, we would need to call this function in many places: inside multiple “ifs” and at the end of the whole function. Because I don’t want to remember where I need to call this function and, at the same time, I don’t want any memory leaks to occur, I have prepared this simple class just for convenience. Now, I don’t have to remember to delete the created shader module because it will be deleted automatically.

Preparing a Description of the Shader Stages

Now that we know how to create and destroy shader modules, we can create data for shader stages compositing our graphics pipeline. As I have written, the data that describes what shader stages should be active when a given graphics pipeline is bound has a form of an array with elements of type VkPipelineShaderStageCreateInfo. Here is the code that creates shader modules and prepares such an array:

Tools::AutoDeleter<VkShaderModule, PFN_vkDestroyShaderModule> vertex_shader_module = CreateShaderModule( "Data03/vert.spv" );
Tools::AutoDeleter<VkShaderModule, PFN_vkDestroyShaderModule> fragment_shader_module = CreateShaderModule( "Data03/frag.spv" );

if( !vertex_shader_module || !fragment_shader_module ) {
  return false;
}

std::vector<VkPipelineShaderStageCreateInfo> shader_stage_create_infos = {
  // Vertex shader
  {
    VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_CREATE_INFO,        // VkStructureType                                sType
    nullptr,                                                    // const void                                    *pNext
    0,                                                          // VkPipelineShaderStageCreateFlags               flags
    VK_SHADER_STAGE_VERTEX_BIT,                                 // VkShaderStageFlagBits                          stage
    vertex_shader_module.Get(),                                 // VkShaderModule                                 module"main",                                                     // const char                                    *pName
    nullptr                                                     // const VkSpecializationInfo                    *pSpecializationInfo
  },
  // Fragment shader
  {
    VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_CREATE_INFO,        // VkStructureType                                sType
    nullptr,                                                    // const void                                    *pNext
    0,                                                          // VkPipelineShaderStageCreateFlags               flags
    VK_SHADER_STAGE_FRAGMENT_BIT,                               // VkShaderStageFlagBits                          stage
    fragment_shader_module.Get(),                               // VkShaderModule                                 module"main",                                                     // const char                                    *pName
    nullptr                                                     // const VkSpecializationInfo                    *pSpecializationInfo
  }
};

8.Tutorial03.cpp, function CreatePipeline()

At the beginning we are creating two shader modules for vertex and fragment stages. They are created with the function presented earlier. When any error occurs and we return from the CreatePipeline() function, any created module is deleted automatically by a wrapper class with a provided deleter function.

The code for the shader modules is read from files that contain the binary SPIR-V assembly. These files are generated with an application called “glslangValidator”. This is a tool distributed officially with the Vulkan SDK and is designed to validate GLSL shaders. But “glslangValidator” also has the capability to compile or rather transform GLSL shaders into SPIR-V binary files. A full explanation of the command line for its usage can be found at the official SDK site. I’ve used the following commands to generate SPIR-V shaders for this tutorial:

glslangValidator.exe -V -H shader.vert > vert.spv.txt

glslangValidator.exe -V -H shader.frag > frag.spv.txt

“glslangValidator” takes a specified file and generates SPIR-V file from it. The type of shader stage is automatically detected by the input file’s extension (“.vert” for vertex shaders, “.geom” for geometry shaders, and so on). The name of the generated file can be specified, but by default it takes a form “<stage>.spv”. So in our example “vert.spv” and “frag.spv” files will be generated.

SPIR-V files have a binary format so it may be hard to read and analyze them—but not impossible. When the “-H” option is used, “glslangValidator” outputs SPIR-V in a form that can be more easily read. This form is printed on standard output and that’s why I’m using the “> *.spv.txt” redirection operator.

Here are the contents of a “shader.vert” file from which SPIR-V assembly was generated for the vertex stage:

#version 400

void main() {
    vec2 pos[3] = vec2[3]( vec2(-0.7, 0.7), vec2(0.7, 0.7), vec2(0.0, -0.7) );
    gl_Position = vec4( pos[gl_VertexIndex], 0.0, 1.0 );
}

9.shader.vert

As you can see I have hardcoded the positions of all vertices used to render the triangle. They are indexed using the Vulkan-specific “gl_VertexIndex” built-in variable. In the simplest scenario, when using non-indexed drawing commands (which takes place here) this value starts from the value of the “firstVertex” parameter of a drawing command (zero in the provided example).

This is the disputable part I wrote about earlier—this approach is acceptable and valid but not quite convenient to maintain and also allows us to skip some of the “structure filling” needed to create the graphics pipeline. I’ve chosen it in order to shorten and simplify this tutorial as much as possible. In the next tutorial, I will present a more typical way of drawing any number of vertices, similar to using vertex arrays and indices in OpenGL.

Below is the source code of a fragment shader from the “shader.frag” file that was used to generate the SPIRV-V assembly for the fragment stage:

#version 400

layout(location = 0) out vec4 out_Color;

void main() {
  out_Color = vec4( 0.0, 0.4, 1.0, 1.0 );
}

10.shader.frag

In Vulkan’s shaders (when transforming from GLSL to SPIR-V) layout qualifiers are required. Here we specify to what output (color) attachment we want to store the color values generated by the fragment shader. Because we are using only one attachment, we must specify the first available location (zero).

Now that you know how to prepare shaders for applications using Vulkan, we can move on to the next step. After we have created two shader modules, we check whether these operations succeeded. If they did we can start preparing a description of all shader stages that will constitute our graphics pipeline.

For each enabled shader stage we need to prepare an instance of VkPipelineShaderStageCreateInfo structure. Arrays of these structures along with the number of its elements are together used in a graphics pipeline create info structure (provided to the function that creates the graphics pipeline). VkPipelineShaderStageCreateInfo structure has the following fields:

  • sType – Type of structure that we are preparing, which in this case must be equal to VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_CREATE_INFO.
  • pNext – Pointer reserved for extensions.
  • flags – Parameter reserved for future use.
  • stage – Type of shader stage we are describing (like vertex, tessellation control, and so on).
  • module – Handle to a shader module that contains the shader for a given stage.
  • pName – Name of the entry point of the provided shader.
  • pSpecializationInfo – Pointer to a VkSpecializationInfo structure, which we will leave for now and set to null.

When we are creating a graphics pipeline we don’t create too many (Vulkan) objects. Most of the data is presented in a form of just such structures.

Preparing Description of a Vertex Input

Now we must provide a description of the input data used for drawing. This is similar to OpenGL’s vertex data: attributes, number of components, buffers from which to take data, data’s stride, or step rate. In Vulkan this data is of course prepared in a different way, but in general the meaning is the same. Fortunately, because of the fact that vertex data is hardcoded into a vertex shader in this tutorial, we can almost entirely skip this step and fill the VkPipelineVertexInputStateCreateInfo with almost nulls and zeros:

VkPipelineVertexInputStateCreateInfo vertex_input_state_create_info = {
  VK_STRUCTURE_TYPE_PIPELINE_VERTEX_INPUT_STATE_CREATE_INFO,    // VkStructureType                                sType
  nullptr,                                                      // const void                                    *pNext
  0,                                                            // VkPipelineVertexInputStateCreateFlags          flags;
  0,                                                            // uint32_t                                       vertexBindingDescriptionCount
  nullptr,                                                      // const VkVertexInputBindingDescription         *pVertexBindingDescriptions
  0,                                                            // uint32_t                                       vertexAttributeDescriptionCount
  nullptr                                                       // const VkVertexInputAttributeDescription       *pVertexAttributeDescriptions
};

11. Tutorial03.cpp, function CreatePipeline()

But for clarity here is a description of the members of the VkPipelineVertexInputStateCreateInfo structure:

  • sType – Type of structure, VK_STRUCTURE_TYPE_PIPELINE_VERTEX_INPUT_STATE_CREATE_INFO here.
  • pNext – Pointer to an extension-specific structure.
  • flags – Parameter reserved for future use.
  • vertexBindingDescriptionCount – Number of elements in the pVertexBindingDescriptions array.
  • pVertexBindingDescriptions – Array with elements describing input vertex data (stride and stepping rate).
  • vertexAttributeDescriptionCount – Number of elements in the pVertexAttributeDescriptions array.
  • pVertexAttributeDescriptions – Array with elements describing vertex attributes (location, format, offset).

Preparing the Description of an Input Assembly

The next step requires us to describe how vertices should be assembled into primitives. As with OpenGL, we must specify what topology we want to use: points, lines, triangles, triangle fan, and so on.

VkPipelineInputAssemblyStateCreateInfo input_assembly_state_create_info = {
  VK_STRUCTURE_TYPE_PIPELINE_INPUT_ASSEMBLY_STATE_CREATE_INFO,  // VkStructureType                                sType
  nullptr,                                                      // const void                                    *pNext
  0,                                                            // VkPipelineInputAssemblyStateCreateFlags        flags
  VK_PRIMITIVE_TOPOLOGY_TRIANGLE_LIST,                          // VkPrimitiveTopology                            topology
  VK_FALSE                                                      // VkBool32                                       primitiveRestartEnable
};

12.Tutorial03.cpp, function CreatePipeline()

We do that through the VkPipelineInputAssemblyStateCreateInfo structure, which contains the following members:

  • sType – Structure type set here to VK_STRUCTURE_TYPE_PIPELINE_INPUT_ASSEMBLY_STATE_CREATE_INFO.
  • pNext – Pointer not yet used.
  • flags – Parameter reserved for future use.
  • topology – Parameter describing how vertices will be organized to form a primitive.
  • primitiveRestartEnable – Parameter that tells whether a special index value (when indexed drawing is performed) restarts assembly of a given primitive.

Preparing the Viewport’s Description

We have finished dealing with input data. Now we must specify the form of output data, all the part of the graphics pipeline that are connected with fragments, like rasterization, window (viewport), depth tests, and so on. The first set of data we must prepare here is the state of the viewport, which specifies to what part of the image (or texture, or window) we want do draw.

VkViewport viewport = {
  0.0f,                                                         // float                                          x
  0.0f,                                                         // float                                          y
  300.0f,                                                       // float                                          width
  300.0f,                                                       // float                                          height
  0.0f,                                                         // float                                          minDepth
  1.0f                                                          // float                                          maxDepth
};

VkRect2D scissor = {
  {                                                             // VkOffset2D                                     offset
    0,                                                            // int32_t                                        x
    0                                                             // int32_t                                        y
  },
  {                                                             // VkExtent2D                                     extent
    300,                                                          // int32_t                                        width
    300                                                           // int32_t                                        height
  }
};

VkPipelineViewportStateCreateInfo viewport_state_create_info = {
  VK_STRUCTURE_TYPE_PIPELINE_VIEWPORT_STATE_CREATE_INFO,        // VkStructureType                                sType
  nullptr,                                                      // const void                                    *pNext
  0,                                                            // VkPipelineViewportStateCreateFlags             flags
  1,                                                            // uint32_t                                       viewportCount
  &viewport,                                                    // const VkViewport                              *pViewports
  1,                                                            // uint32_t                                       scissorCount&scissor                                                      // const VkRect2D                                *pScissors
};

13.Tutorial03.cpp, function CreatePipeline()

In this example, the usage is simple: we just set the viewport coordinates to some predefined values. I don’t check the size of the swap chain image we are rendering into. But remember that in real-life production applications this has to be done because the specification states that dimensions of the viewport cannot exceed the dimensions of the attachments that we are rendering into.

To specify the viewport’s parameters, we fill the VkViewport structure that contains these fields:

  • x – Left side of the viewport.
  • y – Upper side of the viewport.
  • width – Width of the viewport.
  • height – Height of the viewport.
  • minDepth – Minimal depth value used for depth calculations.
  • maxDepth – Maximal depth value used for depth calculations.

When specifying viewport coordinates, remember that the origin is different than in OpenGL. Here we specify the upper-left corner of the viewport (not the lower left).

Also worth noting is that the minDepth and maxDepth values must be between 0.0 and 1.0 (inclusive) but maxDepth can be lower than minDepth. This will cause the depth to be calculated in “reverse.”

Next we must specify the parameters for the scissor test. The scissor test, similarly to OpenGL, restricts generation of fragments only to the specified rectangular area. But in Vulkan, the scissor test is always enabled and can’t be turned off. We can just provide the values identical to the ones provided for viewport. Try changing these values and see how it influences the generated image.

The scissor test doesn’t have a dedicated structure. To provide data for it we fill the VkRect2D structure which contains two similar structure members. First is VkOffset2D with the following members:

  • x – Left side of the rectangular area used for scissor test
  • y – Upper side of the scissor area

The second member is of type VkExtent2D, which contains the following fields:

  • width – Width of the scissor rectangular area
  • height – Height of the scissor area

In general, the meaning of the data we provide for the scissor test through the VkRect2D structure is similar to the data prepared for viewport.

After we have finished preparing data for viewport and the scissor test, we can finally fill the structure that is used during pipeline creation. The structure is called VkPipelineViewportStateCreateInfo, and it contains the following fields:

  • sType – Type of the structure, VK_STRUCTURE_TYPE_PIPELINE_VIEWPORT_STATE_CREATE_INFO here.
  • pNext – Pointer reserved for extensions.
  • flags – Parameter reserved for future use.
  • viewportCount – Number of elements in the pViewports array.
  • pViewports – Array with elements describing parameters of viewports used when the given pipeline is bound.
  • scissorCount – Number of elements in the pScissors array.
  • pScissors – Array with elements describing parameters of the scissor test for each viewport.

Remember that the viewportCount and scissorCount parameters must be equal. We are also allowed to specify more viewports, but then the multiViewport feature must be also enabled.

Preparing the Rasterization State’s Description

The next part of the graphics pipeline creation applies to the rasterization state. We must specify how polygons are going to be rasterized (changed into fragments), which means whether we want fragments to be generated for whole polygons or just their edges (polygon mode) or whether we want to see the front or back side or maybe both sides of the polygon (face culling). We can also provide depth bias parameters or indicate whether we want to enable depth clamp. This whole state is encapsulated into VkPipelineRasterizationStateCreateInfo. It contains the following members:

  • sType – Structure type, VK_STRUCTURE_TYPE_PIPELINE_RASTERIZATION_STATE_CREATE_INFO in this example.
  • pNext – Pointer reserved for extensions.
  • flags – Parameter reserved for future use.
  • depthClampEnable – Parameter describing whether we want to clamp depth values of the rasterized primitive to the frustum (when true) or if we want normal clipping to occur (false).
  • rasterizerDiscardEnable – Deactivates fragment generation (discards primitive before rasterization turning off fragment shader).
  • polygonMode – Controls how the fragments are generated for a given primitive (triangle mode): whether they are generated for the whole triangle, only its edges, or just its vertices.
  • cullMode – Chooses the triangle’s face used for culling (if enabled).
  • frontFace – Chooses which side of a triangle should be considered the front (depending on the winding order).
  • depthBiasEnable – Enabled or disables biasing of fragments’ depth values.
  • depthBiasConstantFactor – Constant factor added to each fragment’s depth value when biasing is enabled.
  • depthBiasClamp – Maximum (or minimum) value of bias that can be applied to fragment’s depth.
  • depthBiasSlopeFactor – Factor applied for fragment’s slope during depth calculations when biasing is enabled.
  • lineWidth – Width of rasterized lines.

Here is the source code responsible for setting rasterization state in our example:

VkPipelineRasterizationStateCreateInfo rasterization_state_create_info = {
  VK_STRUCTURE_TYPE_PIPELINE_RASTERIZATION_STATE_CREATE_INFO,   // VkStructureType                                sType
  nullptr,                                                      // const void                                    *pNext
  0,                                                            // VkPipelineRasterizationStateCreateFlags        flags
  VK_FALSE,                                                     // VkBool32                                       depthClampEnable
  VK_FALSE,                                                     // VkBool32                                       rasterizerDiscardEnable
  VK_POLYGON_MODE_FILL,                                         // VkPolygonMode                                  polygonMode
  VK_CULL_MODE_BACK_BIT,                                        // VkCullModeFlags                                cullMode
  VK_FRONT_FACE_COUNTER_CLOCKWISE,                              // VkFrontFace                                    frontFace
  VK_FALSE,                                                     // VkBool32                                       depthBiasEnable
  0.0f,                                                         // float                                          depthBiasConstantFactor
  0.0f,                                                         // float                                          depthBiasClamp
  0.0f,                                                         // float                                          depthBiasSlopeFactor
  1.0f                                                          // float                                          lineWidth
};

14.Tutorial03.cpp, function CreatePipeline()

In the tutorial we are disabling as many parameters as possible to simplify the process, the code itself, and the rendering operations. The parameters that matter here set up (typical) fill mode for polygon rasterization, back face culling, and similar to OpenGL’s counterclockwise front faces. Depth biasing and clamping are also disabled (to enable depth clamping, we first need to enable a dedicated feature during logical device creation; similarly we must do the same for polygon modes other than “fill”).

Setting the Multisampling State’s Description

In Vulkan, when we are creating a graphics pipeline, we must also specify the state relevant to multisampling. This is done using the VkPipelineMultisampleStateCreateInfo structure. Here are its members:

  • sType – Type of structure, VK_STRUCTURE_TYPE_PIPELINE_MULTISAMPLE_STATE_CREATE_INFO here.
  • pNext – Pointer reserved for extensions.
  • flags – Parameter reserved for future use.
  • rasterizationSamples – Number of per pixel samples used in rasterization.
  • sampleShadingEnable – Parameter specifying that shading should occur per sample (when enabled) instead of per fragment (when disabled).
  • minSampleShading – Specifies the minimum number of unique sample locations that should be used during the given fragment’s shading.
  • pSampleMask – Pointer to an array of static coverage sample masks; this can be null.
  • alphaToCoverageEnable – Controls whether the fragment’s alpha value should be used for coverage calculations.
  • alphaToOneEnable – Controls whether the fragment’s alpha value should be replaced with one.

In this example, I wanted to minimize possible problems so I’ve set parameters to values that generally disable multisampling—just one sample per given pixel with the other parameters turned off. Remember that if we want to enable sample shading or alpha to one, we also need to enable two respective features. Here is a source code that prepares the VkPipelineMultisampleStateCreateInfo structure:

VkPipelineMultisampleStateCreateInfo multisample_state_create_info = {
  VK_STRUCTURE_TYPE_PIPELINE_MULTISAMPLE_STATE_CREATE_INFO,     // VkStructureType                                sType
  nullptr,                                                      // const void                                    *pNext
  0,                                                            // VkPipelineMultisampleStateCreateFlags          flags
  VK_SAMPLE_COUNT_1_BIT,                                        // VkSampleCountFlagBits                          rasterizationSamples
  VK_FALSE,                                                     // VkBool32                                       sampleShadingEnable
  1.0f,                                                         // float                                          minSampleShading
  nullptr,                                                      // const VkSampleMask                            *pSampleMask
  VK_FALSE,                                                     // VkBool32                                       alphaToCoverageEnable
  VK_FALSE                                                      // VkBool32                                       alphaToOneEnable
};

15.Tutorial03.cpp, function CreatePipeline()

Setting the Blending State’s Description

Another thing we need to prepare when creating a graphics pipeline is a blending state (which also includes logical operations).

VkPipelineColorBlendAttachmentState color_blend_attachment_state = {
  VK_FALSE,                                                     // VkBool32                                       blendEnable
  VK_BLEND_FACTOR_ONE,                                          // VkBlendFactor                                  srcColorBlendFactor
  VK_BLEND_FACTOR_ZERO,                                         // VkBlendFactor                                  dstColorBlendFactor
  VK_BLEND_OP_ADD,                                              // VkBlendOp                                      colorBlendOp
  VK_BLEND_FACTOR_ONE,                                          // VkBlendFactor                                  srcAlphaBlendFactor
  VK_BLEND_FACTOR_ZERO,                                         // VkBlendFactor                                  dstAlphaBlendFactor
  VK_BLEND_OP_ADD,                                              // VkBlendOp                                      alphaBlendOp
  VK_COLOR_COMPONENT_R_BIT | VK_COLOR_COMPONENT_G_BIT |         // VkColorComponentFlags                          colorWriteMask
  VK_COLOR_COMPONENT_B_BIT | VK_COLOR_COMPONENT_A_BIT
};

VkPipelineColorBlendStateCreateInfo color_blend_state_create_info = {
  VK_STRUCTURE_TYPE_PIPELINE_COLOR_BLEND_STATE_CREATE_INFO,     // VkStructureType                                sType
  nullptr,                                                      // const void                                    *pNext
  0,                                                            // VkPipelineColorBlendStateCreateFlags           flags
  VK_FALSE,                                                     // VkBool32                                       logicOpEnable
  VK_LOGIC_OP_COPY,                                             // VkLogicOp                                      logicOp
  1,                                                            // uint32_t                                       attachmentCount
  &color_blend_attachment_state,                                // const VkPipelineColorBlendAttachmentState     *pAttachments
  { 0.0f, 0.0f, 0.0f, 0.0f }                                    // float                                          blendConstants[4]
};

16.Tutorial03.cpp, function CreatePipeline()

Final color operations are set up through the VkPipelineColorBlendStateCreateInfo structure. It contains the following fields:

  • sType – Type of the structure, set to VK_STRUCTURE_TYPE_PIPELINE_COLOR_BLEND_STATE_CREATE_INFO in this example.
  • pNext – Pointer reserved for future, extension-specific use.
  • flags – Parameter also reserved for future use.
  • logicOpEnable – Indicates whether we want to enable logical operations on pixels.
  • logicOp – Type of the logical operation we want to perform (like copy, clear, and so on)
  • attachmentCount – Number of elements in the pAttachments array.
  • pAttachments – Array containing state parameters for each color attachment used in a subpass for which the given graphics pipeline is bound.
  • blendConstants – Four-element array with color value used in blending operation (when a dedicated blend factor is used).

More information is needed for the attachmentCount and pAttachments parameters. When we want to perform drawing operations we set up parameters, the most important of which are graphics pipeline, render pass, and framebuffer. The graphics card needs to know how to draw (graphics pipeline which describes rendering state, shaders, test, and so on) and where to draw (the render pass gives general setup; the framebuffer specifies exactly what images are used). As I have already mentioned, the render pass specifies how operations are ordered, what the dependencies are, when we are rendering into a given attachment, and when we are reading from the same attachment. These stages take the form of subpasses. And for each drawing operation we can (but don’t have to) enable/use a different pipeline. But when we are drawing, we must remember that we are drawing into a set of attachments. This set is defined in a render pass, which describes all color, input, depth attachments (the framebuffer just specifies what images are used for each of them). For the blending state, we can specify whether we want to enable blending at all. This is done through the pAttachments array. Each of its elements must correspond to each color attachment defined in a render pass. So the value of attachmentCount elements in the pAttachments array must equal the number of color attachments defined in a render pass.

There is one more restriction. By default all elements in pAttachments array must contain the same values, must be specified in the same way, and must be identical. By default, blending (and color masks) is done in the same way for all attachments. So why it is an array? Why can’t we just specify one value? Because there is a feature that allows us to perform independent, distinct blending for each active color attachment. When we enable the independent blending feature during device creation we can provide different values for each color attachment.

Each pAttachments array’s element is of type VkPipelineColorBlendAttachmentState. It is a structure with the following members:

  • blendEnable – Indicates whether we want to enable blending at all.
  • srcColorBlendFactor – Blending factor for color of the source (incoming) fragment.
  • dstColorBlendFactor – Blending factor for the destination color (stored already in the framebuffer at the same location as the incoming fragment).
  • colorBlendOp – Type of operation to perform (multiplication, addition, and so on).
  • srcAlphaBlendFactor – Blending factor for the alpha value of the source (incoming) fragment.
  • dstAlphaBlendFactor – Blending factor for the destination alpha value (already stored in the framebuffer).
  • alphaBlendOp – Type of operation to perform for alpha blending.
  • colorWriteMask – Bitmask selecting which of the RGBA components are selected (enabled) for writing.

In this example, we disable blending, which causes all other parameters to be irrelevant. Except for colorWriteMask, we select all components for writing but you can freely check what will happen when this parameter is changed to some other R, G, B, A combinations.

Creating a Pipeline Layout

The final thing we must do before pipeline creation is create a proper pipeline layout. A pipeline layout describes all the resources that can be accessed by the pipeline. In this example we must specify how many textures can be used by shaders and which shader stages will have access to them. There are of course other resources involved. Apart from shader stages, we must also describe the types of resources (textures, buffers), their total numbers, and layout. This layout can be compared to OpenGL’s active textures and shader uniforms. In OpenGL we bind textures to the desired texture image units and for shader uniforms we don’t provide texture handles but IDs of the texture image units to which actual textures are bound (we provide the number of the unit which the given texture was associated with).

With Vulkan, the situation is similar. We create some form of a memory layout: first there are two buffers, next we have three textures and an image. This memory “structure” is called a set and a collection of these sets is provided for the pipeline. In shaders, we access specified resources using specific memory “locations” from within these sets (layouts). This is done through a layout (set = X, binding = Y) specifier, which can be translated to: take the resource from the Y memory location from the X set.

And pipeline layout can be thought of as an interface between shader stages and shader resources as it takes these groups of resources, describes how they are gathered, and provides them to the pipeline.

This process is complex and I plan to devote a tutorial to it. Here we are not using any additional resources so I present an example for creating an “empty” pipeline layout:

VkPipelineLayoutCreateInfo layout_create_info = {
  VK_STRUCTURE_TYPE_PIPELINE_LAYOUT_CREATE_INFO,  // VkStructureType                sType
  nullptr,                                        // const void                    *pNext
  0,                                              // VkPipelineLayoutCreateFlags    flags
  0,                                              // uint32_t                       setLayoutCount
  nullptr,                                        // const VkDescriptorSetLayout   *pSetLayouts
  0,                                              // uint32_t                       pushConstantRangeCount
  nullptr                                         // const VkPushConstantRange     *pPushConstantRanges
};

VkPipelineLayout pipeline_layout;
if( vkCreatePipelineLayout( GetDevice(), &layout_create_info, nullptr, &pipeline_layout ) != VK_SUCCESS ) {
  printf( "Could not create pipeline layout!\n" );
  return Tools::AutoDeleter<VkPipelineLayout, PFN_vkDestroyPipelineLayout>();
}

return Tools::AutoDeleter<VkPipelineLayout, PFN_vkDestroyPipelineLayout>( pipeline_layout, vkDestroyPipelineLayout, GetDevice() );

17.Tutorial03.cpp, function CreatePipelineLayout()

To create a pipeline layout we must first prepare a variable of type VkPipelineLayoutCreateInfo. It contains the following fields:

  • sType – Type of structure, VK_STRUCTURE_TYPE_PIPELINE_LAYOUT_CREATE_INFO in this example.
  • pNext – Parameter reserved for extensions.
  • flags – Parameter reserved for future use.
  • setLayoutCount – Number of descriptor sets included in this layout.
  • pSetLayouts – Pointer to an array containing descriptions of descriptor layouts.
  • pushConstantRangeCount – Number of push constant ranges (I will describe it in a later tutorial).
  • pPushConstantRanges – Array describing all push constant ranges used inside shaders (in a given pipeline).

In this example we create “empty” layout so almost all the fields are set to null or zero.

We are not using push constants here, but they deserve some explanation. Push constants in Vulkan allow us to modify the data of constant variables used in shaders. There is a special, small amount of memory reserved for push constants. We update their values through Vulkan commands, not through memory updates, and it is expected that updates of push constants’ values are faster than normal memory writes.

As shown in the above example, I’m also wrapping pipeline layout in an “AutoDeleter” object. Pipeline layouts are required during pipeline creation, descriptor sets binding (enabling/activating this interface between shaders and shader resources) and push constants setting. None of these operations, except for pipeline creation, take place in this tutorial. So here, after we create a pipeline, we don’t need the layout anymore. To avoid memory leaks, I have used this helper class to destroy the layout as soon as we leave the function in which graphics pipeline is created.

Creating a Graphics Pipeline

Now we have all the resources required to properly create graphics pipeline. Here is the code that does that:

Tools::AutoDeleter<VkPipelineLayout, PFN_vkDestroyPipelineLayout> pipeline_layout = CreatePipelineLayout();
if( !pipeline_layout ) {
  return false;
}

VkGraphicsPipelineCreateInfo pipeline_create_info = {
  VK_STRUCTURE_TYPE_GRAPHICS_PIPELINE_CREATE_INFO,              // VkStructureType                                sType
  nullptr,                                                      // const void                                    *pNext
  0,                                                            // VkPipelineCreateFlags                          flags
  static_cast<uint32_t>(shader_stage_create_infos.size()),      // uint32_t                                       stageCount&shader_stage_create_infos[0],                                // const VkPipelineShaderStageCreateInfo         *pStages&vertex_input_state_create_info,                              // const VkPipelineVertexInputStateCreateInfo    *pVertexInputState;&input_assembly_state_create_info,                            // const VkPipelineInputAssemblyStateCreateInfo  *pInputAssemblyState
  nullptr,                                                      // const VkPipelineTessellationStateCreateInfo   *pTessellationState&viewport_state_create_info,                                  // const VkPipelineViewportStateCreateInfo       *pViewportState&rasterization_state_create_info,                             // const VkPipelineRasterizationStateCreateInfo  *pRasterizationState&multisample_state_create_info,                               // const VkPipelineMultisampleStateCreateInfo    *pMultisampleState
  nullptr,                                                      // const VkPipelineDepthStencilStateCreateInfo   *pDepthStencilState&color_blend_state_create_info,                               // const VkPipelineColorBlendStateCreateInfo     *pColorBlendState
  nullptr,                                                      // const VkPipelineDynamicStateCreateInfo        *pDynamicState
  pipeline_layout.Get(),                                        // VkPipelineLayout                               layout
  Vulkan.RenderPass,                                            // VkRenderPass                                   renderPass
  0,                                                            // uint32_t                                       subpass
  VK_NULL_HANDLE,                                               // VkPipeline                                     basePipelineHandle
  -1                                                            // int32_t                                        basePipelineIndex
};

if( vkCreateGraphicsPipelines( GetDevice(), VK_NULL_HANDLE, 1, &pipeline_create_info, nullptr, &Vulkan.GraphicsPipeline ) != VK_SUCCESS ) {
  printf( "Could not create graphics pipeline!\n" );
  return false;
}
return true;

18.Tutorial03.cpp, function CreatePipeline()

First we create a pipeline layout wrapped in an object of type “AutoDeleter”. Next we fill the structure of type VkGraphicsPipelineCreateInfo. It contains many fields. Here is a brief description of them:

  • sType – Type of structure, VK_STRUCTURE_TYPE_GRAPHICS_PIPELINE_CREATE_INFO here.
  • pNext – Parameter reserved for future, extension-related use.
  • flags – This time this parameter is not reserved for future use but controls how the pipeline should be created: if we are creating a derivative pipeline (if we are inheriting from another pipeline) or if we allow creating derivative pipelines from this one. We can also disable optimizations, which should shorten the time needed to create a pipeline.
  • stageCount – Number of stages described in the pStages parameter; must be greater than zero.
  • pStages – Array with descriptions of active shader stages (the ones created using shader modules); each stage must be unique (we can’t specify a given stage more than once). There also must be a vertex stage present.
  • pVertexInputState – Pointer to a variable contain the description of the vertex input’s state.
  • pInputAssemblyState – Pointer to a variable with input assembly description.
  • pTessellationState – Pointer to a description of the tessellation stages; can be null if tessellation is disabled.
  • pViewportState – Pointer to a variable specifying viewport parameters; can be null if rasterization is disabled.
  • pRasterizationState – Pointer to a variable specifying rasterization behavior.
  • pMultisampleState – Pointer to a variable defining multisampling; can be null if rasterization is disabled.
  • pDepthStencilState – Pointer to a description of depth/stencil parameters; this can be null in two situations: when rasterization is disabled or we’re not using depth/stencil attachments in a render pass.
  • pColorBlendState – Pointer to a variable with color blending/write masks state; can be null also in two situations: when rasterization is disabled or when we’re not using any color attachments inside the render pass.
  • pDynamicState – Pointer to a variable specifying which parts of the graphics pipeline can be set dynamically; can be null if the whole state is considered static (defined only through this create info structure).
  • layout – Handle to a pipeline layout object that describes resources accessed inside shaders.
  • renderPass – Handle to a render pass object; pipeline can be used with any render pass compatible with the provided one.
  • subpass – Number (index) of a subpass in which the pipeline will be used.
  • basePipelineHandle – Handle to a pipeline this one should derive from.
  • basePipelineIndex – Index of a pipeline this one should derive from.

When we are creating a new pipeline, we can inherit some of the parameters from another one. This means that both pipelines should have much in common. A good example is shader code. We don’t specify what fields are the same, but the general message that the pipeline inherits from another one may substantially accelerate pipeline creation. But why are there two fields to indicate a “parent” pipeline? We can’t use them both—only one of them at a time. When we are using a handle, this means that the “parent” pipeline is already created and we are deriving from the one we have provided the handle of. But the pipeline creation function allows us to create many pipelines at once. Using the second parameter, “parent” pipeline index, we can create both “parent” and “child” pipelines in the same call. We just specify an array of graphics pipeline creation info structures and this array is provided to pipeline creation function. So the “basePipelineIndex” is the index of pipeline creation info in this very array. We just have to remember that the “parent” pipeline must be earlier (must have a smaller index) in this array and it must be created with the “allow derivatives” flag set.

In this example we are creating a pipeline with the state being entirely static (null for the “pDynamicState” parameter). But what is a dynamic state? To allow for some flexibility and to lower the number of created pipeline objects, the dynamic state was introduced. We can define through the “pDynamicState” parameter what parts of the graphics pipeline can be set dynamically through additional Vulkan commands and what parts are being static, set once during pipeline creation. The dynamic state includes parameters such as viewports, line widths, blend constants, or some stencil parameters. If we specify that a given state is dynamic, parameters in a pipeline creation info structure that are related to that state are ignored. We must set the given state using the proper Vulkan commands during rendering because initial values of such state may be undefined.

So after these quite overwhelming preparations we can create a graphics pipeline. This is done by calling the vkCreateGraphicsPipelines() function which, among others, takes an array of pointers to the pipeline create info structures. When everything goes well, VK_SUCCESS should be returned by this function and a handle of a graphics pipeline should be stored in a variable we’ve provided the address of. Now we are ready to start drawing.

Preparing Drawing Commands

I introduced you to the concept of command buffers in the previous tutorial. Here I will briefly explain what are they and how to use them.

Command buffers are containers for GPU commands. If we want to execute some job on a device, we do it through command buffers. This means that we must prepare a set of commands that process data (that is, draw something on the screen) and record these commands in command buffers. Then we can submit whole buffers to device’s queues. This submit operation tells the device: here is a bunch of things I want you to do for me and do them now.

To record commands, we must first allocate command buffers. These are allocated from command pools, which can be thought of as memory chunks. If a command buffer needs to be larger (as we record many complicated commands in it) it can grow and use additional memory from a pool it was allocated with. So first we must create a command pool.

Creating a Command Pool

Command pool creation is simple and looks like this:

VkCommandPoolCreateInfo cmd_pool_create_info = {
  VK_STRUCTURE_TYPE_COMMAND_POOL_CREATE_INFO,     // VkStructureType                sType
  nullptr,                                        // const void                    *pNext
  0,                                              // VkCommandPoolCreateFlags       flags
  queue_family_index                              // uint32_t                       queueFamilyIndex
};

if( vkCreateCommandPool( GetDevice(), &cmd_pool_create_info, nullptr, pool ) != VK_SUCCESS ) {
  return false;
}
return true;

19.Tutorial03.cpp, function CreateCommandPool()

First we prepare a variable of type VkCommandPoolCreateInfo. It contains the following fields:

  • sType – Standard type of structure, set to VK_STRUCTURE_TYPE_COMMAND_POOL_CREATE_INFO here.
  • pNext – Pointer reserved for extensions.
  • flags – Indicates usage scenarios for command pool and command buffers allocated from it; that is, we can tell the driver that command buffers allocated from this pool will live for a short time; for no specific usage we can set it to zero.
  • queueFamilyIndex – Index of a queue family for which we are creating a command pool.

Remember that command buffers allocated from a given pool can only be submitted to a queue from a queue family specified during pool creation.

To create a command pool, we just call the vkCreateCommandPool() function.

Allocating Command Buffers

Now that we have the command pool ready, we can allocate command buffers from it.

VkCommandBufferAllocateInfo command_buffer_allocate_info = {
  VK_STRUCTURE_TYPE_COMMAND_BUFFER_ALLOCATE_INFO, // VkStructureType                sType
  nullptr,                                        // const void                    *pNext
  pool,                                           // VkCommandPool                  commandPool
  VK_COMMAND_BUFFER_LEVEL_PRIMARY,                // VkCommandBufferLevel           level
  count                                           // uint32_t                       bufferCount
};

if( vkAllocateCommandBuffers( GetDevice(), &command_buffer_allocate_info, command_buffers ) != VK_SUCCESS ) {
  return false;
}
return true;

20.Tutorial03.cpp, function AllocateCommandBuffers()

To allocate command buffers we specify a variable of structure type. This time its type is VkCommandBufferAllocateInfo, which contains these members:

  • sType – Type of the structure; VK_STRUCTURE_TYPE_COMMAND_BUFFER_ALLOCATE_INFO for this purpose.
  • pNext – Pointer reserved for extensions.
  • commandPool – Pool from which we want our command buffers to take their memory.
  • level – Command buffer level; there are two levels: primary and secondary; right now we are only interested in primary command buffers.
  • bufferCount – Number of command buffers we want to allocate.

To allocate command buffers, call the vkAllocateCommandBuffers() function and check whether it succeeded. We can allocate many buffers at once with one function call.

I’ve prepared a simple buffer allocating function to show you how some Vulkan functions can be wrapped for easier use. Here is a usage of two such wrapper functions that create command pools and allocate command buffers from them.

if( !CreateCommandPool( GetGraphicsQueue().FamilyIndex, &Vulkan.GraphicsCommandPool ) ) {
  printf( "Could not create command pool!\n" );
  return false;
}

uint32_t image_count = static_cast<uint32_t>(GetSwapChain().Images.size());
Vulkan.GraphicsCommandBuffers.resize( image_count, VK_NULL_HANDLE );

if( !AllocateCommandBuffers( Vulkan.GraphicsCommandPool, image_count, &Vulkan.GraphicsCommandBuffers[0] ) ) {
  printf( "Could not allocate command buffers!\n" );
  return false;
}
return true;

21.Tutorial03.cpp, function CreateCommandBuffers()

As you can see, we are creating a command pool for a graphics queue family index. All image state transitions and drawing operations will be performed on a graphics queue. Presentation is done on another queue (if the presentation queue is different from the graphics queue) but we don’t need a command buffer for this operation.

And we are also allocating command buffers for each swap chain image. Here we take number of images and provide it to this simple “wrapper” function for command buffer allocation.

Recording Command Buffers

Now that we have command buffers allocated from the command pool we can finally record operations that will draw something on the screen. First we must prepare a set of data needed for the recording operation. Some of this data is identical for all command buffers, but some is referencing a specific swap chain image. Here is a code that is independent of swap chain images:

VkCommandBufferBeginInfo graphics_commandd_buffer_begin_info = {
  VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO,    // VkStructureType                        sType
  nullptr,                                        // const void                            *pNext
  VK_COMMAND_BUFFER_USAGE_SIMULTANEOUS_USE_BIT,   // VkCommandBufferUsageFlags              flags
  nullptr                                         // const VkCommandBufferInheritanceInfo  *pInheritanceInfo
};

VkImageSubresourceRange image_subresource_range = {
  VK_IMAGE_ASPECT_COLOR_BIT,                      // VkImageAspectFlags             aspectMask
  0,                                              // uint32_t                       baseMipLevel
  1,                                              // uint32_t                       levelCount
  0,                                              // uint32_t                       baseArrayLayer
  1                                               // uint32_t                       layerCount
};

VkClearValue clear_value = {
  { 1.0f, 0.8f, 0.4f, 0.0f },                     // VkClearColorValue              color
};

const std::vector<VkImage>& swap_chain_images = GetSwapChain().Images;

22.Tutorial03.cpp, function RecordCommandBuffers()

Performing command buffer recording is similar to OpenGL’s drawing lists where we start recording a list by calling the glNewList() function. Next we prepare a set of drawing commands and then we close the list or stop recording it (glEndList()). So the first thing we need to do is to prepare a variable of type VkCommandBufferBeginInfo. It is used when we start recording a command buffer and it tells the driver about the type, contents, and desired usage of a command buffer. Variables of this type contain the following members:

  • sType – Standard structure type, here set to VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO.
  • pNext – Pointer reserved for extensions.
  • flags – Parameters describing the desired usage (that is, if we want to submit this command buffer only once and destroy/reset it or if it is possible that the buffer will submitted again before the processing of its previous submission has finished).
  • pInheritanceInfo – Parameter used only when we want to record a secondary command buffer.

Next we describe the areas or parts of our images that we will set up image memory barriers for. Here we set up barriers to specify that queues from different families will reference a given image. This is done through a variable of type VkImageSubresourceRange with the following members:

  • aspectMask – Describes a “type” of image, whether it is for color, depth, or stencil data.
  • baseMipLevel – Number of a first mipmap level our operations will be performed on.
  • levelCount – Number of mipmap levels (including base level) we will be operating on.
  • baseArrayLayer – Number of an first array layer of an image that will take part in operations.
  • layerCount – Number of layers (including base layer) that will be modified.

Next we set up a clear value for our images. Before drawing we need to clear images. In previous tutorials, we performed this operation explicitly by ourselves. Here images are cleared as a part of a render pass attachment load operation. We set to “clear” so now we must specify the color to which an image must be cleared. This is done using a variable of type VkClearValue in which we provide R, G, B, A values.

Variables we have created thus far are independent of an image itself, and that’s why we have specified them before a loop. Now we can start recording command buffers:

for( size_t i = 0; i < Vulkan.GraphicsCommandBuffers.size(); ++i ) {
  vkBeginCommandBuffer( Vulkan.GraphicsCommandBuffers[i], &graphics_commandd_buffer_begin_info );

  if( GetPresentQueue().Handle != GetGraphicsQueue().Handle ) {
    VkImageMemoryBarrier barrier_from_present_to_draw = {
      VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER,     // VkStructureType                sType
      nullptr,                                    // const void                    *pNext
      VK_ACCESS_MEMORY_READ_BIT,                  // VkAccessFlags                  srcAccessMask
      VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT,       // VkAccessFlags                  dstAccessMask
      VK_IMAGE_LAYOUT_PRESENT_SRC_KHR,            // VkImageLayout                  oldLayout
      VK_IMAGE_LAYOUT_PRESENT_SRC_KHR,            // VkImageLayout                  newLayout
      GetPresentQueue().FamilyIndex,              // uint32_t                       srcQueueFamilyIndex
      GetGraphicsQueue().FamilyIndex,             // uint32_t                       dstQueueFamilyIndex
      swap_chain_images[i],                       // VkImage                        image
      image_subresource_range                     // VkImageSubresourceRange        subresourceRange
    };
    vkCmdPipelineBarrier( Vulkan.GraphicsCommandBuffers[i], VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT, VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT, 0, 0, nullptr, 0, nullptr, 1, &barrier_from_present_to_draw );
  }

  VkRenderPassBeginInfo render_pass_begin_info = {
    VK_STRUCTURE_TYPE_RENDER_PASS_BEGIN_INFO,     // VkStructureType                sType
    nullptr,                                      // const void                    *pNext
    Vulkan.RenderPass,                            // VkRenderPass                   renderPass
    Vulkan.FramebufferObjects[i].Handle,          // VkFramebuffer                  framebuffer
    {                                             // VkRect2D                       renderArea
      {                                           // VkOffset2D                     offset
        0,                                          // int32_t                        x
        0                                           // int32_t                        y
      },
      {                                           // VkExtent2D                     extent
        300,                                        // int32_t                        width
        300,                                        // int32_t                        height
      }
    },
    1,                                            // uint32_t                       clearValueCount
    &clear_value                                  // const VkClearValue            *pClearValues
  };

  vkCmdBeginRenderPass( Vulkan.GraphicsCommandBuffers[i], &render_pass_begin_info, VK_SUBPASS_CONTENTS_INLINE );

  vkCmdBindPipeline( Vulkan.GraphicsCommandBuffers[i], VK_PIPELINE_BIND_POINT_GRAPHICS, Vulkan.GraphicsPipeline );

  vkCmdDraw( Vulkan.GraphicsCommandBuffers[i], 3, 1, 0, 0 );

  vkCmdEndRenderPass( Vulkan.GraphicsCommandBuffers[i] );

  if( GetGraphicsQueue().Handle != GetPresentQueue().Handle ) {
    VkImageMemoryBarrier barrier_from_draw_to_present = {
      VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER,       // VkStructureType              sType
      nullptr,                                      // const void                  *pNext
      VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT,         // VkAccessFlags                srcAccessMask
      VK_ACCESS_MEMORY_READ_BIT,                    // VkAccessFlags                dstAccessMask
      VK_IMAGE_LAYOUT_PRESENT_SRC_KHR,              // VkImageLayout                oldLayout
      VK_IMAGE_LAYOUT_PRESENT_SRC_KHR,              // VkImageLayout                newLayout
      GetGraphicsQueue().FamilyIndex,               // uint32_t                     srcQueueFamilyIndex
      GetPresentQueue( ).FamilyIndex,               // uint32_t                     dstQueueFamilyIndex
      swap_chain_images[i],                         // VkImage                      image
      image_subresource_range                       // VkImageSubresourceRange      subresourceRange
    };
    vkCmdPipelineBarrier( Vulkan.GraphicsCommandBuffers[i], VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT, VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT, 0, 0, nullptr, 0, nullptr, 1, &barrier_from_draw_to_present );
  }

  if( vkEndCommandBuffer( Vulkan.GraphicsCommandBuffers[i] ) != VK_SUCCESS ) {
    printf( "Could not record command buffer!\n" );
    return false;
  }
}
return true;

23.Tutorial03.cpp, function RecordCommandBuffers()

Recording a command buffer is started by calling the vkBeginCommandBuffer() function. At the beginning we set up a barrier that tells the driver that previously queues from one family referenced a given image but now queues from a different family will be referencing it (we need to do this because during swap chain creation we specified exclusive sharing mode). The barrier is set only when the graphics queue is different than the present queue. This is done by calling the vkCmdPipelineBarrier() function. We must specify when in the pipeline the barrier should be placed (VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT) and how the barrier should be set up. Barrier parameters are prepared through the VkImageMemoryBarrier structure:

  • sType – Type of the structure, here set to VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER.
  • pNext – Pointer reserved for extensions.
  • srcAccessMask – Type of memory operations that took place in regard to a given image before the barrier.
  • dstAccessMask – Type of memory operations connected with a given image that will take place after the barrier.
  • oldLayout – Current image memory layout.
  • newLayout – Memory layout image you should have after the barrier.
  • srcQueueFamilyIndex – Index of a family of queues which were referencing image before the barrier.
  • dstQueueFamilyIndex – Index of a queue family queues from which will be referencing image after the barrier.
  • image – Handle to the image itself.
  • subresourceRange – Parts of an image for which we want the transition to occur.

In this example we don’t change the layout of an image, for two reasons: (1) The barrier may not be set at all (if the graphics and present queues are the same), and (2) the layout transition will be performed automatically as a render pass operation (at the beginning of the first—and only—subpass).

Next we start a render pass. We call the vkCmdBeginRenderPass() function for which we must provide a pointer to a variable of VkRenderPassBeginInfo type. It contains the following members:

  • sType – Standard type of structure. In this case we must set it to a value of VK_STRUCTURE_TYPE_RENDER_PASS_BEGIN_INFO.
  • pNext – Pointer reserved for future use.
  • renderPass – Handle of a render pass we want to start.
  • framebuffer – Handle of a framebuffer, which specifies images used as attachments for this render pass.
  • renderArea – Area of all images that will be affected by the operations that takes place in this render pass. It specifies the upper-left corner (through x and y parameters of an offset member) and width and height (through extent member) of a render area.
  • clearValueCount – Number of elements in pClearValues array.
  • pClearValues – Array with clear values for each attachment.

When we specify a render area for the render pass, we must make sure that the rendering operations won’t modify pixels outside this area. This is just a hint for a driver so it could optimize its behavior. If we won’t confine operations to the provided area by using a proper scissor test, pixels outside this area may become undefined (we can’t rely on their contents). We also can’t specify a render area that is greater than a framebuffer’s dimensions (falls outside the framebuffer).

And with a pClearValues array, it must contain the elements for each render pass attachment. Each of its members specifies the color to which the given attachment must be cleared when its loadOp is set to clear. For attachments where loadOp is not clear, the values provided for them are ignored. But we can’t provide an array with a smaller amount of elements.

We have begun a command buffer, set a barrier (if necessary), and started a render pass. When we start a render pass we are also starting its first subpass. We can switch to the next subpass by calling the vkCmdNextSubpass() function. During these operations, layout transitions and clear operations may occur. Clears are done in a subpass in which the image is first used (referenced). Layout transitions occur each time a subpass layout is different than the layout in a previous subpass or (in the case of a first subpass or when the image is first referenced) different than the initial layout (layout before the render pass). So in our example when we start a render pass, the swap chain image’s layout is changed automatically from “presentation source” to a “color attachment optimal” layout.

Now we bind a graphics pipeline. This is done by calling the vkCmdBindPipeline() function. This “activates” all shader programs (similar to the glUseProgram() function) and sets desired tests, blending operations, and so on.

After the pipeline is bound, we can finally draw something by calling the vkCmdDraw() function. In this function we specify the number of vertices we want to draw (three), number of instances that should be drawn (just one), and a numbers or indices of a first vertex and first instance (both zero).

Next the vkCmdEndRenderPass() function is called which, as the name suggests, ends the given render pass. Here all final layout transitions occur if the final layout specified for a render pass is different from the layout used in the last subpass the given image was referenced in.

After that, the barrier may be set in which we tell the driver that the graphics queue finished using a given image and from now on the present queue will be using it. This is done, once again, only when the graphics and present queues are different. And after the barrier, we stop recording a command buffer for a given image. All these operations are repeated for each swap chain image.

Drawing

The drawing function is the same as the Draw() function presented in Tutorial 2. We acquire the image’s index, submit a proper command buffer, and present an image. We are using semaphores the same way they were used previously: one semaphore is used for acquiring an image and it tells the graphics queue to wait when the image is not yet available for use. The second command buffer is used to indicate whether drawing on a graphics queue is finished. The present queue waits on this semaphore before it can present an image. Here is the source code of a Draw() function:

VkSemaphore image_available_semaphore = GetImageAvailableSemaphore();
VkSemaphore rendering_finished_semaphore = GetRenderingFinishedSemaphore();
VkSwapchainKHR swap_chain = GetSwapChain().Handle;
uint32_t image_index;

VkResult result = vkAcquireNextImageKHR( GetDevice(), swap_chain, UINT64_MAX, image_available_semaphore, VK_NULL_HANDLE, &image_index );
switch( result ) {
  case VK_SUCCESS:
  case VK_SUBOPTIMAL_KHR:
    break;
  case VK_ERROR_OUT_OF_DATE_KHR:
    return OnWindowSizeChanged();
  default:
    printf( "Problem occurred during swap chain image acquisition!\n" );
    return false;
}

VkPipelineStageFlags wait_dst_stage_mask = VK_PIPELINE_STAGE_TRANSFER_BIT;
VkSubmitInfo submit_info = {
  VK_STRUCTURE_TYPE_SUBMIT_INFO,                // VkStructureType              sType
  nullptr,                                      // const void                  *pNext
  1,                                            // uint32_t                     waitSemaphoreCount
  &image_available_semaphore,                   // const VkSemaphore           *pWaitSemaphores&wait_dst_stage_mask,                         // const VkPipelineStageFlags  *pWaitDstStageMask;
  1,                                            // uint32_t                     commandBufferCount&Vulkan.GraphicsCommandBuffers[image_index],  // const VkCommandBuffer       *pCommandBuffers
  1,                                            // uint32_t                     signalSemaphoreCount&rendering_finished_semaphore                 // const VkSemaphore           *pSignalSemaphores
};

if( vkQueueSubmit( GetGraphicsQueue().Handle, 1, &submit_info, VK_NULL_HANDLE ) != VK_SUCCESS ) {
  return false;
}

VkPresentInfoKHR present_info = {
  VK_STRUCTURE_TYPE_PRESENT_INFO_KHR,           // VkStructureType              sType
  nullptr,                                      // const void                  *pNext
  1,                                            // uint32_t                     waitSemaphoreCount
  &rendering_finished_semaphore,                // const VkSemaphore           *pWaitSemaphores
  1,                                            // uint32_t                     swapchainCount&swap_chain,                                  // const VkSwapchainKHR        *pSwapchains&image_index,                                 // const uint32_t              *pImageIndices
  nullptr                                       // VkResult                    *pResults
};
result = vkQueuePresentKHR( GetPresentQueue().Handle, &present_info );

switch( result ) {
  case VK_SUCCESS:
    break;
  case VK_ERROR_OUT_OF_DATE_KHR:
  case VK_SUBOPTIMAL_KHR:
    return OnWindowSizeChanged();
  default:
    printf( "Problem occurred during image presentation!\n" );
    return false;
}

return true;

24.Tutorial03.cpp, function Draw()

Tutorial 3 Execution

In this tutorial we performed “real” drawing operations. A simple triangle may not sound too convincing, but it is a good starting point for a first Vulkan-created image. Here is what the triangle should look like:

If you’re wondering why there are black parts in the image, here is an explanation: To simplify the whole code, we created a framebuffer with a fixed size (width and height of 300 pixels). But the window’s size (and the size of the swap chain images) may be greater than these 300 x 300 pixels. The parts of an image that lay outside of the framebuffer’s dimensions are uncleared and unmodified by our application. They may even contain some “artifacts,” because the memory from which the driver allocates the swap chain images may have been previously used for other purposes and could contain some data. The correct behavior is to create a framebuffer with the same size as the swap chain images and to recreate it when the window’s size changes. But as long as the blue triangle is rendered on an orange/gold background, it means that the code works correctly.

Cleaning Up

One last thing to learn before this tutorial ends is how to release resources created during this lesson. I won’t repeat the code needed to release resources created in the previous chapter. Just look at the VulkanCommon.cpp file. Here is the code needed to destroy resources specific to this chapter:

if( GetDevice() != VK_NULL_HANDLE ) {
  vkDeviceWaitIdle( GetDevice() );

  if( (Vulkan.GraphicsCommandBuffers.size() > 0) && (Vulkan.GraphicsCommandBuffers[0] != VK_NULL_HANDLE) ) {
    vkFreeCommandBuffers( GetDevice(), Vulkan.GraphicsCommandPool, static_cast<uint32_t>(Vulkan.GraphicsCommandBuffers.size()), &Vulkan.GraphicsCommandBuffers[0] );
    Vulkan.GraphicsCommandBuffers.clear();
  }

  if( Vulkan.GraphicsCommandPool != VK_NULL_HANDLE ) {
    vkDestroyCommandPool( GetDevice(), Vulkan.GraphicsCommandPool, nullptr );
    Vulkan.GraphicsCommandPool = VK_NULL_HANDLE;
  }

  if( Vulkan.GraphicsPipeline != VK_NULL_HANDLE ) {
    vkDestroyPipeline( GetDevice(), Vulkan.GraphicsPipeline, nullptr );
    Vulkan.GraphicsPipeline = VK_NULL_HANDLE;
  }

  if( Vulkan.RenderPass != VK_NULL_HANDLE ) {
    vkDestroyRenderPass( GetDevice(), Vulkan.RenderPass, nullptr );
    Vulkan.RenderPass = VK_NULL_HANDLE;
  }

  for( size_t i = 0; i < Vulkan.FramebufferObjects.size(); ++i ) {
    if( Vulkan.FramebufferObjects[i].Handle != VK_NULL_HANDLE ) {
      vkDestroyFramebuffer( GetDevice(), Vulkan.FramebufferObjects[i].Handle, nullptr );
      Vulkan.FramebufferObjects[i].Handle = VK_NULL_HANDLE;
    }

    if( Vulkan.FramebufferObjects[i].ImageView != VK_NULL_HANDLE ) {
      vkDestroyImageView( GetDevice(), Vulkan.FramebufferObjects[i].ImageView, nullptr );
      Vulkan.FramebufferObjects[i].ImageView = VK_NULL_HANDLE;
    }
  }
  Vulkan.FramebufferObjects.clear();
}

25.Tutorial03.cpp, function ChildClear()

As usual we first check whether there is any device. If we don’t have a device, we don’t have a resource. Next we wait until the device is free and we delete all the created resources. We start from deleting command buffers by calling a vkFreeCommandBuffers() function. Next we destroy a command pool through a vkDestroyCommandPool() function and after that the graphics pipeline is destroyed. This is achieved through a vkDestroyPipeline() function call. Next we call a vkDestroyRenderPass() function, which releases the handle to a render pass. Finally, all framebuffers and image views associated with each swap chain image are deleted.

Each object destruction is preceded by a check whether a given resource was properly created. If not we skip the process of destruction of such resource.

Conclusion

In this tutorial, we created a render pass with one subpass. Next we created image views and framebuffers for each swap chain image. One of the most difficult parts was to create a graphics pipeline, because it required us to prepare lots of data. We had to create shader modules and describe all the shader stages that should be active when a given graphics pipeline is bound. We had to prepare information about input vertices, their layout, and assembling them into polygons. Viewport, rasterization, multisampling, and color blending information was also necessary. Then we created a simple pipeline layout and after that we could create the pipeline itself. Next we created a command pool and allocated command buffers for each swap chain image. Operations recorded in each command buffer involved setting up an image memory barrier, beginning a render pass, binding a graphics pipeline, and drawing. Next we ended a render pass and set up another image memory barrier. The drawing itself was performed the same way as in the previous tutorial (2).

In the next tutorial, we will learn about the vertex attributes, images and buffers.


Go to: API without Secrets: Introduction to Vulkan* Part 4: Vertex Attributes ( To Be Continued)


Notices

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.

This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps.

The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are available on request.

Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-800- 548-4725 or by visiting www.intel.com/design/literature.htm.

This sample source code is released under the Intel Sample Source Code License Agreement.

Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.

*Other names and brands may be claimed as the property of others.

© 2016 Intel Corporation.


Intel® Collaboration Suite for WebRTC Simplifies Adding Real-Time Communication to Your Applications

$
0
0

Download PDF [PDF 569 KB]

Overview

Web-based real-time communication (WebRTC) is an open standard proposed by both World Wide Web Consortium (W3C) and Internet Engineering Task Force (IETF) that allows browser-to-browser applications to support voice calling, video chat, and peer-to-peer (P2P) data transmission. End users can use their browsers for real-time communication without the need for any additional clients or plugins.

The WebRTC standard is gaining significant momentum and is currently fully supported by open standard browsers such as Google Chrome*, Mozilla Firefox*, and Opera*. Microsoft also announced its Edge* browser support on Object RTC (ORTC), which will be interoperable with WebRTC.

To ease adoption of this WebRTC technology and make it widely available to expand or create new applications, Intel has developed the end-to-end WebRTC solution, Intel® Collaboration Suite for WebRTC (Intel® CS for WebRTC). Intel CS for WebRTC is highly optimized for Intel® platforms, including Intel® Xeon® processor-based products such as Intel® Visual Compute Accelerator card, Intel® CoreTM processor-based desktop products, and Intel® AtomTM processor-based mobile products.

You can download Intel CS for WebRTC from http://webrtc.intel.com at no charge. It includes the following main components:

  • Intel CS for WebRTC Conference Server– enables not only P2P-style communication, but also efficient WebRTC-based video conferencing.
  • Intel CS for WebRTC Gateway Server for SIP– provides the WebRTC connectivity into session initiation protocol (SIP) conferences.
  • Intel CS for WebRTC Client SDK– allows you to develop WebRTC apps using JavaScript* APIs, Internet Explorer* plugin for WebRTC, Android* native apps using Java* APIs, iOS* native apps using Objective-C* APIs, or Windows* native apps using C++ APIs.
  • Intel CS for WebRTC User Documentation – includes complete online documentation available on the WebRTC website http://webrtc.intel.com, with sample code, installation instructions, and API descriptions.

Problems with Existing WebRTC-Based RTC Solutions

WebRTC-based RTC solutions change the way people communicate, bringing real-time communication to the browser. However, as a new technology, WebRTC-based solutions require improvements in the following areas to be as complete as traditional RTC solutions.

  • Mostly P2P communication based. The WebRTC standard itself as well as Google WebRTC open source reference implementation only focuses on peer-to-peer (P2P) communication, limiting most of the WebRTC-based solutions to two-party communication. Although some WebRTC solutions support multi-party chat, these solutions use mesh network topology, which is less efficient and can only support a few attendees for common client devices.
  • Not fully accounting for client usage preferences. Although browsers are available for multiple platforms, not all users like browsers. That is, many mobile platform end-users prefer native apps, such as Android apps or iOS apps. Additionally, some commonly used browsers, such as Internet Explorer, still do not natively support WebRTC.
  • Lack of flexibility on MCU server. Some WebRTC-based solutions support multipoint control unit (MCU) servers for multi-party communication. However, most of those MCU servers use a router/forward solution, which just forwards the publishers’ streams to the subscribers. Although this method fulfills part of the scenarios when clients have equivalent capabilities or SVC/Simulcast is supported, it becomes a high requirement for clients to easily meet. To work with a wide variety of devices, MCU servers must do some media-specific processing, such as transcoding and mixing.
  • Limited deployment mode choices for customers. Most of the existing WebRTC-based RTC solutions work as a service model hosted by service providers. This style provides all the benefits of a cloud service, but is not useful for service providers and those who want to host the service themselves for data sensitive consideration.

Key differentiation of Intel® CS for WebRTC

Fully Functional WebRTC-Based Audio/Video Communication

Intel CS for WebRTC not only offers peer-to-peer WebRTC communication, but it also supports WebRTC-based multi-party video conferencing and provides the WebRTC client connectivity to other traditional video conferences, like SIP. For video conferencing, it provides router and mixer solutions simultaneously to handle complex customer scenarios. Additionally, it supports:

  • H.264 and VP8 video codecs for input and output streams
  • MCU multi-streaming
  • Real-time streaming protocol (RTSP) stream input
  • Customized video layout definition plus runtime control
  • Voice activity detection (VAD) controlled video switching
  • Flexible media recording

Easy to Deploy, Scale, and Integrate

Intel CS for WebRTC Conference and Gateway Servers provide pluggable integration modules as well as open APIs to work with existing enterprise systems. They easily scale to cluster mode and serve larger number of users with an increase of cluster node numbers. In addition, the Intel solution provides comprehensive client SDKs including JavaScript SDK, Android native SDK, iOS native SDK, and Windows native SDK to help customers quickly expand their client applications with video communication capabilities.

High-Performance Media Processing Capability

Intel CS for WebRTC MCU and Gateway servers are built on top of Intel® Media Server Studio, optimized for Intel® Core™ processors and Intel® Xeon® processor E3 family with Intel® Iris™ graphics, Intel® Iris™ Pro graphics, and Intel® HD graphics technology.

The client SDKs, including the Android native SDK and Windows C++ SDK, use the mobile and desktop platforms’ hardware media processing capabilities to improve the user experience. That is, the Android native SDK is optimized for Intel® Atom™ platforms (all Intel® Atom™ x3, x5, and x7 processor series) focusing on video power and performance, as well as end-to-end latency. The Windows C++ SDK also uses the media processing acceleration of the Intel® Core™ processor-based platforms (i3, i5, i7) for consistent HD video communication.

Secure, Intelligent, Reliable QoS Control Support

Intel CS for WebRTC solution ensures video communication data security through HTTPS, secure WebSocket, SRTP/DTLS, etc. Also the intelligent quality of service (QoS) control, e.g., NACK, FEC, and dynamic bitrate control, guarantees the communication quality between clients and servers against high packet loss and network bandwidth variance. Experiments listed in Figure 1 have shown that the Intel video engine handles up to 20% packet loss and 200ms delay.

Figure 1. Packet Loss Protection Results with QoS Control
Figure 1. Packet Loss Protection Results with QoS Control

Full Functional Video Communication with Intel CS for WebRTC Conference Servers

Flexible Communication Modes

Intel CS for WebRTC offers both peer-to-peer video call and MCU-based multi-party video conference communication modes.

A typical WebRTC usage scenario is direct peer-to-peer video call. After connecting to the signaling server, users can invite other parties for P2P video communication. All video, audio, and data streams are transported directly between each other. Meanwhile, the signaling messages for discovery and control go through the signaling server. As Figure 2 shows, Intel provides a reference signaling server implementation called Peer Server with source code included. Customers can construct their own signaling server based on this Peer Server or replace the whole Peer Server with an existing channel. The client SDK also provides the customization mechanism to let users implement their own signaling channel adapter.

Figure 2. P2P Video Communication with Peer Server
Figure 2. P2P Video Communication with Peer Server

Intel CS for WebRTC solution further offers the MCU-based multi-party video conference chat. All streams go through the MCU server the same as the signaling messages do as Figure 3 shows. This reduces the stream traffic and computing overhead on client devices compared to a mesh network solution.

Figure 3. Multi-party Video Conference Chat through MCU Server
Figure 3. Multi-party Video Conference Chat through MCU Server

Unlike most existing WebRTC MCUs, which usually work as a router to forward media streams for clients, Intel CS for WebRTC MCU server also handles the media processing and allows a wide range of devices to be used in the conference. Users can subscribe to either the forward streams or mixed streams from MCU server. Based on Intel Iris Pro graphics or Intel HD graphics technology, media processing on the MCU server can achieve excellent cost-performance ratio.

The Intel MCU provides more flexibility on mixed streams. You can generate multiple video resolution mixed streams to adapt to various client devices with different media processing capability and network bandwidth.

External Input for RTSP Streams

Intel CS for WebRTC allows bridging a wider range of devices into the conference by supporting external inputs from RTSP streams. This means almost all RTSP compatible devices, including IP cameras, can join the video conference. The IP camera support opens up usage scenarios and applications in security, remote education, remote healthcare, etc.

Mixed-Stream Layout Definition and Runtime Region Control

Through Intel CS for WebRTC video layout definition interface, which is an expanded version of RFC-5707 (MSML), you can define any rectangle-style video layout for conference, according to the runtime participant numbers. Figure 4 shows the video layout for one conference. The meeting contains 5 different layouts with 1, 2, 3, 4, or 5-6 participants.

Figure 4. Example Video Layouts
Figure 4. Example Video Layouts

Figure 5 describes the detailed layout regions for a maximum of 2 participants. The region with id number 1 is always the primary region of this layout.

Figure 5. Example Video Layout Definition and Effect
Figure 5. Example Video Layout Definition and Effect

Intel CS for WebRTC MCU also supports automatic voice-activated video switching through voice activity detection (VAD). The user most active on voice is switched to the primary region which is the yellow part of Figure 6.

Figure 6. Example Video Layouts with Primary Region
Figure 6. Example Video Layouts with Primary Region

You can also assign any stream to any region as needed during runtime for flexible video layout design of the conference.

Flexible Conference Recording

When recording in Intel CS for WebRTC, you can select any video feed and any audio feed. You not only can record switching across different streams that the conference room is offering (such as mixed and forward streams), but also select video and audio tracks separately from different streams. You can select the audio track from the mixed stream of participants and video track from the screen-sharing stream.

Scaling the Peer Server Reference Implementation

Although the Peer Server that Intel provides is a signaling server reference implementation for signal node, you can extend it to a distributed and large scale platform by refactoring the implementation. See Figure 7 for a scaling proposal.

Figure 7. Peer Server Cluster Scaling Proposal
Figure 7. Peer Server Cluster Scaling Proposal

Scaling the MCU Conference Server

The Intel CS for WebRTC MCU server was designed to be a distributed framework with separate components, including manager node, signaling nodes, accessing nodes, media processing nodes, etc. Those components are easy to scale and suitable for cloud deployment.

Figure 8 shows an example from the conference server user guide for deploying an MCU server cluster.

Figure 8. MCU Conference Server Cluster Deployment Example
Figure 8. MCU Conference Server Cluster Deployment Example

Interoperability with Intel CS for WebRTC Gateway

For legacy video conference solutions to adopt the WebRTC advantage on the client side, Intel CS for WebRTC provides the WebRTC gateway.

Key Functionality Offering

Intel CS for WebRTC gateway for SIP not only provides the basic signaling and protocol translation between WebRTC and SIP, it also provides the real-time media transcoding between VP8 and H.264 to address the video codec preference difference between them. In addition, the gateway keeps the sessions mapping between WebRTC and SIP to support bi-directional video calls. Figure 9 briefly shows how SIP devices can connect with WebRTC terminals through the Gateway Intel provided.

Figure 9. Connect WebRTC with SIP Terminals through the Gateway
Figure 9. Connect WebRTC with SIP Terminals through the Gateway

Validated SIP Environments

Note: See Intel CS for WebRTC Release Notes for current validated environments

Cloud Deployment

The Intel CS for WebRTC gateway instances are generally session-based. Each session is independent, so sessions are easily scalable to multiple instances for cloud deployment. You can make the gateway instance management a component of your existing conference system scheduling policy and achieve load balancing for the gateway.

Comprehensive Intel CS for WebRTC Client SDKs

The Intel CS for WebRTC also provides comprehensive client SDKs to help you easily implement all the functionalities that the server provides. The client SDKs allow client apps to communicate with remote clients or join conference meetings. Basic features include audio/video communication, data transmission, and screen sharing. P2P mode also supports a customized signaling channel that can be easily integrated into existed IT infrastructures.

Client SDKs include JavaScript SDK, Android SDK, iOS SDK, and Windows SDK. Current features are listed in Table 1.

Table 1. Client SDK Features

#Partial support: for JavaScript SDK H.264 video codec support, only valid when browser WebRTC engine supports it.

Customized Signaling Channel

In addition to the default Peer Server, Intel CS for WebRTC client SDK for P2P chat provides simple customizable interfaces to allow you to implement and integrate with your own signaling channel through the extensible messaging and presence protocol (XMPP) server channel. Figure 10 shows there is a separated signaling channel model in client SDK for P2P chat and allows user to customize.

Figure 10. Customized Signaling Channel in Client SDK for P2P Chat
Figure 10. Customized Signaling Channel in Client SDK for P2P Chat

Hardware Media Processing Acceleration

On Android platforms, VP8/H.264 decoding/encoding hardware acceleration is enabled if the underlying platform includes corresponding MediaCodec plugins. For Windows, H.264 decoding/encoding and VP8 decoding hardware acceleration is enabled with DXVA-based HMFT or Intel Media SDK. For iOS, H.264 encoding/decoding is hardware-accelerated through Video Toolbox framework. Table 2 below shows hardware acceleration for WebRTC on different platforms.

Table 2.Hardware Media Acceleration Status for Client SDKs

#Conditional support: only enabled if the platform level enables VP8 hardware codec

NAT Traversal

Interactive Connectivity Establishment (ICE) helps devices connect to each other in various complicated Network Address Translation (NAT) conditions. The client SDKs support Session Traversal Utilities for NAT (STUN) and Traversal Using Relay NAT (TURN) servers. Figure 11 and Figure 12 show how client SDKs perform NAT traversal through STUN or TURN servers.

Figure 11. NAT Traversal with STUN Server
Figure 11. NAT Traversal with STUN Server

Figure 12. NAT Traversal with TURN Server
Figure 12. NAT Traversal with TURN Server

Fine-Grained Media & Network Parameter Control

Client SDKs further allow you to choose the video or audio source and its resolution and frame rate, the preferred video codec, and maximum bandwidth for video/audio streams.

Real-Time Connection Status Retrieval

Client SDKs provide APIs to retrieve real-time network and audio/video quality conditions. You can reduce the resolution or switch to an audio only stream if the network quality is not good, or adjust audio levels if audio quality is poor. Table 3 lists connection status information supported by client SDKs.

Table 3. Connection Status Information supported by Client SDKs

Conclusion

Based on WebRTC technology, Intel® Collaboration Suite for WebRTC builds an end-to-end solution, allowing you to enhance your applications with Internet video communication capabilities. The acceleration from Intel’s media processing platforms on the client and server sides, such as the Intel® Visual Compute Accelerator, improves the client user experience as well as the server side cost-effectiveness.

Additional Information

For more information, please visit the following web pages:
 

Intel Visual Compute Accelerator:
http://www.intel.com/content/www/us/en/servers/media-and-graphics/visual-compute-accelerator.html
http://www.intel.com/visualcloud

Intel Collaboration Suite for WebRTC:
http://webrtc.intel.com
https://software.intel.com/en-us/forums/webrtc
https://software.intel.com/zh-cn/forums/webrtc

The Internet Engineering Task Force (IETF) Working Group:
http://tools.ietf.org/wg/rtcweb/

W3C WebRTC Working Group:
http://www.w3.org/2011/04/webrtc/

WebRTC Open Project:
http://www.webrtc.org

Acknowledgements (alphabetical)

Elmer Amaya, Jianjun Zhu, Jianlin Qiu, Kreig DuBose, Qi Zhang, Shala Arshi, Shantanu Gupta, Yuqiang Xian

About the Author

Lei Zhai is the engineering manager in the Intel Software and Solutions Group (SSG), Systems Technologies & Optimizations (STO), Client Software Optimization (CSO). His engineering team focuses on Intel® Collaboration Suite of WebRTC product development and its optimization on IA platforms.

Obtaining a High Confidence in Streaming Data Using Standard Deviation of the Streaming Data

$
0
0

Download PDF (490.39 KB)

When measuring boxes with the world-facing Intel® RealSense™ camera DS4, I discovered that I needed the ability to automate the capturing of the box image and size data. This would allow a camera mounted over a scale to auto-capture the image and then send a known accurate value back to the system. This type of automation enables the design of a kiosk where placing a box on the scale triggers the image capture and automates the image process, so the clerk doesn’t have to press a button to facilitate the transaction. With the weight and size data calculated, the data can be entered as measured into a mail system programmatically.

Thinking about how to automate the image capture when the images are composed of streaming data presents new problems. The idea presented in this code is to determine when the image has stabilized to a point where we have a high confidence in the data that is being seen. The basis used in this class is that we can use statistics to determine when we have a stable image and thus when to automatically capture the image. To do this, we use the standard deviation model.

The standard deviation model or bell curve is normally represented using this type of graph.


https://en.wikipedia.org/wiki/Standard_deviation

Another way to look at this data is to represent how much of the data is within a range of standard deviations from the mean.


https://en.wikipedia.org/wiki/68%E2%80%9395%E2%80%9399.7_rule

As shown in the graph, when you are within 1 standard deviation of the mean, you have a 68-percent confidence in the data. By using 2 standard deviations as the cutoff, you know you have a 95-percent confidence level in the accuracy of the data that is coming across the stream from the camera.

Now that I’ve proposed the idea of using the standard deviation as the method for auto-capturing an image, determining the standard deviation for a set of streaming data becomes another issue. Searching for solutions, I came upon a refresher from college days using Knuth. On page 232 of Donald Knuth's The Art of Computer Programming (volume 2, third edition) is a formula for determining variance around the standard deviation for streaming data. The code in this class implements this formula and is documented in the source code header files, exactly which variables are used for which parts of the formula in the code.

This line of formulas are described in this article:
http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance

With this class, I now have the ability to determine how many standard deviations away from the mean a data stream is. Here’s how I exposed this knowledge in the UI. In my sample, the Intel RealSense camera was used to measure a 3D box and then draw a bounding box around the item in the picture. When the standard deviation was less than 1, the box was drawn with a red border. When the standard deviation was less than 2, it was drawn as yellow, and anything greater than 2 was drawn as green. This UI implementation of the functionality allows immediate feedback to the applications’ user and helps them visualize the consistency of the data. Of course the stability is also viewed as the box lines drawn over the image also move about less and become more stable until the image freezes when captured.

With that preface, here’s the class just described implemented in C code.

For any questions or discussions, please email me
Dale Taylor      Intel Corp
dale.t.taylor@intel.com


This is the source code from the StreamingStats.h file.

/******************************************************************************
Copyright 2015, Intel Corporation All Rights Reserved.

The source code, information and material("Material") contained herein is owned
by Intel Corporation or its suppliers or licensors, and title to such Material
remains with Intel Corporation or its suppliers or licensors. The Material
contains proprietary information of Intel or its suppliers and licensors.The
Material is protected by worldwide copyright laws and treaty provisions. No
part of the Material may be used, copied, reproduced, modified, published,
uploaded, posted, transmitted, distributed or disclosed in any way without
Intel's prior express written permission. No license under any patent,
copyright or other intellectual property rights in the Material is granted to
or conferred upon you, either expressly, by implication, inducement, estoppel
or otherwise. Any license under such intellectual property rights must be
express and approved by Intel in writing.

Unless otherwise agreed by Intel in writing, you may not remove or alter this
notice or any other notice embedded in Materials by Intel or Intel's suppliers
or licensors in any way.
******************************************************************************/

//
// http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
// This code implements a method for determining Std Dev from streaming data.
// Based on Donald Knuth's Art of Computer Programming vol 2, 3rd edition,
// page 232
//
// The basic algorithm follows (to show what the variable names represent)
// X is the current data item
// Mc is the running Mean (Mean current), Mc-1 is the mean for the previous
// item, Sc is the running Sum (Sum current), Sc-1 is the sum for the previous
// item, c is the count or current item
// init M1 = X1 and S1 = 0 during the first pass and on reset
// Data is added to the cumulating values using this formula
// Mc = Mc-1 + (Xc - (Mc-1))/c
// Sc = Sc-1 + (Xc - (Mc-1))*(Xc - Mc)
// for 2<= c <= n the cth estimate of variance is s*2 = Sc/(c-1)
//


#include "math.h"

class StreamingStats {

private:
	unsigned int count = 0;
	unsigned int index = 0;
	double ss_Mean, ss_PrevMean, ss_Sum;
	double* ss_Data;
	unsigned int ss_Size = 1;
	// Internal functions defined here

public:
	StreamingStats(unsigned int windowSize);     // Constructor, defines window size
	~StreamingStats(void) { delete [] ss_Data; };	// destructor for data
	int		DataCount();			// return # items are in this data set
	int		DataReset();			// reset the data to empty state
	int		NewData(double x);		// add a data item
	double	Mean();				// return Mean of the current data
	double	Variance();				// return Variance of the current data
	double	StandardDeviation();		// return Std Deviation of the current data

};

Comments on the class and code defined in the H file.

Because the variable sized data structure (ss_Data) is defined via the new command in the constructor, a destructor was defined to assure that delete is called.



This is the source code from the StreamingStats.cpp file.

/******************************************************************************
Copyright 2015, Intel Corporation All Rights Reserved.

The source code, information and material("Material") contained herein is owned
by Intel Corporation or its suppliers or licensors, and title to such Material
remains with Intel Corporation or its suppliers or licensors. The Material
contains proprietary information of Intel or its suppliers and licensors.The
Material is protected by worldwide copyright laws and treaty provisions. No
part of the Material may be used, copied, reproduced, modified, published,
uploaded, posted, transmitted, distributed or disclosed in any way without
Intel's prior express written permission. No license under any patent,
copyright or other intellectual property rights in the Material is granted to
or conferred upon you, either expressly, by implication, inducement, estoppel
or otherwise. Any license under such intellectual property rights must be
express and approved by Intel in writing.

Unless otherwise agreed by Intel in writing, you may not remove or alter this
notice or any other notice embedded in Materials by Intel or Intel's suppliers
or licensors in any way.
******************************************************************************/

//
// http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
// This code implments a method for determining Std Dev from streaming data.
// Based on Donald Knuth's Art of Computer Programming vol 2, 3rd edition,
// page 232
//
// The basic algorithm follows (to show what the variable names represent)
// X is the current data item
// Mc is the running Mean (Mean current), Mc-1 is the mean for the previous
// item, Sc is the running Sum of Squares of Differences (Sum current),
// Sc-1 is the sum for the previous item, c is the count or current item.
// Init M1 = X1 and S1 = 0 during the first pass and on reset
// Data is added to the cumulating values using this formula
// Mc = Mc-1 + (Xc - (Mc-1))/c
// Sc = Sc-1 + (Xc - (Mc-1))*(Xc - Mc)
// for 2<= c <= n the cth estimate of variance is s*2 = Sc/(c-1)
//

#include "math.h"
#include "StreamingStats.h"

StreamingStats::StreamingStats(unsigned int windowSize)
{
	if (windowSize > 0)
		ss_Size = windowSize;
	ss_Data = new double[ss_Size];

	return;
}

// This is the only public routine, it returns the count of the # of items used
// to determine the current values in the object.
//
int StreamingStats::DataCount()
{
	return count;			// return the number of accumulated data items
}

int StreamingStats::DataReset()
{
//	ss_PrevMean = ss_Mean = 0.0;	// clear all data
//	ss_PrevSum = 0.0;
	count = 0;
	index = 0;
	return 0;				// start empty, no elements
}

// this routine adds new data to the streaming stats totals
// returns the # of items added to the data set
int StreamingStats::NewData(double x)
{
	ss_PrevMean = ss_Mean;
	if (count >= ss_Size) { // We're rolling the window
		// The oldest data point is the next point in a circular array
		index++;
		if (index >= ss_Size)
			index = 0;
		// Remove oldest data point from mean
		ss_Mean = ss_Mean - (ss_Data[index] - ss_Mean) / ss_Size;
		// Add new data point to mean
		ss_Mean = ss_Mean + (x - ss_PrevMean) / ss_Size;
		// Remove oldest data point from sum
		ss_Sum = ss_Sum - (ss_Data[index] - ss_PrevMean) *
			(ss_Data[index] - ss_Mean);
		// Add new data point to sum
		ss_Sum = ss_Sum + (x - ss_PrevMean) * (x - ss_Mean);
	}
	else { // We're still filling the window
		count++;

		if (count == 1) // initialize with the first data item only
		{
			ss_PrevMean = ss_Mean = x;
			ss_Sum = 0.0;
		}
		else // we are adding a data item, follow the formula
		{
			ss_Mean = ss_PrevMean + (x - ss_PrevMean) / count;
			ss_Sum = ss_Sum + (x - ss_PrevMean)*(x - ss_Mean);
		}
		index = count - 1;
	}

	// Store new data point - overwriting oldest in circular array
	ss_Data[index] = x;

	return count;
}

// if the count is positive, return the new mean
double StreamingStats::Mean()
{
	return (count > 0) ? ss_Mean : 0.0;
}

// if the count is 2 or more, return a variance, otherwise zero
double StreamingStats::Variance()
{
	return ((count > 1) ? ss_Sum / (count - 1) : 0.0);
}

// calc the StdDev based using sqrt of the variance (standard method)
double StreamingStats::StandardDeviation()
{
	return sqrt(Variance());
}

About the Author

Dale Taylor has worked for Intel since 1999 as a Software Engineer.  He is currently in Arizona and focused on Atom Enabling, helping our software partners use Intel’s latest chips and hardware.  He’s been a programmer for 25 years. Dale has a BS in Computer Science and a Business Management minor, and has always been a gadget freak. Dale is a pilot and spends time in the summer flying gliders over the Rocky Mountains. When not soaring he enjoys hiking, cycling, boating, and photography

Notices

Intel, the Intel logo, and Intel RealSense are trademarks of Intel Corporation in the U.S. and/or other countries.

*Other names and brands may be claimed as the property of others.

© 2016 Intel Corporation.


This sample source code is released under the Intel Sample Source Code License Agreement.

An API Journey for Velocity and Lower Cost

$
0
0

Download Document

Introduction

With digital business and market competition intensifying, many companies are shifting to APIs to expose business processes and data for easy and better integration, enabling business expansion, and driving lower cost and faster time to market (TTM) through reusable services and APIs. APIs are being deployed on an unprecedented scale in broad ecosystems for social, mobile, cloud, and analytical applications as well as for the Internet of Things (IOTs). This article shares what we has learned in adopting service oriented architecture (SOA) and APIs.

Our Approach to APIs

APIs are now used by most global businesses to interconnect and integrate with each other’s businesses, forming a trend called an API Economy. Since 2013, we has been promoting the adoption of API based application development practice along with our SOA effort. We standardized APIs, design patterns, and code libraries. We developed an API based application framework to help us deliver solutions and integrations faster and more flexible.

Prior we started our API journey, most of our applications were built as monolithic solutions with tight integrated architecture. A typical application was built as a closed system, with some modules compiled and linked together to run as a single process. With little ability for reuse, every capability or integration needed some new design and development effort. As we started our API journey, we created our API strategy, and developed standards, technical guidance, training, and an application framework based on the API and service concept, as illustrated in Figure 1.

We used the standards, guidance, and the API based application framework to guide our application architecture design and development. The framework helped teams shift their application development approach from building tightly coupled applications to building APIs and SOA service based applications.


Figure 1.API based application framework.

We also learned that we need to define the API taxonomy and metadata for APIs up front, see Figure 2. Having clearly defined taxonomy and metadata would allow us to communicate among business stakeholders and development teams, and help us analyze the business processes and data, and properly model the services and APIs. With a defined taxonomy and related metadata, we were able to break large business processes and data into smaller function pieces, and model them as APIs. We then implemented them as services, and exposed them as APIs.


Figure 2.The API taxonomy.

Next we knew it was important to model core APIs that could be used by many enterprise applications. To do this, we developed three categories of core APIs: Master Data APIs, Security APIs, and Utility APIs. These core APIs provided immediate value through broad reuse and shortened application development time. We have built more than 80 core APIs since developing the categories. As a result of developing the applications based on these reusable APIs, TTM has been greatly reduced. Figure 3 shows some sample of our core API usage.


Figure 3.Sample Core API usage.

Finally, we realized the importance of having an API management capability for registering, finding, and managing APIs. With the number of APIs increasing, we needed a better ability to control API calls, throttle the API call volume, track the call traffic, and measure and report the API calls. Of course, we also needed to properly secure the APIs. A good API management product would provide all these key API management capabilities. Figure 4 illustrates a conceptual diagram of API management and usage.

We developed the Enterprise API Registration Application (EARA) and the IT Developer Zone (ITDZ) Service Portal to provide some of the API management capabilities. API owners use the EARA to register their APIs. Application developers use the ITDZ Service Portal to search for, and request to use the APIs. As we continue on our API journey, we are deploying a vendor API Management tool to meet our growing needs.


Figure 4.API management and usage concept.

Lessons Learned

To embark on the API Economy journey, and start API based application development, we had to change our mindset from the traditional way of building applications to the API approach. Instead of thinking of an application as one big thing, we had to think of it as an assembly of smaller APIs and services. From there, we designed the capabilities as reusable APIs integrated together using a loosely coupled framework.

Once we built the component APIs, we designed applications as containers, and orchestrated the actions and responses of APIs to deliver the desired results for the business and customers. We also exposed selected APIs to external partners for business expansion and integration when necessary.

Changing the mindset or our development culture was much more difficult than simply planning the development process. We had chartered a dedicated program to drive our API adoption. We not only drove the architecture change, API management capability, and technology toolsets, but more importantly we also developed standards, guidance, and training courses, providing guidance and training to our development community to help anyone be successful on their API journey.

Summary

The API based application development and integration approach is useful for any company. APIs enable easier integration with customers and partners business applications, and support mobile applications better. The API approach helps reduce the cost of application development and integration through reuse, and gains the TTM advantage, which enables a business to get to market quicker. The approach is especially important for companies that want to expand business onto the Internet, and participate in the growing API Economy. We are actively using APIs in our business application integrations with internal and external partners. As our journey continues, we are beginning work on our Micro-service and Container strategy. Our API architecture framework, experience, and learnings can help other companies start their own API Economy journey.

References

Welcome to The API Economy:
http://www.forbes.com/sites/ciocentral/2012/08/29/welcome-to-the-api-economy/

API management definition:
http://searchcloudapplications.techtarget.com/definition/API-management

Top API management vendors:
http://www.informationweek.com/cloud/platform-as-a-service/forrester-names-top-api-management-vendors/d/d-id/1316520

About the Authors

Jian Wu has over 30 years of industry experience. His career spans cross the Factory Automation, Process Control, Supply Chain, Product Engineering and IT enterprise application architecture design, development and integrations. Currently, Jian is responsible for setting and governing Intel IT enterprise application integration capability strategy, standards, and technical guidance.

Mark Boucher is a 30 year veteran in large scale enterprise software. Currently Mark is a principal engineer setting the software architecture direction for sales and marketing organizations.

Don Meyers has over 30 years of industry experience leading the architecture, design, and deployment of enterprise applications, IT infrastructure, and collaboration systems. He has patent filings for various computer systems, including: Network, Data management, Collaboration, eMail, Search security, and Perceptual computing.

MAGIX takes Video Editing to a New Level by Providing HEVC to Broad Users

$
0
0

MAGIX's Video Pro X delivers Intel HEVC encoding through Intel® Media Server Studio

 

MagixWhile elite video pros have access to high-powered video production applications with bells and whistles available traditionally only to enterprise, MAGIX has taken a broader approach unveiling its latest version of Video Pro X (Figure 1), a video editing software that sets new standards for semi-professional video production for broader users. Optimized with Intel® Media Server Studio, MAGIX Video Pro X delivers Intel HEVC encoding to prosumers and semi-pros to help alleviate a bandwidth-constrained internet where millions of videos are shared and distributed.

Magix Video Pro X
Figure 1: MAGIX Video Pro X for semi-professional video production

 

Video takes up a massive―and growing―share of Internet traffic. And meeting consumers’ demands for higher online video quality pushes the envelope even more. 

One solution to make those demands more manageable is the HEVC standard (also known as H.265), which delivers huge gains in compression efficiency over H.264/AVC, currently the most commonly used standard. Depending on the testing scenarios, Intel's HEVC GPU-accelerated encoder can deliver 43% better compression for the same quality as 2-pass x264-medium.1, 2 While video is the largest and fastest growing category of Internet traffic, it also consumes much more internet bandwidth than other content formats. By using HEVC, massive gains in compression efficiency is innovative not only for major online video streaming providers, but also for general internet users and the growing audiences that create and share videos online every day.

With the 6th generation Intel® Core Processors launch in September last year, Intel's platforms allow both hardware-based decoding and encoding of HEVC capabilities. Since then, MAGIX worked closely to optimize its video production software with Intel Media Server Studio's Professional Edition for access to hardware acceleration and graphics processor capabilities, Intel's HEVC codec, and to use expert-grade performance and advanced visual quality analyzers.

MAGIX technical experts elaborated that thanks to the high quality of Intel's media software product and easy integration, MAGIX was able to incorporate this extremely efficient compression technology in Video Pro X, making it a premier, semi-pro video editing software with the competitive advantage of providing hardware-accelerated HEVC encoding. The production software is also a great tool for the import, editing and export of 4K/UHD videos.

"Through integrating Intel’s HEVC decoder and encoder that is a part of Intel® Media Server Studio Professional Edition, we put the power in our customer's hands to use the benefits of better compression rate of next gen codec that allows to deliver high quality video with less bandwidth," Sven Kardelke, MAGIX Chief Product Officer Video/Photo.

"We’re working with many industry leaders to help bring their solutions to the marketplace, and MAGIX Video Pro X is an innovative example of a new video editing software solution that supports HEVC. Optimized with Media Server Studio, it’s one of the newest, prosumer software products that’s available enabling individuals to create, edit, and share their own broadcast-ready 4K content online in compressed formats. This promises to have a huge effect on improving video viewer experiences via the internetwhere it is so bandwidth constrained today," said Jeff McVeigh, Intel Software and Services Group vice president and Visual Computing Products general manager.

All in all, it’s a great step forward in taking video editing to a new level.  

 


1Intel Media Server Studio HEVC Codec Scores Fast Transcoding Title

Innovative Media Solutions Showcase

$
0
0

New, Inventive Media Products Made Possible with Intel Media Software Tools

With Intel media software tools, media/video solutions providers can create inspiring, innovative new products that capitalize on next gen capabilities like HEVC, high-dynamic range (HDR) content delivery, video security solutions with smart analytics, and more. Check these out. Envision how your company can use Intel's advanced media tools to re-invent new solutions for the media and broadcasting industry.

    Mobile Viewpoint Live Reporting Ronde of Norg

    Mobile Viewpoint Delivers HEVC HDR Live Broadcasting

    Mobile Viewpoint recently announced a new bonding transmitter that delivers HEVC (H.265) HDR video running on the latest 6th generation Intel® processors, - and through using the Intel® Media Server Studio Professional Edition optimizes HEVC compression and quality. And for broadcast-quality video, Intel’s graphics-accelerated codec enabled Mobile Viewpoint to develop a hardware platform that combines low power hardware-accelerated encoding and transmission. The new HEVC enabled software will be used in Mobile Viewpoint's Wireless Multiplex Terminal (WMT) AGILE high-dynamic range (HDR) back of the camera solutions and in its 19-inch FLEX IO encoding and O2 decoding products. The results: fast, high-quality, video broadcasting on-the-go so the world can stay better informed of fast-changing news and events. Read more.

     

    Sharp all-around security camera

    Sharp's New Innovative Security Camera is built with Intel® Architecture & Media Software

    With digital surveillance and security concerns now an everyday part of life, SHARP unveiled a new omnidirectional wireless, intelligent, digital security surveillance camera to better meet these needs. Built with an Intel® Celeron® processor (N3160), SHARP 12 megapixel image sensors, and by utilizing the Intel® Media SDK for hardware accelerated encoding, the QG- B20C camera can capture video in 4Kx3K resolution, provide all-around views, and is armed with many intelligent automatic detection functions. Read more.

     

    Magix Video Pro XMAGIX takes Video Editing to a New Level by Providing HEVC to Broad Users

    While elite video pros have access to high-powered video production applications with bells and whistles available traditionally only to enterprise,MAGIX has taken a broader approach unveiling its latest version of Video Pro X, a video editing software that sets new standards for semi-professional video production to widespread users. Optimized with Intel Media Server Studio, MAGIX Video Pro X provides Intel HEVC encoding to prosumers and semi-pros to help alleviate a bandwidth-constrained internet where millions of videos are shared and distributed. Read more.

     

    Comprimato2

    New JPEG2000 Codec Now Native for Intel Media Server Studio

    Comprimato recently worked with Intel on providing the best video encoding technology as part of Intel Media Server Studio by providing a plug-in for the software, which delivers high quality, low latency JPEG2000 encoding. The result is a powerful encoding option available to Media Server Studio users so that they can transcode JPEG2000 contained in IMF, AS02 or MXF OP1a files to distribution formats like AVC/H.264 and HEVC/H.265, and enable software-defined processes of IP video streams in broadcast applications. By using Intel Media Server Studio to access hardware-acceleration and programmable graphics in Intel GPUs, encoding can run super fast. This is a vital benefit because fast media processing significantly reduces latency in the connection, which is particularly important in live broadcasting. Read more.

     

    SPB TV AG Showcases Innovative Mobile TV/On-demand Transcoder enabled by Intel

    Unveiled at Mobile World Congress (MWC) 2016, SPB TV AG showed its innovative single-platform product line at the event, which included a new SPB TV Astra transcoder powered by IntelSPB TV Astra is a professional solution for fast, high-quality processing of linear TV broadcast and on-demand video streams from a single head-end to any mobile, desktop or home device. The transcoder uses Intel® Core™ i7 processors with media accelerators and delivers high-density transcoding via Intel Media Server Studio. “We are delighted that our collaboration with Intel ensures faster and high quality transcoding, making our new product performance remarkable,” said CEO of SPB TV AG Kirill Filippov. Read more.

     

    SURF Communications collaborates with Intel for NFV & WebRTC all-inclusive platforms

    Also at MWC 2016, SURF Communication Solutions announced SURF ORION-HMP* and SURF MOTION-HMP*, the next building blocks of the SURF-HMP™ family. The new SURF-HMP architecture delivers fast, high-quality media acceleration - facilitating up to 4K video resolutions and ultra-high capacity HD voice and video processing - running on Intel® processors with integrated graphics, and optimized by Media Server Studio. SURF-HMP is flexibly architectured to meet the requirements of evolving and large-scale deployments, is driven-by a powerful processing engine that supports all major video and voice codecs and protocols in use, and delivers a multitude of applications such as transcoding, conferencing/mixing, MRF, playout, recording, messaging, video surveillance, encryption and more. Read more.

     


    More about Intel Media Software Tools

    Intel Media Server Studio - Provides an Intel® Media SDK, runtimes, graphics drivers, media/audio codecs, and advanced performance and quality analysis tools to help video solution providers deliver fast, high-density media transcoding.

    Intel Media SDK - A cross-platform API for developing client and media applications for Windows*. Achieve fast video plaback, encode, processing, media format conversion, and video conferencing. Accelerate RAW video and image processing. Get audio decode/encode support.

    Accelerating Media Processing: Which Media Software Tool do I use? English | Chinese

    Intel® Advisor XE 2016 Update 4 - What’s new

    $
    0
    0

    Intel® Advisor XE 2016 Update 4 - What’s new

     

    We’re pleased to announce new version of the Vectorization Assistant tool - Intel® Advisor XE 2016 Update 4.

    Below are highlights of the new functionality in Intel Advisor 2016 Update 4.

    Full support for all analysis types on the second generation Intel® Xeon Phi processor (code named Knights Landing)

    FLOPS and mask utilization

    Tech Preview feature! Accurate hardware independent FLOPS measurement tool. (AVX512 only) Mask aware . Unique capability to correlate FLOPS with performance data.

    Workflow

    Batch mode, which lets you to automate collecting multiple analysis types at once.  You can collect Survey and Trip Counts in single run – Advisor will run the application twice, but automatically, without user actions. For Memory Access Patterns (MAP) and Dependencies analyses, there are pre-defined auto-selection criterias, e.g. check Dependencies only for loops with “Assumed dependencies” issue.

    Improved MPI workflow allows you to create snapshots for MPI results, so you can collect data in CLI and transfer self-contained packed result to a workstation with GUI for analysis. We also fixed some GUI and CLI interoperability issues.

     

    Memory Access Patterns

    MAP analysis now detects Gather instruction usage, unveiling more complex access patterns. A SIMD loop with Gather instructions will work faster than scalar one, but slower, than SIMD loop without Gather operations.  If a loop has “Gather stride” category, check new “Details” tab in Refinement report for information about strides and mask shape for the gather operation. One of possible solutions is to inform compiler about your data access patterns via OpenMP 4.x options – for cases, when gather instructions are not necessary actually.

    For AVX512 MAP analysis also detects Gather/scatter instruction usage, these instructions allow more code to vectorize but you can obtain greater performance by avoiding these types of instructions.

    MAP report in enriched with Memory Footprint metric – distance in address ranges touched by given instruction. The value represents maximal footprint across all loop instances.

     

    Variable name is now reported for memory accesses, in addition to source line and assembly instruction. Therefore, you have more accuracy in determining the data structure of interest. Advisor can detect global, static, stack and heap-allocated variables.

    We added new recommendation to use SDLT for loops with an “Ineffective memory access” issue.

     

    Survey and Loop Analytics

    Loop Analytics tab has got trip counts and extended instruction mix, so you can see compute vs memory instruction distribution, scalar vs vector, ISA details, etc.

    We have improved usability of non-Executed code paths analysis, so that you can see ISA and traits in the virtual loops and sort and find AVX512 code paths more easily.

     

    Loops with vector intrinsics are exposed as vectorized in the Survey grid now.

     

    Get Intel Advisor and more information

    Visit the product site, where you can find videos and tutorials. .

    Remote Power Management of Intel® AMT Devices with InstantGo

    $
    0
    0

    Download Document

    Introduction

    InstantGo, also known as Connected Standby, creates a low OS power state that must be handled differently from how remote power management was handled in the past. This article provides information on how to support the InstantGo feature.

    How to support Remote Power Management of Intel® Active Management Technology (Intel® AMT) enabled devices with InstantGo

    InstantGo, formerly known as Windows* Connected Standby, is a Microsoft power-connectivity standard for Windows* 8 and 10. This hardware and software specification defines low power levels while maintaining network connectivity and allows for a quick startup (500 milliseconds). InstantGo replaces the s3 power state.

    To verify if a system supports InstantGo, type “powercfg /a” from a command prompt. If you have InstantGo you’ll see the Standby (Connected) as an option.

    Intel AMT and InstantGo

    Intel AMT added support for InstantGo in version 10.0, but the manufacturer must enable the feature.

    How are Intel AMT and "InstantGo" related? Intel AMT has to properly handle the various power states by communicating with the firmware, however in this case the OS, not the hardware, controls the low power state.

    Intel AMT and InstantGo prerequisites

    The only platforms fully compatible with InstantGo run Windows* 8.1 or newer with Intel AMT 10.0 or later. To remotely determine if a device OS is in a low power state, use the OSPowerSavingState method.

    One way of determining the Intel AMT version is to inspect the CIM_SoftwareIdentity.VersionString property as shown in the Get Core Version Use Case.

    Remote Verification of Device Power State

    In order to verify power state in the past we looked at the hardware power state using the CIM_AssociatedPowerManagementService.PowerState method. Now when a system is in a low OS power of InstantGo, the hardware power state will be return s0 (OS powered on) You now need to make an additional query for the OSPowerSavingState in order to determine if the OS is in FULL or low power mode.

    The Power Management Work Flow in Intel AMT

    Previous work flow for Power-On operations

    1. Query for Intel AMT Power State 
    2. If system is in s0 (power on), do nothing
    3. If system is in s3, s4 or s5, then issue a power on command using the Change Power State API

    Current recommendation to properly handle Intel AMT Devices with InstantGo

    1. Query for Intel AMT Power State 
    2. If system is in s3, s4 or s5 then issue a power on command using the Change Power State API
    3. If a system is in s0 (power on) then:
      • If Intel AMT version is 9.0 and below, do nothing
      • If Intel AMT version is 10.0 and above, query the OSPowerSavingState method
        1. If OSPowerSavingState is full power, do nothing
        2. If OSPowerSavingState is in a low power state, wake up the system to full power using RequestOSPowerSavingState method.

    There is also a sample PowerShell Script demonstrating this available for download. The script has 4 basic sections:

    1. Establishes the connection and identifies the Intel AMT Version
    2. Queries the Intel AMT device’s current power state (hardware) – Note: script assumes Intel AMT 10 and is in InstantGo low power mode
    3. Queries for the OS Power State
    4. Wakes up the Device

    For information on running PowerShell scripts with the Intel® vPro™ module please refer to the Intel AMT SDK and related Intel AMT Implementation and Reference Guide.

    Additional Resources

    Summary

    As more devices support InstantGo, integration of this technology with remote power management methodology will become critical. You want to avoid cases where devices may be detected in powered On (s0) state, when the system is actually running at a lower power state. Fortunately, supporting of InstantGo technology isn’t a difficult task, just a few additional steps to determine the actual power state.

    About the Author

    Joe Oster has been active at Intel around Intel® vPro™ technology and Intel AMT technology since 2006. He is passionate about technology and is a MSP/SMB technology advocate. When not working, he is a Dad and spends time working on his family farm or flying drones and RC Aircraft.


    Mapping an Intel® RealSense™ SDK Face Scan to a 3D Head Model

    $
    0
    0

    Download Code Samples

    Download PDF [1.08 MB]

    This code sample allows the user to scan their face using a front-facing Intel® RealSense™ camera, project it onto a customizable head mesh, and apply post-processing effects on it. It’s an extension of a previous code sample titled Applying Intel® RealSense™ SDK Face Scans to a 3D Mesh. This sample adds features and techniques to improve the quality of the final head result as well as provide a series of post mapping effects.

    The following features have been added in this code sample:

    • Parameterized head. The user can shape the head using a series of sliders. For example, the user can change the width of the head, ears, jaw, and so on.
    • Standardized mesh topology. The mapping algorithm can be applied to a head mesh with a standardized topology and can retain the context of each vertex after the mapping is complete. This paves the way for animating the head, improved blending, and post-mapping effects.
    • Color and shape post processing. Morph targets and additional color blending can be applied after the mapping stages are complete to customize the final result.
    • Hair. There are new hair models created to fit the base head model. A custom algorithm is used to adjust the hair geometry to conform to the user’s chosen head shape.

    The final results of this sample could be compelling to various applications. Developers could use these tools in their applications to allow their users to scan their face and then customize their character. It could also be used to generate character content for games. Additional morph targets could be added to increase the range of characters that can be created.


    Figure 1: From left to right: (a) the face mesh returned from the Intel® RealSense™ SDK’s scan module, (b) the scanned face mapped onto a head model, (c) the head-model geometry morphed with an ogre morph target, and (d) the morphed head colorized.


    Figure 2: Examples of effects created using the sample’s morphing and blending techniques.

    Using the Sample

    The sample includes two executables: ReleaseDX.exe, which supports face scanning, and ReleaseDX_NORS.exe, which only supports mapping of a previously scanned face. Both executables require the 64-bit Visual Studio* 2013 Runtime which can be downloaded here. ReleaseDX.exe requires the installation of Intel® RealSense™ SDK Runtime 2016 R1 (8.0.24.6528), which can be downloaded here or here.

    Running the ReleaseDX.exe will begin the face-scanning process. Once the head-positioning hints are satisfied, press the Begin Scan button, turn your head slowly from side to side, and then click End Scan. For best results remove hats or glasses, pull back your hair, and scan in a well-lit area.

    Once the scan is complete, or an existing scan has been loaded, the following categories in the UI can be used to customize the result:

    • Face scan. Adjust the yaw, pitch, roll, and z displacement to adjust the face scan’s orientation and position.
    • Head shaping. Use the provided sliders to change the shape of the head model. The idea is to build a head that matches the head of the person scanned. Adjustments to this shape can be made in the post-processing stage.
    • Blending. Use the color controls to choose two different colors that best match your skin color. The first color is the base color and the second color is for tone.
    • Post head shaping. Make any shape adjustments to the head that you want performed after the mapping process. In this stage you can do things like change your body mass index, turn yourself into an ogre, make your ears big, and so on.
    • Post blending. Select any color effects to apply on the entire head after the mapping is complete. These color effects won’t affect the lips or eyes. These effects will let you adjust or colorize the hue/saturation/luminance of the head.

    The debug category contains many options for visualizing different parts of the face-mapping pipeline.

    The sample allows exporting the resulting head and hair to an .OBJ file so that it can be loaded into other applications.


    Figure 3: A screenshot from the code sample showing a small subset of the options available for customizing the head.

    Art Assets

    The art assets used in this code sample are briefly described in this section and will be referenced throughout the article. All texture assets are authored to the UV coordinates of the base head model. All texture assets are used during the blending stage of the pipeline with the exception of the displacement control map, which is used in the geometry stage.

    • Base head mesh. Base head mesh on which the scanned face mesh will be applied.
    • Head landmark data. Landmarks on the base head mesh that coincide with the landmarks the Intel RealSense SDK provides with each face scan.
    • Displacement control map. Controls which vertices of the base head mesh are displaced by the generated face displacement map.
    • Color control map. Controls blending between the face color and the head color.
    • Feature map. Grayscale map that gives the head texture for the final generated diffuse map.
    • Skin map. Used in the post blending stage to prevent color effects from affecting the eyes and lips.
    • Color transfer map. Controls the blending between the two user-selected colors.
    • Landmark mesh. Used to shift head vertices to their corresponding locations on the face map projection.
    • Head morph targets. Collection of morph targets that can be applied to the base head shape both before and after the face is projected onto the head.
    • Hair models. Collection of hair models that the user can select between.

    Face-Mapping Pipeline

    The face-mapping pipeline can be separated into four stages.

    1. Displacement and color map. We render the face scan geometry to create a displacement and color map that we project onto the head in a later stage.
    2. Geometry. This stage morphs the positions and normals of the head mesh. The values from the displacement map are used to protrude the face scan’s shape from the head model.
    3. Blending. We blend the head and face color channels together and output a single color map that maps to the UVs of the head model.
    4. Hair geometry. We remap the hair vertex positions to account for changes in the head shape made during the geometry stage.

    Face Displacement and Color Map Stage

    During this stage, the scanned face mesh is rendered using an orthographic projection matrix generating a depth map and color map. The face scan landmark positions are projected onto these maps and will be used in the geometry stage to project onto the head mesh.


    Figure 4: Color and displacement maps created in the displacement map stage. The yellow dots are the landmarks provided by the Intel® RealSense™ SDK projected into the 2D map space.

    Geometry Stage

    This stage morphs the base head mesh shape and imprints the displacement map onto the head. This must be done while maintaining the context of each face vertex; a vertex on the tip of the nose of the head mesh will be moved so that when the face displaces the vertices, that vertex is still on the tip of the nose. Maintaining this context allows for rigging information on the base head mesh to persist on the resulting head mesh.

    Details on this process are outlined in the sections below. The high-level steps include the following:

    1. Project the base head mesh vertices onto the landmark mesh. This will associate each head vertex with a single triangle on the landmark mesh, barycentric coordinates, and an offset along that triangles normal.
    2. Morph the head vertices using morph targets.
    3. Compute a projection matrix that projects the 2D displacement/color maps onto the head mesh and use it to calculate the texture coordinates of each vertex.
    4. Morph the landmark mesh using the face landmark data.
    5. Use the projection data from step one to shift the head vertices based on the morphed landmark mesh.
    6. Displace the head vertices along the z-axis using the displacement map value.
    7. Apply post-processing morph targets

    Building a Parametric Head

    A wide range of head shapes are available for the morphing system. Each shape sculpts a subset of the head’s range (for example, one of the targets simply controls the width of the chin. Another, body-mass-index (BMI), changes almost the entire head shape.) Each artist-authored head shape contains the same number of vertices, which must match the corresponding vertices in the base head shape.


    Figure 5: Parametric head implemented with morph targets.

    The artist-authored head shapes are turned into morph targets by compiling a list of delta positions for each vertex. The delta position is the difference, or change, between the vertex of the base head mesh and its associated target shape vertex. Applying a morph target is done by adding the delta positions for each vertex multiplied by some scalar value. A scalar of zero has no effect, while a scalar of one applies the exact target shape. Scalars above one can exaggerate the target shape, and negative scalars can invert it producing interesting effects.

    The sample exposes some compound morph targets that allow a single slider to apply multiple morph targets. Some of the sliders apply a weight of zero to one, while others might allow values outside this range.

    These morphing techniques can be applied both before and after the face is mapped to the head mesh.

    Creating the Displacement and Color Map Projection Matrix

    The displacement/color map projection orthographically projects head model vertex positions into UV coordinates of the Displacement and Color maps that were previously created. For more details on this process see the documentation of the previous sample.

    Fitting-Face Geometry

    The previous sample required the base head mesh to have a relatively dense vertex grid for the facial area. It displaced these vertices to match the scanned mesh’s shape. However, it didn’t discern vertices by their function (for example, vertex in the corner of the mouth). In this version of the sample the base head mesh is less dense and the vertices are fitted to the face scan, preserving vertex functionality. For example, the vertices around the eyes move to where the scan’s eyes are.

    The Intel RealSense SDK provides a known set of face landmarks with the face scan. The authored base head mesh supplies matching, authored landmarks. A landmark mesh is used to map between faces. It has one vertex for each important landmark, forming relatively large triangles that subdivide the face. We identify where each base head mesh vertex projects onto the landmark mesh to compute its corresponding position on the scanned face.

    During this process, the head vertices are projected onto the landmark mesh, the landmark mesh is morphed based on the base head mesh and face landmark data, and the vertex positions are reprojected onto the head. Finally, the displacement map is applied to the z component of each facial area vertex to extrude the scanned face shape. The displacement control map ensures only the face vertices are shifted and that there is a smooth gradient between vertices that are and are not affected.

    The projection of vertices onto the landmark mesh is similar to the projection done in the hair-fitting stage.


    Figure 6: Face color map with landmarks visible (left) and authored base head mesh with head landmarks visible (right).


    Figure 7: The landmark mesh overlayed on top of the head mesh (left). Note that the inner vertices all line up over head landmarks. The landmark mesh morphed based on the face landmark information (right).


    Figure 8: The head with landmark mesh overlay after reprojecting vertex positions and displacing them (left). Notice how the lips have been shifted upward. The head after reprojecting, displacing, and applying the face color map (right).

    Blending Stage

    The blending stage blends together the face scan color data and the artist-authored head textures producing a final diffuse texture built for the head UV coordinates.

    The color transfer map is used to lerp between the two user-selected head colors. That resulting color is then multiplied by the head detail map to create a final color for the head. This color control map is then used to blend between that head color and the face color creating a smooth transition between.

    After the color is determined, we can optionally apply some post-processing blending effects. The sample supports a colorize and color adjust effect. The colorize effect extracts the luminosity of the final blended color and then applies a user-specified hue, saturation, and additional luminance to it. The color adjust is similar except it adjusts the existing hue, saturation, and luminance instead of overriding them. Both of these effects support two colors/adjustments that are controlled by the color control map. These effects use the skin map mask allowing the color of the eyes and lips to remain unchanged.

    All of this blending is done on the GPU. The shader can be found in Media/MyAssets/Shader/SculptFace.fx.


    Figure 9: The Color Transfer Map, which is used to blend between the two user selected colors (left), and the Feature Map, which adds texture to the head (right).


    Figure 10: The Color Control Map (left) controls blending between the head color and the face color. The blending process creates a color texture that maps to the head mesh’s UV coordinates (right).

    Hair Geometry Stage

    Hair information isn’t available from a face-only scan. Producing a complete head requires the application to provide hair. This sample includes only a few choices, with the intent being to demonstrate the technical capability and not to provide a complete solution. A full-featured application could include many choices and variations.

    The sample supports changing the shape of the head, so it also supports changing the shape of the hair to match. One possibility might be to create hair morph targets to match each head morph target. However, this would create an unreasonable amount of required art assets, so the sample instead programmatically adjusts the hair shape as the head shape changes.


    Figure 11: The base hair rendered on the base head mesh (left). Fitted morphed hair rendered on morphed head shapes (center and right).

    The hair fitting is accomplished by locating each of the hair vertices relative to the base head model, then moving the vertices to the same head relative positions on the final head shape. Specifically, each hair vertex is associated with a triangle on the base head mesh, barycentric coordinates, and an offset along the normal.

    An initial approach to mapping a hair vertex to a triangle on the base head mesh would be to iterate over each head mesh triangle, project the vertex onto the triangle’s plane using the triangle’s normal, and check whether the vertex lies within the triangle. This approach yields situations where a hair vertex might not map to any of the triangles. A better approach is to instead project each base head mesh triangle forward along each of the vertices’ normals until it intersects the hair vertex. In the event that a hair vertex can map to multiple head mesh triangles (since the head mesh is non-convex), the triangle closest to hair vertex along that triangle’s normal is chosen.


    Figure 12: A simplified head mesh and hair vertex.


    Figure 13: A single triangle is extruded along the vertex normals until it contains the hair vertex.


    Figure 14: The new hair vertex position is calculated for the new head mesh shape. It’s located by the same barycentric coordinates, but relative to the triangle’s new vertex positions and normals.


    Figure 15: Projecting the triangle onto the vertex - the math.

    Figure 15 shows the vectors used to project the triangle onto the vertex, as seen from four different views.

    • Yellow triangle is the original head triangle.
    • Gray triangle is the yellow triangle projected onto the hair vertex.
    • Blue lines represent the vertex normals (not normalized).
    • Pink line is from the hair vertex to a point on the yellow triangle (note that one of the vertices serves as a point on the plane).
    • Green line shows the shortest distance from the vertex to the plane containing the triangle, along the triangle’s normal. The projected triangle’s vertex positions are computed by first computing the closest distance, d, from the hair vertex to the plane containing the triangle.

    Nt = Triangle normal
    Ns = Surface normal (that is, interpolated from vertex normals)
    Vh = Hair vertex
    P = Position on the plane (that is, one of the yellow triangle’s vertices)
    Px= Intersection position. Position on the triangle intersected by the line from the hair vertex, along the surface normal
    at = 2x the projected triangle area
    d = Closest distance from the hair vertex to the plane containing the head triangle
    l = distance from the hair vertex to the intersection point

    The closest distance from the hair vertex to the plane containing the triangle gives the distance to project the triangle.

    d = Nt(Vh-P)

    The position of each projected vertex v’i where Ni is normal for Vi , is

    Vi =Vi + dNi/(NtNi )

    The barycentric coordinates of the hair vertex, relative to the projected triangle are a function of the total triangle area. Twice the triangle’s area is computed with a cross product.

    at = |(V1 - V0) x (V2-V0)|

    The three barycentric coordinates a, b, and c are then

    a = |(V2 - V1) x (Vh-V1)|/at

    b = |(V0 - V2) x (Vh-V2)|/at

    c = |(V1 - V0) x (Vh-V0)|/at

    The vertex lies inside the triangle if a, b, and c are greater than zero, and the sum of a, b, c are less than one. Note that this vertex lies on the surface normal line. The barycentric coordinates give the surface normal, interpolated from the three vertex normals. They similarly give the point on the triangle that also lies along that line (that is, the intersection point).

    Ns = aN0 + bN1 + cN2

    The intersection point is

    Px = aV0 + bV1 + cV2

    The distance (l) is stored from the hair vertex to Px. After the head is deformed, the hair vertex is moved this distance away from the triangle on the new head shape.

    l = |(Px - Vh)|

    The process is reversed to determine the hair vertex’s position relative to the head’s new shape. The intersection’s barycentric coordinates (previously computed relative to the base head mesh) are used to compute the hair vertex’s position and normal on the new head.

    N’s = aN0 + bN1 + cN2

    Px = aV0 + bV1 + cV2

    The new hair vertex position is then

    Vh = Px + lNs

    Note that this approach just moves the vertices. It doesn’t check for other intersections, and so on. It produces excellent results in practice, but it does have limits. For example, an extreme head shape can poke a head vertex/triangle(s) through the hair.

    Also note that the hair vertices are allowed to be on either side of the head triangle. This supports the case where an artist pushes some hair vertices inside the head. This distance is clamped to minimize the chances of associated a vertex with a triangle on the other side of the head.

    Acknowledgements

    Assets for the base head mesh, morph targets, textures, and some of the hair models were created by 3DModelForge (http://3dmodelforge.com).

    Additional hair was created by Liquid Development (http://www.liquiddevelopment.com/).

    Doug McNabb, Jeff Williams, Dave Bookout, and Chris Kirkpatrick provided additional help for the sample.

    How to Analyze Intel® Media SDK-optimized applications with Intel® Graphics Performance Analyzers

    $
    0
    0

    When developing a media application, you often wonder, “Am I getting the performance I should be? Am I using fixed function logic or my EU array?” This article will show how to set up Intel® Graphics Performance Analyzers (Intel® GPA) to analyze the real time performance of your Intel® Media SDK-optimized application. 

    First, let’s start with Intel® GPA. Intel GPA is a very useful tool for identifying media pipeline inefficiencies and targeting application optimization opportunities. Second, the Intel® Media SDK is software development library that exposes the hardware media acceleration capabilities of Intel® platforms for decoding, encoding and video processing (see hardware requirements for applicable processors). To get started, we’ll use Intel GPA to analyze some of the Intel® Media SDK sample use cases as examples. Please refer to each sample description for details.

    For this article, you will need both Intel GPA and the Intel Media SDK. You can get the free downloads of Intel® GPA and either Intel Media SDK (for clients) or Intel® Media Server Studio Community Edition (where Intel Media SDK is a component).

    Throughout this article, we use Intel® GPA 2016 R1 with the latest Intel Media SDK 2016. Note that future revisions for Intel GPA and Intel Media SDK will introduce new features that may deviate from some parts of this article.

    Setting up Intel GPA

    Run the Intel GPA installer to install tools. With Intel GPA you can get in-depth traces of media workloads, capturing for instance operations such as fixed function hardware processing (denoted as MFX from now on) or execution unit (EU) processing. Intel GPA also has the capability to display real-time Intel GPU load via the Media Performance dialog.

    The Intel GPA tool is started by launching the “Intel® Graphics Monitor” application from the Start window or from the task bar. Right-clicking the task bar icon brings up a menu with options.

    For the purpose of media workload analysis, you will create a media analysis profile in the Intel GPA profiles menu:

    1)Select the “Profiles” menu item -> click the “HUD Metrics” tab. Clear existing metrics in “Metrics to display in the HUD”, by selecting all the metrics and click on “Remove”. Now select following from “Available metrics” from Media and click “Add” to add to “Metrics to display in the HUD”.

    • MFX Decode Usage
    • GPU Overall Usage
    • MFX Engine Usage
    • MFX Encode Usage
    Choose a group name and click on “Add Group” to save your settings.  The profile should look just like the one below.  Click Apply so save changes.

    2) Select the “Preferences” menu item. Make sure the following check boxes are de-selected:

    i.Auto-detect launched applications

    ii.Disable Tracing

    1. Press “OK” to exit the "Preferences" dialog window

    Analyzing the Application

    Intel GPA uses an injection method to collect metrics during the runtime of the application. Injection takes place at application start time and for this tutorial, will be launched from the analyze application menu within GPA.

    1. To capture detailed workload metrics select “Analyze Application” from the Intel GPA from the task bar menu.
    2. Enter the executable path, name and arguments in the “Command line” dialog item. Make sure that “Working Folder” is set correctly and no spaces in path. 

    3) To start capturing metrics for the specified workload press the “Run” button.

    Real-time graphs can also be enabled by pressing “Ctrl+F1” during rendering to view real-time metrics.  Note: Metrics can be changed during runtime from within the profiles menu from the setup step.

    Before analyzing any of the other Intel Media SDK samples workloads, let’s examine what information is presented in metrics captured by Intel® GPA, why they are important, and what are possible next steps for improvement. Each metric is displayed in real time as a percentage value over time. The red number is the minimum value, the green number is the maximum value, and the white number is the current value. 

    Below is a table of metric descriptions for each media metric offered within Intel GPA. For a full list of Media Metrics supported in GPA, please refer to our documentation

    Metric Name

    Metric Description

    MFX Engine usage

    Represents percentage of the time the Multi-Format Codec (MFX) engine is active

    MFX Decode Usage

    Represents percentage of time that MFX-based decode operations are active

    GPU Overall Usage

    Represents percentage of time either execution unit (EU) and MFX (Media Fixed Function) is active

    EU Engine Usage

    Represents percentage of time the Execution Unit (EU) engine is active

    MFX Encode Usage

    Represents percentage of time that MFX-based encode operations are active

     

    Interpreting the Data

    If you are observing GPU overall usage is high, then check IOPattern in your application, as mismatched IOPattern in MediaSDK implementation consumes large amounts of extra buffers and internal copies and its is recommended to avoid such scenarios. Refer to technical articles here, which explain common Media SDK use-case scenario settings and how to further optimize your media pipelines to achieve best performance.   

    Questions, Comments, Feature requests regarding analyzing your media application? Connect with other developers and Intel Technical Experts on the Intel Media SDK forum and Intel GPA forum!

    Signaling the Future with Intel® RealSense™ and GestureWorks Fusion*

    $
    0
    0

    Signs of the Times: Gesture Control Evolves

    The not-so-humble mouse has been around commercially for over 30 years. While it may seem hard to imagine a world without it and the trusty keyboard, our style and models for interacting with computer systems are evolving. We no longer want to be tethered to our devices when collaborating in a shared workspace, for example, or simply when sitting on the couch watching a movie. We want the freedom to control our systems and applications using a more accommodating, intuitive mode of expression. Fortunately, consumer-grade personal computers with the necessary resources and capabilities are now widely available to realize this vision.

    Gesture control has found a natural home in gaming, with Intel® RealSense™ technology at the forefront of these innovations. It was only a matter of time before developers looked for a way to integrate gesture control with a desktop metaphor, complementing the familiar keyboard and mouse with an advanced system of gesture and voice commands. Imagine the possibilities. You could start or stop a movie just by saying so, and pause and rewind with a simple set of gestures. Or you could manipulate a complex 3D computer aided design (CAD) object on a wall-mounted screen directly using your hands, passing the item to a colleague for their input.

    That’s the vision of Ideum, a Corrales, New Mexico-based company that creates state-of-the-art user interaction systems. The company got its start over 15 years ago designing and implementing multi-touch tables, kiosks, and touch wall products. Its installations can be found in leading institutions such as Chicago’s Field Museum of Natural History, the Smithsonian National Museum of the American Indian, and the San Francisco Museum of Modern Art. To develop its latest initiative, GestureWorks Fusion*, Ideum turned to Intel RealSense technology.

    With GestureWorks Fusion, Ideum aims to bring the convenience and simplicity of voice- and gesture-control to a range of desktop applications, beginning with streaming media. The challenges and opportunities Ideum encountered highlight issues that are likely to be common to developers looking to blaze a new trail in Human Computer Interaction (HCI).

    This case study introduces GestureWorks Fusion and describes how the application uses advanced multi-modal input to create a powerful and intuitive system capable of interpreting voice and gesture commands. The study illustrates how the Ideum team used the Intel® RealSense™ SDK and highlights the innovative Cursor Mode capability that allows developers to quickly and easily interact with legacy applications designed for the keyboard and mouse. The article also outlines some of the challenges the designers and developers faced and provides an overview of how Ideum addressed the issues using a combination of Intel- and Ideum-developed technologies.

    Introducing GestureWorks Fusion*

    GestureWorks Fusion is an application that works with an Intel® RealSense™ camera (SR300) to capture multi-modal input, such as gestures and voice controls. The initial version of the software allows users to intuitively and naturally interact with streaming media web sites such as YouTube*. Using familiar graphical user interface (GUI) controls, users can play, pause, rewind, and scrub through media—all without touching a mouse, keyboard, or screen. Direct user feedback makes the system easy to use and understand.

    GestureWorks Fusion* makes it fun and easy to enjoy streaming video web sites, such as YouTube*, using intuitive voice and gesture commands on systems equipped with an Intel® RealSense™ camera SR300.
    GestureWorks Fusion* makes it fun and easy to enjoy streaming video web sites, such as YouTube*, using intuitive voice and gesture commands on systems equipped with an Intel® RealSense™ camera (SR300).

    The Intel RealSense camera SR300 follows on from the Intel RealSense camera (F200), which was one of the world’s first and smallest integrated 3D depth and 2D camera modules. Like the Intel RealSense camera (F200), the Intel RealSense camera (SR300) features a 1080p HD camera with enhanced 3D- and 2D-imaging, and improvements in the effective usable range. Combined with a microphone, the camera is ideal for both head- and hand-tracking, as well as for facial recognition. “What’s really compelling is that the Intel RealSense camera (SR300) can do all this simultaneously, very quickly, and extremely reliably,” explained Paul Lacey, chief technical officer at Ideum and director of the team responsible for the development of GestureWorks.

    GestureWorks Fusion builds on the technology and experience of two existing Ideum products: GestureWorks Core and GestureWorks Gameplay 3. GestureWorks Gameplay 3 is a Microsoft Windows* application that provides touch controls for popular PC games. Gamers can create their own touch controls, share them with others, or download controls created by the community.

    GestureWorks Core, meanwhile, is a multi-modal interaction engine that performs full 3D head- and hand-motion gesture analysis, and offers multi-touch and voice interaction. The GestureWorks Core SDK features over 300 prebuilt gestures and supports the most common programming languages, including C++, C#, Java*, and Python*.

    GestureWorks Fusion was initially designed to work with Google Chrome* and Microsoft Internet Explorer* browsers, running on Microsoft Windows 10. However, Ideum envisions GestureWorks Fusion working with any system equipped with an Intel RealSense camera. The company also plans to expand the system to work with a range of additional applications, such as games, productivity tools, and presentation software.

    Facing the Challenges

    Ideum faced a number of challenges in making GestureWorks Fusion intuitive and easy-to-use, especially for new users receiving minimal guidance. Based on its experiences developing multi-touch tables and touch wall systems for public institutions, the company knew that users can become frustrated when things don’t work as expected. This knowledge persuaded the designers to keep the set of possible input gestures as simple as possible, focusing on the most familiar behaviors.

    GestureWorks* Fusion features a simple set of gestures that map directly to the application user interface, offering touchless access to popular existing applications.
    GestureWorks* Fusion features a simple set of gestures that map directly to the application user interface, offering touchless access to popular existing applications.

    Operating system and browser limitations presented the next set of challenges. Current web browsers, in particular, are not optimized for multi-modal input. This can make it difficult to identify the user’s focus, for instance, which is the location on the screen where the user intends to act. It also disrupts fluidity of movement between different segments of the interface, and even from one web site to another. At the same time, Ideum realized that it couldn’t simply abandon scrolling and clicking, which are deeply ingrained in the desktop metaphor and are at the core of practically all modern applications.

    Further, an intuitive ability to engage and disengage gesture modality is critical for this type of interface. Unlike a person’s deeply-intuitive sense of when a gesture is relevant, an application needs context and guidance. In GestureWorks Fusion, raising a hand into the camera’s view enables the gesture interface. Similarly, dropping a hand from view causes the gesture interface to disappear, much like a mouse hover presents additional information to users.

    The nature of multi-modal input itself presented its own set of programming issues that influenced the way Ideum architected and implemented the software. For example, Ideum offers a voice command for every gesture, which can present potential conflicts. “Multi-modal input has to be carefully crafted to ensure success,” explained Lacey.

    A factor that proved equally important was response time, which needed to be in line with standards already defined for mice and keyboards (otherwise, a huge burden is placed on the user to constantly correct interactions). This means that response times need to be less than 30 milliseconds, ideally approaching something closer to 6 milliseconds—a number that Lacey described as the “Holy Grail of Human Computer Interaction.”

    Finally, Ideum faced the question of customization. For GestureWorks Fusion, the company chose to perform much of this implicitly, behind the scenes. “The system automatically adapts and makes changes, subtly improving the user experience as people use the product,” explained Lacey.

    Using the Intel® RealSense™ SDK

    Developers can access the Intel RealSense camera (SR300) features using the Intel RealSense SDK, which offers a standardized interface to a rich library of pattern detection and recognition algorithms. These cover several helpful functions, including face recognition, gesture and speech recognition, and text-to-speech processing.

    The system is divided into a set of modules to help developers focus on different aspects of the interaction. Certain components, such as the SenseManager interface, coordinate common functions including hand- and face-tracking and operate by orchestrating a multi-modal pipeline controlling I/O and processing. Other elements, such as the Capture and Image interfaces, enable developers to keep track of camera operations and to access captured images. Similarly, interfaces such as HandModule, FaceModule, and AudioSource offer access to hand- and face-tracking, and to audio input, respectively.

    The Intel RealSense SDK encourages seamless integration by supporting multiple coding styles and methodologies. It does this by providing wrappers for several popular languages, frameworks, and game engines—such as C++, C#, Unity*, Processing, and Java. The Intel RealSense SDK also offers limited support for browser applications using JavaScript*. The Intel RealSense SDK aims to lower the barrier to performing advanced HCI, allowing developers to shift their attention from coding pattern recognition algorithms to using the library to develop leading-edge experiences.

    “Intel has done a great job in lowering the cost of development,” noted Lacey. “By shouldering much of the burden of guaranteeing inputs and performing gesture recognition, they have made the job a lot easier for developers, allowing them to take on new HCI projects with confidence.”

    Crafting the Solution

    Ideum adopted a number of innovative tactics when developing GestureWorks Fusion. Consider the issue of determining the user’s focus. Ideum approached the issue using an ingenious new feature called Cursor Mode, introduced in the Intel RealSense SDK 2016 R1 for Windows. Cursor Mode provides a fast and accurate way to track a single point that represents the general position of a hand. This enables the system to effortlessly support a small set of gestures such as clicking, opening and closing a hand, and circling in either direction. In effect, Cursor Mode solves the user-focus issue by having the system interpret gesture input much as it would the input from a mouse.

    Using the ingenious Cursor Mode available in the Intel® RealSense™ SDK, developers can easily simulate common desktop actions such as clicking a mouse.
    Using the ingenious Cursor Mode available in the Intel® RealSense™ SDK, developers can easily simulate common desktop actions such as clicking a mouse.

    Using these gestures, users can then accurately navigate or control an application “in-air” without having to touch a keyboard, mouse, or screen, while providing the same degree of confidence and precision. Cursor Mode helps in other ways as well. “One of the things we discovered is that not everyone gestures in exactly the same way,” said Lacey. Cursor Mode helps by mapping similar gestures to the same context, improving overall reliability.

    Lacey also highlighted the ease with which Ideum was able to integrate Cursor Mode into existing prototypes, permitting developers to get new versions of GestureWorks Fusion up and running in a matter of hours, with just a few lines of code. For instance, GestureWorks uses Cursor Mode to get the cursor image coordinates and then synthesize mouse events, as shown in the following:

    // Get the cursor image coordinates
    PXCMPoint3DF32 position = HandModule.cursor.QueryCursorPointImage();
    
    // Synthesize a mouse movement
    mouse_event (
       0x0001,
                                   // MOUSEEVENTF_MOVE
       (uint)(position.x previousPosition.x),     // dx
       (uint)(position.y previousPosition.y),     // dy
       0,
                               // dwData flags empty
       0                                          // dwExtraInfo flags empty
    };
    
    ...

    
    // Import for calls to unmanaged WIN32 API
    [DllImport("user32.dll", CharSet = CharSet.Auto,
       CallingConvention = CallingConvention.StdCall)]
    
    public static extern void mouse_event(uint dwFlags, uint dx, uint dy,
       uint cButtons, int dwExtraInfo);

    Following this, GestureWorks is able to quickly determine which window has focus using the standard Windows API.

    // Get the handle of the window with focus
    
    IntPtr activeWindow = GetForegroundWindow();
    
    // Create a WINDOWINFO structure object

    WINDOWINFO info = new WINDOWINFO(); GetWindowInfo(activeWindow, ref info);
    
    // Get the actiive window text to compare with pre-configured controllers
    StringBuilder builder = new StringBuilder(256);
    GetWindowText(activeWindow, builder, 256);

    ...
    
    // Import for calls to unmanaged WIN32 API
    [DllImport("user32.dll")]

    static extern IntPtr GetForegroundWindow();
    
    [DllImport("user32.dll")]

    static extern int GetWindowText(IntPtr hWnd, StringBuilder builder,
       int count);

    Cursor Mode tracks twice as fast as full hand-tracking, while using about half the power. “A great user experience is about generating expected results in a very predictable way,” explained Lacey. “When you have a very high level of gesture confidence, it enables you to focus and fine-tune other areas of the experience, lowering development costs and letting you do more with less resources.”

    To support multi-modal input, GestureWorks leverages the Microsoft Speech Application Programming Interface (SAPI) using features that include partial hypothesis, which are unavailable in the Intel RealSense SDK. This allows a voice command to accompany every gesture, as shown in the following code segment:

    IspRecognizer* recognizer;
    ISpRecoContext* context;
    
    // Initialize SAPI and set the grammar

    ...

    
    // Create the recognition context
    recognizer>CreateRecoContext(&context);
    
    // Create flags for the hypothesis and recognition events
    ULONGLONG recognition_event = SPFEI(SPEI_RECOGNITION) |
       SPFEI(SPEI_HYPOTHESIS);
    
    // Inform SAPI about the events to which we want to subscribe context>SetInterest(recognition_event, recognition_event);
    
    // Begin voice recognition
    <recognition code …>

    Ideum also found itself turning to parallelization to help determine a user’s intent, allowing interactions and feedback to occur near-simultaneously at rates of 60 frames per second. “The linchpin for keeping response times low has been our ability to effectively use  multi-threading capabilities,” said Lacey. “That has given us the confidence to really push the envelope, to do things that we weren’t entirely sure were even possible while maintaining low levels of latency.”

    Ideum also strove to more completely describe and formalize gesture-based interactions by developing an advanced XML configuration script called Gesture Markup Language (GML). Using GML, the company has created a comprehensive library of gestures that developers can use to solve HCI problems. This has helped Ideum manage and control the inherent complexity of gesture recognition, since the range of inputs from motion tracking and multi-touch can potentially result in thousands of variations.

    “The impact of multi-modal interactions together with the Intel RealSense camera can be summed up in a single word: context,” noted Lacey. “It allows us to discern a new level of context that dramatically opens new realms for HCI.”

    Moving Forward

    Ideum plans to extend GestureWorks Fusion, adding support for additional applications—including productivity software, graphic packages, and computer-aided design using 3D motion gestures to manipulate virtual objects. Lacey can also imagine GestureWorks appearing in Intel RealSense technology-equipped tablets, home media systems, and possibly even in automobiles, as well as in conjunction with other technologies—applications that are far beyond traditional desktop and laptop devices.

    More expansive and immersive environments are similarly on the horizon, including virtual, augmented, and mixed-reality systems. This also applies to Internet of Things (IoT) technology, where new models of interaction will encourage users to create their own unique spaces and blended experiences.

    “Our work on GestureWorks Fusion has begun to uncover new ways to interact in novel environments,” Lacey explained. “But whatever the setting, you should simply be able to gesture or talk to a gadget, and make very deliberate selections, without having to operate the device like a traditional computer.”

    Resources

    Visit the Intel Developer Zone to get started with Intel RealSense technology.

    Learn more about Ideum, developer of GestureWorks.

    Download the Intel® RealSense™ SDK at https://software.intel.com/en-us/intel-realsense-sdk .

    Tips & Tricks to Heterogenous Programming with OpenCL* SDK & Intel® Media SDK - June 16 Webinar

    $
    0
    0

    Register Now    10 a.m., Pacific time

    Intel® Processor Graphics contain two types of media accelerators: fixed function codec/frame processing and execution units (EUs), used for general purpose compute. In this 1-hour webinar on June 16, learn how to more fully utilize these media accelerators by combining the Intel® Media SDK and Intel® SDK for OpenCL™ Applications for many tasks, including:

    • Applying video effects and filters
    • Accelerating computer vision pipelines
    • Improving encode/transcode quality

    These two tools, both part of Intel® Media Server Studio, are better when used together. With just a few tips, tricks, and sharing APIs you can unlock the full heterogeneous potential of your hardware to create high performance custom pipelines. Then differentiate your media applications and solutions by combining fixed function operations with your own algorithms, to achieve disruptive performance beyond the standard Media SDK capabilities with the secret element that makes your products competitive and unique.

    In this session you will learn:

    • Big performance boosts are possible with Intel graphics processors (GPUs)
    • How to build media/graphics processing pipelines containing standard components, and customize with your algorithms and solutions
    • A short list of steps to share video surfaces efficiently between the Media SDK and OpenCL
    • How to combine Intel Media SDK and OpenCL to do many useful things utilizing Gen Graphics' rapidly increasing capabilities
    • And more

    Sign up today

    Webinar Speakers

    • Jeff McAllister– Media Software Technical Consulting Engineer
    • Robert Ioffe - Technical Consulting Engineer & OpenCL* Expert

     

     

     

    Using ACUWizard for Self-discovery of Configuration Paths

    $
    0
    0

    What is the ACUWizard Tool

    The ACUWizard is a recognized tool that is used to enable and configure an Intel® Active Management Technology (Intel® AMT) capable device. The tool is included as part of the Intel® Setup and Configuration Software (Intel® SCS) download. While the tool comes with documentation, it may not be clear to IT professionals when specific options should be used or what benefits or drawbacks are associated with those options.

    There are three main reasons to use the ACUWizard:

    • You need to configure an Intel AMT device that does not have a Management Console that supports a configuration of any type.
    • The console does not support remote configuration into Admin Control Mode meaning that you will need to implement the USB configuration option.
    • You need to perform self-discovery of the configuration process.

    The next sections describe the following:

    • OS-based configuration versus USB key-based configuration
    • Steps for using ACU Wizard to configure an Intel AMT Client via the OS-based method
    • Steps for using ACU Wizard to configure an Intel AMT Client via the USB key-based method

    Configuration Methods Using the ACUWizard: OS-based method versus USB key-based method

    Configuration can be performed from within the OS or via a USB key. For the OS-Based Configuration Microsoft Windows* 7 or higher and the LMS service is required and will provision the system into Client Control mode (CCM).

    • Single system configuration. This method is easy to do and can range from a simple configuration to more advanced configurations. This is easy to replicate but time consuming if you need to configure many Intel AMT Clients.
    • Multiple system configuration. This method is scriptable via the command line and is a popular option in environments containing many Intel AMT Clients.

    The USB key-based configuration method is designed to use a USB key to push the configuration profile into the Intel® MEBX during a reboot. It is potentially much quicker than an OS-based configuration and has the added capability of configuring the device into Admin Control Mode (ACM). The USB configuration is not supported on Intel AMT 10+ LAN-less devices.

    The USB configuration requires a setup.bin file. There are two tools for creating setup.bin. The first tool uses the acuwizard.exe, and the second tool uses acuconfig.exe. ACUConfig is a command-line tool and is somewhat cumbersome, so I won't be going into detail about it in this article.

    • Single use system configuration key. A key is generated specifically for a client and can be used only once. This type of profile is necessary only if the OS has a static IP, but DHCP enabled can be supported as well.
    • Multi-use system configuration key. A single configuration file is created to configure multiple devices. But the systems will have the same password, and the key assumes the device is DHCP-enabled. If a static OS client is configured in this manner, the system will in effect have two IP addresses.

    A quick note on passwords: There are three basic passwords used with configurable Intel AMT devices:

    • MEBx password. This is your physical access password into the Intel® Management Engine BIOS Extension (Intel® MEBX). By default the USB configuration will set this to be the same as the Intel AMT password. The password rule for this is max 32 characters and complex. The default password is admin.
    • Intel AMT password. This is the remote management password and is set using all versions of the configuration discussed in this blog. The password rule for this is max 32 characters and complex.
    • RFB5900. This is not required, however it is important to note if the plan is to use a standard VNC viewer to make a local connection with Intel AMT KVM, the RFB password must be set. The password rule is exactly eight characters and complex.

    Steps for using the ACU Wizard to configure an Intel® Active Management Technology client via the OS-based method

    Single-System Configuration

    Perform an OS-based configuration by launching ACUWizard as Admin. Once it’s launched follow these steps:

    1. Create the profile by opening the ACUWizard, and then selecting Configure/Unconfigure this System.
      Configuration Methods
      Figure 1.Configuration Methods
       
    2. Select Configure via Windows.
    3. Select Next.
    4. In the Intel® AMT Configuration Utility – select Configure via Windows and do the following:
      • In Current Password, type a password. This is the password for the Intel® MEBX, if the password has not been changed, the default password is admin.
      • Fill in New Password and Confirm Password.
        Example of Configure via Windows
        Figure 2.Example of Configure via Windows
         
      • Select Override Default Settings, and then click Network Settings.
        • If OS is set as DHCP enabled, verify the settings. Typical settings are:
          • Use the Following as FQDN – Select Host Name.
          • Select the Shared FQDN option.
          • Select Get IP from DHCP Server.
          • Update the DNS directly or via DHCP option 81.
          • Select OK.
        • If the OS IP is static, select the Change the IP section radio button and then select Use the same IP as the host.
        • Select Next.
          Example of Network Settings
          Figure 3.Example of Network Settings
    5. The software saves the profile for potential future use. Enter and confirm the Encryption Password.
    6. Select Configure.
    7. Configuring your System Dialog box launches. Wait until it closes, which can take a few minutes.
    8. Screen should now show Configuration Complete, select finish.

    Multiple System Configuration

    Configuring Intel AMT devices using this method requires the use of two tools: ACUWizard.exe and ACUConfig.exe. The first step is to create a profile with the ACUWizard and then push the profile to the client with the ACUConfig tool. The following is an example of a basic profile; advanced profiles are beyond the scope of this blog. See Figures 1-3 for examples of what options are available in the ACUWizard’s GUI.

    Note:This is a scriptable solution.

    1. Create the profile by opening the ACUWizard, and then selecting Create Settings to configure Multiple Systems (See Figure 1.)
    2. In the AMT Configuration Utility: Profile Designer window, select the green plus sign New.
      Example of Green Pus sign
      Figure 4.Example of Green Plus sign
       
    3. In the Configuration Profile Wizard, select Next.
    4. In the Configuration Profile Wizard Optional Settings window, select Next.
    5. In the Configuration Profile Wizard System Settings window:
      • Enter the RFB password if it is being used.
      • Enter the password in the Use the following password for all systems data field:
      • Select the Set button for Edit and FQDN.
      • There will be no changes, but note the changes required if a device has a static OS IP address.
      • Select Cancel.
      • Select Next.
        Example of Available Feature Settings
        Figure 5.Example of Available Feature Settings
    6. In the Configuration Profile Wizard - "Finished" window:
      • Enter the Profile Name you want to use.
      • Encrypt the xml file by adding and confirming the password.
      • Select Finish.
        Profile Naming and Encryption Example
        Figure 6.Profile Naming and Encryption Example
    7. In the Intel AMT Configuration Utility: Profile Designer window:
      • a. Take note of the Profile Path shown on your screen. It should be something like <userName>\documents\SCS_Profile.
      • b. Close ACU Wizard.

    At this point, steps 1 through 7 above are a one-time process per each custom profile needed. The following steps are to be repeated on each client.

    1. Copy the previously created profile and paste it in the configurator folder of the Intel SCS download.
    2. Copy the configurator folder to a location accessible to the Intel AMT Client (Local, Network share, USB thumb drive, and so on).
    3. Open a command prompt as admin, and run the following string: acuconfig.exe configamt <profile.xml>
    4. You should exit with code 0 for a successful configuration.

    Steps for using ACUWizard to configure an Intel AMT Client via the USB Configuration

    Creating a USB Key for configuration is a three-step process: Create a configuration profile, format a USB Key (Fat32), and save the profile to the USB key as setup.bin.

    The profile can be created in two ways: as a single use key or a multiple use key.

    Single-Use Key

    This method creates a single use key that can't be reused without creating a new setup.bin file. You can keep the Intel AMT IP address the same as the OS IP address if it is statically configured. This key should only be created on the device that the finished USB key is going to configure. Figure 4 provides an example of what options are available for the Single-Use Key method.

    To create the USB file setup.bin:

    1. Create the profile by opening the ACUWizard, and then selecting Configure/ Unconfigure this System. (See Figure 1.)
    2. In the Intel AMT Configuration Utility - Configuration Options window:
      • Select Configure via USB Key.
      • Select Next.
    3. In the Intel AMT Configuration Utility - Configure via USB Key window:
      • Fill in Current Password. This is the password for the Intel® MEBX. The default password is "admin" if the password has not been changed,
      • Fill in New Password and Confirm Password.
    4. Select Display advanced settings
      • IS OS IP address is DHCP enabled,, verify that the checkbox for DHCP Enabled is checked.
      • If OS IP address is static, uncheck the DHCP Enabled checkbox and provide the Network address information.
    5. Select Next.
      Example of USB Key Configuration GUI
      Figure 7.Example of USB Key Configuration GUI
       
    6. In the Intel AMT Configuration Utility – Then Create Configuration USB Key window:
      • Specify the appropriate USB Drive in the selection window.
      • Select OK.
      • In the Formatting USB Drive window:
        • Select Yes to format the drive. In the Configuration USB Key Created Successfully dialog box, click OK.
    7. The USB key is now successfully configured

    Multi-Use Key

    This method creates a single multi-use key that can be reused without creating a new setup.bin file. This method allows for quick configuration over multiple devices. However, the configuration file is made specifically for DHCP-enabled or Static IP-assigned operating systems. Using the wrong key causes a mismatch between the OS (static) and Intel AMT (DHCP-enabled) IP addresses. This is not necessarily wrong, but it requires tracking multiple IPs for the same physical device, causing more management requirements. Figure 9, below provides an example of what the GUI looks like for performing the Multi-Use Key method.

    To create the USB file - setup.bin:

    1. Open the ACU Wizard and then select the Create Settings to configure Multiple Systems. See (See Figure 1.)
    2. In Intel AMT Configuration Utility: Profile Designer window:
      • Select the Tools button in the upper-right corner.
        Example of tools button
        Figure 8.Example of tools button
         
      • Select Prepare a USB for Manual Configuration.
    3. In the Settings for Manual Configuration of Multiple Systems window:
      • Select Mobile Systems or Desktop Systems.
        Note:Choosing the wrong device setting will trigger an error about applying power policy. The configuration will be successful; however, the firmware defaults to “Intel® AMT Always On in (s0-s5)” and DHCP-enabled.
      • Select Intel AMT Version level 6+ or 7+.
      • Enter passwords:
        • Old MEBx Password: If the password has not been changed, the default password will be admin.
        • New Password and confirm: The password must be complex and up to 32 characters.
      • Specify the system Power State – select Always On (s0-s5)
      • User Consent Required - Leave unchecked
        Note:With Intel AMT 11, a change was made that defaults User Consent to be KVM only. You can modify this post-configuration via the WS-Management command or through an existing tool such as Mesh Commander.
      • Specify the appropriate USB drive in the selection window.
      • Select OK.
        Example of USB Key Configurable Options
        Figure 9.Example of USB Key Configurable Options
    4. In the Formatting USB Drive window:
      • Select Yes to format the drive. In the Configuration USB Key Created Successfully dialog box, select OK to finish the configuration.
    5. The USB key is now successfully configured.

    How to use the Configuration USB Key

    Now that the key has been created, we need to use it to configure the Intel AMT device. Just insert the USB Key into the Intel AMT device and reboot the system. During reboot, the device will detect the setup.bin file and a message should display asking whether you want to configure the device. Select” Y” for yes and a few seconds later, hit enter at the success screen.

    A few things to note in regards to the USB key; don’t use drives over 32 gig, formatted for FAT32, USB configuration is occasionally disabled in the BIOS thus requiring activation and if a USB key fails to work try a different model or brand.

    Additional Resources

    Summary

    There are a lot of options and reasons for using the ACUWizard tool and it will all depend on your specific environmental requirements. The ACUWizard tool is designed to exercise the full range of features regardless of which method is used. There is not one “correct” way to do configuration as all options are valid, but determining the method that will work in your environment is the essential element.

    About the Author

    Joe Oster has been active at Intel around Intel® vPro™ Technology and Intel AMT since 2006. He is passionate about technology and is an advocate for the Managed Service Provider and Small/Medium business channels. When not working, he enjoys being a Dad and spends time working on his family farm or flying Drones and RC Aircraft.

    API without Secrets: Introduction to Vulkan* Part 3: First Triangle

    $
    0
    0

    Download [PDF 885 KB]

    Link to Github Sample Code


    Go to: API without Secrets: Introduction to Vulkan* Part 2: Swap Chain


    Table of Contents

    Tutorial 3: First Triangle – Graphics Pipeline and Drawing

    In this tutorial we will finally draw something on the screen. One single triangle should be just fine for our first Vulkan-generated “image.”

    The graphics pipeline and drawing in general require lots of preparations in Vulkan (in the form of filling many structures with even more different fields). There are potentially many places where we can make mistakes, and in Vulkan, even simple mistakes may lead to the application not working as expected, displaying just a blank screen, and leaving us wondering what went wrong. In such situations validation layers can help us a lot. But I didn’t want to dive into too many different aspects and the specifics of the Vulkan API. So I prepared the code to be as small and as simple as possible.

    This led me to create an application that is working properly and displays a simple triangle the way I expected, but it also uses mechanics that are not recommended, not flexible, and also probably not too efficient (though correct). I don’t want to teach solutions that aren’t recommended, but here it simplifies the tutorial quite considerably and allows us to focus only on the minimal required set of API usage. I will point out the “disputable” functionality as soon as we get to it. And in the next tutorial, I will show the recommended way of drawing triangles.

    To draw our first simple triangle, we need to create a render pass, a framebuffer, and a graphics pipeline. Command buffers are of course also needed, but we already know something about them. We will create simple GLSL shaders and compile them into Khronos’s SPIR*-V language—the only (at this time) form of shaders that Vulkan (officially) understands.

    If nothing displays on your computer’s screen, try to simplify the code as much as possible or even go back to the second tutorial. Check whether command buffer that just clears image behaves as expected, and that the color the image was cleared to is properly displayed on the screen. If yes, modify the code and add the parts from this tutorial. Check every return value if it is not VK_SUCCESS. If these ideas don’t help, wait for the tutorial about validation layers.

    About the Source Code Example

    For this and succeeding tutorials, I’ve changed the sample project. Vulkan preparation phases that were described in the previous tutorials were placed in a “VulkanCommon” class found in separate files (header and source). The class for a given tutorial that is responsible for presenting topics described in a given tutorial, inherits from the “VulkanCommon” class and has access to some (required) Vulkan variables like device or swap chain. This way I can reuse Vulkan creation code and prepare smaller classes focusing only on the presented topics. The code from the earlier chapters works properly so it should also be easier to find potential mistakes.

    I’ve also added a separate set of files for some utility functions. Here we will be reading SPIR-V shaders from binary files, so I’ve added a function for checking loading contents of a binary file. It can be found in Tools.cpp and Tools.h files.

    Creating a Render Pass

    To draw anything on the screen, we need a graphics pipeline. But creating it now will require pointers to other structures, which will probably also need pointers to yet other structures. So we’ll start with a render pass.

    What is a render pass? A general picture can give us a “logical” render pass that may be found in many known rendering techniques like deferred shading. This technique consists of many subpasses. The first subpass draws the geometry with shaders that fill the G-Buffer: store diffuse color in one texture, normal vectors in another, shininess in another, depth (position) in yet another. Next for each light source, drawing is performed that reads some of the data (normal vectors, shininess, depth/position), calculates lighting and stores it in another texture. Final pass aggregates lighting data with diffuse color. This is a (very rough) explanation of deferred shading but describes the render pass—a set of data required to perform some drawing operations: storing data in textures and reading data from other textures.

    In Vulkan, a render pass represents (or describes) a set of framebuffer attachments (images) required for drawing operations and a collection of subpasses that drawing operations will be ordered into. It is a construct that collects all color, depth and stencil attachments and operations modifying them in such a way that driver does not have to deduce this information by itself what may give substantial optimization opportunities on some GPUs. A subpass consists of drawing operations that use (more or less) the same attachments. Each of these drawing operations may read from some input attachments and render data into some other (color, depth, stencil) attachments. A render pass also describes the dependencies between these attachments: in one subpass we perform rendering into the texture, but in another this texture will be used as a source of data (that is, it will be sampled from). All this data help the graphics hardware optimize drawing operations.

    To create a render pass in Vulkan, we call the vkCreateRenderPass() function, which requires a pointer to a structure describing all the attachments involved in rendering and all the subpasses forming the render pass. As usual, the more attachments and subpasses we use, the more array elements containing properly filed structures we need. In our simple example, we will be drawing only into a single texture (color attachment) with just a single subpass.

    Render Pass Attachment Description

    VkAttachmentDescription attachment_descriptions[] = {
      {
        0,                                   // VkAttachmentDescriptionFlags   flags
        GetSwapChain().Format,               // VkFormat                       format
        VK_SAMPLE_COUNT_1_BIT,               // VkSampleCountFlagBits          samples
        VK_ATTACHMENT_LOAD_OP_CLEAR,         // VkAttachmentLoadOp             loadOp
        VK_ATTACHMENT_STORE_OP_STORE,        // VkAttachmentStoreOp            storeOp
        VK_ATTACHMENT_LOAD_OP_DONT_CARE,     // VkAttachmentLoadOp             stencilLoadOp
        VK_ATTACHMENT_STORE_OP_DONT_CARE,    // VkAttachmentStoreOp            stencilStoreOp
        VK_IMAGE_LAYOUT_PRESENT_SRC_KHR,     // VkImageLayout                  initialLayout;
        VK_IMAGE_LAYOUT_PRESENT_SRC_KHR      // VkImageLayout                  finalLayout
      }
    };

    1.Tutorial03.cpp, function CreateRenderPass()

    To create a render pass, first we prepare an array with elements describing each attachment, regardless of the type of attachment and how it will be used inside a render pass. Each array element is of type VkAttachmentDescription, which contains the following fields:

    • flags – Describes additional properties of an attachment. Currently, only an aliasing flag is available, which informs the driver that the attachment shares the same physical memory with another attachment; it is not the case here so we set this parameter to zero.
    • format – Format of an image used for the attachment; here we are rendering directly into a swap chain so we need to take its format.
    • samples – Number of samples of the image; we are not using any multisampling here so we just use one sample.
    • loadOp – Specifies what to do with the image’s contents at the beginning of a render pass, whether we want them to be cleared, preserved, or we don’t care about them (as we will overwrite them all). Here we want to clear the image to the specified value. This parameter also refers to depth part of depth/stencil images.
    • storeOp – Informs the driver what to do with the image’s contents after the render pass (after a subpass in which the image was used for the last time). Here we want the contents of the image to be preserved after the render pass as we intend to display them on screen. This parameter also refers to the depth part of depth/stencil images.
    • stencilLoadOp – The same as loadOp but for the stencil part of depth/stencil images; for color attachments it is ignored.
    • stencilStoreOp – The same as storeOp but for the stencil part of depth/stencil images; for color attachments this parameter is ignored.
    • initialLayout – The layout the given attachment will have when the render pass starts (what the layout image is provided with by the application).
    • finalLayout – The layout the driver will automatically transition the given image into at the end of a render pass.

    Some additional information is required for load and store operations and initial and final layouts.

    Load op refers to the attachment’s contents at the beginning of a render pass. This operation describes what the graphics hardware should do with the attachment: clear it, operate on its existing contents (leave its contents untouched), or it shouldn’t matter about the contents because the application intends to overwrite them. This gives the hardware an opportunity to optimize memory operations. For example, if we intend to overwrite all of the contents, the hardware won’t bother with them and, if it is faster, may allocate totally new memory for the attachment.

    Store op, as the name suggests, is used at the end of a render pass and informs the hardware whether we want to use the contents of the attachment after the render pass or if we don’t care about it and the contents may be discarded. In some scenarios (when contents are discarded) this creates the ability for the hardware to create an image in temporary, fast memory as the image will “live” only during the render pass and the implementations may save some memory bandwidth avoiding writing back data that is not needed anymore.

    When an attachment has a depth format (and potentially also a stencil component) load and store ops refer only to the depth component. If a stencil is present, stencil values are treated the way stencil load and store ops describe. For color attachments, stencil ops are not relevant.

    Layout, as I described in the swap chain tutorial, is an internal memory arrangement of an image. Image data may be organized in such a way that neighboring “image pixels” are also neighbors in memory, which can increase cache hits (faster memory reading) when image is used as a source of data (that is, during texture sampling). But caching is not necessary when the image is used as a target for drawing operations, and the memory for that image may be organized in a totally different way. Image may have linear layout (which gives the CPU ability to read or populate image’s memory contents) or optimal layout (which is optimized for performance but is also hardware/vendor dependent). So some hardware may have special memory organization for some types of operations; other hardware may be operations-agnostic. Some of the memory layouts may be better suited for some intended image “usages.” Or from the other side, some usages may require specific memory layouts. There is also a general layout that is compatible with all types of operations. But from the performance point of view, it is always best to set the layout appropriate for an intended image usage and it is application’s responsibility to inform the driver about transitions.

    Image layouts may be changed using image memory barriers. We did this in the swap chain tutorial when we first changed the layout from the presentation source (image was used by the presentation engine) to transfer destination (we wanted to clear the image with a given color). But layouts, apart from image memory barriers, may also be changed automatically by the hardware inside a render pass. If we specify a different initial layout, subpass layouts (described later), and final layout, the hardware does the transition automatically at the appropriate time.

    Initial layout informs the hardware about the layout the application “provides” (or “leaves”) the given attachment with. This is the layout the image starts with at the beginning of a render pass (in our example we acquire the image from the presentation engine so the image has a “presentation source” layout set). Each subpass of a render pass may use a different layout, and the transition will be done automatically by the hardware between subpasses. The final layout is the layout the given attachment will be transitioned into (automatically) at the end of a render pass (after a render pass is finished).

    This information must be prepared for each attachment that will be used in a render pass. When graphics hardware receives this information a priori, it may optimize operations and memory during the render pass to achieve the best possible performance.

    Subpass Description

    VkAttachmentReference color_attachment_references[] = {
      {
        0,                                          // uint32_t                       attachment
        VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL    // VkImageLayout                  layout
      }
    };
    
    VkSubpassDescription subpass_descriptions[] = {
      {
        0,                                          // VkSubpassDescriptionFlags      flags
        VK_PIPELINE_BIND_POINT_GRAPHICS,            // VkPipelineBindPoint            pipelineBindPoint
        0,                                          // uint32_t                       inputAttachmentCount
        nullptr,                                    // const VkAttachmentReference   *pInputAttachments
        1,                                          // uint32_t                       colorAttachmentCount
        color_attachment_references,                // const VkAttachmentReference   *pColorAttachments
        nullptr,                                    // const VkAttachmentReference   *pResolveAttachments
        nullptr,                                    // const VkAttachmentReference   *pDepthStencilAttachment
        0,                                          // uint32_t                       preserveAttachmentCount
        nullptr                                     // const uint32_t*                pPreserveAttachments
      }
    };

    2.Tutorial03.cpp, function CreateRenderPass()

    Next we specify the description of each subpass our render pass will include. This is done using VkSubpassDescription structure, which contains the following fields:

    • flags – Parameter reserved for future use.
    • pipelineBindPoint – Type of pipeline in which this subpass will be used (graphics or compute). Our example, of course, uses a graphics pipeline.
    • inputAttachmentCount – Number of elements in the pInputAttachments array.
    • pInputAttachments – Array with elements describing which attachments are used as an input and can be read from inside shaders. We are not using any input attachments here so we set this value to 0.
    • colorAttachmentCount – Number of elements in pColorAttachments and pResolveAttachments arrays.
    • pColorAttachments – Array describing (pointing to) attachments which will be used as color render targets (that image will be rendered into).
    • pResolveAttachments – Array closely connected with color attachments. Each element from this array corresponds to an element from a color attachments array; any such color attachment will be resolved to a given resolve attachment (if a resolve attachment at the same index is not null or if the whole pointer is not null). This is optional and can be set to null.
    • pDepthStencilAttachment – Description of an attachment that will be used for depth (and/or stencil) data. We don’t use depth information here so we can set it to null.
    • preserveAttachmentCount – Number of elements in pPreserveAttachments array.
    • pPreserveAttachments – Array describing attachments that should be preserved. When we have multiple subpasses not all of them will use all attachments. If a subpass doesn’t use some of the attachments but we need their contents in the later subpasses, we must specify these attachments here.

    The pInputAttachments, pColorAttachments, pResolveAttachments, pPreserveAttachments, and pDepthStencilAttachment parameters are all of type VkAttachmentReference. This structure contains only these two fields:

    • attachment – Index into an attachment_descriptions array of VkRenderPassCreateInfo.
    • layout – Requested (required) layout the attachment will use during a given subpass. The hardware will perform an automatic transition into a provided layout just before a given subpass.

    This structure contains references (indices) into the attachment_descriptions array of VkRenderPassCreateInfo. When we create a render pass we must provide a description of all attachments used during a render pass. We’ve prepared this description earlier in “Render pass attachment description” when we created the attachment_descriptions array. Right now it contains only one element, but in more advanced scenarios there will be multiple attachments. So this “general” collection of all render pass attachments is used as a reference point. In the subpass description, when we fill pColorAttachments or pDepthStencilAttachment members, we provide indices into this very “general” collection, like this: take the first attachment from all render pass attachments and use it as a color attachment. The second attachment from that array will be used for depth data.

    There is a separation between a whole render pass and its subpasses because each subpass may use multiple attachments in a different way, that is, in one subpass we are rendering into one color attachment but in the next subpass we are reading from this attachment. In this way, we can prepare a list of all attachments used in the whole render pass, and at the same time we can specify how each attachment will be used in each subpass. And as each subpass may use a given attachment in its own way, we must also specify each image’s layout for each subpass.

    So before we can specify a description of all subpasses (an array with elements of type VkSubpassDescription) we must create references for each attachment used in each subpass. And this is what the color_attachment_references variable was created for. When I write a tutorial for rendering into a texture, this usage will be more apparent.

    Render Pass Creation

    We now have all the data we need to create a render pass.

    vkRenderPassCreateInfo render_pass_create_info = {
      VK_STRUCTURE_TYPE_RENDER_PASS_CREATE_INFO,    // VkStructureType                sType
      nullptr,                                      // const void                    *pNext
      0,                                            // VkRenderPassCreateFlags        flags
      1,                                            // uint32_t                       attachmentCount
      attachment_descriptions,                      // const VkAttachmentDescription *pAttachments
      1,                                            // uint32_t                       subpassCount
      subpass_descriptions,                         // const VkSubpassDescription    *pSubpasses
      0,                                            // uint32_t                       dependencyCount
      nullptr                                       // const VkSubpassDependency     *pDependencies
    };
    
    if( vkCreateRenderPass( GetDevice(), &render_pass_create_info, nullptr, &Vulkan.RenderPass ) != VK_SUCCESS ) {
      printf( "Could not create render pass!\n" );
      return false;
    }
    
    return true;

    3.Tutorial03.cpp, function CreateRenderPass()

    We start by filling the VkRenderPassCreateInfo structure, which contains the following fields:

    • sType – Type of structure (VK_STRUCTURE_TYPE_RENDER_PASS_CREATE_INFO here).
    • pNext – Parameter not currently used.
    • flags – Parameter reserved for future use.
    • attachmentCount – Number of all different attachments (elements in pAttachments array) used during whole render pass (here just one).
    • pAttachments – Array specifying all attachments used in a render pass.
    • subpassCount – Number of subpasses a render pass consists of (and number of elements in pSubpasses array – just one in our simple example).
    • pSubpasses – Array with descriptions of all subpasses.
    • dependencyCount – Number of elements in pDependencies array (zero here).
    • pDependencies – Array describing dependencies between pairs of subpasses. We don’t have many subpasses so we don’t have dependencies here (set it to null here).

    Dependencies describe what parts of the graphics pipeline use memory resource in what way. Each subpass may use resources in a different way. Layouts of each resource may not solely define how they use resources. Some subpasses may render into images or store data through shader images. Other may not use images at all or may read from them at different pipeline stages (that is, vertex or fragment).

    This information helps the driver optimize automatic layout transitions and, more generally, optimize barriers between subpasses. When we are writing into images only in a vertex shader there is no point waiting until the fragment shader executes (of course in terms of used images). After all the vertex operations are done, images may immediately change their layouts and memory access type, and even some parts of graphics hardware may start executing the next operations (that are referencing or reading the given images) without the need to wait for the rest of the commands from the given subpass to finish. For now, just remember that dependencies are important from a performance point of view.

    So now that we have prepared all the information required to create a render pass, we can safely call the vkCreateRenderPass() function.

    Creating a Framebuffer

    We have created a render pass. It describes all attachments and all subpasses used during the render pass. But this description is quite abstract. We have specified formats of all attachments (just one image in this example) and described how attachments will be used by each subpass (also just one here). But we didn’t specify WHAT attachments we will be using or, in other words, what images will be used as these attachments. This is done through a framebuffer.

    A framebuffer describes specific images that the render pass operates on. In OpenGL*, a framebuffer is a set of textures (attachments) we are rendering into. In Vulkan, this term is much broader. It describes all the textures (attachments) used during the render pass, not only the images we are rendering into (color and depth/stencil attachments) but also images used as a source of data (input attachments).

    This separation of render pass and framebuffer gives us some additional flexibility. We can use the given render pass with different framebuffers and a given framebuffer with different render passes, if they are compatible, meaning that they operate in a similar fashion on images of similar types and usages.

    Before we can create a framebuffer, we must create image views for each image used as a framebuffer and render pass attachment. In Vulkan, not only in the case of framebuffers, but in general, we don’t operate on images themselves. Images are not accessed directly. For this purpose, image views are used. Image views represent images, they “wrap” images and provide additional (meta)data for them.

    Creating Image Views

    In this simple application, we want to render directly into swap chain images. We have created a swap chain with multiple images, so we must create an image view for each of them.

    const std::vector<VkImage> &swap_chain_images = GetSwapChain().Images;
    Vulkan.FramebufferObjects.resize( swap_chain_images.size() );
    
    for( size_t i = 0; i < swap_chain_images.size(); ++i ) {
      VkImageViewCreateInfo image_view_create_info = {
        VK_STRUCTURE_TYPE_IMAGE_VIEW_CREATE_INFO,   // VkStructureType                sType
        nullptr,                                    // const void                    *pNext
        0,                                          // VkImageViewCreateFlags         flags
        swap_chain_images[i],                       // VkImage                        image
        VK_IMAGE_VIEW_TYPE_2D,                      // VkImageViewType                viewType
        GetSwapChain().Format,                      // VkFormat                       format
        {                                           // VkComponentMapping             components
          VK_COMPONENT_SWIZZLE_IDENTITY,              // VkComponentSwizzle             r
          VK_COMPONENT_SWIZZLE_IDENTITY,              // VkComponentSwizzle             g
          VK_COMPONENT_SWIZZLE_IDENTITY,              // VkComponentSwizzle             b
          VK_COMPONENT_SWIZZLE_IDENTITY               // VkComponentSwizzle             a
        },
        {                                           // VkImageSubresourceRange        subresourceRange
          VK_IMAGE_ASPECT_COLOR_BIT,                  // VkImageAspectFlags             aspectMask
          0,                                          // uint32_t                       baseMipLevel
          1,                                          // uint32_t                       levelCount
          0,                                          // uint32_t                       baseArrayLayer
          1                                           // uint32_t                       layerCount
        }
      };
    
      if( vkCreateImageView( GetDevice(), &image_view_create_info, nullptr, &Vulkan.FramebufferObjects[i].ImageView ) != VK_SUCCESS ) {
        printf( "Could not create image view for framebuffer!\n" );
        return false;
      }

    4.Tutorial03.cpp, function CreateFramebuffers()

    To create an image view, we must first create a variable of type VkImageViewCreateInfo. It contains the following fields:

    • sType – Structure type, in this case it should be set to VK_STRUCTURE_TYPE_IMAGE_VIEW_CREATE_INFO.
    • pNext – Parameter typically set to null.
    • flags – Parameter reserved for future use.
    • image – Handle to an image for which view will be created.
    • viewType – Type of view we want to create. View type must be compatible with an image it is created for.  (that is, we can create a 2D view for an image that has multiple array layers or we can create a CUBE view for a 2D image with six layers).
    • format – Format of  an image view; it must be compatible with the image’s format but may not be the same format (that is, it may be a different format but with the same number of bits per pixel).
    • components – Mapping of an image components into a vector returned in the shader by texturing operations. This applies only to read operations (sampling), but since we are using an image as a color attachment (we are rendering into an image) we must set the so-called identity mapping (R component into R, G -> G, and so on) or just use “identity” value (VK_COMPONENT_SWIZZLE_IDENTITY).
    • subresourceRange – Describes the set of mipmap levels and array layers that will be accessible to a view. If our image is mipmapped, we may specify the specific mipmap level we want to render to (and in case of render targets we must specify exactly one mipmap level of one array layer).

    As you can see here, we acquire handles to all swap chain images, and we are referencing them inside a loop. This way we fill the structure required for image view creation, which we pass to a vkCreateImageView() function. We do this for each image that was created along with a swap chain.

    Specifying Framebuffer Parameters

    Now we can create a framebuffer. To do this we call the vkCreateFramebuffer() function.

    VkFramebufferCreateInfo framebuffer_create_info = {
        VK_STRUCTURE_TYPE_FRAMEBUFFER_CREATE_INFO,  // VkStructureType                sType
        nullptr,                                    // const void                    *pNext
        0,                                          // VkFramebufferCreateFlags       flags
        Vulkan.RenderPass,                          // VkRenderPass                   renderPass
        1,                                          // uint32_t                       attachmentCount&Vulkan.FramebufferObjects[i].ImageView,    // const VkImageView             *pAttachments
        300,                                        // uint32_t                       width
        300,                                        // uint32_t                       height
        1                                           // uint32_t                       layers
      };
    
      if( vkCreateFramebuffer( GetDevice(), &framebuffer_create_info, nullptr, &Vulkan.FramebufferObjects[i].Handle ) != VK_SUCCESS ) {
        printf( "Could not create a framebuffer!\n" );
        return false;
      }
    }
    return true;

    5.Tutorial03.cpp, function CreateFramebuffers()

    vkCreateFramebuffer() function requires us to provide a pointer to a variable of type VkFramebufferCreateInfo so we must first prepare it. It contains the following fields:

    • sType – Structure type set to VK_STRUCTURE_TYPE_FRAMEBUFFER_CREATE_INFO in this situation.
    • pNext – Parameter most of the time set to null.
    • flags – Parameter reserved for future use.
    • renderPass – Render pass this framebuffer will be compatible with.
    • attachmentCount – Number of attachments in a framebuffer (elements in pAttachments array).
    • pAttachments – Array of image views representing all attachments used in a framebuffer and render pass. Each element in this array (each image view) corresponds to each attachment in a render pass.
    • width – Width of a framebuffer.
    • height – Height of a framebuffer.
    • layers – Number of layers in a framebuffer (OpenGL’s layered rendering with geometry shaders, which could specify the layer into which fragments rasterized from a given polygon will be rendered).

    The framebuffer specifies what images are used as attachments on which the render pass operates. We can say that it translates image (image view) into a given attachment. The number of images specified for a framebuffer must be the same as the number of attachments in a render pass for which we are creating a framebuffer. Also, each pAttachments array’s element corresponds directly to an attachment in a render pass description structure. Render pass and framebuffer are closely connected, and that’s why we also must specify a render pass during framebuffer creation. But we may use a framebuffer not only with the specified render pass but also with all render passes that are compatible with the one specified. Compatible render passes, in general, must have the same number of attachments and corresponding attachments must have the same format and number of samples. But image layouts (initial, final, and for each subpass) may differ and doesn’t involve render pass compatibility.

    After we have finished creating and filling the VkFramebufferCreateInfo structure, we call the vkCreateFramebuffer() function.

    The above code executes in a loop. A framebuffer references image views. Here the image view is created for each swap chain image. So for each swap chain image and its view, we are creating a framebuffer. We are doing this in order to simplify the code called in a rendering loop. In a normal, real-life scenario we wouldn’t (probably) create a framebuffer for each swap chain image. I assume that a better solution would be to render into a single image (texture) and after that use command buffers that would copy rendering results from that image into a given swap chain image. This way we will have only three simple command buffers that are connected with a swap chain. All other rendering commands would be independent of a swap chain, making it easier to maintain.

    Creating a Graphics Pipeline

    Now we are ready to create a graphics pipeline. A pipeline is a collection of stages that process data one stage after another. In Vulkan there is currently a compute pipeline and a graphics pipeline. The compute pipeline allows us to perform some computational work, such as performing physics calculations for objects in games. The graphics pipeline is used for drawing operations.

    In OpenGL there are multiple programmable stages (vertex, tessellation, fragment shaders, and so on) and some fixed function stages (rasterizer, depth test, blending, and so on). In Vulkan, the situation is similar. There are similar (if not identical) stages. But the whole pipeline’s state is gathered in one monolithic object. OpenGL allows us to change the state that influences rendering operations anytime we want, we can change parameters for each stage (mostly) independently. We can set up shader programs, depths test, blending, and whatever state we want, and then we can render some objects. Next we can change just some small part of the state and render another object. In Vulkan, such operations can’t be done (we say that pipelines are “immutable”). We must prepare the whole state and set up parameters for pipeline stages and group them in a pipeline object. At the beginning this was one of the most startling pieces information for me. I’m not able to change shader program anytime I want? Why?

    The easiest and more valid explanation is because of the performance implications of such state changes. Changing just one single state of the whole pipeline may cause graphics hardware to perform many background operations like state and error checking. Different hardware vendors may implement (and usually are implementing) such functionality differently. This may cause applications to perform differently (meaning unpredictably, performance-wise) when executed on different graphics hardware. So the ability to change anything at any time is convenient for developers. But, unfortunately, it is not so convenient for the hardware.

    That’s why in Vulkan the state of the whole pipeline is to gather in one, single object. All the relevant state and error checking is performed when the pipeline object is created. When there are problems (like different parts of pipeline are set up in an incompatible way) pipeline object creation fails. But we know that upfront. The driver doesn’t have to worry for us and do whatever it can to properly use such a broken pipeline. It can immediately tell us about the problem. But during real usage, in performance-critical parts of the application, everything is already set up correctly and can be used as is.

    The downside of this methodology is that we have to create multiple pipeline objects, multiple variations of pipeline objects when we are drawing many objects in a different way (some opaque, some semi-transparent, some with depth test enabled, others without). Unfortunately, even different shaders make us create different pipeline objects. If we want to draw objects using different shaders, we also have to create multiple pipeline objects, one for each combination of shader programs. Shaders are also connected with the whole pipeline state. They use different resources (like textures and buffers), render into different color attachments, and read from different attachments (possibly that were rendered into before). These connections must also be initialized, prepared, and set up correctly. We know what we want to do, the driver does not. So it is better and far more logical that we do it, not the driver. In general this approach makes sense.

    To begin the pipeline creation process, let’s start with shaders.

    Creating a Shader Module

    Creating a graphics pipeline requires us to prepare lots of data in the form of structures or even arrays of structures. The first such data is a collection of all shader stages and shader programs that will be used during rendering with a given graphics pipeline bound.

    In OpenGL, we write shaders in GLSL. They are compiled and then linked into shader programs directly in our application. We can use or stop using a shader program anytime we want in our application.

    Vulkan on the other hand accepts only a binary representation of shaders, an intermediate language called SPIR-V. We can’t provide GLSL code like we did in OpenGL. But there is an official, separate compiler that can transform shaders written in GLSL into a binary SPIR-V language. To use it, we have to do it offline. After we prepare the SPIR-V assembly we can create a shader module from it. Such modules are then composed into an array of VkPipelineShaderStageCreateInfo structures, which are used, among other parameters, to create graphics pipeline.

    Here’s the code that creates a shader module from a specified file that contains a binary SPIR-V.

    const std::vector<char> code = Tools::GetBinaryFileContents( filename );
    if( code.size() == 0 ) {
      return Tools::AutoDeleter<VkShaderModule, PFN_vkDestroyShaderModule>();
    }
    
    VkShaderModuleCreateInfo shader_module_create_info = {
      VK_STRUCTURE_TYPE_SHADER_MODULE_CREATE_INFO,    // VkStructureType                sType
      nullptr,                                        // const void                    *pNext
      0,                                              // VkShaderModuleCreateFlags      flags
      code.size(),                                    // size_t                         codeSize
      reinterpret_cast<const uint32_t*>(&code[0])     // const uint32_t                *pCode
    };
    
    VkShaderModule shader_module;
    if( vkCreateShaderModule( GetDevice(), &shader_module_create_info, nullptr, &shader_module ) != VK_SUCCESS ) {
      printf( "Could not create shader module from a %s file!\n", filename );
      return Tools::AutoDeleter<VkShaderModule, PFN_vkDestroyShaderModule>();
    }
    
    return Tools::AutoDeleter<VkShaderModule, PFN_vkDestroyShaderModule>( shader_module, vkDestroyShaderModule, GetDevice() );

    6.Tutorial03.cpp, function CreateShaderModule()

    First we prepare a VkShaderModuleCreateInfo structure that contains the following fields:

    • sType – Type of structure, in this example set to VK_STRUCTURE_TYPE_SHADER_MODULE_CREATE_INFO.
    • pNext – Pointer not yet used.
    • flags – Parameter reserved for future use.
    • codeSize – Size in bytes of the code passed in pCode parameter.
    • pCode – Pointer to an array with source code (binary SPIR-V assembly).

    To acquire the contents of the file, I have prepared a simple utility function GetBinaryFileContents() that reads the entire contents of a specified file. It returns the content in a vector of chars.

    After we prepare a structure, we can call the vkCreateShaderModule() function and check whether everything went fine.

    The AutoDeleter<> class from Tools namespace is a helper class that wraps a given Vulkan object handle and takes a function that is called to delete that object. This class is similar to smart pointers, which delete the allocated memory when the object (the smart pointer) goes out of scope. AutoDeleter<> class takes the handle of a given object and deletes it with a provided function when the object of this class’s type goes out of scope.

    template<class T, class F>
    class AutoDeleter {
    public:
      AutoDeleter() :
        Object( VK_NULL_HANDLE ),
        Deleter( nullptr ),
        Device( VK_NULL_HANDLE ) {
      }
    
      AutoDeleter( T object, F deleter, VkDevice device ) :
        Object( object ),
        Deleter( deleter ),
        Device( device ) {
      }
    
      AutoDeleter( AutoDeleter&& other ) {
        *this = std::move( other );
      }
    
      ~AutoDeleter() {
        if( (Object != VK_NULL_HANDLE) && (Deleter != nullptr) && (Device != VK_NULL_HANDLE) ) {
          Deleter( Device, Object, nullptr );
        }
      }
    
      AutoDeleter& operator=( AutoDeleter&& other ) {
        if( this != &other ) {
          Object = other.Object;
          Deleter = other.Deleter;
          Device = other.Device;
          other.Object = VK_NULL_HANDLE;
        }
        return *this;
      }
    
      T Get() {
        return Object;
      }
    
      bool operator !() const {
        return Object == VK_NULL_HANDLE;
      }
    
    private:
      AutoDeleter( const AutoDeleter& );
      AutoDeleter& operator=( const AutoDeleter& );
      T         Object;
      F         Deleter;
      VkDevice  Device;
    };

    7.Tools.h

    Why so much effort for one simple object? Shader modules are one of the objects required to create the graphics pipeline. But after the pipeline is created, we don’t need these shader modules anymore. Sometimes it is convenient to keep them as we may need to create additional, similar pipelines. But in this example they may be safely destroyed after we create a graphics pipeline. Shader modules are destroyed by calling the vkDestroyShaderModule() function. But in the example, we would need to call this function in many places: inside multiple “ifs” and at the end of the whole function. Because I don’t want to remember where I need to call this function and, at the same time, I don’t want any memory leaks to occur, I have prepared this simple class just for convenience. Now, I don’t have to remember to delete the created shader module because it will be deleted automatically.

    Preparing a Description of the Shader Stages

    Now that we know how to create and destroy shader modules, we can create data for shader stages compositing our graphics pipeline. As I have written, the data that describes what shader stages should be active when a given graphics pipeline is bound has a form of an array with elements of type VkPipelineShaderStageCreateInfo. Here is the code that creates shader modules and prepares such an array:

    Tools::AutoDeleter<VkShaderModule, PFN_vkDestroyShaderModule> vertex_shader_module = CreateShaderModule( "Data03/vert.spv" );
    Tools::AutoDeleter<VkShaderModule, PFN_vkDestroyShaderModule> fragment_shader_module = CreateShaderModule( "Data03/frag.spv" );
    
    if( !vertex_shader_module || !fragment_shader_module ) {
      return false;
    }
    
    std::vector<VkPipelineShaderStageCreateInfo> shader_stage_create_infos = {
      // Vertex shader
      {
        VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_CREATE_INFO,        // VkStructureType                                sType
        nullptr,                                                    // const void                                    *pNext
        0,                                                          // VkPipelineShaderStageCreateFlags               flags
        VK_SHADER_STAGE_VERTEX_BIT,                                 // VkShaderStageFlagBits                          stage
        vertex_shader_module.Get(),                                 // VkShaderModule                                 module"main",                                                     // const char                                    *pName
        nullptr                                                     // const VkSpecializationInfo                    *pSpecializationInfo
      },
      // Fragment shader
      {
        VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_CREATE_INFO,        // VkStructureType                                sType
        nullptr,                                                    // const void                                    *pNext
        0,                                                          // VkPipelineShaderStageCreateFlags               flags
        VK_SHADER_STAGE_FRAGMENT_BIT,                               // VkShaderStageFlagBits                          stage
        fragment_shader_module.Get(),                               // VkShaderModule                                 module"main",                                                     // const char                                    *pName
        nullptr                                                     // const VkSpecializationInfo                    *pSpecializationInfo
      }
    };

    8.Tutorial03.cpp, function CreatePipeline()

    At the beginning we are creating two shader modules for vertex and fragment stages. They are created with the function presented earlier. When any error occurs and we return from the CreatePipeline() function, any created module is deleted automatically by a wrapper class with a provided deleter function.

    The code for the shader modules is read from files that contain the binary SPIR-V assembly. These files are generated with an application called “glslangValidator”. This is a tool distributed officially with the Vulkan SDK and is designed to validate GLSL shaders. But “glslangValidator” also has the capability to compile or rather transform GLSL shaders into SPIR-V binary files. A full explanation of the command line for its usage can be found at the official SDK site. I’ve used the following commands to generate SPIR-V shaders for this tutorial:

    glslangValidator.exe -V -H shader.vert > vert.spv.txt

    glslangValidator.exe -V -H shader.frag > frag.spv.txt

    “glslangValidator” takes a specified file and generates SPIR-V file from it. The type of shader stage is automatically detected by the input file’s extension (“.vert” for vertex shaders, “.geom” for geometry shaders, and so on). The name of the generated file can be specified, but by default it takes a form “<stage>.spv”. So in our example “vert.spv” and “frag.spv” files will be generated.

    SPIR-V files have a binary format so it may be hard to read and analyze them—but not impossible. When the “-H” option is used, “glslangValidator” outputs SPIR-V in a form that can be more easily read. This form is printed on standard output and that’s why I’m using the “> *.spv.txt” redirection operator.

    Here are the contents of a “shader.vert” file from which SPIR-V assembly was generated for the vertex stage:

    #version 400
    
    void main() {
        vec2 pos[3] = vec2[3]( vec2(-0.7, 0.7), vec2(0.7, 0.7), vec2(0.0, -0.7) );
        gl_Position = vec4( pos[gl_VertexIndex], 0.0, 1.0 );
    }

    9.shader.vert

    As you can see I have hardcoded the positions of all vertices used to render the triangle. They are indexed using the Vulkan-specific “gl_VertexIndex” built-in variable. In the simplest scenario, when using non-indexed drawing commands (which takes place here) this value starts from the value of the “firstVertex” parameter of a drawing command (zero in the provided example).

    This is the disputable part I wrote about earlier—this approach is acceptable and valid but not quite convenient to maintain and also allows us to skip some of the “structure filling” needed to create the graphics pipeline. I’ve chosen it in order to shorten and simplify this tutorial as much as possible. In the next tutorial, I will present a more typical way of drawing any number of vertices, similar to using vertex arrays and indices in OpenGL.

    Below is the source code of a fragment shader from the “shader.frag” file that was used to generate the SPIRV-V assembly for the fragment stage:

    #version 400
    
    layout(location = 0) out vec4 out_Color;
    
    void main() {
      out_Color = vec4( 0.0, 0.4, 1.0, 1.0 );
    }

    10.shader.frag

    In Vulkan’s shaders (when transforming from GLSL to SPIR-V) layout qualifiers are required. Here we specify to what output (color) attachment we want to store the color values generated by the fragment shader. Because we are using only one attachment, we must specify the first available location (zero).

    Now that you know how to prepare shaders for applications using Vulkan, we can move on to the next step. After we have created two shader modules, we check whether these operations succeeded. If they did we can start preparing a description of all shader stages that will constitute our graphics pipeline.

    For each enabled shader stage we need to prepare an instance of VkPipelineShaderStageCreateInfo structure. Arrays of these structures along with the number of its elements are together used in a graphics pipeline create info structure (provided to the function that creates the graphics pipeline). VkPipelineShaderStageCreateInfo structure has the following fields:

    • sType – Type of structure that we are preparing, which in this case must be equal to VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_CREATE_INFO.
    • pNext – Pointer reserved for extensions.
    • flags – Parameter reserved for future use.
    • stage – Type of shader stage we are describing (like vertex, tessellation control, and so on).
    • module – Handle to a shader module that contains the shader for a given stage.
    • pName – Name of the entry point of the provided shader.
    • pSpecializationInfo – Pointer to a VkSpecializationInfo structure, which we will leave for now and set to null.

    When we are creating a graphics pipeline we don’t create too many (Vulkan) objects. Most of the data is presented in a form of just such structures.

    Preparing Description of a Vertex Input

    Now we must provide a description of the input data used for drawing. This is similar to OpenGL’s vertex data: attributes, number of components, buffers from which to take data, data’s stride, or step rate. In Vulkan this data is of course prepared in a different way, but in general the meaning is the same. Fortunately, because of the fact that vertex data is hardcoded into a vertex shader in this tutorial, we can almost entirely skip this step and fill the VkPipelineVertexInputStateCreateInfo with almost nulls and zeros:

    VkPipelineVertexInputStateCreateInfo vertex_input_state_create_info = {
      VK_STRUCTURE_TYPE_PIPELINE_VERTEX_INPUT_STATE_CREATE_INFO,    // VkStructureType                                sType
      nullptr,                                                      // const void                                    *pNext
      0,                                                            // VkPipelineVertexInputStateCreateFlags          flags;
      0,                                                            // uint32_t                                       vertexBindingDescriptionCount
      nullptr,                                                      // const VkVertexInputBindingDescription         *pVertexBindingDescriptions
      0,                                                            // uint32_t                                       vertexAttributeDescriptionCount
      nullptr                                                       // const VkVertexInputAttributeDescription       *pVertexAttributeDescriptions
    };

    11. Tutorial03.cpp, function CreatePipeline()

    But for clarity here is a description of the members of the VkPipelineVertexInputStateCreateInfo structure:

    • sType – Type of structure, VK_STRUCTURE_TYPE_PIPELINE_VERTEX_INPUT_STATE_CREATE_INFO here.
    • pNext – Pointer to an extension-specific structure.
    • flags – Parameter reserved for future use.
    • vertexBindingDescriptionCount – Number of elements in the pVertexBindingDescriptions array.
    • pVertexBindingDescriptions – Array with elements describing input vertex data (stride and stepping rate).
    • vertexAttributeDescriptionCount – Number of elements in the pVertexAttributeDescriptions array.
    • pVertexAttributeDescriptions – Array with elements describing vertex attributes (location, format, offset).

    Preparing the Description of an Input Assembly

    The next step requires us to describe how vertices should be assembled into primitives. As with OpenGL, we must specify what topology we want to use: points, lines, triangles, triangle fan, and so on.

    VkPipelineInputAssemblyStateCreateInfo input_assembly_state_create_info = {
      VK_STRUCTURE_TYPE_PIPELINE_INPUT_ASSEMBLY_STATE_CREATE_INFO,  // VkStructureType                                sType
      nullptr,                                                      // const void                                    *pNext
      0,                                                            // VkPipelineInputAssemblyStateCreateFlags        flags
      VK_PRIMITIVE_TOPOLOGY_TRIANGLE_LIST,                          // VkPrimitiveTopology                            topology
      VK_FALSE                                                      // VkBool32                                       primitiveRestartEnable
    };

    12.Tutorial03.cpp, function CreatePipeline()

    We do that through the VkPipelineInputAssemblyStateCreateInfo structure, which contains the following members:

    • sType – Structure type set here to VK_STRUCTURE_TYPE_PIPELINE_INPUT_ASSEMBLY_STATE_CREATE_INFO.
    • pNext – Pointer not yet used.
    • flags – Parameter reserved for future use.
    • topology – Parameter describing how vertices will be organized to form a primitive.
    • primitiveRestartEnable – Parameter that tells whether a special index value (when indexed drawing is performed) restarts assembly of a given primitive.

    Preparing the Viewport’s Description

    We have finished dealing with input data. Now we must specify the form of output data, all the part of the graphics pipeline that are connected with fragments, like rasterization, window (viewport), depth tests, and so on. The first set of data we must prepare here is the state of the viewport, which specifies to what part of the image (or texture, or window) we want do draw.

    VkViewport viewport = {
      0.0f,                                                         // float                                          x
      0.0f,                                                         // float                                          y
      300.0f,                                                       // float                                          width
      300.0f,                                                       // float                                          height
      0.0f,                                                         // float                                          minDepth
      1.0f                                                          // float                                          maxDepth
    };
    
    VkRect2D scissor = {
      {                                                             // VkOffset2D                                     offset
        0,                                                            // int32_t                                        x
        0                                                             // int32_t                                        y
      },
      {                                                             // VkExtent2D                                     extent
        300,                                                          // int32_t                                        width
        300                                                           // int32_t                                        height
      }
    };
    
    VkPipelineViewportStateCreateInfo viewport_state_create_info = {
      VK_STRUCTURE_TYPE_PIPELINE_VIEWPORT_STATE_CREATE_INFO,        // VkStructureType                                sType
      nullptr,                                                      // const void                                    *pNext
      0,                                                            // VkPipelineViewportStateCreateFlags             flags
      1,                                                            // uint32_t                                       viewportCount
      &viewport,                                                    // const VkViewport                              *pViewports
      1,                                                            // uint32_t                                       scissorCount&scissor                                                      // const VkRect2D                                *pScissors
    };

    13.Tutorial03.cpp, function CreatePipeline()

    In this example, the usage is simple: we just set the viewport coordinates to some predefined values. I don’t check the size of the swap chain image we are rendering into. But remember that in real-life production applications this has to be done because the specification states that dimensions of the viewport cannot exceed the dimensions of the attachments that we are rendering into.

    To specify the viewport’s parameters, we fill the VkViewport structure that contains these fields:

    • x – Left side of the viewport.
    • y – Upper side of the viewport.
    • width – Width of the viewport.
    • height – Height of the viewport.
    • minDepth – Minimal depth value used for depth calculations.
    • maxDepth – Maximal depth value used for depth calculations.

    When specifying viewport coordinates, remember that the origin is different than in OpenGL. Here we specify the upper-left corner of the viewport (not the lower left).

    Also worth noting is that the minDepth and maxDepth values must be between 0.0 and 1.0 (inclusive) but maxDepth can be lower than minDepth. This will cause the depth to be calculated in “reverse.”

    Next we must specify the parameters for the scissor test. The scissor test, similarly to OpenGL, restricts generation of fragments only to the specified rectangular area. But in Vulkan, the scissor test is always enabled and can’t be turned off. We can just provide the values identical to the ones provided for viewport. Try changing these values and see how it influences the generated image.

    The scissor test doesn’t have a dedicated structure. To provide data for it we fill the VkRect2D structure which contains two similar structure members. First is VkOffset2D with the following members:

    • x – Left side of the rectangular area used for scissor test
    • y – Upper side of the scissor area

    The second member is of type VkExtent2D, which contains the following fields:

    • width – Width of the scissor rectangular area
    • height – Height of the scissor area

    In general, the meaning of the data we provide for the scissor test through the VkRect2D structure is similar to the data prepared for viewport.

    After we have finished preparing data for viewport and the scissor test, we can finally fill the structure that is used during pipeline creation. The structure is called VkPipelineViewportStateCreateInfo, and it contains the following fields:

    • sType – Type of the structure, VK_STRUCTURE_TYPE_PIPELINE_VIEWPORT_STATE_CREATE_INFO here.
    • pNext – Pointer reserved for extensions.
    • flags – Parameter reserved for future use.
    • viewportCount – Number of elements in the pViewports array.
    • pViewports – Array with elements describing parameters of viewports used when the given pipeline is bound.
    • scissorCount – Number of elements in the pScissors array.
    • pScissors – Array with elements describing parameters of the scissor test for each viewport.

    Remember that the viewportCount and scissorCount parameters must be equal. We are also allowed to specify more viewports, but then the multiViewport feature must be also enabled.

    Preparing the Rasterization State’s Description

    The next part of the graphics pipeline creation applies to the rasterization state. We must specify how polygons are going to be rasterized (changed into fragments), which means whether we want fragments to be generated for whole polygons or just their edges (polygon mode) or whether we want to see the front or back side or maybe both sides of the polygon (face culling). We can also provide depth bias parameters or indicate whether we want to enable depth clamp. This whole state is encapsulated into VkPipelineRasterizationStateCreateInfo. It contains the following members:

    • sType – Structure type, VK_STRUCTURE_TYPE_PIPELINE_RASTERIZATION_STATE_CREATE_INFO in this example.
    • pNext – Pointer reserved for extensions.
    • flags – Parameter reserved for future use.
    • depthClampEnable – Parameter describing whether we want to clamp depth values of the rasterized primitive to the frustum (when true) or if we want normal clipping to occur (false).
    • rasterizerDiscardEnable – Deactivates fragment generation (discards primitive before rasterization turning off fragment shader).
    • polygonMode – Controls how the fragments are generated for a given primitive (triangle mode): whether they are generated for the whole triangle, only its edges, or just its vertices.
    • cullMode – Chooses the triangle’s face used for culling (if enabled).
    • frontFace – Chooses which side of a triangle should be considered the front (depending on the winding order).
    • depthBiasEnable – Enabled or disables biasing of fragments’ depth values.
    • depthBiasConstantFactor – Constant factor added to each fragment’s depth value when biasing is enabled.
    • depthBiasClamp – Maximum (or minimum) value of bias that can be applied to fragment’s depth.
    • depthBiasSlopeFactor – Factor applied for fragment’s slope during depth calculations when biasing is enabled.
    • lineWidth – Width of rasterized lines.

    Here is the source code responsible for setting rasterization state in our example:

    VkPipelineRasterizationStateCreateInfo rasterization_state_create_info = {
      VK_STRUCTURE_TYPE_PIPELINE_RASTERIZATION_STATE_CREATE_INFO,   // VkStructureType                                sType
      nullptr,                                                      // const void                                    *pNext
      0,                                                            // VkPipelineRasterizationStateCreateFlags        flags
      VK_FALSE,                                                     // VkBool32                                       depthClampEnable
      VK_FALSE,                                                     // VkBool32                                       rasterizerDiscardEnable
      VK_POLYGON_MODE_FILL,                                         // VkPolygonMode                                  polygonMode
      VK_CULL_MODE_BACK_BIT,                                        // VkCullModeFlags                                cullMode
      VK_FRONT_FACE_COUNTER_CLOCKWISE,                              // VkFrontFace                                    frontFace
      VK_FALSE,                                                     // VkBool32                                       depthBiasEnable
      0.0f,                                                         // float                                          depthBiasConstantFactor
      0.0f,                                                         // float                                          depthBiasClamp
      0.0f,                                                         // float                                          depthBiasSlopeFactor
      1.0f                                                          // float                                          lineWidth
    };

    14.Tutorial03.cpp, function CreatePipeline()

    In the tutorial we are disabling as many parameters as possible to simplify the process, the code itself, and the rendering operations. The parameters that matter here set up (typical) fill mode for polygon rasterization, back face culling, and similar to OpenGL’s counterclockwise front faces. Depth biasing and clamping are also disabled (to enable depth clamping, we first need to enable a dedicated feature during logical device creation; similarly we must do the same for polygon modes other than “fill”).

    Setting the Multisampling State’s Description

    In Vulkan, when we are creating a graphics pipeline, we must also specify the state relevant to multisampling. This is done using the VkPipelineMultisampleStateCreateInfo structure. Here are its members:

    • sType – Type of structure, VK_STRUCTURE_TYPE_PIPELINE_MULTISAMPLE_STATE_CREATE_INFO here.
    • pNext – Pointer reserved for extensions.
    • flags – Parameter reserved for future use.
    • rasterizationSamples – Number of per pixel samples used in rasterization.
    • sampleShadingEnable – Parameter specifying that shading should occur per sample (when enabled) instead of per fragment (when disabled).
    • minSampleShading – Specifies the minimum number of unique sample locations that should be used during the given fragment’s shading.
    • pSampleMask – Pointer to an array of static coverage sample masks; this can be null.
    • alphaToCoverageEnable – Controls whether the fragment’s alpha value should be used for coverage calculations.
    • alphaToOneEnable – Controls whether the fragment’s alpha value should be replaced with one.

    In this example, I wanted to minimize possible problems so I’ve set parameters to values that generally disable multisampling—just one sample per given pixel with the other parameters turned off. Remember that if we want to enable sample shading or alpha to one, we also need to enable two respective features. Here is a source code that prepares the VkPipelineMultisampleStateCreateInfo structure:

    VkPipelineMultisampleStateCreateInfo multisample_state_create_info = {
      VK_STRUCTURE_TYPE_PIPELINE_MULTISAMPLE_STATE_CREATE_INFO,     // VkStructureType                                sType
      nullptr,                                                      // const void                                    *pNext
      0,                                                            // VkPipelineMultisampleStateCreateFlags          flags
      VK_SAMPLE_COUNT_1_BIT,                                        // VkSampleCountFlagBits                          rasterizationSamples
      VK_FALSE,                                                     // VkBool32                                       sampleShadingEnable
      1.0f,                                                         // float                                          minSampleShading
      nullptr,                                                      // const VkSampleMask                            *pSampleMask
      VK_FALSE,                                                     // VkBool32                                       alphaToCoverageEnable
      VK_FALSE                                                      // VkBool32                                       alphaToOneEnable
    };

    15.Tutorial03.cpp, function CreatePipeline()

    Setting the Blending State’s Description

    Another thing we need to prepare when creating a graphics pipeline is a blending state (which also includes logical operations).

    VkPipelineColorBlendAttachmentState color_blend_attachment_state = {
      VK_FALSE,                                                     // VkBool32                                       blendEnable
      VK_BLEND_FACTOR_ONE,                                          // VkBlendFactor                                  srcColorBlendFactor
      VK_BLEND_FACTOR_ZERO,                                         // VkBlendFactor                                  dstColorBlendFactor
      VK_BLEND_OP_ADD,                                              // VkBlendOp                                      colorBlendOp
      VK_BLEND_FACTOR_ONE,                                          // VkBlendFactor                                  srcAlphaBlendFactor
      VK_BLEND_FACTOR_ZERO,                                         // VkBlendFactor                                  dstAlphaBlendFactor
      VK_BLEND_OP_ADD,                                              // VkBlendOp                                      alphaBlendOp
      VK_COLOR_COMPONENT_R_BIT | VK_COLOR_COMPONENT_G_BIT |         // VkColorComponentFlags                          colorWriteMask
      VK_COLOR_COMPONENT_B_BIT | VK_COLOR_COMPONENT_A_BIT
    };
    
    VkPipelineColorBlendStateCreateInfo color_blend_state_create_info = {
      VK_STRUCTURE_TYPE_PIPELINE_COLOR_BLEND_STATE_CREATE_INFO,     // VkStructureType                                sType
      nullptr,                                                      // const void                                    *pNext
      0,                                                            // VkPipelineColorBlendStateCreateFlags           flags
      VK_FALSE,                                                     // VkBool32                                       logicOpEnable
      VK_LOGIC_OP_COPY,                                             // VkLogicOp                                      logicOp
      1,                                                            // uint32_t                                       attachmentCount
      &color_blend_attachment_state,                                // const VkPipelineColorBlendAttachmentState     *pAttachments
      { 0.0f, 0.0f, 0.0f, 0.0f }                                    // float                                          blendConstants[4]
    };

    16.Tutorial03.cpp, function CreatePipeline()

    Final color operations are set up through the VkPipelineColorBlendStateCreateInfo structure. It contains the following fields:

    • sType – Type of the structure, set to VK_STRUCTURE_TYPE_PIPELINE_COLOR_BLEND_STATE_CREATE_INFO in this example.
    • pNext – Pointer reserved for future, extension-specific use.
    • flags – Parameter also reserved for future use.
    • logicOpEnable – Indicates whether we want to enable logical operations on pixels.
    • logicOp – Type of the logical operation we want to perform (like copy, clear, and so on)
    • attachmentCount – Number of elements in the pAttachments array.
    • pAttachments – Array containing state parameters for each color attachment used in a subpass for which the given graphics pipeline is bound.
    • blendConstants – Four-element array with color value used in blending operation (when a dedicated blend factor is used).

    More information is needed for the attachmentCount and pAttachments parameters. When we want to perform drawing operations we set up parameters, the most important of which are graphics pipeline, render pass, and framebuffer. The graphics card needs to know how to draw (graphics pipeline which describes rendering state, shaders, test, and so on) and where to draw (the render pass gives general setup; the framebuffer specifies exactly what images are used). As I have already mentioned, the render pass specifies how operations are ordered, what the dependencies are, when we are rendering into a given attachment, and when we are reading from the same attachment. These stages take the form of subpasses. And for each drawing operation we can (but don’t have to) enable/use a different pipeline. But when we are drawing, we must remember that we are drawing into a set of attachments. This set is defined in a render pass, which describes all color, input, depth attachments (the framebuffer just specifies what images are used for each of them). For the blending state, we can specify whether we want to enable blending at all. This is done through the pAttachments array. Each of its elements must correspond to each color attachment defined in a render pass. So the value of attachmentCount elements in the pAttachments array must equal the number of color attachments defined in a render pass.

    There is one more restriction. By default all elements in pAttachments array must contain the same values, must be specified in the same way, and must be identical. By default, blending (and color masks) is done in the same way for all attachments. So why it is an array? Why can’t we just specify one value? Because there is a feature that allows us to perform independent, distinct blending for each active color attachment. When we enable the independent blending feature during device creation we can provide different values for each color attachment.

    Each pAttachments array’s element is of type VkPipelineColorBlendAttachmentState. It is a structure with the following members:

    • blendEnable – Indicates whether we want to enable blending at all.
    • srcColorBlendFactor – Blending factor for color of the source (incoming) fragment.
    • dstColorBlendFactor – Blending factor for the destination color (stored already in the framebuffer at the same location as the incoming fragment).
    • colorBlendOp – Type of operation to perform (multiplication, addition, and so on).
    • srcAlphaBlendFactor – Blending factor for the alpha value of the source (incoming) fragment.
    • dstAlphaBlendFactor – Blending factor for the destination alpha value (already stored in the framebuffer).
    • alphaBlendOp – Type of operation to perform for alpha blending.
    • colorWriteMask – Bitmask selecting which of the RGBA components are selected (enabled) for writing.

    In this example, we disable blending, which causes all other parameters to be irrelevant. Except for colorWriteMask, we select all components for writing but you can freely check what will happen when this parameter is changed to some other R, G, B, A combinations.

    Creating a Pipeline Layout

    The final thing we must do before pipeline creation is create a proper pipeline layout. A pipeline layout describes all the resources that can be accessed by the pipeline. In this example we must specify how many textures can be used by shaders and which shader stages will have access to them. There are of course other resources involved. Apart from shader stages, we must also describe the types of resources (textures, buffers), their total numbers, and layout. This layout can be compared to OpenGL’s active textures and shader uniforms. In OpenGL we bind textures to the desired texture image units and for shader uniforms we don’t provide texture handles but IDs of the texture image units to which actual textures are bound (we provide the number of the unit which the given texture was associated with).

    With Vulkan, the situation is similar. We create some form of a memory layout: first there are two buffers, next we have three textures and an image. This memory “structure” is called a set and a collection of these sets is provided for the pipeline. In shaders, we access specified resources using specific memory “locations” from within these sets (layouts). This is done through a layout (set = X, binding = Y) specifier, which can be translated to: take the resource from the Y memory location from the X set.

    And pipeline layout can be thought of as an interface between shader stages and shader resources as it takes these groups of resources, describes how they are gathered, and provides them to the pipeline.

    This process is complex and I plan to devote a tutorial to it. Here we are not using any additional resources so I present an example for creating an “empty” pipeline layout:

    VkPipelineLayoutCreateInfo layout_create_info = {
      VK_STRUCTURE_TYPE_PIPELINE_LAYOUT_CREATE_INFO,  // VkStructureType                sType
      nullptr,                                        // const void                    *pNext
      0,                                              // VkPipelineLayoutCreateFlags    flags
      0,                                              // uint32_t                       setLayoutCount
      nullptr,                                        // const VkDescriptorSetLayout   *pSetLayouts
      0,                                              // uint32_t                       pushConstantRangeCount
      nullptr                                         // const VkPushConstantRange     *pPushConstantRanges
    };
    
    VkPipelineLayout pipeline_layout;
    if( vkCreatePipelineLayout( GetDevice(), &layout_create_info, nullptr, &pipeline_layout ) != VK_SUCCESS ) {
      printf( "Could not create pipeline layout!\n" );
      return Tools::AutoDeleter<VkPipelineLayout, PFN_vkDestroyPipelineLayout>();
    }
    
    return Tools::AutoDeleter<VkPipelineLayout, PFN_vkDestroyPipelineLayout>( pipeline_layout, vkDestroyPipelineLayout, GetDevice() );

    17.Tutorial03.cpp, function CreatePipelineLayout()

    To create a pipeline layout we must first prepare a variable of type VkPipelineLayoutCreateInfo. It contains the following fields:

    • sType – Type of structure, VK_STRUCTURE_TYPE_PIPELINE_LAYOUT_CREATE_INFO in this example.
    • pNext – Parameter reserved for extensions.
    • flags – Parameter reserved for future use.
    • setLayoutCount – Number of descriptor sets included in this layout.
    • pSetLayouts – Pointer to an array containing descriptions of descriptor layouts.
    • pushConstantRangeCount – Number of push constant ranges (I will describe it in a later tutorial).
    • pPushConstantRanges – Array describing all push constant ranges used inside shaders (in a given pipeline).

    In this example we create “empty” layout so almost all the fields are set to null or zero.

    We are not using push constants here, but they deserve some explanation. Push constants in Vulkan allow us to modify the data of constant variables used in shaders. There is a special, small amount of memory reserved for push constants. We update their values through Vulkan commands, not through memory updates, and it is expected that updates of push constants’ values are faster than normal memory writes.

    As shown in the above example, I’m also wrapping pipeline layout in an “AutoDeleter” object. Pipeline layouts are required during pipeline creation, descriptor sets binding (enabling/activating this interface between shaders and shader resources) and push constants setting. None of these operations, except for pipeline creation, take place in this tutorial. So here, after we create a pipeline, we don’t need the layout anymore. To avoid memory leaks, I have used this helper class to destroy the layout as soon as we leave the function in which graphics pipeline is created.

    Creating a Graphics Pipeline

    Now we have all the resources required to properly create graphics pipeline. Here is the code that does that:

    Tools::AutoDeleter<VkPipelineLayout, PFN_vkDestroyPipelineLayout> pipeline_layout = CreatePipelineLayout();
    if( !pipeline_layout ) {
      return false;
    }
    
    VkGraphicsPipelineCreateInfo pipeline_create_info = {
      VK_STRUCTURE_TYPE_GRAPHICS_PIPELINE_CREATE_INFO,              // VkStructureType                                sType
      nullptr,                                                      // const void                                    *pNext
      0,                                                            // VkPipelineCreateFlags                          flags
      static_cast<uint32_t>(shader_stage_create_infos.size()),      // uint32_t                                       stageCount&shader_stage_create_infos[0],                                // const VkPipelineShaderStageCreateInfo         *pStages&vertex_input_state_create_info,                              // const VkPipelineVertexInputStateCreateInfo    *pVertexInputState;&input_assembly_state_create_info,                            // const VkPipelineInputAssemblyStateCreateInfo  *pInputAssemblyState
      nullptr,                                                      // const VkPipelineTessellationStateCreateInfo   *pTessellationState&viewport_state_create_info,                                  // const VkPipelineViewportStateCreateInfo       *pViewportState&rasterization_state_create_info,                             // const VkPipelineRasterizationStateCreateInfo  *pRasterizationState&multisample_state_create_info,                               // const VkPipelineMultisampleStateCreateInfo    *pMultisampleState
      nullptr,                                                      // const VkPipelineDepthStencilStateCreateInfo   *pDepthStencilState&color_blend_state_create_info,                               // const VkPipelineColorBlendStateCreateInfo     *pColorBlendState
      nullptr,                                                      // const VkPipelineDynamicStateCreateInfo        *pDynamicState
      pipeline_layout.Get(),                                        // VkPipelineLayout                               layout
      Vulkan.RenderPass,                                            // VkRenderPass                                   renderPass
      0,                                                            // uint32_t                                       subpass
      VK_NULL_HANDLE,                                               // VkPipeline                                     basePipelineHandle
      -1                                                            // int32_t                                        basePipelineIndex
    };
    
    if( vkCreateGraphicsPipelines( GetDevice(), VK_NULL_HANDLE, 1, &pipeline_create_info, nullptr, &Vulkan.GraphicsPipeline ) != VK_SUCCESS ) {
      printf( "Could not create graphics pipeline!\n" );
      return false;
    }
    return true;

    18.Tutorial03.cpp, function CreatePipeline()

    First we create a pipeline layout wrapped in an object of type “AutoDeleter”. Next we fill the structure of type VkGraphicsPipelineCreateInfo. It contains many fields. Here is a brief description of them:

    • sType – Type of structure, VK_STRUCTURE_TYPE_GRAPHICS_PIPELINE_CREATE_INFO here.
    • pNext – Parameter reserved for future, extension-related use.
    • flags – This time this parameter is not reserved for future use but controls how the pipeline should be created: if we are creating a derivative pipeline (if we are inheriting from another pipeline) or if we allow creating derivative pipelines from this one. We can also disable optimizations, which should shorten the time needed to create a pipeline.
    • stageCount – Number of stages described in the pStages parameter; must be greater than zero.
    • pStages – Array with descriptions of active shader stages (the ones created using shader modules); each stage must be unique (we can’t specify a given stage more than once). There also must be a vertex stage present.
    • pVertexInputState – Pointer to a variable contain the description of the vertex input’s state.
    • pInputAssemblyState – Pointer to a variable with input assembly description.
    • pTessellationState – Pointer to a description of the tessellation stages; can be null if tessellation is disabled.
    • pViewportState – Pointer to a variable specifying viewport parameters; can be null if rasterization is disabled.
    • pRasterizationState – Pointer to a variable specifying rasterization behavior.
    • pMultisampleState – Pointer to a variable defining multisampling; can be null if rasterization is disabled.
    • pDepthStencilState – Pointer to a description of depth/stencil parameters; this can be null in two situations: when rasterization is disabled or we’re not using depth/stencil attachments in a render pass.
    • pColorBlendState – Pointer to a variable with color blending/write masks state; can be null also in two situations: when rasterization is disabled or when we’re not using any color attachments inside the render pass.
    • pDynamicState – Pointer to a variable specifying which parts of the graphics pipeline can be set dynamically; can be null if the whole state is considered static (defined only through this create info structure).
    • layout – Handle to a pipeline layout object that describes resources accessed inside shaders.
    • renderPass – Handle to a render pass object; pipeline can be used with any render pass compatible with the provided one.
    • subpass – Number (index) of a subpass in which the pipeline will be used.
    • basePipelineHandle – Handle to a pipeline this one should derive from.
    • basePipelineIndex – Index of a pipeline this one should derive from.

    When we are creating a new pipeline, we can inherit some of the parameters from another one. This means that both pipelines should have much in common. A good example is shader code. We don’t specify what fields are the same, but the general message that the pipeline inherits from another one may substantially accelerate pipeline creation. But why are there two fields to indicate a “parent” pipeline? We can’t use them both—only one of them at a time. When we are using a handle, this means that the “parent” pipeline is already created and we are deriving from the one we have provided the handle of. But the pipeline creation function allows us to create many pipelines at once. Using the second parameter, “parent” pipeline index, we can create both “parent” and “child” pipelines in the same call. We just specify an array of graphics pipeline creation info structures and this array is provided to pipeline creation function. So the “basePipelineIndex” is the index of pipeline creation info in this very array. We just have to remember that the “parent” pipeline must be earlier (must have a smaller index) in this array and it must be created with the “allow derivatives” flag set.

    In this example we are creating a pipeline with the state being entirely static (null for the “pDynamicState” parameter). But what is a dynamic state? To allow for some flexibility and to lower the number of created pipeline objects, the dynamic state was introduced. We can define through the “pDynamicState” parameter what parts of the graphics pipeline can be set dynamically through additional Vulkan commands and what parts are being static, set once during pipeline creation. The dynamic state includes parameters such as viewports, line widths, blend constants, or some stencil parameters. If we specify that a given state is dynamic, parameters in a pipeline creation info structure that are related to that state are ignored. We must set the given state using the proper Vulkan commands during rendering because initial values of such state may be undefined.

    So after these quite overwhelming preparations we can create a graphics pipeline. This is done by calling the vkCreateGraphicsPipelines() function which, among others, takes an array of pointers to the pipeline create info structures. When everything goes well, VK_SUCCESS should be returned by this function and a handle of a graphics pipeline should be stored in a variable we’ve provided the address of. Now we are ready to start drawing.

    Preparing Drawing Commands

    I introduced you to the concept of command buffers in the previous tutorial. Here I will briefly explain what are they and how to use them.

    Command buffers are containers for GPU commands. If we want to execute some job on a device, we do it through command buffers. This means that we must prepare a set of commands that process data (that is, draw something on the screen) and record these commands in command buffers. Then we can submit whole buffers to device’s queues. This submit operation tells the device: here is a bunch of things I want you to do for me and do them now.

    To record commands, we must first allocate command buffers. These are allocated from command pools, which can be thought of as memory chunks. If a command buffer needs to be larger (as we record many complicated commands in it) it can grow and use additional memory from a pool it was allocated with. So first we must create a command pool.

    Creating a Command Pool

    Command pool creation is simple and looks like this:

    VkCommandPoolCreateInfo cmd_pool_create_info = {
      VK_STRUCTURE_TYPE_COMMAND_POOL_CREATE_INFO,     // VkStructureType                sType
      nullptr,                                        // const void                    *pNext
      0,                                              // VkCommandPoolCreateFlags       flags
      queue_family_index                              // uint32_t                       queueFamilyIndex
    };
    
    if( vkCreateCommandPool( GetDevice(), &cmd_pool_create_info, nullptr, pool ) != VK_SUCCESS ) {
      return false;
    }
    return true;

    19.Tutorial03.cpp, function CreateCommandPool()

    First we prepare a variable of type VkCommandPoolCreateInfo. It contains the following fields:

    • sType – Standard type of structure, set to VK_STRUCTURE_TYPE_COMMAND_POOL_CREATE_INFO here.
    • pNext – Pointer reserved for extensions.
    • flags – Indicates usage scenarios for command pool and command buffers allocated from it; that is, we can tell the driver that command buffers allocated from this pool will live for a short time; for no specific usage we can set it to zero.
    • queueFamilyIndex – Index of a queue family for which we are creating a command pool.

    Remember that command buffers allocated from a given pool can only be submitted to a queue from a queue family specified during pool creation.

    To create a command pool, we just call the vkCreateCommandPool() function.

    Allocating Command Buffers

    Now that we have the command pool ready, we can allocate command buffers from it.

    VkCommandBufferAllocateInfo command_buffer_allocate_info = {
      VK_STRUCTURE_TYPE_COMMAND_BUFFER_ALLOCATE_INFO, // VkStructureType                sType
      nullptr,                                        // const void                    *pNext
      pool,                                           // VkCommandPool                  commandPool
      VK_COMMAND_BUFFER_LEVEL_PRIMARY,                // VkCommandBufferLevel           level
      count                                           // uint32_t                       bufferCount
    };
    
    if( vkAllocateCommandBuffers( GetDevice(), &command_buffer_allocate_info, command_buffers ) != VK_SUCCESS ) {
      return false;
    }
    return true;

    20.Tutorial03.cpp, function AllocateCommandBuffers()

    To allocate command buffers we specify a variable of structure type. This time its type is VkCommandBufferAllocateInfo, which contains these members:

    • sType – Type of the structure; VK_STRUCTURE_TYPE_COMMAND_BUFFER_ALLOCATE_INFO for this purpose.
    • pNext – Pointer reserved for extensions.
    • commandPool – Pool from which we want our command buffers to take their memory.
    • level – Command buffer level; there are two levels: primary and secondary; right now we are only interested in primary command buffers.
    • bufferCount – Number of command buffers we want to allocate.

    To allocate command buffers, call the vkAllocateCommandBuffers() function and check whether it succeeded. We can allocate many buffers at once with one function call.

    I’ve prepared a simple buffer allocating function to show you how some Vulkan functions can be wrapped for easier use. Here is a usage of two such wrapper functions that create command pools and allocate command buffers from them.

    if( !CreateCommandPool( GetGraphicsQueue().FamilyIndex, &Vulkan.GraphicsCommandPool ) ) {
      printf( "Could not create command pool!\n" );
      return false;
    }
    
    uint32_t image_count = static_cast<uint32_t>(GetSwapChain().Images.size());
    Vulkan.GraphicsCommandBuffers.resize( image_count, VK_NULL_HANDLE );
    
    if( !AllocateCommandBuffers( Vulkan.GraphicsCommandPool, image_count, &Vulkan.GraphicsCommandBuffers[0] ) ) {
      printf( "Could not allocate command buffers!\n" );
      return false;
    }
    return true;

    21.Tutorial03.cpp, function CreateCommandBuffers()

    As you can see, we are creating a command pool for a graphics queue family index. All image state transitions and drawing operations will be performed on a graphics queue. Presentation is done on another queue (if the presentation queue is different from the graphics queue) but we don’t need a command buffer for this operation.

    And we are also allocating command buffers for each swap chain image. Here we take number of images and provide it to this simple “wrapper” function for command buffer allocation.

    Recording Command Buffers

    Now that we have command buffers allocated from the command pool we can finally record operations that will draw something on the screen. First we must prepare a set of data needed for the recording operation. Some of this data is identical for all command buffers, but some is referencing a specific swap chain image. Here is a code that is independent of swap chain images:

    VkCommandBufferBeginInfo graphics_commandd_buffer_begin_info = {
      VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO,    // VkStructureType                        sType
      nullptr,                                        // const void                            *pNext
      VK_COMMAND_BUFFER_USAGE_SIMULTANEOUS_USE_BIT,   // VkCommandBufferUsageFlags              flags
      nullptr                                         // const VkCommandBufferInheritanceInfo  *pInheritanceInfo
    };
    
    VkImageSubresourceRange image_subresource_range = {
      VK_IMAGE_ASPECT_COLOR_BIT,                      // VkImageAspectFlags             aspectMask
      0,                                              // uint32_t                       baseMipLevel
      1,                                              // uint32_t                       levelCount
      0,                                              // uint32_t                       baseArrayLayer
      1                                               // uint32_t                       layerCount
    };
    
    VkClearValue clear_value = {
      { 1.0f, 0.8f, 0.4f, 0.0f },                     // VkClearColorValue              color
    };
    
    const std::vector<VkImage>& swap_chain_images = GetSwapChain().Images;

    22.Tutorial03.cpp, function RecordCommandBuffers()

    Performing command buffer recording is similar to OpenGL’s drawing lists where we start recording a list by calling the glNewList() function. Next we prepare a set of drawing commands and then we close the list or stop recording it (glEndList()). So the first thing we need to do is to prepare a variable of type VkCommandBufferBeginInfo. It is used when we start recording a command buffer and it tells the driver about the type, contents, and desired usage of a command buffer. Variables of this type contain the following members:

    • sType – Standard structure type, here set to VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO.
    • pNext – Pointer reserved for extensions.
    • flags – Parameters describing the desired usage (that is, if we want to submit this command buffer only once and destroy/reset it or if it is possible that the buffer will submitted again before the processing of its previous submission has finished).
    • pInheritanceInfo – Parameter used only when we want to record a secondary command buffer.

    Next we describe the areas or parts of our images that we will set up image memory barriers for. Here we set up barriers to specify that queues from different families will reference a given image. This is done through a variable of type VkImageSubresourceRange with the following members:

    • aspectMask – Describes a “type” of image, whether it is for color, depth, or stencil data.
    • baseMipLevel – Number of a first mipmap level our operations will be performed on.
    • levelCount – Number of mipmap levels (including base level) we will be operating on.
    • baseArrayLayer – Number of an first array layer of an image that will take part in operations.
    • layerCount – Number of layers (including base layer) that will be modified.

    Next we set up a clear value for our images. Before drawing we need to clear images. In previous tutorials, we performed this operation explicitly by ourselves. Here images are cleared as a part of a render pass attachment load operation. We set to “clear” so now we must specify the color to which an image must be cleared. This is done using a variable of type VkClearValue in which we provide R, G, B, A values.

    Variables we have created thus far are independent of an image itself, and that’s why we have specified them before a loop. Now we can start recording command buffers:

    for( size_t i = 0; i < Vulkan.GraphicsCommandBuffers.size(); ++i ) {
      vkBeginCommandBuffer( Vulkan.GraphicsCommandBuffers[i], &graphics_commandd_buffer_begin_info );
    
      if( GetPresentQueue().Handle != GetGraphicsQueue().Handle ) {
        VkImageMemoryBarrier barrier_from_present_to_draw = {
          VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER,     // VkStructureType                sType
          nullptr,                                    // const void                    *pNext
          VK_ACCESS_MEMORY_READ_BIT,                  // VkAccessFlags                  srcAccessMask
          VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT,       // VkAccessFlags                  dstAccessMask
          VK_IMAGE_LAYOUT_PRESENT_SRC_KHR,            // VkImageLayout                  oldLayout
          VK_IMAGE_LAYOUT_PRESENT_SRC_KHR,            // VkImageLayout                  newLayout
          GetPresentQueue().FamilyIndex,              // uint32_t                       srcQueueFamilyIndex
          GetGraphicsQueue().FamilyIndex,             // uint32_t                       dstQueueFamilyIndex
          swap_chain_images[i],                       // VkImage                        image
          image_subresource_range                     // VkImageSubresourceRange        subresourceRange
        };
        vkCmdPipelineBarrier( Vulkan.GraphicsCommandBuffers[i], VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT, VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT, 0, 0, nullptr, 0, nullptr, 1, &barrier_from_present_to_draw );
      }
    
      VkRenderPassBeginInfo render_pass_begin_info = {
        VK_STRUCTURE_TYPE_RENDER_PASS_BEGIN_INFO,     // VkStructureType                sType
        nullptr,                                      // const void                    *pNext
        Vulkan.RenderPass,                            // VkRenderPass                   renderPass
        Vulkan.FramebufferObjects[i].Handle,          // VkFramebuffer                  framebuffer
        {                                             // VkRect2D                       renderArea
          {                                           // VkOffset2D                     offset
            0,                                          // int32_t                        x
            0                                           // int32_t                        y
          },
          {                                           // VkExtent2D                     extent
            300,                                        // int32_t                        width
            300,                                        // int32_t                        height
          }
        },
        1,                                            // uint32_t                       clearValueCount
        &clear_value                                  // const VkClearValue            *pClearValues
      };
    
      vkCmdBeginRenderPass( Vulkan.GraphicsCommandBuffers[i], &render_pass_begin_info, VK_SUBPASS_CONTENTS_INLINE );
    
      vkCmdBindPipeline( Vulkan.GraphicsCommandBuffers[i], VK_PIPELINE_BIND_POINT_GRAPHICS, Vulkan.GraphicsPipeline );
    
      vkCmdDraw( Vulkan.GraphicsCommandBuffers[i], 3, 1, 0, 0 );
    
      vkCmdEndRenderPass( Vulkan.GraphicsCommandBuffers[i] );
    
      if( GetGraphicsQueue().Handle != GetPresentQueue().Handle ) {
        VkImageMemoryBarrier barrier_from_draw_to_present = {
          VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER,       // VkStructureType              sType
          nullptr,                                      // const void                  *pNext
          VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT,         // VkAccessFlags                srcAccessMask
          VK_ACCESS_MEMORY_READ_BIT,                    // VkAccessFlags                dstAccessMask
          VK_IMAGE_LAYOUT_PRESENT_SRC_KHR,              // VkImageLayout                oldLayout
          VK_IMAGE_LAYOUT_PRESENT_SRC_KHR,              // VkImageLayout                newLayout
          GetGraphicsQueue().FamilyIndex,               // uint32_t                     srcQueueFamilyIndex
          GetPresentQueue( ).FamilyIndex,               // uint32_t                     dstQueueFamilyIndex
          swap_chain_images[i],                         // VkImage                      image
          image_subresource_range                       // VkImageSubresourceRange      subresourceRange
        };
        vkCmdPipelineBarrier( Vulkan.GraphicsCommandBuffers[i], VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT, VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT, 0, 0, nullptr, 0, nullptr, 1, &barrier_from_draw_to_present );
      }
    
      if( vkEndCommandBuffer( Vulkan.GraphicsCommandBuffers[i] ) != VK_SUCCESS ) {
        printf( "Could not record command buffer!\n" );
        return false;
      }
    }
    return true;

    23.Tutorial03.cpp, function RecordCommandBuffers()

    Recording a command buffer is started by calling the vkBeginCommandBuffer() function. At the beginning we set up a barrier that tells the driver that previously queues from one family referenced a given image but now queues from a different family will be referencing it (we need to do this because during swap chain creation we specified exclusive sharing mode). The barrier is set only when the graphics queue is different than the present queue. This is done by calling the vkCmdPipelineBarrier() function. We must specify when in the pipeline the barrier should be placed (VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT) and how the barrier should be set up. Barrier parameters are prepared through the VkImageMemoryBarrier structure:

    • sType – Type of the structure, here set to VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER.
    • pNext – Pointer reserved for extensions.
    • srcAccessMask – Type of memory operations that took place in regard to a given image before the barrier.
    • dstAccessMask – Type of memory operations connected with a given image that will take place after the barrier.
    • oldLayout – Current image memory layout.
    • newLayout – Memory layout image you should have after the barrier.
    • srcQueueFamilyIndex – Index of a family of queues which were referencing image before the barrier.
    • dstQueueFamilyIndex – Index of a queue family queues from which will be referencing image after the barrier.
    • image – Handle to the image itself.
    • subresourceRange – Parts of an image for which we want the transition to occur.

    In this example we don’t change the layout of an image, for two reasons: (1) The barrier may not be set at all (if the graphics and present queues are the same), and (2) the layout transition will be performed automatically as a render pass operation (at the beginning of the first—and only—subpass).

    Next we start a render pass. We call the vkCmdBeginRenderPass() function for which we must provide a pointer to a variable of VkRenderPassBeginInfo type. It contains the following members:

    • sType – Standard type of structure. In this case we must set it to a value of VK_STRUCTURE_TYPE_RENDER_PASS_BEGIN_INFO.
    • pNext – Pointer reserved for future use.
    • renderPass – Handle of a render pass we want to start.
    • framebuffer – Handle of a framebuffer, which specifies images used as attachments for this render pass.
    • renderArea – Area of all images that will be affected by the operations that takes place in this render pass. It specifies the upper-left corner (through x and y parameters of an offset member) and width and height (through extent member) of a render area.
    • clearValueCount – Number of elements in pClearValues array.
    • pClearValues – Array with clear values for each attachment.

    When we specify a render area for the render pass, we must make sure that the rendering operations won’t modify pixels outside this area. This is just a hint for a driver so it could optimize its behavior. If we won’t confine operations to the provided area by using a proper scissor test, pixels outside this area may become undefined (we can’t rely on their contents). We also can’t specify a render area that is greater than a framebuffer’s dimensions (falls outside the framebuffer).

    And with a pClearValues array, it must contain the elements for each render pass attachment. Each of its members specifies the color to which the given attachment must be cleared when its loadOp is set to clear. For attachments where loadOp is not clear, the values provided for them are ignored. But we can’t provide an array with a smaller amount of elements.

    We have begun a command buffer, set a barrier (if necessary), and started a render pass. When we start a render pass we are also starting its first subpass. We can switch to the next subpass by calling the vkCmdNextSubpass() function. During these operations, layout transitions and clear operations may occur. Clears are done in a subpass in which the image is first used (referenced). Layout transitions occur each time a subpass layout is different than the layout in a previous subpass or (in the case of a first subpass or when the image is first referenced) different than the initial layout (layout before the render pass). So in our example when we start a render pass, the swap chain image’s layout is changed automatically from “presentation source” to a “color attachment optimal” layout.

    Now we bind a graphics pipeline. This is done by calling the vkCmdBindPipeline() function. This “activates” all shader programs (similar to the glUseProgram() function) and sets desired tests, blending operations, and so on.

    After the pipeline is bound, we can finally draw something by calling the vkCmdDraw() function. In this function we specify the number of vertices we want to draw (three), number of instances that should be drawn (just one), and a numbers or indices of a first vertex and first instance (both zero).

    Next the vkCmdEndRenderPass() function is called which, as the name suggests, ends the given render pass. Here all final layout transitions occur if the final layout specified for a render pass is different from the layout used in the last subpass the given image was referenced in.

    After that, the barrier may be set in which we tell the driver that the graphics queue finished using a given image and from now on the present queue will be using it. This is done, once again, only when the graphics and present queues are different. And after the barrier, we stop recording a command buffer for a given image. All these operations are repeated for each swap chain image.

    Drawing

    The drawing function is the same as the Draw() function presented in Tutorial 2. We acquire the image’s index, submit a proper command buffer, and present an image. We are using semaphores the same way they were used previously: one semaphore is used for acquiring an image and it tells the graphics queue to wait when the image is not yet available for use. The second command buffer is used to indicate whether drawing on a graphics queue is finished. The present queue waits on this semaphore before it can present an image. Here is the source code of a Draw() function:

    VkSemaphore image_available_semaphore = GetImageAvailableSemaphore();
    VkSemaphore rendering_finished_semaphore = GetRenderingFinishedSemaphore();
    VkSwapchainKHR swap_chain = GetSwapChain().Handle;
    uint32_t image_index;
    
    VkResult result = vkAcquireNextImageKHR( GetDevice(), swap_chain, UINT64_MAX, image_available_semaphore, VK_NULL_HANDLE, &image_index );
    switch( result ) {
      case VK_SUCCESS:
      case VK_SUBOPTIMAL_KHR:
        break;
      case VK_ERROR_OUT_OF_DATE_KHR:
        return OnWindowSizeChanged();
      default:
        printf( "Problem occurred during swap chain image acquisition!\n" );
        return false;
    }
    
    VkPipelineStageFlags wait_dst_stage_mask = VK_PIPELINE_STAGE_TRANSFER_BIT;
    VkSubmitInfo submit_info = {
      VK_STRUCTURE_TYPE_SUBMIT_INFO,                // VkStructureType              sType
      nullptr,                                      // const void                  *pNext
      1,                                            // uint32_t                     waitSemaphoreCount
      &image_available_semaphore,                   // const VkSemaphore           *pWaitSemaphores&wait_dst_stage_mask,                         // const VkPipelineStageFlags  *pWaitDstStageMask;
      1,                                            // uint32_t                     commandBufferCount&Vulkan.GraphicsCommandBuffers[image_index],  // const VkCommandBuffer       *pCommandBuffers
      1,                                            // uint32_t                     signalSemaphoreCount&rendering_finished_semaphore                 // const VkSemaphore           *pSignalSemaphores
    };
    
    if( vkQueueSubmit( GetGraphicsQueue().Handle, 1, &submit_info, VK_NULL_HANDLE ) != VK_SUCCESS ) {
      return false;
    }
    
    VkPresentInfoKHR present_info = {
      VK_STRUCTURE_TYPE_PRESENT_INFO_KHR,           // VkStructureType              sType
      nullptr,                                      // const void                  *pNext
      1,                                            // uint32_t                     waitSemaphoreCount
      &rendering_finished_semaphore,                // const VkSemaphore           *pWaitSemaphores
      1,                                            // uint32_t                     swapchainCount&swap_chain,                                  // const VkSwapchainKHR        *pSwapchains&image_index,                                 // const uint32_t              *pImageIndices
      nullptr                                       // VkResult                    *pResults
    };
    result = vkQueuePresentKHR( GetPresentQueue().Handle, &present_info );
    
    switch( result ) {
      case VK_SUCCESS:
        break;
      case VK_ERROR_OUT_OF_DATE_KHR:
      case VK_SUBOPTIMAL_KHR:
        return OnWindowSizeChanged();
      default:
        printf( "Problem occurred during image presentation!\n" );
        return false;
    }
    
    return true;

    24.Tutorial03.cpp, function Draw()

    Tutorial 3 Execution

    In this tutorial we performed “real” drawing operations. A simple triangle may not sound too convincing, but it is a good starting point for a first Vulkan-created image. Here is what the triangle should look like:

    If you’re wondering why there are black parts in the image, here is an explanation: To simplify the whole code, we created a framebuffer with a fixed size (width and height of 300 pixels). But the window’s size (and the size of the swap chain images) may be greater than these 300 x 300 pixels. The parts of an image that lay outside of the framebuffer’s dimensions are uncleared and unmodified by our application. They may even contain some “artifacts,” because the memory from which the driver allocates the swap chain images may have been previously used for other purposes and could contain some data. The correct behavior is to create a framebuffer with the same size as the swap chain images and to recreate it when the window’s size changes. But as long as the blue triangle is rendered on an orange/gold background, it means that the code works correctly.

    Cleaning Up

    One last thing to learn before this tutorial ends is how to release resources created during this lesson. I won’t repeat the code needed to release resources created in the previous chapter. Just look at the VulkanCommon.cpp file. Here is the code needed to destroy resources specific to this chapter:

    if( GetDevice() != VK_NULL_HANDLE ) {
      vkDeviceWaitIdle( GetDevice() );
    
      if( (Vulkan.GraphicsCommandBuffers.size() > 0) && (Vulkan.GraphicsCommandBuffers[0] != VK_NULL_HANDLE) ) {
        vkFreeCommandBuffers( GetDevice(), Vulkan.GraphicsCommandPool, static_cast<uint32_t>(Vulkan.GraphicsCommandBuffers.size()), &Vulkan.GraphicsCommandBuffers[0] );
        Vulkan.GraphicsCommandBuffers.clear();
      }
    
      if( Vulkan.GraphicsCommandPool != VK_NULL_HANDLE ) {
        vkDestroyCommandPool( GetDevice(), Vulkan.GraphicsCommandPool, nullptr );
        Vulkan.GraphicsCommandPool = VK_NULL_HANDLE;
      }
    
      if( Vulkan.GraphicsPipeline != VK_NULL_HANDLE ) {
        vkDestroyPipeline( GetDevice(), Vulkan.GraphicsPipeline, nullptr );
        Vulkan.GraphicsPipeline = VK_NULL_HANDLE;
      }
    
      if( Vulkan.RenderPass != VK_NULL_HANDLE ) {
        vkDestroyRenderPass( GetDevice(), Vulkan.RenderPass, nullptr );
        Vulkan.RenderPass = VK_NULL_HANDLE;
      }
    
      for( size_t i = 0; i < Vulkan.FramebufferObjects.size(); ++i ) {
        if( Vulkan.FramebufferObjects[i].Handle != VK_NULL_HANDLE ) {
          vkDestroyFramebuffer( GetDevice(), Vulkan.FramebufferObjects[i].Handle, nullptr );
          Vulkan.FramebufferObjects[i].Handle = VK_NULL_HANDLE;
        }
    
        if( Vulkan.FramebufferObjects[i].ImageView != VK_NULL_HANDLE ) {
          vkDestroyImageView( GetDevice(), Vulkan.FramebufferObjects[i].ImageView, nullptr );
          Vulkan.FramebufferObjects[i].ImageView = VK_NULL_HANDLE;
        }
      }
      Vulkan.FramebufferObjects.clear();
    }

    25.Tutorial03.cpp, function ChildClear()

    As usual we first check whether there is any device. If we don’t have a device, we don’t have a resource. Next we wait until the device is free and we delete all the created resources. We start from deleting command buffers by calling a vkFreeCommandBuffers() function. Next we destroy a command pool through a vkDestroyCommandPool() function and after that the graphics pipeline is destroyed. This is achieved through a vkDestroyPipeline() function call. Next we call a vkDestroyRenderPass() function, which releases the handle to a render pass. Finally, all framebuffers and image views associated with each swap chain image are deleted.

    Each object destruction is preceded by a check whether a given resource was properly created. If not we skip the process of destruction of such resource.

    Conclusion

    In this tutorial, we created a render pass with one subpass. Next we created image views and framebuffers for each swap chain image. One of the most difficult parts was to create a graphics pipeline, because it required us to prepare lots of data. We had to create shader modules and describe all the shader stages that should be active when a given graphics pipeline is bound. We had to prepare information about input vertices, their layout, and assembling them into polygons. Viewport, rasterization, multisampling, and color blending information was also necessary. Then we created a simple pipeline layout and after that we could create the pipeline itself. Next we created a command pool and allocated command buffers for each swap chain image. Operations recorded in each command buffer involved setting up an image memory barrier, beginning a render pass, binding a graphics pipeline, and drawing. Next we ended a render pass and set up another image memory barrier. The drawing itself was performed the same way as in the previous tutorial (2).

    In the next tutorial, we will learn about the vertex attributes, images and buffers.


    Go to: API without Secrets: Introduction to Vulkan* Part 4: Vertex Attributes ( To Be Continued)


    Notices

    No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

    Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.

    This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps.

    The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are available on request.

    Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-800- 548-4725 or by visiting www.intel.com/design/literature.htm.

    This sample source code is released under the Intel Sample Source Code License Agreement.

    Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.

    *Other names and brands may be claimed as the property of others.

    © 2016 Intel Corporation.

    Enhance Media Performance & Quality with Intel's Media & OpenCL* SDKs - June 16 Webinar

    $
    0
    0

    REGISTER NOW    10 a.m., Pacific time

    Intel® Processor Graphics contain two types of media accelerators: fixed function codec/frame processing and execution units (EUs), used for general purpose compute. In this 1-hour webinar on June 16 at 10.a.m. (Pacific), learn how to more fully utilize these media accelerators by combining the Intel® Media SDK and Intel® SDK for OpenCL™ Applications for many tasks, IntelProcessor-CoreInterfaceincluding:

    • Applying video effects and filters
    • Accelerating computer vision pipelines
    • Improving encode/transcode quality

    June 16 Webinar Sign up NOW

    These two tools, both part of Intel® Media Server Studio, are better when used together. With just a few tips, tricks, and sharing APIs you can unlock the full heterogeneous potential of your hardware to create high performance custom pipelines. Then differentiate your media applications and solutions by combining fixed function operations with your own algorithms, to achieve disruptive performance beyond the standard Media SDK capabilities with the secret element that makes your products competitive and unique.

    In this session you will learn:

    • Big performance boosts are possible with Intel graphics processors (GPUs)
    • How to build media/graphics processing pipelines containing standard components, and customize with your algorithms and solutions
    • A short list of steps to share video surfaces efficiently between the Media SDK and OpenCL
    • How to combine Intel Media SDK and OpenCL to do many useful things utilizing Gen Graphics' rapidly increasing capabilities
    • And more

    Sign up today

    OpenCL-MediaSDK-IntelGPUuse

    Webinar Speakers

    • Jeff McAllister– Media Software Technical Consulting Engineer
    • Robert Ioffe - Technical Consulting Engineer & OpenCL* Expert

    OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos.

     

    Intel® Advisor 2017 Beta Update 1 - What’s new

    $
    0
    0

    Intel® Advisor 2017 Beta Update 1 - What’s new

     

    We’re pleased to announce new version of the Vectorization Assistant tool - Intel® Advisor 2017 Beta Update 1.

    Below are highlights of the new functionality in Intel Advisor 2017 Update 1.

    Full support for all analysis types on the second generation Intel® Xeon Phi processor (code named Knights Landing)

    FLOPS and mask utilization

    Tech Preview feature! Accurate hardware independent FLOPS measurement tool. (AVX512 only) Mask aware . Unique capability to correlate FLOPS with performance data.

     

    Improved MPI workflow allows you to create snapshots for MPI results, so you can collect data in CLI and transfer self-contained packed result to a workstation with GUI for analysis. We also fixed some GUI and CLI interoperability issues.

    Project properties dialog was extended with command line configuration for MPI launcher.

    Project properties dialog also extended to allow setting FLOPS analysis.

     

    Memory Access Patterns

    MAP analysis now detects Gather instruction usage, unveiling more complex access patterns. A SIMD loop with Gather instructions will work faster than scalar one, but slower, than SIMD loop without Gather operations.  If a loop has “Gather stride” category, check new “Details” tab in Refinement report for information about strides and mask shape for the gather operation. One of possible solutions is to inform compiler about your data access patterns via OpenMP 4.x options – for cases, when gather instructions are not necessary actually.

    In addition to Gather profiling, for AVX512 MAP analysis also detects Gather/scatter instruction usage, these instructions allow more code to vectorize but you can obtain greater performance by avoiding these types of instructions.

     

    SMART MODE

    New “Smart mode” for Survey results, which is an effective way to simplify data representation and to automatically select the loops that can be most impactful and/or suitable from a SIMD vector performance perspective.

     

    NEW RECOMMENDATIONS

    1. Consider outer loop vectorization in appropriate cases 
    2. Consider refactoring the code to avoid using single Gather or Scatter instructions if the pattern is regular

     

    Get Intel Advisor and more information

    Visit the product site, where you can find videos and tutorials. .

    Installing Intel Distribution Python for Windows in the non-default directory

    $
    0
    0

    The Intel Python Windows installer defaults to c:\IntelPython27 & c:\intelpython35, but you can customize the directory. When you see the screen below select customize and next. It will let you specify the directory.

    Customize Installation

    Integrating Intel® AMT Configuration into a Management Console

    $
    0
    0

    The complexity of Intel® Active Management Technology (Intel® AMT) configuration profiles vary depending on the enabled features. The first step in integrating Intel AMT into a management console is to determine which features the console should support.

    Begin by looking at configuration options within the ACU Wizard tool where you can examine the options. This tool is part of the Intel® Setup and Configuration Software (Intel® SCS) download. You can find more information about the options in the Intel SCS documentation within the Intel SCS download.

    Console Integration of Host-Based Configuration

    The most common console integration uses the host-based configuration methodology. This method uses the host's OS (Windows* 7+) with a scripted configuration to execute the configuration.

    This article shows how the ACU Wizard tool creates a sample configuration profile. The profile provides the expected XML code so the console can create and encrypt for deployment to Intel AMT devices.

    Note: If the console creates the profile XML, you should encrypt the file by using the SCSEncryption.exe tool prior to deployment to the Intel AMT device. Without encryption, the file will be sent to the client in clear text, exposing passwords within the profile.xml file.

    Automating the configuration process will involve creating the profile.xml file and creating a script to perform the configuration. The basic steps are:

    1. Copy the “configurator” folder and the profile.xml file from the SCS download folder to a location accessible by the Intel AMT client (Local, Network share, USB thumb drive, and so on).
    2. Open a command prompt with “Run as Administrator”  privileges, and then navigate to the acuconfig folder and run the following command: "acuconfig.exe configamt <profile.xml> /decryptionpassword <PASSWORD>"
    3. The configuration is successful if the program exits with code 0.

    Host-based configuration, as described above, has one significant disadvantage. It does not allow an Intel AMT device to be configured into Admin Control Mode. With a slight change to the configuration profile, we can point the firmware to a Setup and Configuration Server to access a Provisioning Certificate. For more detail on Admin Control Mode/Client Control Mode, see Intel vPro Setup and Configuration Integration.

    How to Use the ACU Wizard Tool

    The ACU Wizard tool has several methods for configuring an Intel AMT Device. However for our purposes, we only need one of the options to get our sample xml file. To create the profile.xml file while using ACU Wizard, do the following:

    1. Create the profile by opening the ACU Wizard and selecting the Create Settings to configure Multiple Systems option.
    2. The Intel® AMT Configuration Utility: Profile Designer Window opens.
    3. Click the green plus sign.
    4. When the Configuration Profile Wizard opens, click Next.
    5. When the Configuration Profile Wizard Optional Settings Window opens, click Next.
    6. The Configuration Profile Wizard System Settings Window opens.
      1. Enter the RFB password if being enabled (not required).
        1. RFB refers to the Remote Frame Buffer protocol, also known as RFB5900. Enabling the RFB password allows for the use of a standard VNC viewer using port 5900, as opposed to a VNC viewer enabled for Intel AMT, which also uses Port 16994 or 16995.
      2. Enter the password in the use the following password for all systems data field.
      3. To edit the network settings, click the Set... button.
        1. There are no changes to make if the host OS is DHCP Enabled. Note the changes required if the OS has a static IP address.
        2. Select Cancel.
      4. Click Next.
    7. The Configuration Profile Wizard - Finished window opens.
      1. Enter the profile name you want to use, for example: profile.
      2. Encrypt the xml file by adding and confirming the password.
      3. Click Finish.
    8. The Intel® AMT Configuration Utility: Profile Designer Window opens.
      1. Note the Profile Path shown on your screen.
        1. It should look like this: <userName>\documents\SCS_Profile
      2. Close the ACU Wizard.

    Note: For detailed instructions on using the ACU Wizard, please refer to or the documentation contained within the Intel® SCS download.

     

    Using the Profile.xml file

    Now we have an encrypted profile.xml. We next need to decrypt the file to expose the configuration parameters by using SCSEncryption.exe program, contained in the Intel SCS download. Once decrypted, you can open the file in an xml viewer and see the exposed xml tags.

    Decryption syntax:

    >SCSEncryption.exe Decrypt <input_filename> <password> /Output <output_filename>

    Note: If you wish to enable additional features within your profile or explore other features of Intel AMT, these features can be enabled in step 5 above. For example, one of the popular and highly recommended features is wireless configuration.

    Control Mode Choices

    The configuration process will place the Intel AMT device into one of two modes: Client Control Mode or Admin Control Mode. The main difference is that Client Control Mode requires User Consent for redirection operations and Admin Control Mode does not.

    User Consent

    The User Consent feature adds another level of security for remote users. A User Consent code must be submitted when a redirection or control is required of the remote client. For example, accessing via Remote KVM or executing an IDEr command is considered a redirection operation, but performing a get power state or reboot is not.

    Additional Resources

    Summary

    One of the most important integration tasks for managing Intel AMT-enabled devices is configuration. The process of configuration is straightforward when using ACUconfig.exe, however the profile creation process is the portion we need to address in depth.

    Using ACUWizard.exe we can create a sample profile.xml that gives us a snapshot showing how we can create dynamic console-based profiles, so we are not tied to a specific static profile. This gives us the ability to manage Intel AMT in a wider range of feature enablement, such as User Consent Configuration, wireless profiles, Active Directory Access Control Lists (AD ACLs), and so on.

    About the Author

    Joe Oster has been at Intel working with Intel® vPro™ technology and Intel AMT since 2006. When not working, he spends time working on his family farm or flying drones and RC aircraft.

    Developing a Touch-Friendly Option for Your Website and/or Web-Based App

    $
    0
    0

    By Jonathan Rodriguez

    Are you finding it hard to provide your users with a touch-friendly option on 2-in-1 devices? This is a common problem related to Web-based UIs, because most browsers do not easily reveal the current state of the OS, between desktop and tablet mode. So you are left with the agonizing questions: How do I find out? And after I find out, what do I do to make my UI touch-friendly? Read on for some suggestions.

    One of the things you will encounter while making your Web-based UI touch-friendly, is how hard this can be to execute once you take into account the many Web browsers through which your UI may be rendered (I apologize for the irony, there – it was intended.) But why is it so hard to implement what should be such a simple option? And if it is so hard, what is the simplest option?

    One of the reasons why you may be struggling to deploy a Web-based, touch-friendly option has its roots in the past. If you are familiar with the original Nintendo* gaming console and Michael Jackson’s Billie Jean song, you are probably old enough to remember how chaotic technology was in the mid-'90's. Back then, Web browsers allowed many questionable things to happen, “…thereby, making the world a [worse] place.” (a reference to the opposite of that statement, which many startups claim to be doing. I watch Silicon Valley. L.O.L.)

    But seriously, the Internet was a dangerous place back then. Computer systems could get hacked due to the most illogical Web browser technicalities.

    But we learned and evolved. And as a result of that dark past, these days, for security purposes, Web browsers are much more limiting. Additionally, some codes are simply not supported by some of the rendering engines (which execute the actual rendering of the pages within Web browsers), also with security in mind. Have you ever tried a piece of code that worked perfectly fine in one browser, but not in another? Now you get the gist.

    Now, here comes the good stuff...

    Due to the aforementioned Web browser limitations, when it comes to Web-based, touch-friendly development, almost every developer tends to resort to an approach that I call “Mode-Whispering.” Mode-Whispering refers to any method used to "fool" certain Web browsers into giving away the current state (Desktop or Tablet mode) of the OS, indirectly.

    While the browser is not technically “whispering” anything to anybody, the code does the “whispering” to another section within itself. Most of these Mode-Whispering codes involve placing JavaScript* codes that are able to count the available pixels (within the screen) in order to determine whether a scroll bar is present within the Web browser. If it is, the code then uses an algorithm to determine the size of that scroll bar. This approach gracefully determines the OS state between the Desktop and Tablet modes (because Desktop and Tablet modes generate different sizes of scroll bars.)

    While Mode-Whispering can work (mostly in Microsoft Edge*), I strongly recommend against using this approach. First, Mode-Whispering will not work unless your page is long enough to require a scroll bar from the Web browser (Web browsers will only show a scroll-bar if it is necessary.) But the main issue with Mode-Whispering is that your UI will have a piece of code (and icon/link) that will work in some browsers, but not in many others. Which could make your UI and user experience seem sub-par.

    Since choosing the experience for your users, automatically, is not currently a viable or reliable option from a Web-based UI, why not tap into the human element and use it as your best ally? Simply let your users decide whether they want a touch-friendly experience in the first place. I call this approach “Manual-Moding.”

    If you think about it, Manual-Moding makes sense. Unlike Mode-Whispering (which is done secretly in the background—deceitfully, at least from the Web browser’s perspective), Manual-Moding consists of a simple (straightforward) user-friendly option, preferably a button that your users can click, based on their preferences. This is the way the BitMar* premium TV platform deploys its touch-friendly option. Visit http://BitMar.com to see the Web-based app in action. (Full disclosure: I am the designer and developer behind BitMar.)

    In order to deploy a Manual-Moding option, you will need to frame the top portion of your UI (using either an iFrame or Frameset) and have two, distinct top-section pages: A Desktop and a Touch version—each with different-size images and/or links. Obviously, the larger image/font version is recommended for Touch mode. And the smaller-size version is for Desktop mode. Both pages will have to inter-link (link to each other) within the same frame, preferably using the same or similar icon/s (just as you can see happening on http://BitMar.com.)  

    Are there any disadvantages to implementing Manual-Moding? You bet—it’s not a perfect world. Although supported by every major Web browser (including mobile devices), Framesets are officially not part of the HTML5 standard (which is, arguably, the future of websites and Web-based apps.) However, unlike Framesets, iFrames are included in the standard. But HTML5-friendly iFrames may disable scripts and other page functionality. Therefore, it is recommended that you use Manual-Moding only within the top and side portions of your UI (that is, for the navigation menu and/or any other static menus and options). Otherwise, you risk having a UI that may not work properly in some Web browsers (because in the HTML5 standard, most Web browsers do not allow scripts to execute from any page that is rendered through an iFrame.) In the case of BitMar, the execution consists of two PHP pages, inter-linking, within the same frame. But you can do this with any type of page (.htm, .html, .asp, .php, and so on).

    So, just make the "correct" choice (based on your situation), or be open to updating your UI, should the Frameset officially die out. But, whatever the case, at least now you have two options. And one of them provides your users a universal touch-switch option that will work in any environment.

    Acknowledgements

    About the Author and Company

    Jonathan Rodriguez is the founder and CEO of BitMar Networks (the firm behind the BitMar premium TV platform.) BitMar is an Internet TV portal that provides access to millions of free, premium TV shows and movies (mostly in HD), over 200,000 channels, live radio and millions of songs (all in one single place, from safe, legal sources.) BitMar could be your free and legal alternative to expensive TV service, Netflix, Hulu, and so on. You may learn more at: http://BitMar.com

    The BitMar name is a combination of "bit" (the basic unit of information in computing and digital communications) and "mar" (a synonym for "break", "impair" and "disfigure"). The adaptation of "mar" in the BitMar context reflects disruption. 

    Innovative Media Solutions Showcase

    $
    0
    0

    New, Inventive Media Products Made Possible with Intel Media Software Tools

    With Intel media software tools, media/video solutions providers can create inspiring, innovative new products that capitalize on next gen capabilities like HEVC, high-dynamic range (HDR) content delivery, video security solutions with smart analytics, and more. Check these out. Envision how your company can use Intel's advanced media tools to re-invent new solutions for the media and broadcasting industry.

      Mobile Viewpoint Live Reporting Ronde of Norg

      Mobile Viewpoint Delivers HEVC HDR Live Broadcasting

      Mobile Viewpoint recently announced a new bonding transmitter that delivers HEVC (H.265) HDR video running on the latest 6th generation Intel® processors, - and through using the Intel® Media Server Studio Professional Edition optimizes HEVC compression and quality. And for broadcast-quality video, Intel’s graphics-accelerated codec enabled Mobile Viewpoint to develop a hardware platform that combines low power hardware-accelerated encoding and transmission. The new HEVC enabled software will be used in Mobile Viewpoint's Wireless Multiplex Terminal (WMT) AGILE high-dynamic range (HDR) back of the camera solutions and in its 19-inch FLEX IO encoding and O2 decoding products. The results: fast, high-quality, video broadcasting on-the-go so the world can stay better informed of fast-changing news and events. Read more.

       

      Sharp all-around security camera

      Sharp's New Innovative Security Camera is built with Intel® Architecture & Media Software

      With digital surveillance and security concerns now an everyday part of life, SHARP unveiled a new omnidirectional wireless, intelligent, digital security surveillance camera to better meet these needs. Built with an Intel® Celeron® processor (N3160), SHARP 12 megapixel image sensors, and by utilizing the Intel® Media SDK for hardware accelerated encoding, the QG- B20C camera can capture video in 4Kx3K resolution, provide all-around views, and is armed with many intelligent automatic detection functions. Read more.

       

      Magix Video Pro XMAGIX takes Video Editing to a New Level by Providing HEVC to Broad Users

      While elite video pros have access to high-powered video production applications with bells and whistles available traditionally only to enterprise,MAGIX has taken a broader approach unveiling its latest version of Video Pro X, a video editing software that sets new standards for semi-professional video production to widespread users. Optimized with Intel Media Server Studio, MAGIX Video Pro X provides Intel HEVC encoding to prosumers and semi-pros to help alleviate a bandwidth-constrained internet where millions of videos are shared and distributed. Read more.

       

      Comprimato2

      New JPEG2000 Codec Now Native for Intel Media Server Studio

      Comprimato recently worked with Intel on providing the best video encoding technology as part of Intel Media Server Studio by providing a plug-in for the software, which delivers high quality, low latency JPEG2000 encoding. The result is a powerful encoding option available to Media Server Studio users so that they can transcode JPEG2000 contained in IMF, AS02 or MXF OP1a files to distribution formats like AVC/H.264 and HEVC/H.265, and enable software-defined processes of IP video streams in broadcast applications. By using Intel Media Server Studio to access hardware-acceleration and programmable graphics in Intel GPUs, encoding can run super fast. This is a vital benefit because fast media processing significantly reduces latency in the connection, which is particularly important in live broadcasting. Read more.

       

      SPB TV AG Showcases Innovative Mobile TV/On-demand Transcoder enabled by Intel

      Unveiled at Mobile World Congress (MWC) 2016, SPB TV AG showed its innovative single-platform product line at the event, which included a new SPB TV Astra transcoder powered by IntelSPB TV Astra is a professional solution for fast, high-quality processing of linear TV broadcast and on-demand video streams from a single head-end to any mobile, desktop or home device. The transcoder uses Intel® Core™ i7 processors with media accelerators and delivers high-density transcoding via Intel Media Server Studio. “We are delighted that our collaboration with Intel ensures faster and high quality transcoding, making our new product performance remarkable,” said CEO of SPB TV AG Kirill Filippov. Read more.

       

      SURF Communications collaborates with Intel for NFV & WebRTC all-inclusive platforms

      Also at MWC 2016, SURF Communication Solutions announced SURF ORION-HMP* and SURF MOTION-HMP*, the next building blocks of the SURF-HMP™ family. The new SURF-HMP architecture delivers fast, high-quality media acceleration - facilitating up to 4K video resolutions and ultra-high capacity HD voice and video processing - running on Intel® processors with integrated graphics, and optimized by Media Server Studio. SURF-HMP is flexibly architectured to meet the requirements of evolving and large-scale deployments, is driven-by a powerful processing engine that supports all major video and voice codecs and protocols in use, and delivers a multitude of applications such as transcoding, conferencing/mixing, MRF, playout, recording, messaging, video surveillance, encryption and more. Read more.

       


      More about Intel Media Software Tools

      Intel Media Server Studio - Provides an Intel® Media SDK, runtimes, graphics drivers, media/audio codecs, and advanced performance and quality analysis tools to help video solution providers deliver fast, high-density media transcoding.

      Intel Media SDK - A cross-platform API for developing client and media applications for Windows*. Achieve fast video plaback, encode, processing, media format conversion, and video conferencing. Accelerate RAW video and image processing. Get audio decode/encode support.

      Accelerating Media Processing: Which Media Software Tool do I use? English | Chinese

      Viewing all 533 articles
      Browse latest View live


      <script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>