Quantcast
Channel: Intel Developer Zone Articles
Viewing all 533 articles
Browse latest View live

Media Client Getting Started Guide

$
0
0

Media Client Getting Started Guide

 

Introduction

Media Client is a feature of  Intel® Integrated Native Developer Experience  (Intel® INDE).  Media Client  includes Media SDK for Windows* foundation, Media Raw Accelerator for Windows* plugin and  Audio for Windows* plugin.

Media SDK for Windows* 

Media SDK (Software Development Kit) is a software development library that exposes the media acceleration capabilities of Intel platforms for decoding, encoding and video preprocessing. The API library covers a wide range of Intel platforms.. The  Intel®  Media SDK includes simple console samples, media framework components and a transcoding application for hands-on experience.

Download Media SDK samples for client and server tutorial pack.

Download reference manuals for  Media SDK , Media SDK for  Multi-view Video Coding, Media SDK for JPEG*/Motion JPEG, Media SDK extensions for User-defined  Functions and Intel® Media SDK Library Distribution and Dispatching Process.

Download Media SDK samples Guide.

Media RAW Accelerator for Windows*

The Media RAW Accelerator for Windows* is a development library that exposes the media acceleration capabilities of Intel® platforms for processing RAW data in Bayer format from camera and converting it to monitor compatible formats, also providing filters for additional color adjustment. The Intel® Media RAW Accelerator package includes a hardware accelerated plug-in library exposing RAW media acceleration capabilities, implemented as Intel® Media SDK VPP plug-in. It also includes a simple console sample showing how to use this library.

Download Media Raw Accelerator for Windows* with Intel® INDE.

Download the reference manual for Media Raw Accelerator for Windows*.

Audio for Windows*

The Audio for Windows*  is a development library that provides processing for decoding compressed audio streams into raw samples, encoding raw audio samples into compressed bit streams, auxiliary functions for synchronization and global auxiliary functions.  

Download Audio for Windows* with Intel® INDE Professional and Ultimate edition.

Download Intel® Audio for Windows* reference manual.


Media Client Release notes and support

$
0
0

Intended Audience

Software developers interested in a cross-platform API foundation and plugins to develop and optimize video  coding and processing, design products delivering visually stunning media by enabling RAW photo and 4K RAW video processing or deliver quality sound with audio encode & decode.

Customer Support

For technical support on Media client products, a feature of Intel® Integrated Native Developer Experience (Intel® INDE), including answers to questions not addressed in this product, latest online getting started help, visit the technical support forum, FAQs, and other support information, click  here.

To seek help with an issue in  Media client products, go to the user forum, click here.

To submit an issue for Ultimate or Professional Edition of  Media client products, go to Intel® Premier Support, click here.

Intel® Premier Support is not available for Starter Edition of the product.For more information on registering to Intel® Premier Support, click here.

Release notes for Media SDK for Windows*

Release notes for Media RAW Accelerator for Windows*

Release notes for Audio for Windows*

Media SDK Tutorials for Client and Server

$
0
0

The Media Software Development Kit (Media SDK) Tutorials show you how to use theMedia SDK by walking you step-by-step through use case examples from simple to increasingly more complex usages.

The Tutorials are divided into few parts (sections):

1.Introduces the Media SDK session concept via a very simple sample.
2-4.Illustrates how to utilize the three core SDK components: Encode, Decode and VPP (video pre/post processing).
5.Showcases transcode workloads, utilizing the components described in earlier sections.
6.Showcases more advanced and compound usages of the SDK.

For simplicity and uniformity the Tutorials focus on the H.264 (AVC) video codec. Other codecs are supported by Intel® Media SDK and can be utilized in a similar way.

Additional information on the tutorials can be found at https://software.intel.com/en-us/articles/media-sdk-tutorial-tutorial-samples-index. The Media SDK is available for free through the Intel® INDE Starter Edition for client and mobile development, or the Intel® Media Server Studio for datacenter and embedded usages. 

 Download the Media SDK tutorial in the following available formats:

Quick installation instructions:

  • For Linux setup MFX_HOME environment variable:
    export MFX_HOME=/opt/intel/mediasdk
  • For Windows setup INTELMEDIASDKROOT and build with Microsoft* VS2012.

Previous versions of the Tutorials package:

Intel® Identity Protection Technology-based Token Provider for RSA SecurID* Software Token for Microsoft Windows*

$
0
0

Download Document

Intel® Identity Protection Technology (Intel® IPT) provides a more secure environment for RSA SecurID* software token.

Introduction

This paper presents an overview of the token provider for EMC’s RSA SecurID* software token implemented using Intel® Identity Protection Technology (Intel® IPT) with public key infrastructure (PKI). Intel IPT with PKI provides hardware-enhanced protection of RSA cryptographic keys in specific Intel® Core™ vPro™ processor-powered systems. The token provider for EMC’s RSA SecurID based on Intel IPT provides hardware-enhanced protection of the RSA token seed by using Intel IPT with PKI cryptographic functions to encrypt and sign the token seed. This signed and encrypted token seed is used by the RSA SecurID software token to generate the OTP token. The token provider based on Intel IPT provides an additional layer of protection to the RSA OTP solution. This whitepaper explains how the Intel IPT with PKI hardware-enhanced cryptographic functions are used to provide a more secure environment for RSA SecurID software token.

Table of Contents

Intel Core vPro Processor Platforms and Features

Intel Core vPro processor technology addresses many IT security and platform management needs through its broad set of security, manageability, and productivity-enhancing capabilities. This technology is built into the new Intel Core vPro processor family, some smaller form-factor devices based on the Intel® Atom™ processor, and some Intel® Xeon® processors.

Among the notable security features included in Intel Core vPro processor platforms is the Intel Identity Protection Technology described in the next chapter. Additional features found on Intel Core vPro processor platforms and platforms based on the 4th generation Intel Atom processor for business include:

  • Improved device manageability with Intel® Active Management Technology
    • Out of band system access
    • Hardware-based host agent status checking
    • Remote diagnostics and repair tools such as hardware-based KVM, IDE redirection, power control and more
  • Hardware-assisted secure boot coupled with platform trust technology
    • Hardware-assisted secure boot, along with early launch anti-malware drivers, enable a boot in a known trusted environment.
    • Credential storage and key management capability to meet Windows 8 CSB requirements, optimized for low power consumption in S0ix environment.
  • Improved data encryption performance with Intel® AES New Instructions (Intel® AES-NI)
    • Intel AES-NI provides a faster, more secure AES engine for a variety of encryption apps, including whole disk encryption, file storage encryption, conditional access of HD content, Internet security, and VoIP. Consumers benefit from increased protection for Internet and email content, plus faster, more responsive disk encryption.
  • Improved operating system security with Intel® Secure Key
    • A hardware-based random number generator that can be used for generating high-quality keys for cryptographic (encryption and decryption) protocols. Provides quality entropy that is important in the cryptography world for added security.
  • Improved operating system security with Intel® OS Guard
    • An enhanced hardware-based security feature that better protects the OS kernel. Intel OS Guard protects areas of memory marked as user mode pages and helps prevent attack code in a user mode page or a code page, from taking over the OS kernel. Intel OS Guard is not application-specific and can protect the kernel from any application.

To find out more about the features included in Intel Core vPro processor platforms, visit http://intel.com/vpro.

Intel Identity Protection Technology with Public Key Infrastructure

Intel IPT with PKI uses the Intel® Management Engine (Intel® ME) in specific Intel Core vPro processor-powered systems to provide a hardware-based security capability. Intel IPT with PKI provides hardware-enhanced protection of RSA 1024 and 2048 asymmetric cryptographic keys. The Intel IPT with PKI capability is exposed as a crypto service provider (CSP) via the Microsoft CryptoAPI* software layer. Software that supports the use of cryptographic features through CryptoAPI can use Intel IPT with PKI to:

  • Securely generate tamper resistant, persistent RSA key pairs in hardware
  • Generate PKI certificates from hardware-protected RSA key pairs
  • Perform RSA private key operations within a protected hardware environment
  • Protect key usage via PINs that use the Intel IPT with PKI protected transaction display (PTD)

Both the RSA key-pair and the PKI certificates generated by Intel IPT with PKI are stored on the hard drive. The RSA keys are first wrapped within the hardware with something called the platform binding key (PBK) before being stored on the hard drive. The PBK is unique for each platform using Intel IPT with PKI and cannot be exported from the Intel ME. When the RSA key is needed, it must be brought back into the Intel ME to be unwrapped.

The hardware enhancements of Intel IPT with PKI focus on enhanced RSA private key protection; but it should be noted that the installed CSP can be used for any algorithms typically supported by software-based CSPs. Non-RSA operations are performed in software and provide the same level of protection as existing software-based CSPs shipped with Microsoft Windows 7 and above. Applications based on CryptoAPI should be able to transparently use Intel IPT with PKI and derive the benefits of enhanced private key protection with little, if any, modification.

The RSA keys and certificates created by Intel IPT with PKI support existing PKI usage models. Some typical usage scenarios include:

  • VPN authentication
  • Email and document signing
  • SSL web site authentication

Intel IPT with PKI provides a PC-embedded 2nd factor of authentication to validate legitimate users in an enterprise. Compared to a hardware security module, external reader, or a TPM, Intel IPT with PKI can be less expensive and easier to deploy. Compared to a software-based cryptographic product, Intel IPT with PKI is generally more secure. Intel IPT with PKI provides a good balance between security, ease of deployment, and cost.

Overview of RSA SecurID Software Token

RSA SecurID software tokens use the same algorithm (AES-128) as RSA SecurID hardware tokens while eliminating the need for users to carry dedicated hardware key fob devices. Instead of being stored in hardware, the symmetric key is securely safeguarded utilizing Intel IPT with PKI. RSA SecurID software authenticators reduce the number of items a user has to manage for safer and more secure access to corporate assets. Software tokens can help the enterprise cost-effectively manage secure access to information and streamline the workflow for distributing and managing two-factor authentication for a global work force. Additionally, software tokens can be revoked and recovered when someone leaves the company or loses a device, eliminating the need to replace tokens.

RSA SecurID Software Token for Microsoft Windows

Features

  • Strong two-factor authentication to protected network resources
  • Software token automation for integration with available RSA SecurID partner applications
  • Silent, secure installation
  • Multiple token provisioning options including dynamic seed provisioning (CT-KIP)
  • Web plug-in for faster access to protected web sites with Microsoft Internet Explorer*
  • Interoperability with Windows screen readers for visually impaired users

Overview of the Intel Identity Protection Technology based token provider for RSA SecurID software token

The Intel IPT-based token provider provides two functions: 1) the initial encryption, signing, and storage of the token seed using a platform binding key when it is provisioned to the system, and 2) the signature validation, decryption, and calculation of the OTP token.

Provisioning the RSA SecurID software token Seed


Provisioning the RSA SecurID software token involves the following functions:

  • Import the token seed from a file or from the web.
  • Use the Intel® Hardware Cryptographic Service Provider (Intel® CSP) that is included in the Intel IPT with PKI binaries to generate a platform binding key, which is unique per platform.
  • Encrypt the token seed using the platform binding key.
  • Use the Intel CSP to sign the encrypted token seed using the platform binding key.
  • Store the signed and encrypted token seed in the Intel® Persistent Storage Manager device.


Figure 1 – Token Seed Provisioning Architecture


Figure 2 – The Token Storage Devices screen [or UI]

Using the Hardware-Protected RSA SecurID software Token Seed to Generate the OTP Token


RSA SecurID software OTP token generation involves the following functions:

  • Read the signed and encrypted token seed from the Intel Persistent Storage Manager device.
  • Use the CSP-based platform binding key from Intel IPT with PKI to validate the signature on the signed and encrypted token seed.
  • Use the CSP from Intel IPT with PKI to decrypt the token seed.
  • Call the RSA token library to generate the next OTP token.


Figure 3 – Using the hardware-protected token seed to generate the OTP token

Summary

The token provider for EMC’s RSA SecurID software token based on Intel IPT provides hardware-enhanced protection of the RSA token seed by using Intel IPT with PKI cryptographic functions to encrypt and sign the RSA SecurID software token seed and bind it to the specific Intel platform.

Related Links for Intel Identity Protection Technology with PKI


For more information on Intel IPT with PKI and protected transaction display visit:

Brick by Brick: Building a Better Game with LEGO* Minifigures Online

$
0
0

Download  Lego Minifigures Optimization.pdf

Game makers now enjoy unprecedented market opportunity by offering titles that deliver advanced gaming experiences on both PCs that run Microsoft Windows* and on mobile devices that run Android*. Optimizing graphics for Intel® Core™ processors as well as Intel® Atom™ processors is rapidly becoming a strategic imperative.

With the evolution of mobile gaming beyond its roots in casual games, revenue projections in this segment are growing dramatically. In fact, market research firm Newzoo projects that mobile games will replace consoles as the largest game segment by revenue in 2015, reaching USD 30.0 billion in 2015 and USD 40.9 billion by 2017.1

Helping cement its more than 20 years of providing well-regarded games, Funcom developed LEGO* Minifigures Online (LMO) with both Intel® architecture-based 2 in 1 PCs and Android tablets as primary target devices. The company’s optimizations provide exceptional graphical experiences on both platforms, building on recognized successes by Funcom that include The Longest Journey (ranked number 59 on the MetaCritic list of the top 100 PC games of all time)2, as well as Anarchy Online*, Age of Conan*, and The Secret World*.

Advanced Pixel Synchronization Effects for Intel® Graphics Technology

The current generation of Intel® graphics hardware extends Intel’s leadership in enabling innovation across the industry, including being fully ready for DirectX* 12 and driving the adoption of advanced features by next-generation games. An excellent example is Intel’s pixel synchronization extension for DirectX 11, which enables programmable blending operations.

This set of capabilities is being widely adopted, becoming a part of the DirectX 12 standard (under the name Raster Ordered Views), being supported by graphics hardware from other manufacturers (such as Nvidia Maxwell*), and being enabled in OpenGL* with the GL_INTEL_fragment_shader_ordering extension.

Intel’s pixel synchronization extension gives developers control over the ordering of pixel shader operations. It can be used to implement functions such as custom blending, advanced volumetric shadows, and order-independent transparency. It provides a way to serialize and synchronize access to a pixel from multiple pixel shaders and to guarantee deterministic pixel changes. On Intel® hardware, the serialization is limited to directly overlapping pixels, so performance remains unchanged for the rest of the code.

Examples of algorithms that are enabled by this set of features include the following:

LEGO Minifigures Online uses AVSM to achieve advanced smoke and cloud effects on both Windows and Android. Comparisons of game scenes on Intel processor-based 2 in 1 PCs with AVSM disabled versus the same scenes with AVSM enabled are shown in Figures 1 through 4. Enhanced graphics quality using AVSM in these scenes provides a more realistic and immersive gaming experience that will also be made available for Android tablets based on Intel Atom x5 and x7 processors.


Figure 1.Actually Hopping Antelope – Level 2” scene with AVSM disabled.


Figure 2.Actually Hopping Antelope – Level 2” scene with AVSM enabled.


Figure 3.Scarlet Serrated Brainiac – Level 5” scene with AVSM disabled.


Figure 4.Scarlet Serrated Brainiac – Level 5” scene with AVSM enabled.

Cross-Platform Playability and Scaling

LEGO Minifigures Online has been optimized for 4th generation Intel Core processors. It also provides support for both laptop and tablet modes on 2 in 1 PCs as shown in Figures 5 and 6, giving users the raw horsepower of the laptops they love in a more casual environment by converting the device to tablet mode. This flexibility allows gamers to play LMO when they want, where they want, in the mode they want – giving them more opportunity to play.


Figure 5.Scarlet Serrated Brainiac - Level 5” scene in Laptop Mode.

Notice the larger, more conveniently located touch icons for gamers.


Figure 6.Scarlet Serrated Brainiac - Level 5” scene in Tablet Mode.

The enhanced graphics capabilities across Intel® platforms make it possible for users on high-end Windows desktops, Windows laptops, 2 in 1 devices, and Intel Atom processor-based tablets running both Windows and Android, to all play together in the same immersive game world.

Improved Battery Life on Intel® Core™ Processors

Optimizing games to reduce power consumption is not only an important aspect of the user experience, but it can also be a critical component to getting favorable reviews. The releases of many otherwise well-received games have been marred by the dreaded one-star reviews dominated by the phrase “kills the battery.”

Intel and Funcom worked together to add Battery Saving Mode as a user-controlled option in LEGO Minifigures Online, as illustrated in Figure 7. This capability can extend battery life by nearly 80 percent on 4th generation Intel Core processors and more than 100 percent on 5th generation Intel Core processors.3


Figure 7.Battery Saving Mode in LEGO* Minifigures Online.

The fundamental approach to improving battery life is to reduce the amount of work for the processor and GPU. Battery Saving Mode in LEGO Minifigures Online achieves that goal by capping framerate at 30 frames per second, disabling anisotropic filtering, post-processing FX, and anti-aliasing.

The overall effect of these measures is to reduce frame draw time, allowing the processor and GPU to enter deeper sleep states during periods of inactivity, thus improving battery life. Details of these battery-life optimizations are available in the Game Developer Conference 2015 presentation, “Power Efficient Programming: How Funcom increased play time in Lego Minifigures by 80%.”

Optimization for Android Devices Based on Intel® Atom™ Processors

Successfully shipping more than its goal of 40 million processors for tablets in 20144, Intel has become one of the largest silicon providers for tablets and a growing force in the Android market segment. Intel is extending this drive into 2015 with the introduction of the Intel Atom x5 and x7 processors, based on industry-leading 14 nm manufacturing process technology and compact, low-power system-on-chip (SoC) designs.

  • Performance improvements for gaming include Gen 8 graphics, as well as support for 64-bit processing and multi-tasking.
  • Enhanced battery life is provided by capabilities that include Intel® Display Power Saving Technology and Intel® Display Refresh Rate Switching Technology to help reduce panel backlight and refresh rate opportunistically.

An initial focus for performance improvement of LEGO Minifigures Online on Android devices was native compilation for Intel platforms. Non-native binaries, such as those compiled for ARM*, must be run by the Intel Atom processor using just-in-time compilation, which incurs additional processing overhead, interferes with advanced offline compilation optimizations, and increases loading times.

Intel worked with Funcom to ensure that Android installation packages include native binaries for Intel architecture, overcoming those previous limitations. In fact, providing this support for Android games using the Unity* game engine is straightforward, as discussed in the Intel® Developer Zone article, “Adding x86 Support to Android* Apps Using the Unity* Game Engine.” Further information is available in the articles, “Google Play* Store Submission Process: Android* APK” and “How to Publish Your Apps on Google Play* For x86-based Android* Devices Using Multiple APK Support.”

Conclusion

Intel architecture provides a compelling set of opportunities for game developers to expand their potential market segment share. Optimized games can deliver excellent graphical user experiences across the full range of target systems—from high-end desktop systems, to laptop PCs, 2 in 1s, and Intel Atom processor-based tablets. Enabling gameplay that responds to the needs of each platform supports broader usability and prepares game companies to benefit from ongoing expansion of mobile gaming in the years to come.

About the Authors

Filip Strugar is a former game developer, now working for Intel as a Software Graphics Engineer. He enjoys working on various algorithms, inventing things like CMAA and helping game developers in making their games run best on Intel graphics hardware.

Landyn Pethrus is an engineer at Intel, avid gamer, and hardware enthusiast.  When Landyn is not fountain sniping with Ancient Apparition in Dota2, slaying bosses, or pursuing higher level education, he can be found on the rivers of Oregon fishing.

For more information, visit the Intel Game Developer Community at https://software.intel.com/en-us/gamedev/tools

1   Newzoo BV, “Global Mobile Games Revenues to Reach $25 Billion in 2014.” October 29, 2014. www.newzoo.com/insights/global-mobile-games-revenues-top-25-billion-2014/.

2   CBS Interactive as of April 25, 2015. www.metacritic.com/browse/games/score/metascore/all/pc.

3   Source : Internal Intel® battery rundown tests. See details at https://software.intel.com/sites/default/files/managed/4a/38/Power_Efficient_Programming_GDC_2015_Final.pdf.

4   Brian M. Krzanich, Intel CEO Letter to Shareholders, Intel 2014 Annual Report. http://www.intc.com/common/download/download.cfm?companyid=INTC&fileid=819111&filekey=43FE7343-2D01-42E3-A09C-99A3BDEAEEE9&filename=Intel_2014_Annual_Report.pdf.

 

Notices

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.

This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps.

The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are available on request.

Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-800-548-4725 or by visiting www.intel.com/design/literature.htm.

Intel, the Intel logo, Intel Atom, and Intel Core are trademarks of Intel Corporation in the U.S. and/or other countries.

*Other names and brands may be claimed as the property of others

© 2015 Intel Corporation.

Quick Installation Guide for Media SDK on Windows with Intel® INDE

$
0
0

Intel® INDE provides a comprehensive toolset for developing Media applications targeting both CPU and GPUs, enriching the development experience of a game or media developer. Yet, if you got used to work with the legacy Intel® Media SDK or if you just want to get started using those tools quickly, you can follow these steps and install only the Media SDK components of Intel® INDE.

Go to the Intel® INDE Web page, select the edition you want to download and hit Download link:

At the Intel INDE downloads page select Online Installer (9 MB):

At the screen, where you need to select, which IDE to integrate Getting Started tools for Android* development, click Skip IDE Integration and uncheck the Install Intel® HAXM check box:

At the component selection screen, select only Media SDK for Windows, Media RAW Accelerator for Windows, Audio for Windows and Media for Mobile in the build category (you are welcome to select any additional components that you need as well), and click Next. Installer will install all the Media SDK  components.

Complete the installation and restart your computer. Now you are ready to start building media applications with Intel® Media SDK components!

If later you decide that you need to install additional components of the Intel® INDE suite, rerun the installer, select Modify option to change installed features:

and then you can select additional components that you need:

Complete the installation and restart your computer. Now you are ready to start using additional components of the Intel® INDE suite!

 

Calligra* Gemini - Stage & Words Transforming 2 in 1 Interface

$
0
0

Download PDF[PDF 2.06MB]


Figure 1: Calligra* Gemini Stage in tablet mode


Figure 2: Calligra* Gemini Words in tablet mode

Introduction


The release of Calligra* Gemini, an open source office suite updated with a transforming interface for 2 in 1 devices, is a cool step in office application design. Imagine making last minute changes to your presentation, then handing your customer the 2 in 1 device in tablet mode so your presentation plays before them with touchscreen interaction. Or imagine being able to easily edit a text document when you’re using your 2 in 1 device in tablet mode. The transformable user interface (UI) added to these applications—Calligra* Stage for presentations and Calligra* Words for word processing—will make this kind of usage a reality.

Why Calligra

KO GmBH has been producing quality open source applications and tools for some time. Last year, with Intel’s help, they released their first Gemini product, Krita* Gemini. This product is a painting and illustration application with a transformable UI that was downloaded 100,000 times in 2014. The transformable UI in Krita Gemini makes the app particularly useful for users with 2 in 1 devices, who can now easily create or edit images while on the go. Figure 3 shows a 2 in 1 devices transforming from one mode to the other.

Figure 3: Example 2 in 1 Device, Dell XPS* 12, mid-transformation

KO GmBH and the Intel team again joined forces to produce Calligra office suite applications with transformable UI’s for Intel 2 in 1 devices. After recognizing the growing success of 2 in 1 devices, development lead Dan Leinir Turthra Jensen was joined by Intel’s Björn Taubert in creating the code for 2 in 1 transformation in Calligra Gemini. Calligra is a suite of applications including whiteboard, flowchart, database, spreadsheet, and drawing applications in addition to the word processing and presentation apps we are discussing in this article.  These two applications were chosen primarily because of their usefulness on touch-only devices, thus allowing for a single application that can work on the desktop, tablet, and for our purposes, the 2 in 1 device.

Change the UI for “Touch-ability”


Because of the experience the team had in developing Krita Gemini, they already had the technical expertise to add 2 in 1 support.  The hardest part was determining how much to change the UI for the word processing and presentation apps in Calligra. The team approached their users and experts to develop a list of “must keep on the screen” controls.  They set out to create a great experience for their users in new ways not available before the 2 in 1 device. They had to take into account more than the traditional mouse and keyboard inputs that have been the foundation of office applications since the first graphical user interfaces. They added touch support with a focus on improving a select number of usage models.  They watched users operating the Calligra office apps differently depending on the circumstance, and they saw an opportunity to create apps that adapted to the users’ intention.

These scenarios drove the developers’ decisions. For the user wanting to have easy access to all the options the app provides, the team first updated their desktop UI. This was the case for both applications in this article. Single users operating Calligra Stage or Words in an office environment had all they needed and expected in the usability of the office suite. However, once touch was added (and therefore traditional inputs removed), the user experience needed to change. Mobility, collaboration, and consumption became much more important. Instead of simply being an either/or experience of creation or consumption, the simultaneous experiences of both become more commonplace. Adding the 2 in 1 device to the mix allows for the best traditional experience while supporting these new experiences.

The developers made carefully considered changes for each application when determining the tablet mode versus the desktop mode UI. They focused the desktop UI on access to full editing tools, but designed the tablet mode UI with fewer on-screen options for review, comment, and consumption.

Figures 4 and 5 show the Calligra Gemini desktop UI for Stage and Words. The team decided to provide many status indicators and shortcuts on the desktop UI screen for the user creating new documents or performing edits. In Figure 4, notice there are multiple options for navigating slides and a number of shortcut buttons that provide access to a myriad of features. Each app has over 70 clickable controls and actions when running in the desktop UI.

Figure 4: Stage desktop mode for creation and editing

Figure 5: Words desktop mode for traditional word processing

This versatility had to be reined in to adapt the apps for the tablet UI. In contrast to desktop mode, Figures 6 and 7 show the tablet UI with the number of actions available on the screen reduced to as few as 5. These controls were chosen with the intent of supporting review, commentary, and simplified editing by a group of users in a setting more public than an individual’s office, maybe a conference room full of people. For example, the tablet UI for Stage allows the user to add an image, but does not provide controls for charts and other more complex visuals.

Figure 6: Stage tablet mode for quick editing

Figure 7: Words tablet mode for editing and annotating

The KO GmBH team went further to support more usage scenarios that met their users’ needs for collaboration features. For Stage they chose to make a presentation “play” screen with whiteboard-like (highlighter) and projection screen-like (pointer) functionality in tablet mode. The red pointer allows for laser pointer-like, on-the-fly focus that does not stay on the screen for more than a second or two. The yellow highlighter produces a screen focal point that stays until the slide is changed. Neither action is an actual edit of the document. Figures 8 through 10 show the Stage play screen in tablet mode.

Figure 8: Stage tablet mode for review and play

Figure 9: Stage tablet mode red pointer in action

Figure 10: Stage tablet mode yellow highlighter in action

When the team designed the tablet UI for word processing, other methods were considered, again focusing on the collaboration features users wanted. The team added edit options to the tablet mode with colorful messages for instant feedback as shown in Figure 11.

Figure 11: Words tablet mode example visual annotation

The final element the team approached was making sure there was a pure document consumption view available in both apps. These modes are shown in Figures 12 and 13. A nice feature of the Words “Distraction-free” view is that it is very convenient for reading documents in natural portrait orientation.

Figure 12: Words distraction-free mode view in portrait orientation

Figure 13: Stage desktop mode for individual consuming

Add a Manual Transform UI Element

Since some users want to use the app in ways developers don’t always consider, the Gemini team made it easy for users to transform the UI manually. All the team did was add a single UI element to get from desktop to tablet mode and back as shown in Figure 14. Advanced users will be grateful for this button.

Figure 14: Manual transform button in desktop (left) and tablet (right) modes

The UI transformation code is provided for your reference in Figure 15.

// Snip from Gemini - Perform 2-in1 Mode Transition via Button:

#ifdef Q_OS_WIN
bool MainWindow::winEvent( MSG * message, long * result ) {
     if (message && message->message == WM_SETTINGCHANGE && message->lParam)
     {
         if (wcscmp(TEXT("ConvertibleSlateMode"), (TCHAR *) message->lParam) == 0)
             d->notifySlateModeChange();
         else if (wcscmp(TEXT("SystemDockMode"), (TCHAR *)
message->lParam) == 0)
             d->notifyDockingModeChange();
         *result = 0;
         return true;
     }
     return false;
}
#endif

void MainWindow::Private::notifySlateModeChange()
{
#ifdef Q_OS_WIN
     bool bSlateMode = (GetSystemMetrics(SM_CONVERTIBLESLATEMODE) == 0);

     if (slateMode != bSlateMode)
     {
         slateMode = bSlateMode;
         emit q->slateModeChanged();
         if (forceSketch || (slateMode && !forceDesktop))
         {
             if (!toSketch || (toSketch && toSketch->isEnabled()))
                 q->switchToSketch();
         }
         else
         {
                 q->switchToDesktop();
         }
         //qDebug() << "Slate mode is now"<< slateMode;
     }
#endif
}

void MainWindow::Private::notifyDockingModeChange()
{
#ifdef Q_OS_WIN
     bool bDocked = (GetSystemMetrics(SM_SYSTEMDOCKED) != 0);

     if (docked != bDocked)
     {
         docked = bDocked;
         //qDebug() << "Docking mode is now"<< docked;
     }
#endif
}

Figure 15: Snip from Gemini - Perform 2 in1 Mode Transition via Button

Conclusion

App developers already know to consider user needs when designing features. What the 2 in 1 does is allow designers more choices in providing the best experience for their users by making apps that tailor the UI to the users’ intentions. It is very exciting to see something as “everyday” as productivity apps made better with some carefully thought through user interface changes available only on a 2 in 1 device. Productivity never felt so good.

References & Links

Download Calligra Gemini from their web site under the Windows* at http://userbase.kde.org/Calligra/Download, and learn more about Calligra Gemini at http://heap.kogmbh.net/leinir/.

Find information about touch development and 2 in 1 devices:
Introducing the Intel Developer Zone
2 in 1 Information
Touch Developer Guide for Ultra Mobile Devices
Ultrabook and Tablet Windows* 8 Sensors Development Guide
Designing for Ultrabook Devices and Touch-enabled Desktop Applications
How to Write a 2 in 1 Aware Application
Mixing Stylus and Touch Input on Windows* 8
Krita* Gemini* - Twice as Nice on a 2 in 1
Krita Gemini 2 in 1 UI Change

About the Author

Tim Duncan is technology enthusiast and an Intel engineer described by friends as “Mr. Gidget-Gadget.” Currently helping developers integrate technology into solutions, Tim has decades of industry experience, from chip manufacturing to systems integration. Find him on the Intel® Developer Zone: Tim Duncan (Intel), on Twitter: @IntelTim.

Using Intel MKL and Intel TBB in the same application

$
0
0

Intel MKL 11.3 Beta has introduced Intel TBB support.

Intel MKL 11.3 can increase performance of applications threaded using Intel TBB. Applications using Intel TBB can benefit from the following Intel MKL functions:

  •          BLAS dot, gemm, gemv, gels
  •          LAPACK getrf, getrs, syev, gels, gelsy, gesv, pstrf, potrs
  •          Sparse BLAS csrmm, bsrmm
  •          Intel MKL Poisson Solver
  •          Intel MKL PARDISO

If such applications call functions not listed above, Intel MKL 11.3 executes sequential code. Depending on feedback from customers, future versions of Intel MKL may support Intel TBB in more functions.

Linking applications to Intel TBB and Intel MKL

The simplest way to link applications to Intel TBB and Intel MKL is to use Intel C/C++ Compiler. While Intel MKL supports static and dynamic linking, only dynamic Intel TBB library is available.

Under Linux, use the following commands to compile your application app.c and link it to Intel TBB and Intel MKL.

Dynamic Intel TBB, dynamic Intel MKL                    icc app.c -mkl -tbb

Dynamic Intel TBB, static Intel MKL                         icc app.c -static -mkl -tbb

Under Windows, use the following commands to compile your application app.c and link it to dynamic Intel TBB and Intel MKL.

Dynamic Intel TBB, dynamic Intel MKL                    icl.exe app.c -mkl -tbb

Improving Intel MKL performance with Intel TBB

Performance of Intel MKL can be improved by telling Intel TBB to ensure thread affinity to processor cores. Use the tbb::affinity_partitioner class to this end.

To improve performance of Intel MKL for small input data, you may limit the number of threads allocated by Intel TBB for Intel MKL. Use the tbb::task_scheduler_init class to do so.

For more information on controlling behavior of Intel TBB, see the Intel TBB documentation at https://www.threadingbuildingblocks.org/documentation.

LAPACK performance in applications using Intel TBB and Intel MKL 11.3

* Each call is single run of single size on range from 1000 to 10000 with step 1000.  Performance (GFlops) is computed as cumulative number of floating point operations for all 10 calls divided by wall clock time from starting very first call till finishing very last call.

 


Optimizing Power for Interactions between Virus Scanners and Pre-bundled Software

$
0
0

Download PDF

Introduction

Original equipment manufacturers (OEMs) like Lenovo,Toshiba, Dell, etc. ship their desktop PCs, laptops, and Ultrabook™ devices with software already installed on them (bundled applications) in order to help make their products more appealing to their customers. Most likely, you have bought a PC that came with several applications already installed on it, one of which is usually an anti-virus application such as Norton, McAfee (which Intel now owns), Avast, etc. This article will discuss ways for analyzing and optimizing power consumption by looking at the interaction an anti-virus scanner might have with another bundled application.

Users oftentimes blame the anti-virus scanner for why their new PC only has a battery life of ~4 hours, when the box claimed it should be ~8 hours. Anti-virus scanners have higher CPU usage when one of these bundled system utilities performs certain operations like writing to a directory or performing an update.

Anti-virus software can start a chain reaction due to another process making changes to the system such as write operations. If one of these bundled applications (let’s call them system utilities from now on) is performing an operation more than it should, the anti-virus scanner will in turn consume more system resources. This chain reaction is why your PC does not have the battery life labeled on the box.

Tools are available that provide insight as to how these bundled applications interact and impact the overall system. The case study below shows how a system utility was writing to a directory more than it should, which in turn caused an anti-virus scanner to execute to ensure the operation was safe. The interaction between the two caused a very high power consumption resulting in less battery life.

Case Study

This case study will first describe the problem that required investigation, the tool used for that investigation, and finally the methods used to solve the problem.

The Problem

The problem is that a particular machine claims to have 8 hours of battery life yet the actual shutdown time is 6 hours. What is occurring on the system to cause such a massive drain on battery power? Many tools are available that can help debug this type of situation. Below are brief descriptions of these tools.

Windows* Tools for Power Analysis

Of the many Windows-based tools available to help you optimize your system, the first step is understanding how each can help you determine how one bundled application affects another.

The second step is to see what can be done to improve the overall power consumption of the system.

Measurement Tools

  1. Intel® Power Gadget
    • Gathers power/energy metrics such as package (CPU and GPU) power consumption. This is a must-have tool for analyzing power consumption while the app is running.
  2. Intel® Soc Watch
  3. Windows Assessment Console (WAC)
    • Packaged with Windows 8/8.1 ADK.
    • Pick the “Battery Life during idle periods” job
    • WAC’s user interface allows you to perform traces showing specific system metrics like CPU utilization, virtual memory commits, power consumption, etc. It provides a summary of projected system shutdown times.

Analysis Tool

  1. Windows Performance Analyzer (WPA)
    • Packaged with Windows 8/8.1 ADK.
    • WPA is used to load the .etl file generated by WPR/WAC so that in-depth analysis can be performed.

Suggested checklist when using these tools

  1. Does Intel Power Gadget report a package (CPU and GPU) power consumption that is much larger than when the pre-bundled application is uninstalled?
  2. Does Intel Soc Watch on the latest Intel® processors report C7 state residency being at least 95%?
  3. Does the Windows Performance Analyzer show high CPU usage or many writes to a specific directory under the IO section?
    • Which process is writing to this directory excessively and what is the full path for these writes?
    • If there are spikes in CPU usage, what is occurring on the system? Maybe an activity occurs every three seconds causing the CPU usage to increase every three seconds.
  4. Does an application change the system’s timer tick resolution from 15.6 ms (Windows default) to a smaller value?
    • If an application is changing the system’s timer tick resolution to a smaller value, i.e., 1 ms, the application or system utility could perform certain activities more frequently thus causing higher power consumption on the overall system.

Note: This is not a full list. For more information refer to Intel Optimization Guide.

The Investigation

The tool used to investigate power consumption was the Windows Assessment Console (WAC), which can be found at the following paths:

The 64-bit “wac.exe” application — C:\Program Files (x86)\Windows Kits\8.x\Assessment and Deployment Kit\Windows Assessment Toolkit\amd64

The 32-bit “wace.exe” application — C:\Program Files\Windows Kits\8.x\Assessment and Deployment Kit\Windows Assessment Toolkit\x86.

The Windows Assessment Console (WAC) provides a summary view in which the statistics of each run can be viewed in the same window for easy comparisons. Note the projected shutdown times below.


Comparing projected shutdown times under different conditions

Comparing average projected shutdown times of the best case (no Internet/no anti-virus) to worst case (Internet and anti-virus) shows roughly a one hour reduction in battery life between these runs.

Using the summary view provided by the Windows Assessment Console (WAC), it is evident that in some runs a particular application is changing the system’s timer tick resolution down to 1ms. This change can be very significant to the overall system performance as other system utilities are using the timer for purposes such as when to update. The next figure shows how WAC would report something of this nature.


WAC Summary view showing issues reported during Idle Battery Life job

By further investigating the results provided by the Idle Battery Life assessment, the interaction between a system utility and a bundled anti-virus application becomes better understood. The next snapshot shows that the CPU usage of the anti-virus application increases (purple line) shortly after the CPU usage of a particular system utility increases (red line).


Anti-Virus software executing during 3rd-party write operations

The system utility updates its cache, which the anti-virus then cross-checks to ensure the operation is safe. This behavior was seen with and without Internet connection.


System utility writing to its log file, which in turn causes the anti-virus software to execute to ensure the operation is safe.

An increase in CPU usage causes an increase in the power consumption of the system, which reduces the battery life of the system. The goal is to reduce behavior that causes increased and unnecessary CPU usage.

End Result

Both the system utility and the anti-virus software came bundled on the machine. One way to stop the anti-virus software from running every time the system utility performs an IO operation is to put the system utility on a pre-approved list. That way the anti-virus software knows the system utility is safe and doesn’t execute unnecessarily.

See the difference in behavior on the system after this change was made.


No CPU usage increase caused by the anti-virus software when the system utility writes to its log

Now that CPU usage has decreased, the overall power consumption of the system will decrease as well. The following figure shows the new estimated shutdown times.


New projected shutdown times show significantly longer battery life.

Even though there are two runs with issues reported due to writes to storage, the projected shutdown times in both these runs are unaffected.

Conclusion

In this case study, the write operations performed by a system utility were causing the bundled anti-virus scanner to execute to ensure the operations were safe. This chain reaction in turn caused CPU usage to increase every time this particular system utility performed a write operation. Every time the CPU usage increased, the overall power consumption of the system would increase thus reducing battery life and showing lower projected shutdown times. By adding the system utility to a safe list, the projected shutdown times increased even when writes to storage still occurred.

References

[1] "Windows Assessment and Deployment Kit (Windows ADK)." Microsoft Corporation, 2 April. 2014. Web. 3 April. 2014. http://msdn.microsoft.com/en-us/library/windows/hardware/dn247001.aspx

[2] Pantels, Tom, Sheng Guo, and Rajshree Chabukswar. Touch Response Measurement, Analysis, and Optimization for Windows* Applications Intel Corporation, 10 April. 2014. Web. 22 May. 2015. https://software.intel.com/en-us/articles/touch-response-measurement-analysis-and-optimization-for-windows-applications

[3] Chabukswar, Rajshree, Mike Chynoweth, and Erik Niemeyer. Intel® Performance Bottleneck Analyzer. Intel Corporation, 4 Aug. 2011. Web. 12 Feb. 2014. http://software.intel.com/en-us/articles/intel-performance-bottleneck-analyzer

[4] Kim, Seung-Woo, Joseph Jin-Sung Lee, Vardhan Dugar, Jun De Vega. Intel® Power Gadget. Intel Corporation, 7 January. 2014. Web. 25 March 2014. http://software.intel.com/en-us/articles/intel-power-gadget-20

How to check and read the postcode in early boot-up stage by Intel System Debugger(R)

$
0
0

1. Background 

   During developing the system with the Intel platform, you may face with the problem that system cannot boot-up even in the very early stage of boot-up. To identify the issue, you connect the JTAG and use the Intel System Debugger(R) for the source line debugging. If there,however, is no source codes or symbol files of the BIOS or firmware, then you cannot do source line debugging. In that case you may get the postcode when it's stuck and you send the postcode to the BIOS / firmware team. Secondly, if the issue happens only in the run-time which means when you step it, there is no issue or it is a sporadic issue, then you check the postcode as the checkpoint of the software before the issue happens and you can guess the problematic point. Finally, if you have the information about the postcode number but no source codes of BIOS / firmware, then you may guess the where the issue is caused by checking the postcode number.

    Postcode is very useful when there is any issue in the early boot-up stage in BIOS or firmware of system. But many of embedded system or closed chassis don't have the postcode-LED on the board. But don't worry; you can check the postcode by JTAG and Intel System Debugger(R). In this article, I will explain how to check the postcode with the Intel System Debugger(R).

 

2. Postcode

   Postcode is a legacy unit which used for debug. During Power-On Self Test(POST), Postcode provide the indication of the progress in the BIOS / firmware when they send the codes to I/O 80h.

(1) What can get from the postcode

  • Postcode give which major feature are enable and passed in the BIOS / firmware. 
  • Postcode can be used to indicate devices or feature errors when it initialize the system.

(2) Where can find the postcode in a hardware(LED) (If available)

POST card

(PCI add-in card)

The POST card decodes the port and displays the contents on an LED display.

The POST card must be installed in PCI bus connector 1.

Onboard POST code

LED display

Some Intel® Desktop Boards include an onboard LED to show POST codes

(3) Example of each postcode description (typical usage)

RangeCategory/Subsystem
00 – 0FDebug codes: Can be used by any PEIM/driver for debug
10 – 1FHost Processors
20 – 2FMemory/Chipset
30 – 3FRecovery
40 – 4FReserved for future use
50 – 5FI/O Busses: PCI, USB, ISA, ATA, etc.
60 – 6FNot currently used
70 – 7FOutput Devices: All output consoles
80 – 8FReserved for future use (new output console codes)
90 – 9FInput devices: Keyboard/Mouse
A0 – AFReserved for future use (new input console codes)
B0 – BFBoot Devices: Includes fixed media and removable media.
C0 – CFReserved for future use
D0 – DFBoot device selection
E0 – FFE0 - EE: Miscellaneous codes
F0 – FF: FF processor exception

 

3. Intel System Debugger(R) Usage for checking postcode

(1) Download and Install the Intel System Studio Ultimate Edition(R) or Intel System Debugger NDA version(R) (You may need NDA with Intel.)

https://software.intel.com/en-us/intel-system-studio

(2) Go to the installed directory and run the batch file as your Intel platform

e.g.) If your product is using the Intel Atom(R) then please select the Atom products batch file.

This batch file list can be changed by the product update.

(3) Menu : File → Connect... and select your probe and core and connect it.

(4) Menu : View → Breakpoints and go to the Create Breakpoints dialog and select the Create... by mouse right click .

(5) In the Create Breakpoint WIndow, select the Data tab and add 0x80 in the Location and select the IO Read&Write.

System will stop when it reads or writes any data to 0x80 port.

(6)  Menu : Options → Options... and select Run After Restart Off to make reset break

System will stop after reset and you can check the IO RW breakpoint and run it.

(7) Input the restart in the command line console.

System will restart.

xdb> restart

(8) After restart, system will stop by resetbreak and check the breakpoint which you set and run the system again

xdb> run

(9) When system stop at the breakpoint where system write the postcode via 0x80, you can check the postcode.

You may check other port number as well by changing the port number.

xdb> show port 0x80

e.g.) We can see that current postcode is 0xAB.

My test platform has the on-board postcode luckily and can see it is also 0xAB.

 

     Briefly, you can easily check the postcode by using the Intel System Debugger(R) in the Intel Platform which does not have the LED postcode display. By checking the postcode, you can get several debugging information - the progress of the BIOS / firmware codes, which devices or software features are enabled, the checkpoint of your BIOS / firmware by adding user-defined postcode. It may be easy thing to get as well as critical debug information in the very early stage boot-up failure.

<References>

Postcode information from Intel Desktop board web page : http://www.intel.com/support/motherboards/desktop/sb/CS-025434.htm

 

Simple optimization methodology with Intel System Studio ( VTune, C++ Compiler, Cilk Plus )

$
0
0

Introduction:

 In this article, we introduce an easy optimization methodology that includes Intel® Cilk™ Plus and Intel® C++ Compiler based on the performance analysis using Intel® VTune amplifier. Intel® System Studio 2015 that containes the mentioned components was used for this article.

  • Intel® VTune amplifier, is an integrated performance analyzer that helps developers anayzes complex code and indentify bottlenecks quickly.  
  • Intel® C++ Compiler generates optimized code that runs on IA-32 and Intel 64 architectures. It also provides numbers of features to help developers easily improve performance.
  • Intel® Cilk Plus, a C/C++ language extension, included in the Intel® C++ Compiler, allows you to improve performance by adding parallelism to new or existing C or C++ programs. 

Strategy:

 We will use one of the code examples that used for tutorials of VTune,  tachyon_amp_xe, as our target code for performance optimization. This example draws a picture of a complicate objects.

 

                                                                                                               ↓

 

 

The performance optimization methodology that possibly applicable for the sample is described below.

  1. Running Basic Hotspots Analysis or General Exploration Analysis on the example project on the integrated IDE, for instance Visual Studio* 2013.
  2. Identifying hotspots and other potentials of optimization.
  3. Applying code modification on the detected hotspot.
  4. Examining optimization options of the compiler.
  5. Applying parallelism on parallelization candidates.

Optimization :

< Test Environment >

 OS : Windows 8.1

 Tool Suite : Intel® System Studio for Windows Update 3 

 IDE : Microsoft Visual Studio 2013

 

< Step 1 : Interpret & Analyze the result data >

  • Running General Exploration Analysis ( if not possible, go with Basic Hotspots Analysis ) and find hotspots. Since this example code is made for practice finding hotspots and improving the performance. It is helpful to follow and refer the tachyon_amp_ex example page for this particular hotspot finding. After running the example with VTune, we can see the result as the following
  • We can observe the elapsed time this application took was 44.834s andthis can be the performance baseline we will concentrate to reduce.
  • Also, for this sample application, the 'initialize_2D_buffer'function, which took 18.945s to execute, shows up at the top of the list as the hottest function. We will try to optimize this most time-consuming function.

 

  • The CPU Usage Histogram above shows this sample does not make use of parallelism. Therefore, there are possibilities that we may use of multi threads to handle heavy tasks more quickly.

 

< Step 2 : Algorithmic approach for 'initialize_2D_buffer' >

 

 

 

  • As we saw earlier, 'initialize_2D_buffer' function took the longest time to execute and the largest amount of instructions have been retired by the function, which means If we can optimize something and get performance improvement, this function is where we can get the largest benefit out possibly.  

  • By double-clicking the function name, VTune Amplifier opens the source file positioning at the most time-consuming code line of this function. For the 'initialize_2D_buffer' function, this is the line used to initialize a memory array using non-sequential memory locations. This sample code already has its alternative faster 'for loop'.

  • The code listed below is actual code of the function 'initialize_2D_buffer'. The first for loop is not consecutively filling in the target array, and the second for loop in designed to do the same task consecutively. By using the second for loop, we can get performance benefit.

  • After replacing the for loop with the second one, we can observe some performance improvement. Let's look at the new VTune profiling results.

  • Compare to the previous results, the total elapsed time has been reduced from 44.834s to 35.742s, which is about x1.25 faster than before,  and for only the target function, it is from 18.945s to 11.318s, which is about x1.67 faster.

 

< Step 3: Compiler Optimization Options >

  • We often overlook automatic optimization ability that compilers have. In this case, we simply enable Intel C++ Compiler's optimization option which is triggered by adding '/O3' while compiling. Also we can use GUI to enable this. First, seting Intel C++ Compiler as the project's compiler is required to use '/O3'.

 

  •  
  • Just changing the above option sometimes brings great performance benefits. For the detailed explanation of Optimization option '/O[n]', please click here . The new results below show 24.979s to finish the task. It was 35.742s, which gives us the result as x1.43 boosting.
  •  

< Step 4 : Adding Parallelism by Cilk Plus >

  • Parallel programing is a very broad area by itself and there are many ways to achieve and implement parallelism in your system trying to manipulate multi-core platforms. This time, we are introducing Intel Cilk Plus, which is a language expansion that is fairly easy to implement  and works smart.

  • By investigating and analyzing the code with VTune's results, we can find the point where it calls the heaviest routine repeatedly and it can be a successful parallelization candidate. Useually, it can be done by looking at the caller/callee tree and following back from the root hotspot until you find a parallelizable spot to test.

  • For this time, it was 'draw_trace' function in find_hotspots.cpp. Adding simple 'cilk_for' to parallelize the target task here can work dynamically address lines to draw to multi-threads instead of a single thread. Therefore, you can visually observe 4 threads ( tested machine is dual-core with Hyper Threading technology ) drawing different lines simultaneously.

  • If you see the time it took to finish the painting job, it is 11.656s which is a big improvement than how long it took at the beginning. Let's take a look at VTune results.

  • We can see 13.117s as the total elapsed time which is x1.9 faster than the previous result. Also we can observe that multi-core is being efficiently utilized.

Summary:

  • Total elapsed time has been decreased from 44.834s to 13.117s -> x3.41 boosting up.
  • This optimization case has been achieved by simple VTune analysis and adding an Intel C++ Compiler option and a Cilk Plus feature.
  • Intel System Studio's components are designed as a solution to help developers to easily make improvements for their products.

Diagnostic 13379: loop was not vectorized with "simd"

$
0
0

Product Version:  Intel® Fortran Compiler 15.0 and above

Cause:

When  a loop contains a conditional statement which controls the assignment of a scalar value AND the scalar value is referenced AFTER the loop exits. The vectorization report generated using Intel® Fortran Compiler's optimization and vectorization report options includes non-vectorized loop instance:

Windows* OS:  /O2  /Qopt-report:2  /Qopt-report-phase:vec    

Linux OS or OS X:  -O2 -qopt-report2  -qopt-report-phase=vec

Example:

An example below will generate the following remark in optimization report:

subroutine f13379( a, b, n )
implicit none
integer :: a(n), b(n), n

integer :: i, x=10

!dir$ simd
do i=1,n
  if( a(i) > 0 ) then
     x = i  !...here is the conditional assignment
  end if
  b(i) = x
end do
!... reference the scalar outside of the loop
write(*,*) "last value of x: ", x
end subroutine f13379

ifort -c /O2 /Qopt-report:2 /Qopt-report-phase:vec /Qopt-report-file:stdout f13379.f90

Begin optimization report for: F13379

    Report from: Vector optimizations [vec]

LOOP BEGIN at f13379.f90(8,1)
    ....
   remark #13379: loop was not vectorized with "simd"
LOOP END

Resolution:

The reference of the scalar after the loop requires that the value coming out of the loop is "correct", meaning that the loop iterations were executed strictly in-order and sequentially.  IF the scalar is NOT referenced outside of the loop, the compiler can can vectorize this loop since the order of that the iterations are evaluated does not matter - without reference outside the loop the final value of the scalar does not matter since it is no longer referenced.

Example

subroutine f13379( a, b, n )
implicit none
integer :: a(n), b(n), n

integer :: i, x=10

!dir$ simd
do i=1,n
  if( a(i) > 0 ) then
     x = i  !...here is the conditional assignment
  end if
  b(i) = x
end do
!... no reference to scalar X outside of the loop
!... removed the WRITE statment for X
end subroutine f13379

Begin optimization report for: F13379
    Report from: Vector optimizations [vec]

LOOP BEGIN at f13379.f90(8,1)
f13379.f90(8,1):remark #15301: SIMD LOOP WAS VECTORIZED
LOOP END

See also:

Requirements for Vectorizable Loops

Vectorization Essentials

Vectorization and Optimization Reports

Back to the list of vectorization diagnostics for Intel® Fortran

Debug SPI BIOS after Power Up Sequence

$
0
0

After PCB assembly and the board power up, the next phase will be SPI BIOS debugging. A lot of system engineers and firmware engineers who had been interested in Intel System Studio (ISS), questioned whether available to use ISS to do SPI BIOS debugging once after CPU reset de-assertion. Answer is YES, and explain below how to make it happen with Intel System Debugger of ISS.

To debug SPI BIOS once after CPU reset de-assertion, is kind of difficult task. Because the connection time from host to target is much longer compared with platform power up sequence, and even including BIOS module boot time. In order to accommodate end users’ demands, Intel System Debugger provides a feature set to halt target once after CPU reset de-assertion. Some steps described below are required for this usage case.

 

1st need to do, is to launch Intel System Debugger of ISS2015 (former name was Intel JTAG Debugger of ISS2014).

2nd connect to target platform

3rd reset the target through using the “restart” console command, or by clicking the restart button as below.

After the target reset, then can debug SPI BIOS.

 

5 Reasons to go Pro

$
0
0

Intel® Media Server Studio - Professional (Pro) Edition

Download Article                  Get a FREE Trial

According to a new report, Intel’s new server hardware is gaining market share in video encoding and transcoding1. And many developers are activating hardware acceleration and graphics features on Intel® processors2 to get fast performance for their media solutions by using the Media Server Studio Community or Essentials edition.

However, with video online traffic increasing, along with complexity of coding standards, developers are challenged to move fast to market changes—such as transitioning to high-efficiency video coding (HEVC) format, Ultra High Definition (UHD), use of 4K display devices. The trick for Enterprises is to understand how to get the most bandwidth utilization out of their media infrastructures while balancing video quality. Anddoing so faster than competitors.

In Top 5 Reasons to go Pro, developers are given the rationale on why now is the time to prep for next gen formats, and how to conduct video analysis and amp up qualitywith the Media Server Studio Pro edition's tools and techniques.


12015 Global Video Encoding and Transcoding Technology Innovation Leadership Award, Frost & Sullivan, 2015

2See product details for hardware requirements.

Media Client Release notes and support

$
0
0

Intended Audience

Software developers interested in a cross-platform API foundation and plugins to develop and optimize video  coding and processing, design products delivering visually stunning media by enabling RAW photo and 4K RAW video processing or deliver quality sound with audio encode & decode.

Customer Support

For technical support on Media client products, a feature of Intel® Integrated Native Developer Experience (Intel® INDE), including answers to questions not addressed in this product, latest online getting started help, visit the technical support forum, FAQs, and other support information, click  here.

To seek help with an issue in  Media client products, go to the user forum, click here.

To submit an issue for Ultimate or Professional Edition of  Media client products, go to Intel® Premier Support, click here.

Intel® Premier Support is not available for Starter Edition of the product.For more information on registering to Intel® Premier Support, click here.

Release notes for Media SDK for Windows*

Release notes for Media RAW Accelerator for Windows*

Release notes for Audio for Windows*


Intel® INDE 2015 Release Notes and Installation Guide

$
0
0

This page provides the Release Notes for Intel® INDE 2015 with the most recent being listed at the top of the page. For more details, please take a look at the documents on the table below.

Intended Audience

Software developers interested in a cross-platform productivity suite that enables them to quickly and easily create native apps from OS X* host for Android* or OS X* targets (or) Windows* host for Android* or Windows* targets

Customer Support

For technical support of Intel® Integrated Native Developer Experience 2015, including answers to questions not addressed in this product, latest online getting started help, visit the technical support forum, FAQs, and other support information at: https://software.intel.com/en-us/intel-inde-support

To seek help with an issue in Intel® INDE (any edition), go to the user forum (https://software.intel.com/en-us/forums/intel-integrated-native-developer-experience-intel-inde)

To submit an issue for Ultimate of Intel® INDE, go to Intel® Premier Support: (https://premier.intel.com/)

Intel® Premier Support is not available for Starter Edition of the product. New Professional Edition users will not have Intel® Premier Support either as it is made free but the users who have had prior access to this support will continue to have the privilege.

For more information on registering to Intel Premier Support, go to: http://software.intel.com/en-us/articles/performance-tools-for-software-developers-intel-premier-support

What is new in Update 2?

Windows* Host:

  • Support for 14nm SoC code named  Cherry Trail
  • Support for Windows* 10 Desktop (Preview) Host and Target
  • Limited Support for Windows* 10 Mobile Target
  • Limited Support for Visual Studio* 2015 (all editions)
  • Graphics Frame Analyzer for OpenGL*
  • New beta or preview features
    • Tamper Protection [Beta]
    • OpenCV Library [Beta]
    • OpenCL™ Code Analyzer Tool for Microsoft* Visual Studio* [Preview]

OS X* Host:

  • Support for 14nm SoC code named Cherry Trail on Android* targets
  • Support for OS X* targets on Intel® C++ Compiler, Intel® IPP and Intel® TBB
  • Graphics Frame Analyzer for OpenGL*
Release Notes for Windows* HostRelease Notes for OS X* HostEnd User License Agreements
Update 2Update 2

EULA for Intel® INDE on Windows* Host

EULA for Intel® INDE on OS X* Host

Update 1Update 1Embedded in Release Notes

Older Updates:

Update 1.1 is available now!

INDE 2015 Update 1.1 is a label change in License only on top of Update 1.  If you already installed Intel INDE 2015 Update 1, then this installation is optional.

If you are an existing user of INDE 2015 Update 1, you will receive a notification in your Intel® Software Manager tool on how to update your installation. If you are a new user, please visit  https://software.intel.com/en-us/intel-inde to see various packages available and download INDE.

What is new in Update 1?

  • Support for Android* Lollipop 32 bit/64 bit apps
  • Support for Nexus Player
  • Visual Studio 2013 Community Edition is supported

 

Removing CPU-GPU sync stalls in Galactic Civilizations* 3

$
0
0

Download Document

Galactic Civilizations* 3 (GC3) is a turn-based 4X strategy game developed and published by Stardock Entertainment that released on May 14th, 2015. During the early access and beta periods, we profiled and analyzed the rendering performance of the game. One of the big performance improvements made was the removal of several CPU-GPU sync stalls that were responsible for losing some parallelism between the CPU and GPU. This article describes the issue and the fix and emphasizes the importance of using performance analysis tools during development, while keeping their strengths and limitations in mind.

Spotting the issue

We started the rendering performance analysis with Intel® INDE Graphics Performance Analyzers (GPA) Platform Analyzer. The screenshot below is a trace capture from the game (without v-sync) before improvements were made. The GPU queue has several gaps within and between frames, with less than one frame's worth of work queued up at any time. If the GPU queue isn't fed well by the CPU and has gaps, the application will never leverage that idle time to improve performance or visual fidelity.


Before:Frame time = ~21 ms – Less than 1 frame queued – Gaps in the GPU queue – Very long Map call

GPA Platform Analyzer also shows the time spent processing each Direct3D* 11 API call (i.e., application -> runtime -> driver and back). In the screenshot above, you can see an ID3D11DeviceContext::Map call that takes ~15 ms to return, during which the application's main thread does nothing.

The image below shows a zoom into one frame’s timeline, from CPU start to GPU end. The gaps are shown in pink boxes, amounting to ~3.5 ms per frame. Platform Analyzer also tells us the cumulative duration of various API calls for the trace, with Map taking 4.015 seconds out of the total 4.306 seconds!

It’s important to note that Frame Analyzer cannot spot the long Map call with a frame capture. Frame Analyzer uses GPU timer queries to measure the time for an erg, which consists of state changes, binding resources, and the draw. The Map however happens on the CPU, with the GPU unaware of it.

Debugging the issue

(See the Direct3D resources section at the end for a primer on using and updating resources.)

Driver debug revealed the long Map call to be using DX11_MAP_WRITE_DISCARD (Platform Analyzer doesn't show you the arguments of the Map call) to update a large vertex buffer that was created with the D3D11_USAGE_DYNAMIC flag.

This is a very common scenario in games to optimize the data flow to frequently updated resources. When mapping a dynamic resource with DX11_MAP_WRITE_DISCARD, an alias is allocated from the resource's alias-heap and returned. An alias refers to the memory allocation for the resource each time it is mapped. When there is no room for aliases on the resource's current alias-heap, a new shadow alias-heap is allocated. This continues to happen until the resource’s heap limit is reached.

This was precisely the issue in GC3. Each time this happened (which was multiple times a frame for a few large resources that were mapped several times), the driver waited on a draw call using an alias of the resource (which was allocated earlier) to finish, so it could reuse it for the current request. This wasn’t an Intel-specific issue. It occurred on NVIDIA's driver too and was verified with GPUView to confirm what we found with Platform Analyzer.

The vertex buffer was ~560 KB (size was found via the driver) and was mapped ~50 times with discard in a frame. The Intel driver allocates multiple heaps on demand (each being 1 MB) per resource to store its aliases. Aliases are allocated from a heap until they no longer can be, after which another 1 MB shadow alias-heap is assigned to the resource and so on. In the long Map call's case, only one alias could fit in a heap; thus, each time Map was called on the resource, a new shadow heap was created for that alias until the resource's heap limit was reached. This happened every frame (which is why you see the same pattern repeat), wherein the driver was waiting for an earlier draw call (from the same frame) to be done using its alias, in order to reuse it.

We looked at the API log in Frame Analyzer to filter resources that were mapped several times. We found several such cases, with the UI system being the lead culprit, mapping a vertex buffer 50+ times. Driver debug showed that each map updated only a small chunk of the buffer.


Same resource (handle 2322) being mapped several times in a frame

Fixing the issue

At Stardock, we instrumented all their rendering systems to get additional markers into the Platform Analyzer’s timeline view, in part to verify that the UI system was behind the large call and for future profiling.

We had several options for fixing the issue:

  • Set the Map flag to D3D11_MAP_WRITE_NO_OVERWRITE instead of D3D11_MAP_WRITE_DISCARD:
    The large vertex buffer was being shared by several like-entities. For example, most of the UI elements on the screen shared a large buffer. Each Map call updated only a small independent portion of the buffer. The ships and asteroids that used instancing also shared a large vertex/instance data buffer. D3D11_MAP_WRITE_NO_OVERWRITE would be the ideal choice here since the application guarantees that it won't overwrite regions of the buffer that could be in use by the GPU.
  • Split the large vertex buffer into several smaller ones:
    Since alias allocation was the reason behind the stall, considerably reducing the vertex buffer size allows several aliases to fit in a heap. GC3 doesn't submit too many draw calls, and hence, reducing the size by a factor of 10 or 100 (560 KB to 5-50 KB) would fix it.
  • Use the D3D11_MAP_FLAG_DO_NOT_WAIT flag:
    You can use this flag to detect when the GPU is busy using the resource and do other work before remapping the resource. While this lets the CPU do actual work, it'd make for a really bad fix in this case.

We went with the second option and changed the constant used in the buffer creation logic. The vertex buffer sizes were hardcoded for each subsystem and just needed to be lowered. Several aliases could now fit into each 1 MB heap, and with the comparatively low number of draw calls in GC3, the issue wouldn’t crop up.

Each rendering subsystem fix magnified the issue in another one, so we fixed it for all the rendering subsystems. A trace capture with the fixes and instrumentation, followed by a zoomed-in look at one frame, is shown below:


After: Frame time = ~16 ms – 3 frames queued – No gaps in GPU queue – No large Map calls

 

The total time taken by Map went down from 4 seconds to 157 milliseconds! The gaps in the GPU queue disappeared. The game had 3 frames queued up at all times and was waiting on the GPU to finish frames to submit the next one! The GPU was always busy after a few simple changes. Performance went up by ~24% with each frame taking ~16 ms instead of ~21 ms.

Importance of GPU profiling tools during game development

Here’s what Stardock had to say:

Without tools like GPA Platform Analyzer or GPUView, we wouldn't have known what was happening on the GPU because the information we get back from DirectX is only if the call succeeded or not. Traditionally, we would have disabled systems, or parts of systems, to try to isolate where the performance costs are coming from. This is a very time consuming process, which can often consume hours or days without any practical benefit, especially, if the bottlenecks aren’t in the systems you expect.

Also, measuring isolated systems can often miss issues that require multiples systems to interact to cause the problem. For example, if you have a bottleneck in the animation system you may not be able to identify it if you have enough other systems disabled that the animation system (which is your performance problem) now has enough resources to run smoothly. Then you spend time troubleshooting the wrong system, the one you removed, instead of the source of the actual problem.

We have also tried to build profiling tools into our games. Although this works, we only get measurement data on the systems we explicitly measure, again making us unable to see issues from systems we wouldn’t expect. It is also a lot of work to implement and has to be maintained through the games development to be usable. And we need to do it over again with each game we make. So we get partial information at a high development cost. Because of this, issues can be hard to detect just by looking over the code, or even stepping through it, because it may appear correct and render properly, but, in reality, it is causing the GPU to wait or perform extra work.

This is why it is important to understand what is happening on the GPU.GPU profiling tools are critical for quickly showing developers where their code is causing the GPU to stall or where the frame is spending the most time. Developers can then identify which areas of the code would benefit the most from optimization, so they can focus on making improvements that make the most noticeable changes to performance.

Conclusion

Optimizing the rendering performance of a game is a complex beast. Frame and Trace capture-replay tools provide different and important views into a game’s performance. This article focused on CPU-GPU synchronization stalls that required a trace tool like GPA Platform Analyzer or GPUView to locate.

Credits

Thanks to Derek Paxton (Vice President) and Jesse Brindle (Lead Graphics Developer) at Stardock Entertainment for the great partnership and incorporating these changes into Galactic Civilizations 3.

Special thanks to Robert Blake Taylor for driver debug, Roman Borisov and Jeffrey Freeman for GPA guidance, and Axel Mamode and Jeff Laflam at Intel for reviewing this article.

About the author

Raja Bala is an application engineer in the game developer relations group at Intel. He enjoys dissecting the rendering process in games and finding ways to make it faster and is a huge Dota2* and Valve fanboy.

Direct3D* resources primer

The Direct3D API can be broken down into resource creation/destruction, setting render pipeline state, binding resources to the pipeline, and updating certain resources. Most of the resource creation happens during the level/scene load.

A typical game frame consists of binding various resources to the pipeline, setting the pipeline state, updating resources on the CPU (constant buffers, vertex/index buffers,…) based on simulation state, and updating resources on the GPU (render targets, uavs,…) via draws, dispatches, and clears.

During resource creation, the D3D11_USAGE enum is used to mark the resource as requiring:

(a) GPU read-write access (DEFAULT - for render targets, uavs, infrequently updated constant buffers)
(b) GPU read-only access (IMMUTABLE - for textures)
(c) CPU write + GPU read (DYNAMIC - for buffers that need to be updated frequently)
(d) CPU access but allowing the GPU to copy data to it (STAGING)

Note that the resource's D3D11_CPU_ACCESS_FLAG needs to also be set correctly to comply with the usage for c & d.

In terms of actually updating a resource's data, the Direct3D 11 API provides three options, each of which is used for a specific usage (as described earlier):

(i) Map/Unmap
(ii) UpdateSubresource
(iii) CopyResource / CopySubresourceRegion

One interesting scenario, where implicit synchronization is required, is when the CPU has write access and GPU has read access to the resource. This scenario often comes up during a frame. Updating the view/model/projection matrix (stored in a constant buffer) and the (animated) bone transforms of a model are examples. Waiting for the GPU to finish using the resource would be too expensive. Creating several independent resources (resource copies) to handle it would be tedious for the application programmer. As a result, Direct3D (9 to 11) pushes this onto the driver via the DX11_MAP_WRITE_DISCARD Map flag. Each time the resource is mapped with this flag, the driver creates a new memory region for the resource and lets the CPU update that instead. Thus, multiple draw calls that update the resource end up working on separate aliases of the resource, which, of course, eats up GPU memory.

For more info on resource management in Direct3D, check:

John McDonald's "Efficient Buffer Management" presentation at GDC
Direct3D 11 Introduction to resources
Direct3D 10 Choosing a resource
UpdateSubresource v/s Map

R6 Release: What's New in Intel® Media Server Studio

$
0
0

Top 5 Advancements from Intel Media Server Studio 2015 R6 Release

  1. 5th gen support(formerly codenamed Broadwell): Get even faster media performance with Media Server Studio support for new 5th gen Intel® Core processors and Intel® Xeon® Processor E3-128x v4 product family with integrated Intel® Iris™ Pro graphics. The flagship desktop processor, Intel® Core™ i7-5775-C, delivers up to 35 percent better media performance1, and Xeon family delivers up to 1.4x more performance1 for video transcoding, perfect for graphics-hungry solutions. 

  2. FREE Community EditionGet the free community edition which includes all the performance features of the Essentials Edition: Media SDK, runtimes, media and graphics drivers, OpenCL™ Code Builder and more. This edition, however, does not include Intel® Premier Support (direct access to Intel technical experts).
       — Review Details page to learn more
       — Download the Free Version
       — See Documentation for Release Notes and other guides; see forums and frequently asked questions for support

  3. HEVC: Achieve real-time 4K performance on Xeon E5 processor through HEVC software implementation present in Media Server Studio Professional edition. Significant improvements gain in performance (~15% average) over the previous release2. HEVC GPU-accelerated and software implementation support is now available for both 4:2:2 and 10-bit. Huge performance improvements are made to different bit rate control methods like CBR, VBR and AVBR. Look ahead bit rate control and 1:N LA optimization are available for HEVC Software and GPU-accelerated encoder. More target usage modes are available for HEVC GPU-accelerated implementation. (See Enabling HEVC Whitepaper.)

  4. Metrics Monitor & Intel® VTuneTM Amplifier on Linux*: Access a number of metrics from the GPU kernel mode driver to help in understanding GPU usage for Media workloads through metrics monitor, available on Linux. It allows you the ability to monitor the following GPU hardware units:
       — Render engine (execution units)
       — Multi-Format CODEC (MFX)
       — Video Quality Engine
       — Blitter engine

    Now, Intel VTune Amplifier is available on Linux OS. Use VTune Amplifier to fine-tune or complete advanced level optimizations; the tool provides performance insight into hotspots, threading, locks and waits, OpenCL, bandwidth and more.

  5. Virtualization Support on E5: Use Media Server Studio software implementation on a virtualized environment on E5 - KVM+Xen, Xenserver on Linux.You can make operating system as a virtual image, run several operating systems in parallel, and execute several simultaneous transcoding operations.

Other features and improvements were also completed in Media Server Studio 2015 R6 release; please review the various edition Release notes for details.

Media Server Studio Essential/Community Edition Release Notes for Windows* and Linux

Media Server Studio Professional Edition Release Notes for Windows and Linux

1 http://www.intel.com/newsroom/kits/computex/2015/pdfs/Computex2015-5th_Gen_Intel_Core-Xeon_FactSheet.pdf
2 https://software.intel.com/en-us/intel-media-server-studio/details

Elevating Head of the Order* Gameplay with Gesture Control

$
0
0

By Edward J. Correia

Intel is aiming to revolutionize the way users interface with traditional PCs, and Jacob Pennock is among the movement's primary champions. Back in 2013, Pennock won the Intel Perceptual Computing Challenge with Head of the Order*, a game that cast a spell on perceptual computing. Originally built with the Intel® Perceptual Computing SDK and Creative Senz3D* camera, the game has since evolved with the implementation of the new Intel® RealSense™ SDK and the Intel® RealSense™ 3D (front facing) camera. Pennock's experiences—and those of his coworkers—are plotting a course through the new APIs and creating a navigational aid that other developers can use to steer their own perceptual apps.

Armed with the new Intel RealSense SDK and a new company, Livid Interactive, Pennock and his team set out to transform the user experience and enhance Head of the Order (Figure 1) by implementing improved gesture controls and 3D hand-tracking points in the Intel RealSense SDK that were not possible with the previous Intel Perceptual Computing SDK.

Figure 1: Livid Interactive’s Head of the Order* trailer.

 

From Perceptual Computing to the Intel® RealSense™ SDK

Gesture Control Improvements 

The Head of the Order team was particularly interested in the new hand- and finger-tracking capabilities of the Intel RealSense SDK. These capabilities provide 3D positioning of each joint of the user's hand, with 22 points of hand and joint tracking (Figure 2) for greater precision.  Control of the hands is everything to this game; hands are used to craft and cast off spells and to combine multiple spells to form more powerful ones. 

Hand tracked in 3-D using 22 landmark data points
Figure 2: The hand can be tracked in 3-D using 22 landmark data points. [Image source]

3D Hand Tracking

With the original SDK, a user’s hands could only be represented as flat, 2D images superimposed on the screen (Figure 3—left). To achieve this, Pennock had to create a hand-rendering system that resampled the low-resolution 2D hand images, and then add them to the game’s rendering stack at multiple depths through custom processing with his own code.

According to Pennock, the implementation of the fine-grained hand tracking in the new Intel RealSense SDK allowed the gameplay experience to become far more life-like and engaging, with much better 3D positioning.  Hands are now seen as 3D models (Figures 3 and 4—right) that interact within the game space.

Original 2D spell crafting and the improved 3D hand rendering
Figure 3: Original 2D spell crafting (left) and the improved 3D hand rendering (right).

This capability enhances the immersive nature of the game and allows Head of the Order to run on virtual-reality headsets. The Intel RealSense SDK also greatly improves the position tracking of all the finger joints, providing much more depth and accuracy when it comes to casting spells and navigating the virtual game space.

Original 2D hand spell casting off and the improved 3D hand rendering
Figure 4: Original 2D hand spell casting off (left) and the improved 3D hand rendering (right).

Switching from the Intel Perceptual Computing SDK to the Intel RealSense SDK wasn't an entirely smooth ride; it took time for the functionality to ramp up in the new SDK. But by the time the Intel RealSense SDK Gold R2 version release was ready, Pennock and his team had replicated and extended what they had achieved with the predecessor SDK.

Challenges

Head of the Order is controlled entirely with hand gestures, and spells are created by drawing simple shapes in the air. Over time, players learn how to master the art of combining gestures to craft the most powerful spells.  The learning curve for gesture-based input in general can be steep, particularly for players accustomed to traditional mouse and keyboard interfaces or game hand controllers.  Some of the biggest challenges Pennock and his team faced were communicating to the players what they wanted them to do.

Because many user movements can be picked up by the Intel RealSense 3D camera's gesture-recognition capabilities, Pennock and his team noticed that if a player makes random or unrecognizable movements—or is too far from or too close to the camera—the camera can’t process what the player is attempting to do. For those players who are accustomed to traditional interfaces, this situation can cause them to become frustrated. "Even if it's working perfectly well,” said Pennock, “they may be interacting in an unexpected way so it appears that the system isn’t functioning. This can be difficult to amend on the development end.”

To resolve this issue, Pennock and his team created a 5-minute guided tutorial with narration and video examples to demonstrate proper input techniques (Figure 5) and step the players through gameplay scenarios. This idea came about during the first contest, as early testers had trouble realizing that three steps were required to create a spell.

The tutorial demonstrates proper input technique
Figure 5:The tutorial demonstrates proper input technique.

Because visual cues also play a key role during gameplay, Pennock addressed this issue by factoring gesture speed into the visualization. Now, a trail is drawn on the screen only when the hand movement speed is within acceptable limits. "Other than that, when we're actually tracking for a particular gesture, your hands glow," he said. 

For users who want to play an “easy” game, Head of the Order offers characters that perform very simple gestures.

Advice for Developers

Testing

With gesture-based apps, perhaps more than for those using traditional input types, Pennock stresses the importance of letting new users try the app and observing how they interact with it. He also emphasizes the use of outsiders to test the app as opposed to those involved with its development because it's easy to make something work correctly if you already know how to use the app. The advantage to having new users test the app is being able to see what doesn’t feel right to them and then considering ways to address the problem.

Not surprisingly, tests with younger audiences are generally more successful. "For the kids who grew up with motion controls such as Microsoft Kinect*, it's intuitive to them and they don't usually have a big issue with gesture control," said Pennock, adding that testing the system on the more mature crowds at trade shows is when most problems arise.

Performance Over Implementation

Pennock and his team acknowledge that performance can be an issue: the enormous data streams coming from the cameras create latency. This is particularly true if using more than one tracking module at a time or opting for a large number of tracking points.

It initially took some time for functionality to ramp up in the new RealSense SDK, but Pennock said that, “The Gold R2 release was a great update.”  In this latest RealSense SDK, the level of noise in tracking finger joints is improved, and smoothing functions are better.  

Looking Forward

The target systems on which Head of the Order can actually be played are continuing to emerge. The software is designed only for systems equipped with a natural user interface such as that provided by the Intel RealSense 3D camera. Intel offers versions tailored to the desired application. Today, this includes tablets, conventional and two-in-one laptops, and all-in-one PCs equipped with Intel RealSense technology. What's more, an increasing number of technology companies such as Acer, Asus, Dell, Fujitsu, Hewlett-Packard, Lenovo, and NEC currently offer or have announced systems that feature Intel RealSense technology.

The Intel RealSense SDK and the technologies it's tied to provide facial detection and tracking, emotion detection, depth-sensing photography, 3D scanning, background removal, and the tracking of 22 joints in each hand for accurate touch-free gesture recognition. In the future, Pennock believes that Intel RealSense technology could find applications in automotive control, robotics, home automation systems, and on the industrial side, "I see a wealth of opportunity for measurement devices."

About the Developer

The original version of Head of the Order was built and submitted to the Intel Perceptual Computing contest under Pennock's development company—Unicorn Forest Games. That company has since combined forces with Helios Interactive, where Pennock was employed as a developer. The result was a new entity: San Francisco-based Livid Interactive.

According to Michael Schaiman, managing partner at Helios, his company was asked to develop concepts for the Intel® Experience, a set of "hands-on experience zones" being set up at 50 Best Buy flagship stores across the United States. The zones are designed to showcase cutting-edge Intel® technologies for people of all ages and technical abilities. Schaiman was tasked with demonstrating Intel RealSense technology. "One concept we came up with was to build a special version of Head of the Order that consumers could play with," said Schaiman. "They really loved that idea." The Head of the Order demo is set to hit the stores sometime in June 2015, with the game to follow later in the summer.

Intel Resources

Throughout the development process, the Head of the Order team held monthly "innovator calls" with engineers at Intel. These calls allowed Livid developers to stay abreast of Intel RealSense SDK features, sample code, and documentation that were about to be released and to have a structure for providing feedback on what had come before.

For more information, check out the Intel RealSense SDKand Supportpages. 

Get more information on Livid Interactive here.

Analyze and Optimize Windows* Game Applications Using Intel® INDE Graphics Performance Analyzers (GPA)

$
0
0

Download  Intel INDE Graphics Performance Analyzer.pdf

Intel® INDE Graphics Performance Analyzers (GPA) are powerful, agile tools enabling game developers to utilize the full performance potential of their gaming platform. GPA visualizes performance data from your application, enabling you to understand system-level and individual frame performance issues, as well as allowing you to perform “what-if” experiments to estimate potential performance gains from optimizations. GPA tools are available as part of the Intel® INDE tool suite or as standalone from here.

This article describes the GPA tools and walks through a sample game application for Windows*, showing individual frame performance issues and optimizing with the Graphics Frame Analyzer for DirectX*.

Graphics Monitor

Graphics Monitor is used to view, graph, and configure metrics in-game.  You can also take trace and frame captures as well as enable graphics pipeline overrides and experiments in real-time.

The sample game application used in this article (CitiRacer.exe) comes as part of the installation and is used as the example throughout this article.

Once you download and install GPA (see link above), click Analyze Application as shown below, and the Analyze Application window opens.


1. Graphics monitor


2 Analyze Application window to launch the game

Click the Run button, and you can start analyzing the application. The application automatically loads and displays the FPS (frames per second) as shown below. Press CTRL + F1 three times to see the screenshot shown below with different settings and metrics displayed.


3. Game running with all the metrics shown

Now we will take a trace capture of one particular frame and analyze it using the Graphics Frame Analyzer for DirectX tool that is installed with the Intel GPA toolkit. You can take the trace capture by pressing CTRL + SHIFT + C or using the System Analyzer tool that is described below.

System Analyzer

System Analyzer provides access to system-wide metrics for your game, including CPU, GPU, API, and the graphics driver. The metrics available vary depending on your platform, but you will find a large collection of useful metrics to help quantify key aspects of your application's use of system resources. In the System Analyzer you can also perform various "what-if" experiments to diagnose at a high level where your game's performance bottlenecks are concentrated.

If the System Analyzer finds that your game is CPU-bound, perform additional fine-tuning of your application using Platform Analyzer.

If the System Analyzer finds that you game is GPU-bound, use the Graphics Frame Analyzer for DirectX*/OpenGL* to drill down within a single graphics frame to pinpoint specific rendering problems, such as texture bandwidth, pixel shader performance, level-of-detail issues, or other bottlenecks within the rendering pipeline

Open System Analyzer, installed as part of Intel INDE.


4. Connecting using the System Analyzer

If the application you are analyzing is running on the same machine where Intel INDE is installed, click Connect or if the application is running on a remote machine enter the IP address of that machine and click Connect.

You will see the application in the System Analyzer as shown below.


5. Click the application to open the System Analyzer

Once the next screen opens, you can drag and drop the metrics that you are interested in. In this example, we are monitoring the Aggregated CPU Load, GPU duration, GPU Busy, and GPU frequency metrics. Press the CTRL key and drag multiple metrics simultaneously. Click the Camera button to capture a frame that’s taking more GPU and giving less FPS. We are going to capture this frame and analyze it using the Graphics Frame Analyzer for DirectX.


6. Capturing a frame using the System Analyzer

Analyzing a frame using the Intel® INDE Graphics Frame Analyzer for DirectX*

Once you open the Frame Analyzer, the captured frames will be automatically loaded. Select the latest frame that you captured and want to analyze and click Open.


7. Opening the captured frame with the Graphics Frame Analyzer for DirectX*

Now let’s start analyzing this particular frame that we captured.


8. Captured frame when opened with the Graphics Frame Analyzer for DirectX*

On the left-hand side RT0, RT1, RT2, RT3 are the render targets that are generated during this frame. Different games can have a different number of render targets used to build the whole frame and for this frame we have four render targets.

What we see on the graph below are the draw calls during that frame. They are called “ergs,” which is the scientific unit of measurement.


9. Graphical view of the ergs with GPU duration on X and Y Axes

You can filter the metrics that are shown. X and Y axes show GPU duration by default. You can change the X and Y axis metrics the dropdown. This is a quick glance that gives how long each erg takes on GPU and can quickly shows us the readings that show us to dive into the ergs that might need some optimization.

Right-click on RT1 and choose “Select ergs in this render target,” which highlights all the ergs used to generate this render target. You can analyze metrics on how long it took to generate the render target. An example of a render target is shown below.


10. Selecting all the ergs in the render target

Let’s dive further in to this render target. Click on the erg that takes the longest GPU duration to see the details of just this erg. If you click the Geometry tab, you can see what geometry is rendered as shown below. If we click the Shaders tab, it will show the vertex and fragment shaders for this erg.


11. Geometry rendered for the selected erg

Let’s explore the tabs at the bottom of the screen. “Selected” means the erg you have selected. If we select “Highlighted,” it shows the highlighted erg that corresponds to the Geometry we see on the right-hand side.

“Other” indicates all other ergs of the render target. Selecting “Hidden” means don’t show them at all. “Draw only to last selected” will draw for this render target only the ergs up to the erg we have selected. If we unselect it, all the ergs for this render target are shown.


12. How to highlight the selected erg


13. Highlighted erg shown in blue color

If you click the Texture tab, you can see what textures are bound with this erg. It is possible that all the textures may not be used in this same erg—they might have been used by a previous erg. But in general, the Texture tab shows what textures are used and how big they are. It’s a good way to find uncompressed images that may take more GPU duration, so we can go back and compress that particular texture.


14. Textures bound with this erg

Experiments

Now let’s talk about the Experiments tab. It allows you to override what the GPU does and look at your net results. In this example, the entire frame runs at 27 ms or 37 FPS as shown in the top right corner box (indicated by the arrow). You can toggle between FPS and GPU duration by clicking that box.


15. Click the top-right toggle button to switch between GPU duration and FPS readings

Now, if you click the Frame Overview tab, you’ll see stats for the entire frame. The Details tab provides the stats for only the erg you selected. In this example in the Frame Overview tab you can see the different metrics you can experiment with as shown below.


16. Frame Overview tab that gives the stats for the entire frame

Let’s click the Experiments tab and try completely disabling this erg so that this erg does not even render.


17. Experiment tab: Before disabling the erg


18. Experiment tab: After disabling the erg

If you go to the Frame Overview, you can see the difference in the GPU duration, execution units, etc. We can look at the general overall performance to see how much difference there is between the old and new values. In the example shown below, the delta value for GPU duration is -8 ms, the new value of the GPU duration is around 18 ms, and the percentage decrease in the GPU duration is around 35%.


19. Frame Overview and difference in GPU duration after disabling the selected erg

Anything significantly bad is marked in red. Most of these ergs are draw calls. If there is nothing highlighted when you select an erg, it can be an indication of a clear call. Sometimes the clear calls can be unnecessary. If everything in the render target is rendered without that clear call, you can try disabling it and see if there is any improvement in the GPU duration.

The API Log tab shows the draw calls being used for the ergs you have selected or if the erg is a clear call.

You can also filter by the primitive count and see how many primitives are being rendered and how many triangles are being rendered. You can set the X-axis to the GPU duration and Y-axis to the primitive count as shown below. Then you can look at the Geometry tab to see the ergs with more primitives.


20. Selecting primitive count on Y-Axis


21. Primitive count for the selected render target

You can also sort by render targets to see how long each render target takes. It’s worth experimenting to see what the hardware is doing and disable and change things to see if the performance increases or decreases by looking at the Frame Overview and seeing the delta of the performance.

About Author

Praveen Kundurthy works in the Software & Services Group at Intel Corporation. He has a Masters degree in Computer Engineering. His main interest is mobile technologies, Windows and game development.

Notices

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.

This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps.

The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are available on request.

Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-800-548-4725 or by visiting www.intel.com/design/literature.htm.

Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.

*Other names and brands may be claimed as the property of others

© 2015 Intel Corporation.

Viewing all 533 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>