Download Code Sample [Zip: 19 KB]

Introduction

Are you thinking about creating a simple application with RGB streaming that uses an Intel® RealSense™ camera and the Intel® RealSense™ SDK, or simply using RGB streaming in one of your applications? Do you want an easy-to-follow and easy-to-understand application that is direct and to the point without a lot of extra code that clouds up what you are trying to learn? Then you’re in luck, because that’s exactly what I’ve tried to do here: create a simple, yet effective sample application and document that describes how to use the Intel RealSense camera and SDK.

This sample was written using Intel RealSense SDK R4 Visual Studio* C# and tested with R5. It requires an Intel RealSense camera F200.

Project structure

In this sample application, I have tried to separate out the Intel RealSense SDK functionality from the Windows* Form GUI layer code to make it easier for a developer to focus on the SDK’s streaming functionality. I’ve done this by creating a C# wrapper class (RSStreaming) around some of the Intel RealSense SDK classes.

The Windows Form app contains only a few buttons and a PictureBox control to display the RGB stream.

Note that I’m not trying to make a bulletproof application. I have added some degree of exception handling; however, it’s up to you to ensure that proper engineering practices are in place to ensure a stable user-friendly application.

This project structure also relies on using Events to pass data around, which thus eliminates the need for tight coupling. A helper event class was created: RSNewImageArg which inherits from EventArg. It’s used to post the current frame from the camera back to the client form application.

Getting Started

To get started, you’ll need to have an Intel RealSense camera F200. You also need to have the Intel RealSense SDK version R4 or higher, and the appropriate Depth Camera Manager (DCM) installed on your computer. The SDK and F200 DCM can be downloaded here.

Requirements

Hardware requirements:

4th generation Intel® Core™ processors based on the Intel® microarchitecture code-name Haswell
8 GB free hard disk space
Intel RealSense camera F200 (required to connect to a USB 3 port)

Software requirements:

Microsoft Windows 8.1/Win10 OS 64-bit
Microsoft Visual Studio 2010–2015 with the latest service pack
Microsoft .NET* 4.0 Framework for C# development
Unity* 5.x or higher for Unity game development

RSNewImageArg.CS

RSNewImageArg derives from the C# EventArgs class. As you can see it’s a small wrapper that has one private data member added to it. The private Bitmap _bitMap holds the current bitmap that was extracted from the camera stream.

This class is used as an event argument when the RSStreaming class dispatches an event back to the Form class indicating that a new bitmap image is ready to display.

RSStreaming.CS

RSStreaming is a wrapper class, an engine so to speak around streaming RGB data from the Intel RealSense camera. I wrote the class with the following intentions:

Cleanly and clearly isolate as much of the Intel RealSense SDK functionality as possible away from the client application.
Try to provide comments in the code to help the reader understand what the code is doing.

The following describes each function that comprises the RSSpeechEngine class.

public event EventHandler<RSNewImageArg> OnStreamingImage;

The OnStreamingImage Event is used to trigger a message back to the client application letting it know that a new RGB bitmap image is ready to display. The client creates an event handler to handle the RSNewImageArg object.

public bool Initialized

Getter property used as a flag to indicate that the RSStreaming class has been initialized.

public bool IsStreaming

Getter property used as a flag to indicate that the RSStreaming class is currently streaming RGB data.

public void StartStreaming()

Checks to see if the class has been initialized and if not calls the InitCamera to ensure the class is up and running properly. Once this has been done, the function calls the _senseManager.StreamFrames( … ) function.

If you have done much reading about developing Intel RealSense applications, you have probably noticed that pulling data from the camera is often done in a while loop. For example, something like the following:

while (!Stop)
{
   /* Wait until a frame is ready: Synchronized or Asynchronous */
   if (sm.AcquireFrame(Synced).IsError())
      break;

  /* Display images */
   PXCMCapture.Sample sample = sm.QuerySample();

   /* Render streams */
   EventHandler<RenderFrameEventArgs> render = RenderFrame;
   PXCMImage image = null;
   if (MainPanel != PXCMCapture.StreamType.STREAM_TYPE_ANY && render != null)
   {
      image = sample[MainPanel];
      render(this, new RenderFrameEventArgs(0, image));
   }

   if (PIPPanel != PXCMCapture.StreamType.STREAM_TYPE_ANY && render != null)
      render(this, new RenderFrameEventArgs(1, sample[PIPPanel]));

   /* Optional: Set Mirror State */
   mirror = Mirror ? PXCMCapture.Device.MirrorMode.MIRROR_MODE_HORIZONTAL :
                                                                                                                                        PXCMCapture.Device.MirrorMode.MIRROR_MODE_DISABLED;
   if (mirror != sm.captureManager.device.QueryMirrorMode())
      sm.captureManager.device.SetMirrorMode(mirror);

   sm.ReleaseFrame();

   /* Optional: Show performance tick */
   if (image!=null)
      timer.Tick(PXCMImage.PixelFormatToString(image.info.format)+""+image.info.width+"x"+image.info.height);
}

This is a LOT of code to wade through. Now granted, they may be doing more than what my sample application does, but my point is that my application does not run a while loop like this. My application uses the StreamFrames(…) function. This function handles the while loop internally and for every frame triggers an event RSStreamingRGB will subscribe to. Essentially it works like this:

Kick off the stream PXCMSenseManager.StreamFrames(…).
Trap the event in an event handler.
When you’re done streaming, call the PXCMSenseManager.Close( ).

I like this approach, because I don’t want to have to manually deal with a while loop, knowing when and how to stop the loop. I would rather rely on the SDK to take care of that for me. When I talk about the InitCamera() function, you will see how this methodology is configured so I won’t talk about it here. Just make sure that you see how we can stream data and allow the SDK to handle the looping over the raw data coming from the camera.

Once the StreamFrames has been called, I then set the boolean flag _isStreaming to true allowing the class and client app to know that streaming has been started.

public void StopStreaming ( )

Stop streaming does the opposite of StartStreaming. It instructs the SDK to stop streaming data from the camera and calls Dispose() to destroy the objects the data.

private void InitCamera ( )

InitCamera() creates the PXCMSenseManager instance and enables the type of stream we want. As you can see I’m specifying a 320x240 color stream at 30 fps.

Recall what I said about being able to use an event from the PXCMSenseManger letting the class know when a new frame of RGB data is available. This is done using the PXCMSenseMananger.Handler event class. It’s a simple process: create an instance of the Handler class, assign it an event handler via the onNewSample, then initialize the PXCMSenseManager object; _senseMananger with the handler class.

Once this is completed, I set the _initialized flag to true. As previously mentioned, this flag is used to let either this class internally, or client app know that RSStreaming has been initialized.

private pxcmStatus OnNewSample( )

This is the event handler for the PXCMSenseMananger.Handler() object. Recall in the InitCamera() function I set the handler objects event handler to this function.

The event handler must adhere to a given function signature. The function must return a pxcmStatus value and takes two parameters:

Mid. The stream identifier. If multiple streams are requested through the EnableVideoStreams function, this is PXCMCapture.CUID+0, or PXCMCapture.CUID+1 etc.
Sample. The available image sample.

We need to convert the PXCMCapture.Sample object into a usable bitmap that the client application can use to display.

First I check to ensure that the sample.color object is not null and that the classes internal bitmap _colorImageData is not null as well. We need to ensure that our internal _colorImageData is not holding any data and to release it if it is.

Next we need to use the sample.colors object to populate the _colorImagedata. This basically is a metadata object about the PXCMCapture.Sample color object. Once we have that, we can tell it to create a bitmap for us specifying a size.

Once we have the bitmap and we know it’s not null, I trigger the OnStreamingImage event specifying the source of the event and a new RSNewImageArg object.

Finally, we MUST release the current frame from the PXCMSenseMananger object and as required by the function signature, return a pxcmStatus. I could have done some exception handling here, but I chose not to in order to keep things as simple as possible. If I had done some, I could have trapped it and chosen a different pxcmStatus to return, however, I’m just returning success.

private void Dispose ( )

Dispose() cleans up. I check to ensure that the manager is not null, that it was initialized and, if so, I call it’s dispose method. I check to ensure that RSStreaming’s bitmap is not null and dispose of it. Then I set everything to null.

MainForm.CS

The main form is the GUI that displays the RGB stream and allows you to control the RSStreaming object. It has two global variables: an instance of RSStreamingRGB and a bitmap. The bitmap will contain the current image from the current frame that’s sent by the RSStreamingRGB class.

public MainForm( )

The forms constructor. Creates a new RSSTreamingRGB object and gives the OnStreamingImage event an event handler.

private void btnStream_Click( )

The event handler when the Start Streaming button is clicked. Instructs the _rsStreaming object to start streaming by calling its StartStreaming() function.

private void btnStopStream_Click( )

The event handler when the Stop Streaming button is clicked. Instructs the _rsStreaming object to stop streaming by calling its StopStreaming() function.

private void UpdateColorImageBox( object source, RSNewImageArg e )

UpdateColorImageBox is the event handler for the _rsStream.OnStreamingImage event. It ensures that the newImage argument is not null, and, if not, assigns _currentBitMap to a new bitmap using the newImage as the source bitmap.

If I don’t create a new bitmap, the form’s _currentBitMap will be pointing back to the original bitmap that the SDK created. This can be problematic when calling the RSStreaming.Dispose method. The client has a picture box, the picture box has an image, and that image is coming from the SDK. When the form and picture box are still active, if I try to call RSStreaming.Dispose which releases SDK resources, I would get a crash because the picture box’s source image was now being disposed of.

After _currentBitMap has been assigned a new image, I call pictureBox.Invalidate() which forces the picture box’s Paint event to be triggered.

private void pictureBox_Paint( object sender, PaintEventArgs e )

This is the picture box’s paint event handler, which is triggered by the call to pictureBox.Invalidate(). It forces the picture box to redraw itself with the current source image.

First I check to ensure that the _currentBitMap is not null, and if not I set it to the most recent bitmap which is stored in _currentBitMap.

private void btnExit_Click( )

Easy enough. Simply calls Close(). No need to handle any clean up here because I ensure that this is happening in the MainForm_FormClosing method.

private void bMainForm_FormClosing( )

This is the forms event closing event handler. When the Close() method is called in any given function, the FormClosing event is called. I didn’t want to duplicate code so I simply put all clean up code here. I check to ensure that _rsStream is not null and that it’s streaming. If these conditions are met, I call _rsStream.StopStreaming(). There is no need to call a dispose method on _rsStreaming because it’s taken care of inside StopStreaming.

Conclusion

I hope this article and sample code has helped you gain a better understanding of how to use the Intel RealSense SDK to create a simple RGB streaming application. My intent was to show how this can be done in an easy-to-understand simple application that covers everything to be successful in implementing your own RGB streaming application.

If you think that I left out any explanation or wasn’t clear in a particular area, OR if you think I could have accomplished something in a better way, please shoot me an email at rick.blacker@intel.com or make a comment below.

About the Author

Rick Blacker is a seasoned software engineer who spent many of his years authoring solutions for database driven applications. Rick has recently moved to the Intel RealSense technology team and helps users understand the technology.

The ingenious Posture Monitor app, from husband-and-wife team Jaka and Jasmine Jaksic, is a great example of using technology to change lives for the better. An award-winner in the Open Innovation category of the 2015 Intel® RealSense™ App Challenge, it addresses a problem plaguing an estimated 50 percent of US workers – back pain. Using an Intel® RealSense™ camera (F200) and several third-party software tools, the team turned the camera’s data stream into a handsome package of graphs and statistics – all while balancing power consumption and frame rate. In this article, you’ll learn how the Jaksics relied on a strong software engineering process to keep advancing toward their goal–and we’ll show you their frame-rate processing code sample, which illustrates how to minimize power consumption and still provide a smooth user experience.

Jaka Jaksic was co-founder and lead engineer for San Francisco-based startup Plumfare, a social gifting mobile app. Plumfare was acquired by Groupon in 2013, leaving Jaka on the lookout for another great opportunity. Jasmine is a long time product- and project-manager and currently works at Google. Combining their expertise, they started the company JTechLab. The couple decided to pursue a product related to posture problems—something they both have battled. The app has moved from prototype to production, during which time Jaka encountered and overcame a few interesting hurdles. The lessons learned include data conversion, power conservation, and integrating Intel RealSense technology with commercial software tools.

Posture Monitor continually monitors your posture and alerts you to potential problems using advanced graphics and statistics, packaged in an attractive interface.

Contest Deadline Spurs Rapid Advances

Like many who suffer back pain, Jaka had tried several products for posture correction—without success. “It’s not that difficult to sit straight,” he explained. “But it's difficult to do it all the time, because it requires constant attention. While you are focused on work, your posture and taking breaks are usually the last things on your mind.”

One day, Jaka got the revolutionary idea of using a 3D camera as a posture-tracking device, and, after just a little research, he landed on a solution with the Intel® RealSense™ technology. “I also noticed that there was a contest going on,” he said, “I got busy right away, and just made the deadline.” Successful applicants received an Intel RealSense camera (F200) and the complete Intel® RealSense™ SDK as encouragement to create the next great app.

After the Intel RealSense camera arrived, Jaka built his first working prototype in about two days, taking half that time to learn about Intel RealSense technology and set up the camera’s data pipeline, and the other half to build the first posture detection algorithm. Once the proof of concept was complete, he proceeded with further development and built the more polished app.

At this point they began some informal usability testing. “Usually, in software projects, it’s good to get as much feedback as possible early on, from all sorts of people,” he said. In this case, the amount of time was very limited by the project deadline and by the fact that both he and Jasmine had separate, full-time jobs. “As a general rule, the user interface (UI) is crucial,” Jaka explained. “Right after the technological proof of concept that verifies that the thing can be built, I would recommend focusing on the user experience.” That may mean a prototype tied directly to the UI, and some targeted questions:

Do you understand what this UI does?
Can you tell how to use the product?
Do you find it attractive?

Only after positive responses to these questions, Jaka says, should the development process turn to creating functionality. “Never wait to create and test the UI until after you have a complete functionality, with considerable time and effort already sunk into development,” Jaka said. He pointed out that once your project is too far along, it’s very costly to go back and fix basic mistakes. “User feedback is the kind of thing that can always surprise you. We’re fairly experienced with application design, but still we’re just now finding things that, when we sit the users in front of the app, they say, ‘Whoa, what’s this?’”

It can be expensive—and time consuming—to get your app prototype into a UI lab, but the benefit for doing so is big savings down the road. In addition to asking friends and colleagues, one cheap and easy method of user testing that Jaka employed in the past is to go to a local coffee shop and give people $5 gift cards in exchange for their feedback. “People are usually happy to help, and you will learn a lot.”

Advice from an Expert App Designer

Jaka said that the demos provided by Intel are extremely useful—but he had a few words of caution. “Intel’s examples are technology demonstrations rather than starting points for building your own application, so you have to strip away all the unnecessary demo functionality,” he said.

For Posture Monitor, the camera’s data pipeline was the essence, and Jaka drilled down to exclusively focus there. Since the SDK didn’t stay centered on the user’s face at all times, Jaka used the raw data stream and processed it himself. He used the Intel RealSense camera to locate the user’s face, which he then converted to a rectangle. He could next approximate where the spine was by calculating the center of the subject’s torso. By noting where the pixels were and where the head was, he could continually calculate the true center of gravity. He noted that much of his approach will change when he adopts the Intel® RealSense™ SDK R5 version, which will support body-tracking using a new camera, the user-facing SR300. It will be available in Q2 2016.

Jaka also overcame limitations concerning infrared 3D camera physics. While skin reflects infrared light easily, the reflection is occasionally muddied by certain hairstyles and clothing choices. (Busy prints, dark colors and shiny fabrics may pose a problem; as could long, dark hair that obscures the torso.) From the depth-camera standpoint, certain combinations report that the user has no torso. “It’s like they’re invisible and just a floating head,” he said. There isn’t much you can do in such cases other than detect when it happens and suggest the user wear something else.

In order to work for everybody, Posture Monitor requires each user to complete a calibration sequence: they sit straight, demonstrating perfect posture, and then click “calibrate.” The application compares their posture at a given time to their ideal posture, and that’s how it assesses what’s good or bad.

The calibration sequence of Posture Monitor ensures that the system can identify key aspects of your body and thus track your posture.

The team has yet to use specialized medical knowledge or chiropractic experts, but Jaka says that day is coming. “We wanted the application to be able to detect when the user is slouching, and the current version does that really well. After we launch, we’re going to reach out to medical professionals and add some more specialized functionality.”

Minimizing Power Consumption

At full frame-rate, Intel RealSense applications are too CPU-intensive to be used in a background application. The obvious solutions are to only process every N-th frame or to have a fixed delay between processed frames. This is typically a good tradeoff when the user interface is not shown and responsiveness does not matter. But what if we want the best of both worlds: minimize power consumption and still provide a smooth user experience when required?

Jaka developed a frame-processing pipeline with a dynamic frame-rate, where the baseline frame-rate is low (for example, one frame every two seconds), and is elevated only when a visible control requires it. Using this technique, Posture Monitor uses less than two percent of the CPU when minimized–or when no real-time controls are shown–without any degradation of overall user experience. It’s a relatively simple and completely generic code pattern that’s easily applicable to almost any app.

Here is the sample code:

using System;
using System.Drawing;
using System.Threading;
using System.Windows;
using System.Windows.Controls;

namespace DynamicFramerateDemo
{
    class CameraPipeline
    {
        public static readonly CameraPipeline Instance = new CameraPipeline();

        // Baseline/longest frame delay (this is used
        // unless a shorter delay is explicitly requested)
        private const int BASELINE_FRAME_DELAY_MILLIS = 2000;
        // Timer step / shortest frame delay
        private const int TIMER_STEP_MILLIS = 100;

        private PXCMSenseManager senseManager = null;
        private Thread processingThread;
        private int nextFrameDelayMillis = TIMER_STEP_MILLIS;

        public int CapNextFrameDelay(int frameDelayMillis) {
            // Make sure that processing of the next frame happens
            // at least within the specified delay
            nextFrameDelayMillis = Math.Min(nextFrameDelayMillis, frameDelayMillis);
            return nextFrameDelayMillis;
        }

        public void Start() {
            // Initialize SenseManager with streams and modules
            this.senseManager = PXCMSenseManager.CreateInstance();
            senseManager.EnableStream(PXCMCapture.StreamType.STREAM_TYPE_COLOR, 640, 480, 30);
            senseManager.EnableStream(PXCMCapture.StreamType.STREAM_TYPE_DEPTH, 640, 480, 30);
            senseManager.EnableFace();
            senseManager.Init();

            // Frame processing thread with dynamic frame rate
            this.processingThread = new Thread(new ThreadStart(delegate {
                while (processingThread != null) {
                    // Sleep in small increments until next frame is due
                    Thread.Sleep(TIMER_STEP_MILLIS);
                    nextFrameDelayMillis -= TIMER_STEP_MILLIS;
                    if (nextFrameDelayMillis > 0)
                        continue;

                    // Reset next frame delay to baseline long delay
                    nextFrameDelayMillis = BASELINE_FRAME_DELAY_MILLIS;
                    try {
                        if (senseManager.AcquireFrame(true, TIMER_STEP_MILLIS).IsSuccessful()) {
                            ProcessFrame(senseManager.QuerySample());
                        }
                    } finally {
                        senseManager.ReleaseFrame();
                    }
                }
            }));
            processingThread.Start();
        }

        private void ProcessFrame(PXCMCapture.Sample sample) {
            // [Do your frame processing and fire camera frame event]
        }
    }

    // Sample control that sets its own required frame rate
    class CameraViewControl : UserControl
    {
        // This event handler should get called by CameraPipeline.ProcessFrame
        protected void HandleCameraFrameEvent(Bitmap depthBitmap) {
            if (this.IsVisible && Application.Current.MainWindow.WindowState != WindowState.Minimized) {
                // While the control is visible, cap the frame delay to
                // 100ms to provide a smooth experience. When it is not
                // visible, the frame rate automatically drops to baseline.
                CameraPipeline.Instance.CapNextFrameDelay(100);

                // [Update your control]
            }
        }
    }
}

The Start() method initializes the Intel RealSense technology and starts a processing loop with a fixed delay (TIMER_STEP_MILLIS). This delay should be the lowest frame delay that your application will ever use (for example, 100 ms). In each loop iteration, this interval is subtracted from a countdown counter (nextFrameDelayMillis), and a frame is only acquired and processed when this counter reaches zero (0).

Initially and after every processed frame, the countdown timer is set to the baseline (longest) delay (BASELINE_FRAME_DELAY_MILLIS), (for example, 2000 ms). The next frame is processed only after this time, unless during this time any agent requests a lower value by calling CapNextFrameDelay. A lower delay / higher frame-rate is typically requested by visible user-interface controls (such as the CameraViewControl example), or by internal states that demand a higher frame rate. Each such agent can set the maximum acceptable frame delay and the lowest value will win; this way the frame rate always meets the most demanding agent. The beauty of this solution is that it is extremely efficient, very simple to implement, and only requires one additional line of code for each agent to set its required frame rate.

The Right Tools

The list of tools and technology that Jaka integrated with the Intel RealSense SDK to create Posture Monitor is impressive:

Microsoft Visual Studio* 2013 (dev tools for Microsoft Windows* apps)
Microsoft Blend* for Visual Studio 2013 (interface designer)
.NET framework 4.5.1 (umbrella for packages, compilers, and runtime)
AForge* framework (for image analysis)
Accord* framework (for statistics)
OxyPlot* (for charts and gauges)
SQLite* (for storing posture data)
Hardcodet Taskbar Notification (provides tray icon and popup menus)

Jaka said he used Microsoft Visual Studio and C# because they are industry standards for building Microsoft Windows* applications. He wanted to use tools that had the largest community behind them, with lots of third-party libraries available. He picked additional libraries for each particular need, trying many different products and picking the best ones.

Jaka didn’t write or use any plug-ins to get the Intel RealSense technology working with the application. He said that the SDK itself provides solid data structures that are standard and easy to use. “Sometimes you might have to use raw data,” he said. “We used the raw depth format with 16 bits per pixel, which is the most precise way of reading raw data.” He then wrote an algorithm to convert the data into a bitmap that had a higher contrast where it mattered. His converted bitmap focuses on the range where the person’s body is, and enhances contrast and accuracy around that range.

Posture Monitor integrates leading statistics and data-charting programs with the Intel RealSense SDK to produce an intriguing user interface full of helpful information.

To process the camera data, Jaka used Accord, a very extensive library for math and statistics. He liked that Accord also has some machine-learning algorithms and some image processing. The data had to be converted into a compatible format, but, once achieved, it was a great step forward. “Once you get the data into the right form, Accord can really help you,” Jaka said. “You don’t have to reinvent the wheel for statistical processing, object recognition, detecting things like shapes and curves—that type of stuff. It really makes things easier.”

Another tool, OxyPlot, is an open-source charting library that Jaka found to be very extensive and very flexible.

Avoid Technical Debt—No Sloppy Code!

Jaka has a general philosophy for development that has consistently brought him success. “Paying attention to code quality from the start pays dividends later on,” he said. So he starts by learning everything he needs before he starts coding. He’ll make a ‘throwaway’ prototype to serve as a proof-of-concept, which allows him to play with all the technologies and figure them out. At that point, he’s not focused on the quality of the code, because his goal is simply to learn. Then, he’ll discard that prototype and start over with a strong architecture, based on what he’s figured out.

At this point, Jaka is then ready to build high-quality components in a systematic fashion. The complexity of the code base always grows with time, so early decisions are critical. “You can’t afford to have any sloppiness in your code,” Jaka warned. “You want to clean up every bit as soon as you can, just to make sure that in time, it doesn’t become an unmanageable nightmare as technical debt piles on.”

Posture Monitor only works with a desktop or laptop PC, not mobile devices, because it needs a stationary camera. And right now it only works with Windows, because that’s what the Intel RealSense SDK currently supports. When the Intel RealSense SDK integrates with Apple MacBooks*, Jaka is ready to pursue that. And his ambitions don’t stop there—he’s interested in learning more about Intel RealSense technology working in some fashion with Linux*, too.

Jaka has also been thinking about building a specialized hardware device for posture monitoring, perhaps a product that would include an Intel® Atom™ processor. “I’m sure this work is going to take us in a very interesting direction, once we start interacting with the users and the medical community. We are looking forward to where the future takes us.”

A Revolution is Coming

From his perspective as a successful entrepreneur who has already struck gold once, Jaka believes that the Intel RealSense camera and SDK are reaching developers at a crucial time. He sees the global software market for desktop, mobile, and web apps as oversupplied. “Almost anything you can think of has already been built,” he believes. Now, with Intel RealSense technology, Jaka says it is much easier to innovate.

“This market is still fresh and unsaturated, so it’s a lot easier to come up with new and exciting ideas that aren’t already implemented,” he affirmed. “I think this is a really good time to start working with Intel RealSense technology. It’s a great time to dig in and get a head start, while the market is still growing.”

As one example of this, Jaka can envision combining Posture Monitor with IoT devices. “There are so many potential ideas right now that are just waiting to be built. Most of them, I’m sure, no one has even thought of yet, because the technology is so new. I think we’re in front of some really exciting times, both for developers and for consumers.”

Resources

Learn more about Posture Monitor at https://posturemonitor.org/

Download the Intel® RealSense™ SDK at https://software.intel.com/en-us/intel-realsense-sdk

Learn more about the 2015 Intel® RealSense™ App Challenge at https://software.intel.com/sites/campaigns/realsense-winners/details.html?id=22

Abstract

The Roemotion* Roy robotic arm is the result of a successfully funded Kickstarter project launched in 2012, which was described as a “project to create a human sized animatronic character from only laser cut mechanics and off the shelf hobby servos.” In this experiment, software has been developed using the Intel® RealSense™ SDK for Windows* to control the Roy hand using the SDK’s hand tracking APIs (Figure 1).

Figure 1. Robotic hand control software.

Figure 1.Robotic hand control software.

The code for this project was developed in C#/XAML using Microsoft Visual Studio* Community 2015, and works with both the Intel RealSense F200 and SR300 (coming soon) cameras. To see the software-controlled robotic hand in action, check out the YouTube* video: https://youtu.be/VQ93jw4Aocg

About the Roy Arm

The Roy arm assembly is currently available for purchase from the Roemotion, Inc. website in kit form, which includes:

Laser cut pieces
All necessary hardware
8 hobby-grade servos
6 servo extension cables

As stated on the Roemotion website, the kit does not include any control electronics. This is because the initial concept of the project was to supply cool mechanical systems for people to use with whatever controller they want. As such, this experiment incorporates a third-party servo controller for driving the motors in the robotic hand (Figure 2).

Figure 2. Roy robotic arm.

Figure 2. Roy robotic arm.

The hand incorporates six servo motors: one for each of the fingers (index, middle, ring, and pinky) and two for the thumb. (Note: there are two additional servos located in the base of the arm for controlling wrist movements, but these are not controlled in this experiment.)

Intel® RealSense™ SDK Hand Tracking APIs

As stated in the Intel RealSense SDK online documentation, the hand tracking module provides real-time 3D hand motion tracking and can track one or two hands, providing precise joint-level locations and positions. Of particular interest to this real-time device control experiment is the finger’s “foldedness” value acquired through calls to the QueryFingerData() method.

Control Electronics

This experiment incorporated a Pololu Micro Maestro* 6-channel USB servo controller (Figure 3) to control the six motors located in the Roy hand. This device includes a fairly comprehensive SDK for developing control applications targeting different platforms and programming languages.

Figure 3. Pololu Micro Maestro* servo controller.

Figure 3.Pololu Micro Maestro* servo controller.

Servo Controller Settings

Before custom software could be developed to directly control the robotic hand in this experiment, it was essential to understand each of the full-scale finger ranges in terms of servo control parameters. Unlike high-end robotic servos with integrated controllers, whose position encoders can be queried prior to applying torque, the low-cost servos used in the Roy hand needed to be energized cautiously to avoid rapid motor movements that could lead to binding the fingers and potentially stripping the motor gears.

Fortunately, the Pololu Micro Maestro SDK includes a Control Center app that allows a user to configure firmware-level parameters and save them to flash memory on the control board. The settings that were determined experimentally for this application are shown in Figure 4.

Figure 4. Pololu Maestro Control Center app.

Figure 4.Pololu Maestro Control Center app.

Once the Min and Max position settings are fixed, the servo controller firmware will not allow the servos to be accidentally software-driven to a position that exceeds the desired range of motion. This is critical for this type of application, which has mechanical hard-stops (that is, fingers fully open or closed) that could cause a motor to burn out or strip gears if over-driven.

Another important setting for an application such as this is the “On startup or error” parameter, which in this case ensures the default starting (and error) position for all of the fingers is “open” to prevent binding of the index finger and thumb if they were allowed to close indiscriminately.

The two final settings that are noteworthy are the Speed and Acceleration parameters. These setting allow for motion smoothing at the firmware level, which is often preferable to higher-level filtering algorithms that can add latency and overhead to the main software application.

Note: In more advanced robotic servos that include integrated controllers, a proportional–integral–derivative controller (PID) algorithm is often implemented that allows each term to be flashed in firmware for low-level (that is, closer to the metal) feedback tuning to facilitate smooth motor translations without burdening the higher-level software.

Custom Control Software

In this experiment, custom software (Figure 5) was developed that leverages many of the hand tracking features that are currently present in the SDK samples.

Figure 5. Custom Control Software.

Figure 5. Custom Control Software.

Although real-time fingertip tracking data is presented in the user interface, this particular experiment ultimately relied on the following three parameters for controlling the Roy hand:

Alert data
Foldedness data
Scaled data

Alert Data

Alerts are the most important information to monitor in a real-time device control application such as this. It is paramount to understand (and control) how a device will behave when its set-point values become unreliable or unavailable.

In this experiment the following alert information is being monitored:

Hand detected
Hand calibrated
Hand inside borders

The design of this software app precludes control of the robotic hand servos in the event of any alert condition. In order for the software to control the robotic hand, the user’s hand must be successfully calibrated and within the operating range of the camera.

As shown in the code snippet below, the custom app loops over the total number of fired alerts and sets three Boolean member variables -- detectionStatusOk, calibrationStatusOk, and borderStatusOk (note that handOutput is an instance of PXCMHandData):

for (int i = 0; i < handOutput.QueryFiredAlertsNumber(); i++)
{
  PXCMHandData.AlertData alertData;
  if (handOutput.QueryFiredAlertData(i, out alertData) !=
	 pxcmStatus.PXCM_STATUS_NO_ERROR) { continue; }

  switch (alertData.label)
  {
	 case PXCMHandData.AlertType.ALERT_HAND_DETECTED:
		detectionAlert = "Hand Detected";
		detectionStatusOk = true;
		break;
	 case PXCMHandData.AlertType.ALERT_HAND_NOT_DETECTED:
		detectionAlert = "Hand Not Detected";
		detectionStatusOk = false;
		break;
	 case PXCMHandData.AlertType.ALERT_HAND_CALIBRATED:
		calibrationAlert = "Hand Calibrated";
		calibrationStatusOk = true;
		break;
	 case PXCMHandData.AlertType.ALERT_HAND_NOT_CALIBRATED:
		calibrationAlert = "Hand Not Calibrated";
		calibrationStatusOk = false;
		break;
	 case PXCMHandData.AlertType.ALERT_HAND_INSIDE_BORDERS:
		bordersAlert = "Hand Inside Borders";
		borderStatusOk = true;
		break;
	 case PXCMHandData.AlertType.ALERT_HAND_OUT_OF_BORDERS:
		bordersAlert = "Hand Out Of Borders";
		borderStatusOk = false;
		break;
  }
}

A test to determine if detectionStatusOk, calibrationStatusOk, and borderStatusOk are all true is performed before any attempt is made in the software to control the hand servos. If at any time one of these flags is set to false, the fingers will be driven to their default Open positions for safety.

Foldedness Data

The custom software developed in this experiment makes calls to the QueryFingerData() method, which returns the finger’s “foldedness” value and fingertip radius. The foldedness value is in the range of 0 (finger folded) to 100 (finger extended).

The foldedness data for each finger is retrieved within the acquire/release frame loop as shown in the following code snippet (where handData is an instance of PXCMHandData.IHand):

PXCMHandData.FingerData fingerData;

handData.QueryFingerData(PXCMHandData.FingerType.FINGER_THUMB, out fingerData);
thumbFoldeness = fingerData.foldedness;
lblThumbFold.Content = string.Format("Thumb Fold: {0}", thumbFoldeness);

handData.QueryFingerData(PXCMHandData.FingerType.FINGER_INDEX, out fingerData);
indexFoldeness = fingerData.foldedness;
lblIndexFold.Content = string.Format("Index Fold: {0}", indexFoldeness);

handData.QueryFingerData(PXCMHandData.FingerType.FINGER_MIDDLE, out fingerData);
middleFoldeness = fingerData.foldedness;
lblMiddleFold.Content = string.Format("Middle Fold: {0}", middleFoldeness);

handData.QueryFingerData(PXCMHandData.FingerType.FINGER_RING, out fingerData);
ringFoldeness = fingerData.foldedness;
lblRingFold.Content = string.Format("Ring Fold: {0}", ringFoldeness);

handData.QueryFingerData(PXCMHandData.FingerType.FINGER_PINKY, out fingerData);
pinkyFoldeness = fingerData.foldedness;
lblPinkyFold.Content = string.Format("Pinky Fold: {0}", pinkyFoldeness);

Scaled Data

After acquiring the foldedness data for each of the user’s fingers, scaling equations are processed to map these values to the full-scale ranges of each robotic finger. Each full-scale value (that is, the control pulse width, in microseconds, required to move the finger either fully opened or fully closed) are defined as constants in the servo.cs class:

// Index finger
public const int INDEX_OPEN = 1808;
public const int INDEX_CLOSED = 800;
public const int INDEX_DEFAULT = 1750;
.
.
.

Individual constants are defined for each finger on the robotic hand, which match the Min and Max servo parameters that were flashed in the Micro Maestro controller board (see Figure 4). Similarly, the full-scale range of the finger foldedness data is defined in the software:

int fingerMin = 0;
int fingerMax = 100;

Since the finger foldedness range is the same for all fingers (that is, 0 to 100), the range only needs to be defined once and can be used for the data scaling operation performed for each finger as shown below:

// Index finger
int indexScaled = Convert.ToInt32((Servo.INDEX_OPEN - Servo.INDEX_CLOSED) *
   (index - fingerMin) / (fingerMax - fingerMin) + Servo.INDEX_CLOSED);

lblIndexScaled.Content = string.Format("Index Scaled: {0}", indexScaled);
Hand.MoveFinger(Servo.HandJoint.Index, Convert.ToUInt16(indexScaled));
.
.
.

Check Out the Video

To see the robotic hand in action, check out the YouTube video here: https://youtu.be/VQ93jw4Aocg

Summary

This software experiment took only a few hours to implement, once the basic control constraints of the servo motors were tested and understood. The Windows 10 desktop app was developed in C#/XAML, and it leveraged many of the features present in the Intel RealSense SDK APIs and code samples.

About Intel® RealSense™ Technology

To learn more about the Intel RealSense SDK for Windows, go to https://software.intel.com/en-us/intel-realsense-sdk.

About the Author

Bryan Brown is a software applications engineer in the Developer Relations Division at Intel.

Mystery Mansion* is a spooky, hidden-object adventure game from veteran game developer Cyrus Lum. The game took first place in the Pioneer track of the 2014 Intel® RealSense™ App Challenge, with its innovative, experimental approach drawing the attention of the judges.

As a self-described storyteller, Lum aims to “enhance that suspension of disbelief—to get [the] audience lost in the story and the world I’ve created.” Mystery Mansion is remarkable for its exclusive use of face tracking for the user interface (UI), which results in a highly immersive experience. However, the face-only approach posed a number of development challenges during Lum’s quest to implement intuitive controls and to create a satisfying user experience with a bare-bones UI. In this paper we’ll discuss the challenges encountered and the code Lum used to address them, including how to accurately calibrate the game’s UI for different players, and manage the movement and manipulation of objects in the environment with the intentionally limited control scheme.

The Mystery Mansion* splash screen, presenting its central, haunted-house theme.

Optimizations and Challenges

Face Tracking

The inspiration for Mystery Mansion came from Lum’s search for a way to use the capabilities of the Intel® RealSense™ SDK other than for hand gesture-control. This led to his decision to work on a game that would be controlled exclusively with face tracking.

After searching for a game theme and mechanics that would correspond with the necessarily simplified UI, Lum decided on a first-person, hidden-object game in the style of a point-and-click adventure. Controlling the game with the head alone, as if it were somehow detached from the body, inspired Lum’s idea of a ‘disembodied’ ghost as the game’s central playable character. In Mystery Mansion, the player takes the role of a spirit trapped in the mansion, looking to solve the mystery of its own death.

Nose and Eyes

The game requires that players visually explore the environment, locating and collecting a series of items. For lateral and vertical movement, the Intel® RealSense™ Camera F200 tracks the movement of the face—specifically the nose—allowing the player to intuitively look around the environment. The directional movement of the player’s face is reproduced by the reticule on-screen.

The red reticule is controlled by the movement of the player’s face, which is tracked by the Intel® RealSense™ camera.

Lum wanted players to be able to explore the environment in 360 degrees—to turn left and right, and even look behind them. To make this possible, he implemented a system whereby once the player crosses a certain lateral movement threshold, the field of view begins to shift in that direction around the space, with the movement accelerating as the player’s view moves toward the edge of the screen.

To zoom in and out of the environment, the Intel RealSense camera tracks the distance between the player’s eyes. The closer together the eyes are in relation to the camera, the further away the user is–hence the more zoomed out from the environment (with the opposite true for zooming in). The camera zoom is calibrated to ensure that the player doesn’t need to move too close to the screen to effectively zoom in on objects.

/* get face data and calculate camera FOV based on eye distance.  Use face data to rotate camera. */
	void OnFaceData(PXCMFaceData.LandmarksData data) {
		if (!visualizeFace) return;
		if (colorSize.width == 0 || colorSize.height == 0) return;

		PXCMFaceData.LandmarkPoint[] points;
		if (!data.QueryPoints(out points)) return;

		/* Use nose tip as the control point */
		PXCMPointF32 xy=points[data.QueryPointIndex(PXCMFaceData.LandmarkType.LANDMARK_NOSE_TIP)].image;

		/* Get Left and Right Eye to calculate distance */
		PXCMPointF32 xyEyeLeft=points[data.QueryPointIndex(PXCMFaceData.LandmarkType.LANDMARK_EYE_LEFT_CENTER)].image;
		PXCMPointF32 xyEyeRight=points[data.QueryPointIndex(PXCMFaceData.LandmarkType.LANDMARK_EYE_RIGHT_CENTER)].image;

		float tmpEye = Mathf.Abs ((xyEyeLeft.x - xyEyeRight.x) * eyeDistScale);
		if (tmpEye < eyeDistNear) tmpEye = eyeDistNear;
		if (tmpEye > eyeDistFar) tmpEye = eyeDistFar;

		/* Use eyes apart distance to change FOV */
		Camera.current.fieldOfView = eyeFOVcenter - tmpEye;

Code Sample 1: This code uses the distance between the user’s eyes to calculate his or her distance from the screen and then adjusts the field of view accordingly.

Optimizing the UI

Calibration

Lum observed that every player will have a slightly different way of playing the game, as well as different physical attributes in terms of the size, dimensions and configuration of the facial features. This meant that calibrating the face tracking at the start of each play session was key to making the directional controls function correctly for each individual player. Lum inserted a calibration stage at the start of each play session to establish the “zero position” of the player and to ensure that they can be tracked within the range of the camera.

To establish the “zero position” during the calibration stage, the player moves his or her head to position the reticule within a bordered area at the center of the screen. This ensures that the player is within the range of the camera (the tracking volume) when turning his or her head, or moving in and out. The process ensures a consistent experience for every player regardless of differences in face shape, size, and position in relation to the camera.

The calibration stage at the beginning of each play session helps ensure a consistent and accurate experience for each different player.

//check to see if target graphic is within the box
if (!calibrated && calibratedNose && gameStart && (295 * screenScaleWidth) < targetX && targetX < (345 * screenScaleWidth) && (235 * screenScaleHeight) < targetY && targetY < (285 * screenScaleHeight)) {
			calibrated = true;
			tutorialImg = tutorial1Img;
			LeanTween.alpha(tutorialRect, 1f, 2f) .setEase(LeanTweenType.easeInCirc).setDelay(1.5f);
			LeanTween.alpha(tutorialRect, 0f, 2f) .setEase(LeanTweenType.easeInCirc).setDelay(8f).setOnComplete (showTutorialTurn);
		}

Code Sample 2: This code calibrates the game by ensuring the target reticule is within the red box show on the calibration screenshot.

Lateral Movement

The full freedom of lateral camera movement is necessary to create the 360-degree field of view that Lum wanted to offer the player. Fundamental to ensuring an optimized user experience with the in-game camera was the implementation of a safety zone and rotational acceleration.

//— rotate camera based on face data ———
		/* Mirror the facedata input, normalize */
		xy.x=(1-(xy.x/colorSize.width));
		xy.y=(xy.y/colorSize.height);

		/* exponentially accelerate the rate of rotation when looking farther away from the center of the screen, use rotateAccelerationScale to adjust */
		newX = (0.5f-xy.x)*(rotateAccelerationScale*Mathf.Abs((0.5f-xy.x)*(0.5f-xy.x)));
		newY = (0.5f-xy.y)*(rotateAccelerationScale*Mathf.Abs((0.5f-xy.y)*(0.5f-xy.y)));

		/* Camera is a hierarchy  mainCamBase  with Main Camera as a child of mainCamBase.  We will horizontally rotate the parent mainCamBase  */
		mainCamBase.transform.Rotate(0, (newX * (lookTimeScale*Time.deltaTime)), 0);

		/* angleY is a rotation accumulator */
		angleY += newY;
		if (angleY > lookUpMin && angleY < lookUpMax) {

			mainCam.transform.Rotate ((newY * (lookTimeScale * Time.deltaTime)), 0, 0);
			if(angleY < lookUpMin) angleY = lookUpMin;
			if(angleY > lookUpMax) angleY = lookUpMax;
		}
		else angleY -= newY;

Code Sample 3: This code controls the rotation and lateral acceleration of the camera as the user turns his or her head further from the center of the screen.

If the player keeps the reticule within a specific zone at the center of the screen—approximately 50 percent of the horizontal volume—the camera does not begin to rotate, which ensures that the player can explore that area without the camera moving unintentionally. Once the reticule is moved outside that zone, the lateral movement begins in the direction the player is looking, and accelerates as he or she moves the reticule toward the edge of the screen. This gives the player accurate and intuitive 360-degree control.

Vertical Movement

Lum’s experiments showed that complete freedom of camera-movement was less practical on the vertical axis, because if both axes are in play, the player can become disoriented. Furthermore, little is to be gained by allowing players to look at the ceiling and floor with the interactive elements of the game occupying the lateral band of vision. However, players needed some vertical movement in order to inspect elements near the floor or on raised surfaces. To facilitate this, Lum allowed 30 degrees of movement in both up and down directions, a compromise that lets the player look around without becoming disoriented.

Gameplay Optimizations

After exploring a variety of gameplay mechanics, Lum decided to add simple puzzle solving, in addition to the core, object-collection gameplay. Puzzle solving fit with the overall game design and could be effectively and intuitively implemented using only the face-tracking UI.

The player picks up items in the game by moving the reticule over them, like a mouse cursor hovering over an object. At the rollover point, the object moves out of the environment toward the player—as if being levitated supernaturally—and enters the on-screen item inventory.

Here, the player has selected an object in the environment by moving his or her face to position the reticule over it, causing the object—in this case a pillbox—to move toward the player and into the inventory.

Ray ray = Camera.current.ScreenPointToRay(new Vector3((float)targetX, (float)Screen.height-(float)targetY, (float)0.1f));
RaycastHit hit;
if (Physics.Raycast(ray, out hit,100 )) {
	if(hit.collider.name.Contains("MysItem")){
		hit.collider.transform.Rotate(new Vector3(0,-3,0));
	}
}

Code Sample 4: This is the code that lets the player pick up objects using the face-tracking UI.

Lum also needed the player to move from room to room in a way that would fit with the simplified UI and the logic of the gameplay. To this end, each room has an exit door that is only activated once the player has collected all the available objects. At this point, the player moves his or her face to position the reticule over the door, causing the door to open and moving the game scene to the next space.

To add variety to the gameplay, Lum explored and added different puzzle mechanics that could be manipulated in similarly simple and logical ways. One of these is the block puzzle, where the player uses the directional control to move blocks within a frame. Lum implemented this action in multiple ways, including moving blocks within a picture frame to reconstruct an image and moving pipes on a door in order to unlock it.

else if(hit.collider.name.Contains("puzzleGame")){
	room currentRoom = (room)rooms[roomID];
	puzzleItem tmpPuzzleItem = (puzzleItem)currentRoom.puzzleItems[hit.collider.name];
	if(!LeanTween.isTweening(hit.collider.gameObject) && !tmpPuzzleItem.solved){
		if(hit.collider.name.Contains("_Rot")){
			LeanTween.rotateAroundLocal ( hit.collider.gameObject, Vector3.up, 90f, 2f ).setEase (LeanTweenType.easeInOutCubic).setOnCompleteParam (hit.collider.gameObject).setOnComplete (puzzleAnimFinished);
		}
	}
}

Code Sample 5: This code allows players to move blocks using face-tracking in order to complete some of the game’s puzzles.

The puzzles include this block picture. The player uses directional control to move pieces and to reconstruct the image resulting in finding a clue.

In this puzzle, the player uses the face-tracked directional control to move the segments of piping.

Text prompts also appear on the screen to help the player determine the next step, for example, when he or she needs to exit the room.

Testing and Analysis

Mystery Mansion was developed in its entirety by Lum, who also conducted the majority of the testing. During development, however, he called on the services of three friends to test the game.

In the world of video games, no two players play a game exactly the same way—what seems intuitive to one player will not necessarily be so for another. This difference quickly became evident to Lum during the external testing, particularly with the face tracking. Testers had difficulty advancing through the game because they would position themselves differently, or simply because of differences in facial size and composition. Lum’s observations led to the implementation of the calibration stage prior to play and the addition of a tutorial at the beginning of the game to ensure that the basics of the UI are well understood.

The Tutorial stage helps players understand how to interact with the game using only the face-tracking interface.

Key Lessons

For Lum, when working with Intel RealSense technology and human interface technology, simplicity is absolutely key; a point that was driven home through his work on Mystery Mansion, where the UI is limited exclusively to face tracking. He’s a firm believer in the importance of not trying to do too much in terms of adding mechanics and features—even if it initially seems cool. Moving through the environment and manipulating objects using only face-tracking required careful iteration of the stripped-down UI, and a degree of tutorial “hand-holding” to ensure that the player was never in a position of not knowing what to do or how to advance through the game.

Testing played a key role in the development of Mystery Mansion. Lum found that developers should not assume that what works for one player will automatically be true for another. Every player will behave differently with the game, and in terms of the human interface of the Intel RealSense camera, each player’s face and hands will have different size, shape, movement and positional attributes which must be compensated for in the code.

The resources provided with the Intel RealSense SDK's Unity Toolkit ensured a straightforward development environment for Lum. Unity* is a user-friendly development environment that has well-tested and thorough compatibility with the Intel RealSense SDK, a wealth of resources (including those provided with the Intel RealSense SDK), a strong support community, and a ready stock of graphical assets from the Asset Store.

Lum believes that developers should always consider the physical impact of prolonged play times with hand gesture controls, which can sometimes lead to limb fatigue if the UI is not thoughtfully balanced for the player.

Tools and Resources

Lum found the process of developing his game using the Intel RealSense SDK straightforward. He also dedicated time to reviewing the available demos to pick up practical tips, including the Nine Cubes sample provided with the Intel RealSense SDK.

Unity

Lum chose to develop the game using Unity, which is readily compatible with the Intel RealSense SDK and offers a complete development environment. While Lum is an accomplished programmer in C#, the Unity platform made much of the basic programming required unnecessary, allowing him to iterate quickly in terms of developing and testing prototypes.

MonoDevelop*

To develop the C# game scripts, Lum used MonoDevelop, the integrated development environment supplied with Unity. Within MonoDevelop, Lum placed objects, set up properties, added behaviors and logic, and wrote scripts for the integration of the Intel RealSense camera data.

Nine Cubes

One of the fundamental building blocks for building Mystery Mansion was the Nine Cubes sample, which is a Unity software sample provided with the Intel RealSense SDK (it can be found in the frameworks folder of the samples directory in the SDK). This demo allows the user to move a cube using face tracking—specifically nose tracking. This functionality became the foundation of Mystery Mansion’s UI.

Unity Technologies Asset Store

Having already had experience with the Unity Technologies Asset Store for a previous project, it was Lum’s go-to place for the graphic elements of Mystery Mansion, essentially saving time and making it possible to singlehandedly develop a visually rich game. Serendipitously, he was looking for assets during the Halloween period, so creepy visuals were easy to find.

What’s Next for Mystery Mansion

Since submitting Mystery Mansion for the Intel RealSense App Challenge, Lum has continued to experiment with features that help create an even more immersive experience. For example, recent iterations allow the player to look inside boxes or containers by slowly leaning in to peer inside. This action eventually triggers something to pop out, creating a real moment of visceral fright. Lum’s takeaway is that the more pantomiming of real physical actions in the game, the greater the immersion in the experience and the more opportunities to emotionally involve and engage the player.

To date, Mystery Mansion has been designed principally for laptops and desktop PCs equipped with Intel RealSense user-facing cameras. Lum has already conducted tests with the Google Tango* tablet and is eager to work on tablet and mobile platforms with Intel RealSense technology, particularly in the light of the ongoing collaboration between Intel and Google to bring Project Tango to mobile phone devices with Intel RealSense technology.

Intel RealSense SDK: Looking Forward

In Lum’s experience, context is crucial for the successful implementation of Intel RealSense technology. Lum is particularly excited about the possibilities this technology presents in terms of 3D scanning of objects and linking that to the increasingly accessible world of 3D printing.

As for Lum’s own work with human interface technology, he is currently pursuing the ideas he began exploring with another of his recent Intel RealSense SDK projects, My Pet Shadow, which won first place in the 2013 Intel Perceptual Computing Challenge. My Pet Shadow is a “projected reality” prototype that uses an LCD projector to cast a shadow that the user can interact with in different ways. It’s this interactive fusion of reality and the digital realm that interests Lum, and it’s a direction he intends to pursue as he continues to push the possibilities of Intel RealSense technology.

Lum’s Intel® RealSense™ project, My Pet Shadow, took first place in the 2013 Intel® Perceptual Computing Challenge.

About the Developer

Cyrus Lum has over 25 years of experience in game production, development and management roles, with both publishing and independent development companies, including Midway Studios in Austin, Texas, Inevitable Entertainment Inc., Acclaim Entertainment, and Crystal Dynamics. His roles have ranged from art director to co-founder and VP of digital productions. Currently, Lum is an advisor to Phunware Inc., and Vice president of Technology for 21 Pink, a game software development company. He has also served on the Game Developer Conference Advisory Board since 1997.

Additional Resources

Cyrus Lum Web Site

MonoDevelop

Unity

Intel® Developer Zone for Intel® RealSense™ Technology

Intel® RealSense™ SDK

Intel® RealSense™ Developer Kit

Intel RealSense Technology Tutorials

Download PDF
Download Code Sample

Ryan Measel and Ashwin Sinha

1. Introduction

Perceptual computing is the next step in human-computer interaction. It encompasses technologies that sense and understand the physical environment including gestures, voice recognition, facial recognition, motion tracking, and environment reconstruction. Advanced Intel® RealSense™ cameras F200 and R200 are at the forefront of the perceptual computing frontier. Depth sensing capabilities allow the F200 and R200 to reconstruct the 3D environment and track a device’s motion relative to the environment. The combination of environment reconstruction and motion tracking enables augmented reality experiences where virtual assets are seamlessly intertwined with reality.

While the Intel RealSense cameras can provide the data to power augmented reality applications, it is up to developers to create immersive experiences. One method of bringing an environment to life is through the use of autonomous agents. Autonomous agents are entities that act independently using artificial intelligence. The artificial intelligence defines the operational parameters and rules by which the agent must abide. The agent responds dynamically in real time to its environment, so even a simple design can result in complex behavior.

Autonomous agents can exist in many forms; though, for this discussion, the focus will be restricted to agents that move and navigate. Examples of such agents include non-player characters (NPCs) in video games and birds flocking in an educational animation. The goals of the agents will vary depending on the application, but the principles of their movement and navigation are common across all.

The intent of this article is to provide an introduction to autonomous navigation and demonstrate how it's used in augmented reality applications. An example is developed that uses the Intel RealSense camera R200 and the Unity* 3D Game Engine. It is best to have some familiarity with the Intel® RealSense™ SDK and Unity. For information on integrating the Intel RealSense SDK with Unity, refer to: “Game Development with Unity* and Intel® RealSense™ 3D Camera” and “First look: Augmented Reality in Unity with Intel® RealSense™ R200.”

2. Autonomous Navigation

Agent-based navigation can be handled in a number of ways ranging from simple to complex, both in terms of implementation and computation. A simple approach is to define a path for the agent to follow. A waypoint is selected, then the agent moves in a straight line towards it. While easy to implement, the approach has several problems. Perhaps the most obvious: what happens if a straight path does not exist between the agent and the waypoint (Figure 1)?

Figure 1. An agent moves along a straight path towards the target, but the path can become blocked by an obstacle. Note: This discussion applies to navigation in both 2D and 3D spaces, but 2D is used for illustrative purposes.

More waypoints need to be added to route around obstacles (Figure 2).

Figure 2. Additional waypoints are added to allow the agent to navigate around obstacles.

On bigger maps with more obstacles, the number of waypoints and paths will often be much larger. Furthermore, a higher density of waypoints (Figure 3) will allow for more efficient paths (less distance traveled to reach the destination).

Figure 3. As maps grow larger, the number of waypoints and possible paths increases significantly.

A large number of waypoints necessitates a method of finding a path between non-adjacent waypoints. This problem is referred to as pathfinding. Pathfinding is closely related to graph theory and has applications in many fields besides navigation. Accordingly, it is a heavily researched topic, and many algorithms exist that attempt to solve various aspects of it. One of the most prominent pathfinding algorithms is A*. In basic terms, the algorithm traverses along adjacent waypoints towards the desired destination and builds a map of all waypoints it visits and the waypoints connected to them. Once the destination is reached, the algorithm calculates a path using its generated map. An agent can then follow along the path. A* does not search the entire space, so the path is not guaranteed to be optimal. It is computationally efficient though.

Figure 4. The A* algorithm traverses a map searching for a route to the target.Animation by Subh83 / CC BY 3.0

A* is not able to adapt to dynamic changes in the environment such as added/removed obstacles and moving boundaries. Environments for augmented reality are dynamic by nature, since they build and change in response to the user’s movement and physical space.

For dynamic environments, it is preferable to let agents make decisions in real time, so that all current knowledge of the environment can be incorporated into the decision. Thus, a behavior framework must be defined so the agent can make decisions and act in real time. With respect to navigation, it is convenient and common to separate the behavior framework into three layers:

Action Selection is comprised of setting goals and determining how to achieve those goals. For example, a bunny will wander around looking for food, unless there is a predator nearby, in which case, the bunny will flee. State machines are useful for representing such behavior as they define the states of the agent and the conditions under which states change.
Steering is the calculation of the movement based on the current state of the agent. If the bunny is being chased by the predator, it should flee away from the predator. Steering calculates both the magnitude and direction of the movement force.
Locomotion is the mechanics through which the agent moves. A bunny, a human, a car, and a spaceship all move in different ways. Locomotion defines both how the agent moves (e.g., legs, wheels, thrusters, etc.) and the parameters of that motion (e.g., mass, maximum speed, maximum force, etc.).

Together these layers form the artificial intelligence of the agent. In Section 3, we'll show a Unity example to demonstrate the implementation of these layers. Section 4 will integrate the autonomous navigation into an augmented reality application using the R200.

3. Implementing Autonomous Navigation

This section walks through the behavior framework in a Unity scene for autonomous navigation described above from the ground up, starting with locomotion.

Locomotion

The locomotion of the agent is based on Newton’s laws of motions where force applied to mass results in acceleration. We will use a simplistic model with uniformly distributed mass that can have force applied in any direction to the body. To constrain the movement, the maximum force and the maximum speed must be defined (Listing 1).

public float mass = 1f;            // Mass (kg)
public float maxSpeed = 0.5f;      // Maximum speed (m/s)
public float maxForce = 1f;        // "Maximum force (N)

Listing 1. The locomotion model for the agent.

The agent must have a rigidbody component and a collider component that are initialized on start (Listing 2). Gravity is removed from the rigidbody for simplicity of the model, but it is possible to incorporate.

private void Start () {

	// Initialize the rigidbody
	this.rb = GetComponent<rigidbody> ();
	this.rb.mass = this.mass;
	this.rb.useGravity = false;

	// Initialize the collider
	this.col = GetComponent<collider> ();
}

Listing 2. The rigidbody and collider components are initialized on Start().

The agent is moved by applying force to the rigidbody in the FixedUpdate() step (Listing 3). FixedUpdate() is similar to Update(), but it is guaranteed to execute at a consistent interval (which Update() is not). The Unity engine performs the physics calculations (operations on rigidbodies) at the completion of the FixedUpdate() step.

private void FixedUpdate () {

	Vector3 force = Vector3.forward;

	// Upper bound on force
	if (force.magnitude > this.maxForce) {
		force = force.normalized * this.maxForce;
	}

	// Apply the force
	rb.AddForce (force, ForceMode.Force);

	// Upper bound on speed
	if (rb.velocity.magnitude > this.maxSpeed) {
		rb.velocity = rb.velocity.normalized * this.maxSpeed;
	}
}

Listing 3. Force is applied to rigidbody in the FixedUpdate() step. This example moves the agent forward along the Z axis.

If the magnitude of the force exceeds the maximum force of the agent, it is scaled such that its magnitude is equivalent to the maximum force (direction is preserved). The AddForce () function applies the force via numerical integration:

Equation 1. Numerical integration of velocity. The AddForce() function performs this calculation.

where is the new velocity, is the previous velocity, is the force, is the mass, and is the time step between updates (the default fixed time step in Unity is 0.02 s). If the magnitude of the velocity exceeds the maximum speed of the agent, it is scaled such that its magnitude is equivalent to the maximum speed.

Steering

Steering calculates the force that will be supplied to the locomotion model. Three steering behaviors will be implemented: seek, arrive, and obstacle avoidance.

Seek

The Seek behavior attempts to move towards a target as fast as possible. The desired velocity of the behavior points directly at the target at maximum speed. The steering force is calculated as the difference between the desired and current velocity of the agent (Figure 5).

Figure 5. The Seek behavior applies a steering force from the current velocity to the desired velocity.

The implementation (Listing 4) first computes the desired vector by normalizing the offset between the agent and the target and multiplying it by the maximum speed. The steering force returned is the desired velocity minus the current velocity, which is the velocity of the rigidbody.

private Vector3 Seek () {

	Vector3 desiredVelocity = (this.seekTarget.position - this.transform.position).normalized * this.maxSpeed;
	return desiredVelocity - this.rb.velocity;
}

Listing 4. Seek steering behavior.

The agent uses the Seek behavior by invoking Seek() when it computes the force in FixedUpdate() (Listing 5).

private void FixedUpdate () {

	Vector3 force = Seek ();
	...

Listing 5. Invoking Seek () in FixedUpdate ().

An example of the Seek behavior in action is shown in Video 1. The agent has a blue arrow that indicates the current velocity of the rigidbody and a red arrow that indicates the steering force being applied in that time step.

Video 1. The agent initially has a velocity orthogonal to the direction of the target, so its motion follows a curve.

Arrive

The Seek behavior overshoots and oscillates around the target, because it was traveling as fast as possible to the reach target. The Arrive behavior is similar to the Seek behavior except that it attempts to come to a complete stop at the target. The “deceleration radius” parameter defines the distance from the target at which the agent will begin to decelerate. When the agent is within the deceleration radius, the desired velocity will be scaled inversely proportional to the distance between the agent and the target. Depending on the maximum force, maximum speed, and deceleration radius, it may not be able to come to a complete stop.

The Arrive behavior (Listing 6) first calculates the distance between the agent and the target. A scaled speed is calculated as the maximum speed scaled by the distance divided by the deceleration radius. The desired speed is taken as the minimum of the scaled speed and maximum speed. Thus, if the distance to the target is less than the deceleration radius, the desired speed is the scaled speed. Otherwise, the desired speed is the maximum speed. The remainder of the function performs exactly like Seek using the desired speed.

// Arrive deceleration radius (m)
public float decelerationRadius = 1f;

private Vector3 Arrive () {

	// Calculate the desired speed
	Vector3 targetOffset = this.seekTarget.position - this.transform.position;
	float distance = targetOffset.magnitude;
	float scaledSpeed = (distance / this.decelerationRadius) * this.maxSpeed;
	float desiredSpeed = Mathf.Min (scaledSpeed, this.maxSpeed);

	// Compute the steering force
	Vector3 desiredVelocity = targetOffset.normalized * desiredSpeed;
	return desiredVelocity - this.rb.velocity;
}

Listing 6. Arrive steering behavior.

Video 2. The Arrive behavior decelerates as it reaches the target.

Obstacle Avoidance

The Arrive and Seek behaviors are great for getting places, but they are not suited for handling obstacles. In dynamic environments, the agent will need to be able to avoid new obstacles that appear. The Obstacle Avoidance behavior looks ahead of the agent along the intended path and determines if there are any obstacles to avoid. If obstacles are found, the behavior calculates a force that alters the path of the agent to avoid the obstacle (Figure 6).

Figure 6. When an obstacle is detected along the current trajectory, a force is returned that prevents the collision.

The implementation of Obstacle Avoidance (Listing 7) uses a spherecast to detect collisions. The spherecast casts a sphere along the current velocity vector of the rigidbody and returns a RaycastHit for every collision. The sphere originates from the center of the agent and has a radius equal to the radius of the agent’s collider plus an “avoidance radius” parameter. The avoidance radius allows the user to define the clearance around the agent. The cast is limited to traveling the distance specified by the “forward detection” parameter.

// Avoidance radius (m). The desired amount of space between the agent and obstacles.
public float avoidanceRadius = 0.03f;
// Forward detection radius (m). The distance in front of the agent that is checked for obstacles.
public float forwardDetection = 0.5f;

private Vector3 ObstacleAvoidance () {

	Vector3 steeringForce = Vector3.zero;

	// Cast a sphere, that bounds the avoidance zone of the agent, to detect obstacles
	RaycastHit[] hits = Physics.SphereCastAll(this.transform.position, this.col.bounds.extents.x + this.avoidanceRadius, this.rb.velocity, this.forwardDetection);

	// Compute and sum the forces across all hits
	for(int i = 0; i < hits.Length; i++)    {

		// Ensure that the collidier is on a different object
		if (hits[i].collider.gameObject.GetInstanceID () != this.gameObject.GetInstanceID ()) {

			if (hits[i].distance > 0) {

				// Scale the force inversely proportional to the distance to the target
				float scaledForce = ((this.forwardDetection - hits[i].distance) / this.forwardDetection) * this.maxForce;
				float desiredForce = Mathf.Min (scaledForce, this.maxForce);

				// Compute the steering force
				steeringForce += hits[i].normal * desiredForce;
			}
		}
	}

	return steeringForce;
}

Listing 7. Obstacle Avoidance steering behavior.

The spherecast returns an array of RaycastHit objects. A RaycastHit contains information about a collision including the distance to the collision and the normal of the surface that was hit. The normal is a vector that is orthogonal to the surface. Accordingly, it can be used to direct the agent away from the collision point. The magnitude of the force is determined by scaling the maximum force inversely proportional to the distance from the collision. The forces for each collision are summed, and the result produced is the total steering force for a single time step.

Separate behaviors can be combined together to create more complex behaviors (Listing 8). Obstacle Avoidance is only useful when it works in tandem with other behaviors. In this example (Video 3), Obstacle Avoidance and Arrive are combined together. The implementation combines the behaviors simply by summing their forces. More complex schemes are possible that incorporate heuristics to determine priority weighting on forces.

private void FixedUpdate () {

	// Calculate the total steering force by summing the active steering behaviors
	Vector3 force = Arrive () + ObstacleAvoidance();
	...

Listing 8. Arrive and Obstacle Avoidance are combined by summing their forces.

Video 3. The agent combines two behaviors, Arrive and Obstacle Avoidance.

Action Selection

Action selection is the high level goal setting and decision making of the agent. Our agent implementation already incorporates a simple action selection model by combining the Arrive and Obstacle Avoidance behaviors. The agent attempts to arrive at the target, but it will adjust its trajectory when obstacles are detected. The “Avoidance Radius” and “Forward Detection” parameters of Obstacle Avoidance define when action will be taken.

4. Integrating the R200

Now that the agent is capable of navigating on its own, it is ready to be incorporated into an augmented reality application.

The following example is built on top of the “Scene Perception” example that comes with the Intel RealSense SDK. The application will build a mesh using Scene Perception, and the user will be able to set and move the target on the mesh. The agent will then navigate around the generated mesh to reach the target.

Scene Manager

A scene manager script initializes the scene and handles the user input. Touch up (or mouse click release, if the device does not support touch) is the only input. A raycast from the point of the touch determines if the touch is on the generated mesh. The first touch spawns the target on the mesh; the second touch spawns the agent; and every subsequent touch moves the position of the target. A state machine handles the control logic (Listing 9).

// State machine that controls the scene:
//         Start => SceneInitialized -> TargetInitialized -> AgentInitialized
private enum SceneState {SceneInitialized, TargetInitialized, AgentInitialized};
private SceneState state = SceneState.SceneInitialized;    // Initial scene state.

private void Update () {

	// Trigger when the user "clicks" with either the mouse or a touch up gesture.
	if(Input.GetMouseButtonUp (0)) {
		TouchHandler ();
	}
}

private void TouchHandler () {

	RaycastHit hit;

	// Raycast from the point touched on the screen
	if (Physics.Raycast (Camera.main.ScreenPointToRay (Input.mousePosition), out hit)) {

	 // Only register if the touch was on the generated mesh
		if (hit.collider.gameObject.name == "meshPrefab(Clone)") {

			switch (this.state) {
			case SceneState.SceneInitialized:
				SpawnTarget (hit);
				this.state = SceneState.TargetInitialized;
				break;
			case SceneState.TargetInitialized:
				SpawnAgent (hit);
				this.state = SceneState.AgentInitialized;
				break;
			case SceneState.AgentInitialized:
				MoveTarget (hit);
				break;
			default:
				Debug.LogError("Invalid scene state.");
				break;
			}
		}
	}
}

Listing 9. The touch handler and state machine for the example application.

The Scene Perception feature generates lots of small meshes. These meshes typically have less than 30 vertices. The positioning of the vertices is susceptible to variance, which results in some meshes being angled differently than the surface it resides on. If an object is placed on top of the mesh (e.g., a target or an agent), the object will be oriented incorrectly. To circumvent this issue, the average normal of the mesh is used instead (Listing 10).

private Vector3 AverageMeshNormal(Mesh mesh) {

	Vector3 sum = Vector3.zero;

	// Sum all the normals in the mesh
	for (int i = 0; i < mesh.normals.Length; i++){
		sum += mesh.normals[i];
	}

	// Return the average
	return sum / mesh.normals.Length;
}

Listing 10. Calculate the average normal of a mesh.

Building the Application

All code developed for this example is available on Github.

The following instructions integrate the scene manager and agent implementation into an Intel® RealSense™ application.

Open the “RF_ScenePerception” example in the Intel RealSense SDK folder “RSSDK\framework\Unity”.
Download and import the AutoNavAR Unity package.
Open the “RealSenseExampleScene” in the “Assets/AutoNavAR/Scenes/” folder.
Build and run on any device compatible with an Intel RealSense camera R200.

Video 4. The completed integration with the Intel® RealSense™ camera R200.

5. Going Further with Autonomous Navigation

We developed an example that demonstrates an autonomous agent in an augmented reality application using the R200. There are several ways in which this work could be extended to improve the intelligence and realism of the agent.

The agent had a simplified mechanical model with uniform mass and no directional movement restrictions. A more advanced locomotive model could be developed that distributes mass non-uniformly and constrains the forces applied to the body (e.g., a car with differing acceleration and braking forces, a spaceship with main and side thrusters). More accurate mechanical models will result in more realistic movement.

Craig Reynolds was the first to extensively discuss steering behaviors in the context of animation and games. The Seek, Arrive, and Obstacle Avoidance behaviors that were demonstrated in the example find their origins in his work. Reynolds described other behaviors including Flee, Pursuit, Wander, Explore, Obstacle Avoidance, and Path Following. Group behaviors are also discussed including Separation, Cohesion, and Alignment. “Programming Game AI by Example” by Mat Buckland is another useful resource that discusses the implementation of these behaviors as well as a number of other related concepts including state machines and pathfinding.

In the example, both the Arrive and Obstacle Avoidance steering behaviors are applied to the agent simultaneously. Any number of behaviors can be combined in this way to create more complex behaviors. For instance, a flocking behavior is built from the combination of Separation, Cohesion, and Alignment. Combining behaviors can sometimes produce unintuitive results. It is worth experimenting with types of behaviors and their parameters to discover new possibilities.

Additionally, some pathfinding techniques are intended for use in dynamic environments. The D* algorithm is similar to A*, but it can update the path based on new observations (e.g., added/removed obstacles). D* Lite operates in the same fashion as D* and is simpler to implement. Pathfinding can also be used in conjunction with steering behaviors by setting the waypoints and allowing steering to navigate to those points.

While action selection has not been discussed in this work, it is widely studied in game theory. Game theory investigates the mathematics behind strategy and decision making. It has applications in a many fields including economics, political science, and psychology to name a few. With respect to autonomous agents, game theory can inform how and when decisions are made. “Game Theory 101: The Complete Textbook” by William Spaniel is a great starting point and has a companion YouTube series.

6. Conclusion

An arsenal of tools exist that you can use to customize the movement, behavior, and actions of agents. Autonomous navigation is particularly well suited for dynamic environments, such as those generated by Intel RealSense cameras in augmented reality applications. Even simple locomotion models and steering behaviors can produce complex behavior without prior knowledge of the environment. The multitude of available models and algorithms allows for the flexibility to implement an autonomous solution for nearly any application.

About the Authors

Ryan Measel is a Co-Founder and CTO of Fantasmo Studios. Ashwin Sinha is a founding team member and developer. Founded in 2014, Fantasmo Studios is a technology-enabled entertainment company focused on content and services for mixed reality applications.

Abstract: This document describes Intel® Memory Protection Extensions (Intel® MPX), its motivation, and programming model. It also describes the enabling requirements and the current status of enabling in the supported OSs: Linux* and Windows* and compilers: Intel® C++ Compiler, GCC, and Visual C++*. Finally, the paper describes how ISVs can incrementally enable bounds checking in their Intel MPX applications.

Introduction

C/C++ pointer arithmetic is a convenient language construct often used to step through an array of data structures. If an iterative write operation does not take into consideration the bounds of the destination, then adjacent memory locations may get corrupted. Such modification of adjacent data not intended by the developer is referred as a Buffer Overflow. Similarly, uncontrolled reads could reveal cryptographic keys and passwords. Buffer overflows have been known to be exploited, causing Denial of Service (DOS) attacks and system crashes. More sinister attacks, which do not immediately draw the attention of the user or system administrator, alter the code execution path, such as modifying the return address in the stack frame, to execute malicious code or script.

Intel’s Execute Disable Bit and similar hardware features from other vendors have blocked all buffer overflow attacks that redirected the execution to malicious code stored as data. Various other techniques adopted by compiler vendors to mitigate buffer overflow problems can be found in the references.

Intel® MPX technology consists of new Intel® architecture instructions and registers that C/C++ compilers can use to check the bounds of a pointer before it is used. This new hardware technology will be enabled in future Intel® processors. The supported compilers are Intel® C/C++ compiler, GCC (GNU C/C++ compiler) and Microsoft* Visual C++* .

Download Code Sample

Using swap chains in D3D12 has additional complexity compared to D3D11. Only flip model [1] swap chains may be used with D3D12. There are many parameters that must be selected, such as: the number of buffers, number of in-flight frames, the present SyncInterval, and whether or not WaitableObject is used. We developed this application internally to help understand the interaction between the different parameters, and to aid in the discovery of the most useful parameter combinations.

The application consists of an interactive visualization of the rendered frames as they progress from CPU to GPU to display and through the present queue. All of the parameters can be modified in real time. The effects on framerate and latency can be observed via the on-screen statistics display.

Sample App Direct3D

Figure 1:An annotated screenshot of the sample application

Swap Chain Parameters

These are the parameters used to investigate D3D12 swap chains.

Parameter	Description
Fullscreen	True if the window covers the screen (i.e. borderless windowed mode). NOTE: Different than SetFullscreenState, which is for exclusive mode.
Vsync	Controls the SyncInterval parameter of the Present() function.
Use Waitable Object	Whether or not the swap chain is created with DXGI_SWAP_CHAIN_FLAG_FRAME_LATENCY_WAITABLE_OBJECT
Maximum Frame Latency	The value passed to SetMaximumFrameLatency. Ignored if “Use Waitable Object” is not enabled. Without waitable object, the effective Maximum Frame Latency is 3.
BufferCount	The value specified in DXGI_SWAP_CHAIN_DESC1::BufferCount.
FrameCount	The maximum number of “game frames” that will be generated on the CPU before waiting for the earliest one to complete. A game frame is a user data structure and its completion on the GPU is tracked with D3D12 fences. Multiple game frames can point to the same swap chain buffer.

Additional Parameters

These parameters were included in the swap chain investigation. However, these parameters had fixed values. As their value was fixed, we additionally list why each value was fixed and not variable.

Parameter	Description
Exclusive mode	SetFullscreenState is never called in the sample because the present statistics mechanism does not work in exclusive mode.
SwapEffect	The value specified in DXGI_SWAP_CHAIN_DESC1::SwapEffect. Always set to DXGI_SWAP_EFFECT_FLIP_DISCARD. DISCARD is the least specified behavior, which affords the OS the most flexibility to optimize presentation. The only other choice, DXGI_SWAP_EFFECT_FLIP_SEQUENTIAL, is only useful for operations which involve reusing image regions from previous presents (e.g. scroll rectangles).

Understanding BufferCount and FrameCount

BufferCount is the number of buffers in the swap chain. With flip model swap chains, the operating system may lock one buffer for an entire vsync interval while it is displayed, so the number of buffers available to the application to write is actually BufferCount-1. If BufferCount = 2, then there is only one buffer to write to until the OS releases the second one at the next vsync. A consequence of this is that the frame rate cannot exceed the refresh rate.

When BufferCount >= 3, there are at least 2 buffers available to the application which it can cycle between (assuming SyncInterval=0), which allows the frame rate to be unlimited.

FrameCount is the maximum number of in-flight “render frames,” where a render frame is the set of resources and buffers that the GPU must perform the rendering. If FrameCount = 1, then the CPU will not build the next render frame until the previous one is completely processed. This means that FrameCount must be at least 2 for the CPU and GPU to be able to work in parallel.

Maximum Frame Latency, and how “Waitable Object” Reduces Latency

Latency is the time between when a frame is generated, and when it appears on screen. Therefore, to minimize latency in a display system with fixed intervals (vsyncs), frame generation must be delayed as long as possible.

The maximum number of queued present operations is called the Maximum Frame Latency. When an application tries to queue an additional present after reaching this limit, Present() will block until one of the previous frames has been displayed.

Any time that the render thread spends blocked on the Present function occurs between frame generation and frame display, so it directly increases the latency of the frame being presented. This is the latency which is eliminated by the use of the “waitable object.”

Conceptually, the waitable object can be thought of as a semaphore which is initialized to the Maximum Frame Latency, and signaled whenever a present is removed from the Present Queue. If an application waits for the semaphore to be signalled before rendering then the present queue is not full (so Present will not block), and the latency is eliminated.

Top parameter presets

The results of our investigation gave three different “best” values depending on your requirements. These are the parameter combos we thought are best suited for games.

Direct3D Game Mode

Game Mode

Vsync On
3 Buffers, 2 Frames
Waitable Object, Maximum Frame Latency 2

Game mode is a balanced tradeoff between latency and throughput.

Direct3D Classic Mode

Classic Game mode

Vsync On
3 Buffers, 3 Frames
Not using waitable object

This implicitly happens under D3D11 with triple buffering, hence “classic.” Classic game mode prioritizes throughput. The extra frame queueing can absorb spikes better but at the expense of latency.

Direct3D Minimum Latency

Minimum Latency

Vsync On
2 Buffers, 1 Frame
Waitable Object, Maximum Frame Latency 1

The absolute minimum amount of latency without using VR-style vsync racing tricks. If the application misses vsync, the frame rate will immediately drop to ½ refresh. CPU and GPU operate serially rather than in parallel.

App version

The source code includes a project file for building the sample as a Windows 10 Universal App. The only difference in the Direct3D code is calling CreateSwapChainForCoreWindow instead of CreateSwapChainForHWND.

If you wish to try the app version without compiling it yourself, here is a link to the Windows Store page: https://www.microsoft.com/store/apps/9NBLGGH6F7TT

References

1 - “DXGI Flip Model.” https://msdn.microsoft.com/en-us/library/windows/desktop/hh706346%28v=vs.85%29.aspx

2 - ”Reduce Latency with DXGI 1.3 Swap Chains.” https://msdn.microsoft.com/en-us/library/windows/apps/dn448914.aspx

3 - DirectX 12: Presentation modes in Windows 10. https://www.youtube.com/watch?v=E3wTajGZOsA

4 - DirectX 12: Unthrottled Framerate. https://www.youtube.com/watch?v=wn02zCXa9IU

Download Demo Files ZIP 14KB

The Intel® RealSense™ camera is a vital tool for creating VR and AR projects. Part 2 of this article will lay out how to use the Intel RealSense camera nodes in TouchDesigner to set up a render or real-time projections for multi screens, single screen, 180 degree (FullDome) and 360 degree VR renders. In addition the Intel RealSense™ camera information can be sent out to an Oculus Rift* through the TouchDesigner Oculus Rift TOP node.

Part 2 will focus on the RealSense CHOP node in TouchDesigner.

The RealSense CHOP node in TouchDesigner is where the powerful tracking features of the RealSense F200 and R200 cameras, such as eye tracking, finger tracking, and face tracking can be accessed. These tracking features are especially exciting for use in setting up real time animations and/or tracking these animations to body/gesture movements of performers. I find this particularly useful for live performances of dancers or musicians where I want a high level of interactivity between live video, animations, graphics and sound as well as the performers.

To get the TouchDesigner (.toe) files that go with this article, click the button at the top of the page. A free non-commercial copy of TouchDesigner is available too and is fully functional except that the highest resolution is limited to 1280 x 1280.

Once again it is worth noting that the support of the Intel RealSense camera in TouchDesigner makes it an even more versatile and powerful tool.

Note: Like Part 1 of this article, Part 2 is aimed at those familiar with using TouchDesigner and its interface. If you are unfamiliar with TouchDesigner and plan to follow along with this article step-by-step, I recommend that you first review some of the documentation and videos available here: Learning TouchDesigner.

Note: When using the Intel RealSense camera, it is important to pay attention to its range for best results. On this Intel web page you will find the range of the camera and best operating practices for using it.

A Bit of Historical Background

All of the data the Intel RealSense cameras can provide is extremely useful for creating VR and AR. Some early attempts at what the Intel RealSense camera can now do took place in the 1980s. Hand positioning tracking technology came out in the 1980s in the form of the data glove developed by Jason Lanier and Thomas G. Zimmerman, and in 1987 Nintendo came out with the first wired glove available to consumers for gaming in their Nintendo entertainment system.

Historically the Intel RealSense camera also has roots in performance animation, which uses motion capture technologies to translate a live motion event into usable math, thus translating a live performance into digital information that can be used for a performance. Motion capture was used as early as the 1970s in research projects at various universities and in the military for training. One of the first animations turned out utilizing motion capture data to create an animated performance was “Sexy Robot” https://www.youtube.com/watch?v=eedXpclrKCc in 1985 by Robert Abel and Associates. Sexy Robot used several techniques for getting in the information to create the digital robot model and then to animate it. First, a practical model was made of the robot. It was measured in all dimensions and the information to describe it was input in numbers, something the RealSense camera can now get from scanning the object. Then for the motion in Sexy Robot, dots were painted on a real person and used to make skeleton drawings on the computer creating a vector animation that was then used to animate the digital model. The RealSense camera is a big improvement on this with its infrared camera and an infrared laser projector, which provides the data from which digital models can be made as well as providing the data for tracking motion. The tracking capabilities of the Intel RealSense camera are very refined, making even eye tracking possible.

About the Intel RealSense Cameras

There are currently two types of Intel RealSense cameras that perform many of the same functions with slight variations: The Intel RealSense camera F200, for which the exercises in this article are designed, and the Intel RealSense camera R200.

The Intel RealSense R200 camera with its tiny size has many advantages as it is designed to mount on a tripod or be placed on the back of a tablet. Thus, the camera is not focused on the user but on the world, and with its increased scanning capabilities, it can scan over a larger area. It also has advanced depth-measuring capabilities. The camera’s use will be exciting for Augmented Reality (AR) as it has a feature called Scene Perception, which will enable you to add virtual objects into a captured world scene. Virtual information will also be able to be laid over a live image feed. Unlike the F200 model, the R200 does not have finger and hand tracking and doesn’t support face tracking. TouchDesigner supports both the F200 and the R200 Intel RealSense cameras.

About the Intel RealSense Cameras In TouchDesigner

TouchDesigner is a perfect match with the Intel RealSense camera, which allows a direct interface between the gestures of the user’s face and hands and the software interface. TouchDesigner can directly use this position/tracking data. TouchDesigner can also use the depth, color, and infrared data that the Intel RealSense camera supplies. The Intel RealSense cameras are very small and light, especially the R200 model, which can easily be placed near performers and not be noticed by audience members.

Adam Berg, a research engineer for Leviathan who is working on a project using the Intel RealSense camera in conjunction with TouchDesigner to create interactive installations says: “The small size and uncomplicated design of the camera is well-suited to interactive installations. The lack of an external power supply simplifies the infrastructure requirements, and the small camera is discreet. We've been pleased with the fairly low latency of the depth image as well. TouchDesigner is a great platform to work with, from first prototype to final development. Its built-in support for live cameras, high-performance media playback, and easy shader development made it especially well-suited for this project. And of course the support is fantastic.”

Using the Intel® RealSense™ Camera in TouchDesigner

In Part 2 we focus on the CHOP node in TouchDesigner for the Intel RealSense camera.

RealSense CHOP Node

The RealSense CHOP node controls the 3D tracking/position data. The CHOP node carries two types of information: (1) The real-world position, which is meters units but potentially accurate down to the millimeter, is used for the x, y, and z translations. The x, y and z rotations in the RealSense CHOP are output at x, y and z Euler angles in degrees. (2) The RealSense CHOP also takes pixels from image inputs and converts that to normalized UV coordinates. This is useful for image tracking.

The RealSense CHOP node has two setup settings: finger/face tracking and marker tracking.

Finger/Face Tracking gives you a list of selections to track. You can narrow down the list of what is trackable to one aspect, then by connecting a Select CHOP node to the RealSense CHOP node you can narrow down the selection even further so that you may only be tracking the movement of an eyebrow or an eye.
Marker tracking enables you to load an image and track that item wherever it is.

Using the RealSense CHOP node in TouchDesigner

Demo #1 Using Tracking

This is a simple first demo of the RealSense CHOP node to show you how it can be wired/connected to other nodes and used to track and create movement. Once again, please note these demos require a very basic knowledge of TouchDesigner. If you are unfamiliar with TouchDesigner and plan to follow along with this article step-by-step, I recommend that you first review some of the documentation and videos available here: Learning TouchDesigner

Create the nodes you will need and arrange them in a horizontal row in this order: Geo COMP node, the RealSense CHOP node, the Select CHOP node, the Math CHOP node, the Lag CHOP node, the Out CHOP node, and the Trail CHOP node.
Wire the RealSense CHOP node to the Select CHOP node, the Select CHOP node to the Math CHOP node, the Math CHOP node to the Lag CHOP node, the Lag CHOP node to the Out CHOP Node, and the Out CHOP node to the Trail CHOP node.
Open the Setup parameters page of the RealSense CHOP node, and make sure the Hands World Position parameter is On. This outputs positions of the tracked hand joints in world space. Values are given in meters relative to the camera.
In the Select parameters page of the Select CHOP Node, set the Channel Names parameter to hand_r/wrist:tx by selecting it from the tracking selections available using the drop-down arrow on the right of the parameter.
In the Rename From parameter, enter: hand_r/wrist:tx, and then in the Rename To parameter, enter: x.
Figure 1. The Select CHOP node is where the channels are chosen from the RealSense CHOP node.
In the Range/To Range parameter of the Math CHOP node, enter: 0, 100. For a smaller range of movement range, enter a number less than 100.
Select the Geometry COMP and make sure it is on its Xform parameters page. Press the + button on the bottom right of the Out CHOP node to activate its viewer. Drag the X channel onto the Translate X parameter of the Geometry COMP node and select Export CHOP from the drop-down menu that will appear.
Figure 2. This is where you are adding animation as gotten from the RealSense CHOP.
To render geometry, you need a Camera COMP node, a Material (MAT) node (I used the Wireframe MAT), a Light COMP node, and a Render TOP node. Add these to render this project.
In the Camera COMP, on the Xform parameter page set the Translate Z to 10. This gives you a better view of the movement in the geometry you have created as the camera is further back on the z-axis.
Wave your right wrist back and forth in front of the camera and watch the geometry move in the Render TOP node.

**Figure 3.** *How the nodes are wired together. The Trail CHOP at the end gives you a way of seeing the animation in graph form.*

**Figure 4.** *The setting for the x translate of the Geometry COMP was exported from the x channel in the Out CHOP which has been carried forward down the chain from the Select CHOP Node.*

Demo #2: RealSense CHOP Marker Tracking

In this demo, we use the marker tracking feature in the RealSense CHOP to show how to use an image for tracking. You will create an image and have two copies of it: a printed copy and a digital copy. They should exactly match. You can either have a digital file, print a hard copy, or you can scan an image to create the digital version.

Add a RealSense CHOP node to your scene.
On the Setup parameters page for the RealSense CHOP node, for Mode select Marker Tracking.
Create a Movie File in TOP.
In the Play parameters page of the TOP node, under File, choose and load in the digital image that you also have a printed version of.
Drag the Movie File in TOP to the RealSense CHOP node Setup parameters page and into the Marker Image TOP slot at the bottom of the page.
Create a Geometry COMP, a Camera COMP, a Light COMP and a Render TOP.
Like we did in step 7 of Demo #1, export the tx channel from the RealSense CHOP and drag it to the Translate X parameter of the Geometry COMP.
Create a Reorder TOP and connect it to the Render TOP. In the Reorder parameters page in the Output Alpha change the drop-down to One.
Position your printed image of the digital file in front of the Intel RealSense Camera and move it. The camera should track the movement and reflect it in the Render TOP. The numbers in the RealSense CHOP will also change.
Figure 5. This is the complete layout for the simple marker tracking demo.

Figure 6. On the parameters page of the Geo COMP the tx channel from the RealSense CHOP has been dragged into the Translate x parameter.

Eye Tracking in TouchDesigner Using the RealSense CHOP Node

In the TouchDesigner Program Palette, under RealSense there is a template called eyeTracking that can be used to track a person’s eye movements. This template uses the RealSense CHOP node finger/face tracking and the RealSense TOP node set to Color. In the template, green WireFrame rectangles track to the person’s eyes and are then composited over the RealSense TOP color image of the person. Any other geometry or particles etc. could be used instead of the green open rectangles. It is a great template to use. Here is an image using the template.

**Figure 7.** *Note that the eyes were tracked even through the glasses.*

Demo #3, Part 1: Simple ways to set up a FullDome render or a VR render

In this demo we take a file and show how to render it as a 180 FullDome render and as a 360 VR render. I have already made the file to download to see how it is done in detail. It is chopRealSense_FullDome_VR_render.toe

A brief description of how this file was created:

In this file I wanted to place geometries (sphere, torus, tubes, and rectangles) in the scene. So I made a number of SOP nodes of these different geometrical shapes. Each SOP node was attached to a Transform SOP node to move (translate) the geometries to different places in the scene. All the SOP nodes were wired to one Merge SOP node. The Merge SOP node was fed into the Geometry COMP.

**Figure 8.** *This is the first step in the layout for creating the geometries placed around the scene in the downloadable file.*

Next I created a Grid SOP node and a SOP To DAT node. The SOP To DAT node was used to instance the Geometry COMP so that I had more geometries in the scene. I also created a Constant MAT node, made the color green, and turned on the WireFrame parameter on the Common page.

**Figure 9.** *The SOP To DAT Node was created using the Grid SOP Node.*

Next I created a RealSense CHOP node and wired it to the Select CHOP node where I selected the hand_r/wrist:tx channel to track and renamed it to x. I wired the Select CHOP to the Math CHOP so I could change the range and wired the Math CHOP to the Null CHOP. It is always good practice to end a chain with a Null or Out node so you can more easily insert new filters inside the chain. Next I exported the x Channel from the Null CHOP into the Scale X parameter of the Geometry COMP. This controls all of the x scaling of the geometry in my scene when I moved my right wrist in front of the Intel RealSense Camera.

**Figure 10.** *The tracking from the RealSense CHOP node is used to create real-time animation, scaling of the geometries along the x-axis.*

To create a FullDome 180-degree render from the file :

Create a Render TOP, a Camera COMP, and a Light COMP.
In the Render TOPs Render parameters page, select Cube Map in the Render Mode drop-down menu.
In the Render TOP Common parameters page, set the Resolution to a 1:1 aspect ratio such as 4096 by 4096 for a 4k render.
Create a Projection TOP node and connect the Render TOP node to it.
In the Projection TOP Projection parameters page, select Fish-Eye from the Output drop-down menu.
(This is optional to give your file a black background.) Create a Reorder TOP and in the Reorder parameters page in the right drop-down menu for Output Alpha, select One.
You are now ready to either perform the animation live or export a movie file. Refer to Part 1 of this article for instructions. You are creating a circular fish-eye dome master animation. It will be a circle within a square.

For an alternative method, go back to Step 2 and instead of selecting Cube Map in the Render Mode drop-down menu, select Fish-Eye(180). Continue with Step 3 and optionally Step 6, and you are now ready to perform live or export a dome Master animation.

To create a 360-degree VR render from this file:

Create a Render TOP, a Camera COMP, and a Light COMP.
In the Render TOP’s Render parameters page, select Cube Map in the Render Mode drop-down menu.
In the Render TOP Common parameters page, set the Resolution to a 1:1 aspect ratio such as 4096 by 4096 for a 4k render.
Create a Projection TOP node, and connect the Render TOP node to it.
In the Projection TOP Projection Parameters page, select Equirectangular from the Output drop-down menu. It will automatically make the aspect ratio 2:1.
(This is optional to give your file a black background.) Create a Reorder TOP, and in the Reorder parameters page in the right drop-down menu for Output Alpha, select One.
You are now ready to either perform the animation live or export out a movie file. Refer to Part 1 of this article for instructions. If you export a movie render, you are creating a 2:1 aspect ratio equirectangular animation for viewing in VR headsets.

**Figure 11.** *Long, orange Tube SOPs have been added to the file. You can add your own geometries to the file*

To Output to an Oculus Rift* from TouchDesigner While Using the Intel RealSense Camera

TouchDesigner has created several templates for download that will show you how to set up the Oculus Rift in TouchDesigner, one of which you can download using the button on the top right of this article, OculusRiftSimple.toe. You do need to have your computer connected to an Oculus Rift to see it in the Oculus Rift. Without an Oculus Rift you can create the file and see the images in the LeftEye Render TOP and the RightEye Render TOP and display them in the background of your scene. I added the Oculus Rift capabilities to the file I used in Demo 3. In this way I have the Intel RealSense Camera animating what I am seeing in the Oculus Rift.

**Figure 12.** Here displayed in the background of the file are the left eye and the right eye. Most of the animation in the scene is being controlled by the tracking from the Intel® RealSense™ camera CHOP node. The file that created this image can be downloaded from the button on the top right of this article, chopRealSense_FullDome_VRRender_FinalArticle2_OculusRiftSetUp.toe

About the Author

Audri Phillips is a visualist/3d animator based out of Los Angeles, with a wide range of experience that includes over 25 years working in the visual effects/entertainment industry in studios such as Sony*, Rhythm and Hues*, Digital Domain*, Disney*, and Dreamworks* feature animation. Starting out as a painter she was quickly drawn to time based art. Always interested in using new tools she has been a pioneer of using computer animation/art in experimental film work including immersive performances. Now she has taken her talents into the creation of VR. Samsung* recently curated her work into their new Gear Indie Milk VR channel.

Her latest immersive work/animations include: Multi Media Animations for "Implosion a Dance Festival" 2015 at the Los Angeles Theater Center, 3 Full dome Concerts in the Vortex Immersion dome, one with the well-known composer/musician Steve Roach. She has a fourth upcoming fulldome concert, "Relentless Universe", on November 7th, 2015. She also created animated content for the dome show for the TV series, "Constantine*" shown at the 2014 Comic-Con convention. Several of her Fulldome pieces, "Migrations" and "Relentless Beauty", have been juried into "Currents", The Santa Fe International New Media Festival, and Jena FullDome Festival in Germany. She exhibits in the Young Projects gallery in Los Angeles.

She writes online content and a blog for Intel®. Audri is an Adjunct professor at Woodbury University, a founding member and leader of the Los Angeles Abstract Film Group, founder of the Hybrid Reality Studio (dedicated to creating VR content), a board member of the Iota Center, and she is also an exhibiting member of the LA Art Lab. In 2011 Audri became a resident artist of Vortex Immersion Media and the c3: CreateLAB. Works of hers can be found on Vimeo , on creativeVJ and on Vrideo .

By John Tyrrell

Fusing the traditions of Western and Southeast Asian shadow-theater, Ombre Fabula is a prototype app that uses the Intel® RealSense™ SDK to create a gesture-controlled interactive shadow play. During the collaborative development process, the Germany-based team comprised of Thi Binh Minh Nguyen and members of Prefrontal Cortex overcame a number of challenges. These included ensuring that the Intel® RealSense™ camera accurately sensed different hand-shapes by using custom blob-detection algorithms, and extensive testing with a broad user base.

The title screen of Ombre Fabula showing the opening scene in grandma’s house (blurred in the background) before the journey to restore her eyesight begins.

Originally the bachelor’s degree project of designer Thi Binh Minh Nguyen—and brought to life with the help of visual design and development team Prefrontal Cortex—Ombre Fabula was intended to be experienced as a projected interactive installation running on a PC or laptop equipped with a user-facing camera. Minh explained that the desire was to “bring this art form to a new interactive level, blurring the boundaries between audience and player”.

Players are drawn into an enchanting, two-dimensional world populated by intricate “cut-out” shadow puppets. The user makes different animals appear on screen by forming familiar shapes—such as a rabbit—with their hands. The user then moves the hand shadow to guide the child protagonist as he collects fragments of colored light on his quest to restore his grandmother’s eyesight.

Ombre Fabula was entered in the 2014 Intel RealSense App Challenge, where it was awarded second place in the Ambassador track.

Decisions and Challenges

With previous experience working with more specialized human interface devices, the team was attracted to Intel RealSense technology by the breadth of possibilities it offers in terms of gesture, face-tracking and voice recognition, although they ultimately used only gesture for the Ombre Fabula user interface (UI).

Creating the UI

Rather than being adapted from another UI, the app was designed from the ground up with hand- and gesture-tracking in mind. Ombre Fabula was also designed primarily as a wall-projected interactive installation, in keeping with the traditions of shadow-theater and in order to deliver room-scale immersion.

Ombre Fabula Virtual Phoenix
Ombre Fabula was designed to be used as a room-scale interactive installation to maximize the player’s immersion in the experience. Here the user is making the bird form.

For the designers, the connection between the real world and the virtual one that users experience when their hands cast simulated shadows on the wall, is crucial to the immersive experience. This experience is further enhanced by the use of a candle, evoking traditional shadow plays.

The UI of Ombre Fabula is deliberately minimal—there are no buttons, cursors, on-screen displays, or any controls whatsoever beyond the three hand-forms that the app recognizes. The app was designed to be used in a controlled installation environment, where there is always someone on hand to guide the user. The designers often produce this type of large-scale interactive installation, and with Ombre Fabula they specifically wanted to create a short, immersive experience that users might encounter in a gallery or similar space. The intentionally short interaction time made direct instructions from a host in situ more practical than an in-game tutorial in order to quickly bring users into the game’s world. The shadow-only UI augments the immersion and brings the experience closer to the shadow theater that inspired it.

The UI of Ombre Fabula
Here, the user moves the hand to the right to lead the protagonist out of his grandma’s house.

Implementing Hand Tracking

In the initial stage of development, the team focused on using the palm- and finger-tracking capabilities of the Intel RealSense SDK to track the shape of the hands as they formed the simple gestures for rabbit, bird, and dragon. They chose these three shapes because they are the most common to the style of shadow theater that inspired that app, and because they are the most simple and intuitive for users to produce. The rabbit is produced by pinching an O with the thumb and finger of a single hand and raising two additional fingers for the ears; the bird is formed by linking two hands at the thumbs with the out-stretched fingers as the wings; and dragon is made by positioning two hands together in the form of a snapping jaw.

Ombre Fabula Animal Shapes
The basic hand gestures used to produce the rabbit, bird, and dragon.

However, it was discovered that the shapes presented problems for the Intel RealSense SDK algorithms because of the inconsistent visibility and the overlapping of fingers on a single plane. Essentially, the basic gestures recognized by the Intel RealSense camera—such as five fingers, a peace sign, or a thumbs-up, for example—were not enough to detect the more complex animal shapes required.

As a result, the team moved away from the Intel RealSense SDK’s native hand-tracking and instead used the blob detection algorithm, which tracks the contours of the hands. This delivers labeled images—of the left hand, for example—and then the Intel RealSense SDK provides the contours of that image.

Ombre Fabula Recognize Gestures
Here, the hand gestures for bird, dragon and rabbit are shown with the labeled hand contours used to allow Ombre Fabula to recognize the different gestures.

At first, extracting the necessary contour data from the Intel RealSense SDK was a challenge. While the Unity* integration is excellent for the hand-tracking of palms and fingers, this wasn’t what was required for effective contour tracking. However, after spending time with the documentation and working with the Intel RealSense SDK, the team was able to pull the detailed contour data required to form the basis for the custom shape-detection.

Ombre Fabula Virtual Rabbit
The user forms a rabbit-like shape with the hand in order to make the rabbit shadow-puppet appear on-screen.

Ombre Fabula Collect the Dots
The user then moves the hand to advance the rabbit through the game environment and collect the dots of yellow light.

Using the Custom Blob Detection Algorithm

Once the blob data was pulled from the Intel RealSense SDK, it needed to be simplified in order for the blob detection to be effective for each of the three forms—rabbit, bird, and dragon. This process proved more complex than anticipated, requiring a great deal of testing and iteration of the different ways to simplify the shapes in order to maximize the probability of them being consistently and accurately detected by the app.

// Code snippet from official Intel "HandContour.cs" script for blob contour extraction

int numOfBlobs = m_blob.QueryNumberOfBlobs();
PXCMImage[] blobImages = new PXCMImage[numOfBlobs];

for(int j = 0; j< numOfBlobs; j++)
{
	blobImages[j] = m_session.CreateImage(info);


	results = m_blob.QueryBlobData(j,  blobImages[j], out blobData[j]);
	if (results == pxcmStatus.PXCM_STATUS_NO_ERROR && blobData[j].pixelCount > 5000)
	{
		results = blobImages[j].AcquireAccess(PXCMImage.Access.ACCESS_WRITE, out new_bdata);
		blobImages[j].ReleaseAccess(new_bdata);
		BlobCenter = blobData[j].centerPoint;
		float contourSmooth = ContourSmoothing;
		m_contour.SetSmoothing(contourSmooth);
		results = m_contour.ProcessImage(blobImages[j]);
		if (results == pxcmStatus.PXCM_STATUS_NO_ERROR && m_contour.QueryNumberOfContours() > 0)
		{
			m_contour.QueryContourData(0, out pointOuter[j]);
			m_contour.QueryContourData(1, out pointInner[j]);
		}
	}
}

The contour extraction code used to pull the contour data from the Intel RealSense SDK.

The simplified data was then run through a freely available software algorithm called $P Point-Cloud Recognizer*, which is commonly used for character recognition of pen strokes and similar tasks. After making minor modifications and ensuring its good functioning in Unity, the developers used the algorithm to detect the shape of the hand in Ombre Fabula. The algorithm decides–with a level of probability of around 90 percent–which animal the user’s hand-form represents, and the detected shape is then displayed on screen.

// every few frames, we test if and which animal is currently found
void DetectAnimalContour () {
	// is there actually a contour in the image right now?
	if (handContour.points.Count > 0) {
		// ok, find the most probable animal gesture class
		string gesture = DetectGestureClass();
		// are we confident enough that this is one of the predefined animals?
		if (PointCloudRecognizer.Distance < 0.5) {
			// yes, we are: activate the correct animal
			ActivateAnimalByGesture(gesture);
		}
	}
}

// detect gesture on our contour
string DetectGesture() {
	// collect the contour points from the PCSDK
	Point[] contourPoints = handContour.points.Select (x => new Point (x.x, x.y, x.z, 0)).ToArray ();

	// create a new gesture to be detected, we don't know what it is yet
	var gesture = new Gesture(contourPoints, "yet unknown");

	// the classifier returns the gesture class name with the highest probability
	return PointCloudRecognizer.Classify(gesture, trainingSet.ToArray());
}

// This is from the $P algorithm
// match a gesture against a predefined training set
public static string Classify(Gesture candidate, Gesture[] trainingSet) {
	float minDistance = float.MaxValue;
	string gestureClass = "";
	foreach (Gesture template in trainingSet) {
		float dist = GreedyCloudMatch(candidate.Points, template.Points);
		if (dist < minDistance) {
			minDistance = dist;
			gestureClass = template.Name;
			Distance = dist;
		}
	}
	return gestureClass;
}

This code uses the $P algorithm to detect which specific animal form is represented by the user’s gesture.

Getting Player Feedback Early Through Testing and Observation

Early in the development process, the team realized that no two people would form the shapes of the different animals in exactly the same way—not to mention that every individual’s hands are different in size and shape. This meant that a large pool of testers was needed to fine-tune the contour detection.

Conveniently, Minh’s situation at the university gave her access to just such a pool, and approximately 50 fellow students were invited to test the app. For the small number of gestures involved, this number of testers was found to be sufficient to optimize the app’s contour-detection algorithm and to maximize the probability that it would display the correct animal in response to a given hand-form.

Ombre Fabula Pink Lights
Moving the hands left or right moves the camera and causes the protagonist to follow the animal character through the shadow world.

In addition to creating something of a magical moment of connection for the user, the simulated shadow of the user’s hands on screen delivered useful visual feedback. During testing, the developers observed that if the animal displayed was other than the one the user intended, the user would respond to the visual feedback of the shadow on screen to adjust the form of their hands until the right animal appeared. This was entirely intuitive for the users, requiring no prompting from the team.

A common problem with gesture detection is that users might make a rapid gesture—a thumbs up, for example—which can give rise to issues of responsiveness. In Ombre Fabula, however, the gesture is continuous in order to maintain the presence of the animal on the screen. Testing showed that this sustained hand-form made the app feel responsive and immediate to users, with no optimization required in terms of the hand-tracking response time.

Ombre Fabula has been optimized for a short play-time of 6–10 minutes. Combined with a format in which users naturally expect to keep their hands raised for a certain period of time, the testers didn’t mention any hand or arm fatigue.

Under the Design Hood: Tools and Resources

The previous experience that Minh and Prefrontal Cortex have of creating interactive installations helped them make efficient decisions regarding the tools and software required to bring their vision to life.

Intel RealSense SDK

The Intel RealSense SDK was used to map hand contours, for which the contour tracking documentation provided with the SDK proved invaluable. The developers also made use of the Unity samples provided by Intel, trying hand-tracking first. When the hand-tracking alone was found to be insufficient, they moved on to images and implemented the custom blob detection.

Unity software

Both Minh and Prefrontal Cortex consider themselves designers first and foremost, and rather than invest their time in developing frameworks and coding, their interest lies in quickly being able to turn their ideas into working prototypes. To this end, the Unity platform allowed for fast prototyping and iteration. Additionally, they found the Intel RealSense Unity toolkit within the Intel RealSense SDK a great starting point and easy to implement.

$P Point-Cloud Recognizer

The $P Point-Cloud Recognizer is a 2D gesture-recognition software algorithm which detects, to a level of probability, which line-shape is being formed by pen strokes and other similar inputs. It’s commonly used as a tool to support rapid prototyping of gesture-based UIs. The developers lightly modified the algorithm and used it in Unity to detect the shape the user’s hands are making in Ombre Fabula. Based on probability, the algorithm decides which animal the shape represents and the app then displays the relevant visual.

Ombre Fabula Dragon
The dragon is made to appear by forming a mouth-like shape with two hands, as seen in the background of this screenshot.

What’s Next for Ombre Fabula

Ombre Fabula has obvious potential for further story development and adding more hand-shadow animals for users to make and control, although the team currently has no plans to implement this. Ultimately, their ideal scenario would be to travel internationally with Ombre Fabula and present it to the public as an interactive installation—its original and primary purpose.

Intel RealSense SDK: Looking Forward

Felix Herbst from Prefrontal Cortex is adamant that gesture-control experiences need to be built from the ground up for the human interface, and that simply adapting existing apps for gesture will, more often than not, result in an unsatisfactory user experience. He emphasizes the importance of considering the relative strengths of gestural interfaces—and of each individual technology—and developing accordingly.

Those types of appropriate, useful, and meaningful interactions are critical to the long-term adoption of human interface technologies. Herbst’s view is that if enough developers create these interactions using Intel RealSense technology, then this type of human interface has the potential to make a great impact in the future.

About the Developers

Born in Vietnam and raised in Germany, Minh Nguyen is an experienced designer who uses cutting-edge technologies to create innovative, interactive multimedia installations and games. She is currently completing her masters in multimedia and VR design at Burg Giebichenstein University of Art and Design Halle in Germany. In addition to her studies, Minh freelances under the name Tibimi on such projects as Die besseren Wälder from game studio The Good Evil. Based on an award-winning play, the game is designed to encourage children and teens to consider the meaning of being ‘different’.

Prefrontal Cortex is a team of three designers and developers―Felix Herbst, Paul Kirsten and Christian Freitag―who use experience and design to astound and delight users with uncharted possibilities. Their award-winning projects have included the world-creation installation Metaworlds*, the [l]ichtbar interactive light projection at the Farbfest conference, the multi-touch image creation tool Iterazer*, and the award-winning, eye-tracking survival shooter game Fight with Light*. Their large-screen, multiplayer game Weaver* was a finalist in the Intel App Innovation Contest 2013. In addition to all these experimental projects, they create interactive applications for various industry clients using virtual and augmented reality.

Both Minh and the Prefrontal Cortex team intend to continue exploring the possibilities of human interface technologies, including the Intel RealSense solution.

Additional Resources

The video demonstrating Ombre Fabula can be found here.

For more information about the work of the Ombre Fabula creators, visit the Web sites of Minh Nguyen (Tibimi) and Prefrontal Cortex.

The Intel® Developer Zone for Intel® RealSense™ Technology provides a detailed resource for developers who want to learn more about Intel RealSense solutions. Developers can also download the Intel RealSense SDK and Intel RealSense Developer Kit, along with a number of useful Intel RealSense Technology Tutorials.

Download application/pdf Download

Introduction

In order to better optimize and debug OpenCL kernels, sometimes it is very helpful to look at the underlying assembly. This article shows you the tools available in the Intel® SDK for OpenCL™ Applications that allow you to view assembly generated by the offline compiler for individual kernels, highlight the regions of the assembly code that correspond to OpenCL C code, as well as attempts at a high level explain different portions of the generated assembly. We also give you a brief overview of the register region syntax and semantics, show different types of registers, and summarize available assembly instructions and data types that these instructions can manipulate on. We hope to give you enough ammunition to get started. In the upcoming articles we will cover assembly debugging as well as assembly profiling with Intel® VTune™ Amplifier.

Assembly for Simple OpenCL Kernels

Let us start with a simple kernel:

kernel void empty() {
}

This is as simple as kernels get. We are going to build this kernel in a Code Builder Session Explorer. Go ahead and create a new session by going to CODE-BUILDER/OpenCL Kernel Development/New Session, copying the kernel above to an empty program.cl file and then building it. If you have a 5^th generation Intel processor (Broadwell) or a 6^th generation Intel processor (Skylake), you will notice that one of the artifacts being generated is program_empty.gen file. Go ahead and double-click on it. What you will see is something like this:

The assembly for the kernel is on the right: let me annotate it for you:

// Start of Thread
LABEL0
(W)      and      (1|M0)        r2.6<1>:ud    r0.5<0;1,0>:ud    0x1FF:ud         // id:

// End of thread
(W)      mov      (8|M0)        r127.0<1>:ud  r0.0<8;8,1>:ud   {Compacted}                 // id:
         send     (8|M0)        null          r127              0x27      0x2000010 {EOT}  // id:

Not much, but it is a start.

Now, let’s complicate life a little. Copy the following into program.cl:

kernel void meaning_of_life(global uchar* out)
{
 out[31] = 42;
}

After rebuilding the file you will notice program_meaning_of_life.gen file. After double clicking on it you will see something more complex:

What you can do now is to click on different parts of the kernel on the left, and see different parts of assembly being highlighted:

Here are instructions corresponding to the beginning of the kernel:

The body of the kernel:

And the end of the kernel:

We are going to rearrange the assembly to make it a little bit more understandable:

// Start of Thread
LABEL0
(W)      and      (1|M0)        r2.6<1>:ud    r0.5<0;1,0>:ud    0x1FF:ud         // id:
// r3 and r4 contain the address of out variable (8 unsigned quadwords – uq)
// we are going to place them in r1 and r2
(W)      mov      (8|M0)        r1.0<1>:uq    r3.0<0;1,0>:uq                   // id:


// Move 42 (0x2A:ud – ud is unsigned dword) into 32 slots (our kernel is compiled SIMD32)
// We are going to use registers r7, r10, r13 and r16, each register fitting 8 values
         mov      (8|M0)        r7.0<1>:ud    0x2A:ud          {Compacted}                 // id:
         mov      (8|M8)        r10.0<1>:ud   0x2A:ud          {Compacted}                 // id:
         mov      (8|M16)       r13.0<1>:ud   0x2A:ud                          // id:
         mov      (8|M24)       r16.0<1>:ud   0x2A:ud                          // id:

// Add 31 (0x1F:ud) to eight quadwords in r1 and r2 and place the results in r3 and r4
// Essentially, we get &out[31]
 (W)      add      (8|M0)        r3.0<1>:q     r1.0<0;1,0>:q     0x1F:ud          // id:

// Now we spread &out[31] into r5,r6, r8,r9, r11, r10, and r14, r15 – 32 values in all.
         mov      (8|M0)        r5.0<1>:uq    r3.0<0;1,0>:uq                   // id:
         mov      (8|M8)        r8.0<1>:uq    r3.0<0;1,0>:uq                   // id:1
         mov      (8|M16)       r11.0<1>:uq   r3.0<0;1,0>:uq                   // id:1
         mov      (8|M24)       r14.0<1>:uq   r3.0<0;1,0>:uq                   // id:1

// Write to values in r7 into addresses in r5, r6, etc.
         send     (8|M0)        null          r5                0xC       0x60680FF                 // id:1
         send     (8|M8)        null          r8                0xC       0x60680FF                 // id:1
         send     (8|M16)       null          r11               0xC       0x60680FF                 // id:1
         send     (8|M24)       null          r14               0xC       0x60680FF                 // id:1

// End of thread
(W)      mov      (8|M0)        r127.0<1>:ud  r0.0<8;8,1>:ud   {Compacted}                 // id:
         send     (8|M0)        null          r127              0x27      0x2000010 {EOT}                 // id:1

Now, we are going to complicate life ever so slightly, by using get_global_id(0) instead of a fixed index to write things out:

kernel void meaning_of_life2(global uchar* out)
{
 int i = get_global_id(0);
 out[i] = 42;
}

Note, that the addition of get_global_id(0) increases the size of our kernel by 9 assembly instructions. This mainly has to do with the fact that we will need to calculate increasing addresses for each subsequent workitem in a thread (there 32 work items there):

// Start of Thread
LABEL0
(W)      and      (1|M0)        r7.6<1>:ud    r0.5<0;1,0>:ud    0x1FF:ud         // id:

// Move 42 (0x2A:ud – ud is unsigned dword) into 32 slots (our kernel is compiled SIMD32)
// We are going to use registers r17, r20, r23 and r26, each register fitting 8 values
         mov      (8|M0)        r17.0<1>:ud   0x2A:ud          {Compacted}                 // id:
         mov      (8|M8)        r20.0<1>:ud   0x2A:ud          {Compacted}                 // id:
         mov      (8|M16)       r23.0<1>:ud   0x2A:ud                          // id:
         mov      (8|M24)       r26.0<1>:ud   0x2A:ud                          // id:
// get_global_id(0) calculation, r0.1, r7.0 and r7.3 will contain the necessary starting values
(W)      mul      (1|M0)        r3.0<1>:ud    r0.1<0;1,0>:ud    r7.3<0;1,0>:ud   // id:
(W)      mul      (1|M0)        r5.0<1>:ud    r0.1<0;1,0>:ud    r7.3<0;1,0>:ud   // id:
(W)      add      (1|M0)        r3.0<1>:ud    r3.0<0;1,0>:ud    r7.0<0;1,0>:ud   {Compacted} // id:
(W)      add      (1|M0)        r5.0<1>:ud    r5.0<0;1,0>:ud    r7.0<0;1,0>:ud   {Compacted} // id:1
// r3 thru r6 will contain the get_global_id(0) offsets; r1 and r2 contain 32 increasing values
         add      (16|M0)       r3.0<1>:ud    r3.0<0;1,0>:ud    r1.0<8;8,1>:uw   // id:1
         add      (16|M16)      r5.0<1>:ud    r5.0<0;1,0>:ud    r2.0<8;8,1>:uw   // id:1
// r8 and r9 contain the address of out variable (8 unsigned quadwords – uq)
// we are going to place these addresses in r1 and r2
 (W)      mov      (8|M0)        r1.0<1>:uq    r8.0<0;1,0>:uq                   // id:1

// Move the offsets in r3 thru r6 to r7, r8, r9, r10, r11, r12, r13, r14
         mov      (8|M0)        r7.0<1>:q     r3.0<8;8,1>:d                    // id:1
         mov      (8|M8)        r9.0<1>:q     r4.0<8;8,1>:d                    // id:1
         mov      (8|M16)       r11.0<1>:q    r5.0<8;8,1>:d                    // id:1
         mov      (8|M24)       r13.0<1>:q    r6.0<8;8,1>:d                    // id:1

// Add the offsets to address of out in r1 and place them in r15, r16, r18, r19, r21, r22, r24, r25
         add      (8|M0)        r15.0<1>:q    r1.0<0;1,0>:q     r7.0<4;4,1>:q    // id:1
         add      (8|M8)        r18.0<1>:q    r1.0<0;1,0>:q     r9.0<4;4,1>:q    // id:1
         add      (8|M16)       r21.0<1>:q    r1.0<0;1,0>:q     r11.0<4;4,1>:q   // id:2
         add      (8|M24)       r24.0<1>:q    r1.0<0;1,0>:q     r13.0<4;4,1>:q   // id:2

// write into addresses in r15, r16, values in r17, etc.
         send     (8|M0)        null          r15               0xC       0x60680FF                 // id:2
         send     (8|M8)        null          r18               0xC       0x60680FF                 // id:2
         send     (8|M16)       null          r21               0xC       0x60680FF                 // id:2
         send     (8|M24)       null          r24               0xC       0x60680FF                 // id:2

// End of thread
(W)      mov      (8|M0)        r127.0<1>:ud  r0.0<8;8,1>:ud   {Compacted}                 // id:
         send     (8|M0)        null          r127              0x27      0x2000010 {EOT}                 // id:2

And finally, let’s look at a kernel that does, reading, writing and some math:

kernel void foo(global float* in, global float* out) {
 int i = get_global_id(0);

 float f = in[i];
 float temp = 0.5f * f;
 out[i] = temp;
}

It will be translated to the following (note, that I rearranged some assembly instructions for better understanding):

// Start of Thread
LABEL0
(W)      and      (1|M0)        r7.6<1>:ud    r0.5<0;1,0>:ud    0x1FF:ud         // id:

// r3 and r4 will contain the address of out buffer
(W)      mov      (8|M0)        r3.0<1>:uq    r8.1<0;1,0>:uq                     // id:
// int i = get_global_id(0);
(W)      mul      (1|M0)        r5.0<1>:ud    r0.1<0;1,0>:ud    r7.3<0;1,0>:ud   // id:
(W)      mul      (1|M0)        r9.0<1>:ud    r0.1<0;1,0>:ud    r7.3<0;1,0>:ud   // id:
(W)      add      (1|M0)        r5.0<1>:ud    r5.0<0;1,0>:ud    r7.0<0;1,0>:ud   {Compacted} // id:
(W)      add      (1|M0)        r9.0<1>:ud    r9.0<0;1,0>:ud    r7.0<0;1,0>:ud   {Compacted} // id:
         add      (16|M0)       r5.0<1>:ud    r5.0<0;1,0>:ud    r1.0<8;8,1>:uw   // id:
         add      (16|M16)      r9.0<1>:ud    r9.0<0;1,0>:ud    r2.0<8;8,1>:uw   // id:

// r1 and r2 will contain the address of in buffer
(W)      mov      (8|M0)        r1.0<1>:uq    r8.0<0;1,0>:uq                   // id:1
// r11, r12, r13, r14, r15, r16, r17 and r18 will contain 32 qword offsets
         mov      (8|M0)        r11.0<1>:q    r5.0<8;8,1>:d                    // id:1
         mov      (8|M8)        r13.0<1>:q    r6.0<8;8,1>:d                    // id:1
         mov      (8|M16)       r15.0<1>:q    r9.0<8;8,1>:d                    // id:1
         mov      (8|M24)       r17.0<1>:q    r10.0<8;8,1>:d                   // id:1

//  float f = in[i];
         shl      (8|M0)        r31.0<1>:uq   r11.0<4;4,1>:uq   0x2:ud           // id:1
         shl      (8|M8)        r33.0<1>:uq   r13.0<4;4,1>:uq   0x2:ud           // id:1
         shl      (8|M16)       r35.0<1>:uq   r15.0<4;4,1>:uq   0x2:ud           // id:1
         shl      (8|M24)       r37.0<1>:uq   r17.0<4;4,1>:uq   0x2:ud           // id:1
         add      (8|M0)        r19.0<1>:q    r1.0<0;1,0>:q     r31.0<4;4,1>:q   // id:1
         add      (8|M8)        r21.0<1>:q    r1.0<0;1,0>:q     r33.0<4;4,1>:q   // id:2
         add      (8|M16)       r23.0<1>:q    r1.0<0;1,0>:q     r35.0<4;4,1>:q   // id:2
         add      (8|M24)       r25.0<1>:q    r1.0<0;1,0>:q     r37.0<4;4,1>:q   // id:2
// read in f values at addresses in r19, r20, r21, r22, r23, r24, r25, r26 into r27, r28, r29, r30
         send     (8|M0)        r27           r19               0xC       0x4146EFF                 // id:2
         send     (8|M8)        r28           r21               0xC       0x4146EFF                 // id:2
         send     (8|M16)       r29           r23               0xC       0x4146EFF                 // id:2
         send     (8|M24)       r30           r25               0xC       0x4146EFF                 // id:2

// float temp = 0.5f * f; - 0.5f is 0x3F000000:f
//     We multiply 16 values in r27, r28 by 0.5f and place them in r39, r40
//     We multiple 16 values in r29, r30 by 0.5f and place them in r47, r48
         mul      (16|M0)       r39.0<1>:f    r27.0<8;8,1>:f    0x3F000000:f     // id:3
         mul      (16|M16)      r47.0<1>:f    r29.0<8;8,1>:f    0x3F000000:f     // id:3

//     out[i] = temp;
         add      (8|M0)        r41.0<1>:q    r3.0<0;1,0>:q     r31.0<4;4,1>:q   // id:2
         add      (8|M8)        r44.0<1>:q    r3.0<0;1,0>:q     r33.0<4;4,1>:q   // id:2
         add      (8|M16)       r49.0<1>:q    r3.0<0;1,0>:q     r35.0<4;4,1>:q   // id:2
         add      (8|M24)       r52.0<1>:q    r3.0<0;1,0>:q     r37.0<4;4,1>:q   // id:3

         mov      (8|M0)        r43.0<1>:ud   r39.0<8;8,1>:ud  {Compacted}                 // id:3
         mov      (8|M8)        r46.0<1>:ud   r40.0<8;8,1>:ud  {Compacted}                 // id:3
         mov      (8|M16)       r51.0<1>:ud   r47.0<8;8,1>:ud                  // id:3
         mov      (8|M24)       r54.0<1>:ud   r48.0<8;8,1>:ud                  // id:3

// write into addresses r41, r42 the values in r43, etc.
         send     (8|M0)        null          r41               0xC       0x6066EFF                 // id:3
         send     (8|M8)        null          r44               0xC       0x6066EFF                 // id:3
         send     (8|M16)       null          r49               0xC       0x6066EFF                 // id:3
         send     (8|M24)       null          r52               0xC       0x6066EFF                 // id:4

// End of thread
(W)      mov      (8|M0)        r127.0<1>:ud  r0.0<8;8,1>:ud   {Compacted}                 // id:
         send     (8|M0)        null          r127              0x27      0x2000010 {EOT}                 // id:4

How to Read an Assembly Instruction

Typically, all instructions have the following form:

[(pred)] opcode (exec-size|exec-offset) dst src0 [src1] [src2]

(pred) is the optional predicate. We are going to skip it for now.

opcode is the symbol of the instruction, like add or mov (we have a full table of opcodes below.

exec-size is the SIMD width of the instruction, which of our architecture could be 1, 2, 4, 8, or 16. In SIMD32 compilation, typically two instructions of execution size 8 or 16 are grouped into one.

exec-offset is the part that's telling the EU, which part of the ARF registers to read or write from, e.g. (8|M24) consults the bits 24-31 of the execution mask. When emitting SIMD16 or SIMD32 code like the following:

         mov  (8|M0)   r11.0<1>:q   r5.0<8;8,1>:d   // id:1
         mov  (8|M8)   r13.0<1>:q   r6.0<8;8,1>:d   // id:1
         mov  (8|M16)  r15.0<1>:q   r9.0<8;8,1>:d   // id:1
         mov  (8|M24)  r17.0<1>:q   r10.0<8;8,1>:d  // id:1

the compiler has to emit four 8-wide operations due to a limitation of how many bytes can be accessed per operand in the GRF.

dst is a destination register

src0 is a source register

src1 is an optional source register. Note, that it could also be an immediate value, like 0x3F000000:f (0.5) or 0x2A:ud (42).

src2 is an optional source register.

General Register File (GRF) Registers

Each thread has a dedicated space of 128 registers, r0 through r127. Each register is 256 bits or 32 bytes.

Architecture Register File (ARF) Registers

In the assembly code above, we only saw one of these special registers, the null register, which is typically used as a destination for send instructions used for writing and indicating end of thread. Here is a full table of other architecture registers:

Since our registers are 32 bytes wide and are byte addressable, our assembly has a register region syntax, to be able to access values stored in these registers.

Below, we have a series of diagrams explaining how register region syntax works.

Here we have a register region r4.1<16;8,2>:w. The w at the end of the region indicates that we are talking about word (or two bytes) values. The full table of allowable integer and floating datatypes is below. The origin is at r4.1, which means that we are starting with the second word of register r4. The vertical stride is 16, which means that we need to skip 16 elements to start the second row. Width parameter is 8 and refers to the number of elements in a row; Horizontal stride of 2 means that we are taking every second element. Note, that we refer here to the content of both r4 and r5. The picture below summarizes the result:

In this example, let’s consider a register region r5.0<1;8,2>:w. The region starts at a first element of r5. We have 8 elements in a row, row containing every second element, so the first row is {0, 1, 2, 3, 4, 5, 6, 7}. The second row starts at offset of 1 word, or at r5.2 and so it contains {8, 9, 10, 11, 12, 13, 14, 15}. The picture below summarizes the result:

Consider the following assembly instruction

add(16) r6.0<1>:w r1.7<16;8,1>:b r2.1<16;8,1>:b

The src0 starts at r1.7 and has 8 consecutive bytes in the first row, followed by the second row of 8 bytes, which starts at r1.23.

The src1 starts at r2.1 and has 8 consecutive bytes in the first row, followed by the second row of 8 bytes, which starts at r2.17.

The dst starts at r6.0, stores the values as words, and since the instruction Add(16) will operate on 16 values, stores 16 consecutive words into r6.

Let’s consider the following assembly instruction:

add(16) r6.0<1>:w r1.14<16;8,0>:b r2.17<16;8,1>:b

Src0 is r1.14<16;8,0>:b, which means the we have the first byte sized value at r1.14, 0 in the stride value means that we are going to repeat the value for the width of the region, which is 8, and the region continues at r1.30, and we are going to repeat the value stored there 8 times as well, so we are talking about the following value {1,1,1,1,1,1,1,1, 8, 8, 8, 8, 8, 8, 8, 8}.

Src1 is r2.17<16;8,1>:b, so we actually start with 8 bytes starting from r2.17 and end up with the second row of 8 bytes starting from r3.1.

The letter after : in the register region signifies the data type stored there. Here are two tables summarizing the available integer and floating point types:

The following tables summarize available assembly instructions:

References:

Volume 7 of Intel Graphics documentation is available here:

Volume 7: 3D-Media-GPGPU

Full set of Intel Graphics Documentation is available here:

https://01.org/linuxgraphics/documentation/hardware-specification-prms

About the Author

Robert Ioffe is a Technical Consulting Engineer at Intel’s Software and Solutions Group. He is an expert in OpenCL programming and OpenCL workload optimization on Intel Iris and Intel Iris Pro Graphics with deep knowledge of Intel Graphics Hardware. He was heavily involved in Khronos standards work, focusing on prototyping the latest features and making sure they can run well on Intel architecture. Most recently he has been working on prototyping Nested Parallelism (enqueue_kernel functions) feature of OpenCL 2.0 and wrote a number of samples that demonstrate Nested Parallelism functionality, including GPU-Quicksort for OpenCL 2.0. He also recorded and released two Optimizing Simple OpenCL Kernels videos and is in the process of recording a third video on Nested Parallelism.

You might also be interested in the following:

GPU-Quicksort in OpenCL 2.0: Nested Parallelism and Work-Group Scan Functions

Sierpiński Carpet in OpenCL 2.0

Optimizing Simple OpenCL Kernels: Modulate Kernel Optimization

Optimizing Simple OpenCL Kernels: Sobel Kernel Optimization

Download Code Sample [Zip: 23 KB]

Introduction

Thinking about exploring speech recognition in your code? Do you want more detailed information on the inner workings of the Intel® RealSense™ SDK and voice commands? In this article, we’ll show you a sample application that uses the speech recognition feature of the Intel RealSense SDK, using C# and Visual Studio* 2015, the Intel RealSense SDK R4 or above, and an Intel® RealSense™ camera F200.

Project Structure

In this sample application, I separated out the Intel RealSense SDK functionality from the GUI layer code to make it easier for a developer to focus on the SDK’s speech functionality. I’ve done this by creating a C# wrapper class (RSSpeechEngine) around the Intel RealSense SDK Speech module. Additionally, this sample app is using the “command” mode from the Intel RealSense speech engine.

The Windows* application uses a standard Windows Form class for the GUI controls and interaction with the RSSpeechEngine class. The form class makes use of delegates as well as multithreaded technology to ensure a responsive application.

I am not trying to make a bullet-proof application. I have added some degree of exception handling, but it’s up to you to ensure that proper engineering practices are in place to ensure a stable, user friendly application.

Requirements

Hardware requirements:

4th generation Intel® Core™ processors based on the Intel microarchitecture code name Haswell
8 GB free hard disk space
Intel RealSense camera F200 (required to connect to a USB 3 port)

Software requirements:

Microsoft Windows* 8.1/Win10 OS 64-bit
Microsoft Visual Studio 2010–2015 with the latest service pack
Microsoft .NET* 4.0 (or higher) Framework for C# development
Unity* 5.x or higher for Unity game development

WordEventArg.CS

WordEventArgs derives from the C# EventArgs class. It’s a small wrapper that has one private data member added to it. The private string _detectedWord holds the word that was detected by the speech engine.

This class is used as an event argument when the RSSpeechEngine class dispatches an event back to the Form class indicating the word that was detected.

RSSpeechEngine.CS

RSSpeechEngine is a wrapper class, an engine so to speak, around the speech modules command mode. I wrote the class with the following goals in mind:

Cleanly and clearly isolate as much of the Intel RealSense SDK functionality away from the client application.
Isolate each of the steps needed to get the command mode up and running in easy-to-understand function blocks.
Try to provide comments in the code to help the reader understand what the code is doing.

Below, I describe functions that comprise the RSSpeechEngine class.

public event EventHandler<WordEventArg> OnWordDetected;

The OnWordDetected Event triggers a message back to the client application letting it know that a given word was detected. The client creates an event handler to handle the WordEventArg object.

public RSSpeechEngine( )

RSSpeechEngine is the constructor for the class, and it takes no parameters. The constructor creates the global session object, which is needed in many different areas of the class for initialization.

Next I create the speech recognition module itself. If that succeeds, it creates the speech implementation object, following by the grammar module, the audio source, and the speech event hander. If none of those functions fail, _initialized is set to true, and the client application has the green light to try to use the class.

You might wonder whether the private variable _initialized variable is worth having, given the fact that each function returns a Boolean value, and if it’s false, I manually throw an error. In this example, the only real benefit of this variable is in the StartSpeechRecognition() function, where _initialized acts as a gate, allowing the recognition to start or not.

public void Dispose( )

This is a cleanup function that the client application calls to ensure memory is properly cleaned up.

public bool Initialized

This property exposes the private _initialized variable for the client application to use if they so choose. This example uses it as a gate in the StartSpeechRecognition() function.

private bool CreateSpeechRecognitionModule( )

This sets up the module itself by calling the session object’s CreateImpl function specifying the name of the module to be created. This function has one “out” parameter, which is a PXCMSpeechRecognition object.

private bool CreateSpeechModuleGrammar( )

This function ensures the speech module has a grammar to work with. A grammar for command mode is a set of words that speech recognition will recognize. In this example I’ve used the words “Fire,” “Bomb,” and “Lazer” as a grammar.

It should be noted that it’s possible to have more than one grammar loaded up into the speech recognition engine at one time. As an example, you could do the following which would load up three different grammars into the speech recognition module all waiting to be used.

_speechRecognition.BuildGrammarFromStringList( 1, _commandWordsList1, null ); _speechRecognition.BuildGrammarFromStringList( 2, _commandWordsList2, null ); _speechRecognition.BuildGrammarFromStringList( 3, _commandWordsList3, null );

Then when you want to use them, you would use the following

_speechRecognition.SetGrammar( 1 );
_speechRecognition.SetGrammar( 2 );
_speechRecognition.SetGrammar( 3 );

While you can build multiple grammars for speech recognition, you can only have one grammar loaded at a time.

When might you want to use multiple grammars? Maybe you have a game where each level has a different grammar. You can load one list for one level, and set its grammar. Then on a different level, you can use a different word list that you’ve already loaded. To use it, you simply set that level’s grammar.

As the sample code shows, I created one array that contains three words. I use the BuildGrammarFromStringList function to load that grammar, then use the SetGrammar function to ensure it’s active.

private bool CreateAudioSource( )

The CreateAudioSource function finds and connects to the Intel RealSense camera’s internal microphone. Even though I am specifically targeting this microphone, you can use any microphone attached to your computer. As an example, I’ve even used my Plantronics headset, and it works fine.

The first thing the function does is initialize the _audioSource PXCMAudioSource object by calling the session’s CreateAudioSource() function, and then check to ensure it was successfully created by checking it against null.

If I have a valid audio source, the next step is to create the device information for the audio source, which is talked about in the next function description CreateDeviceInfo(). For now, let’s assume that the valid audio device information was created. I set the volume of the microphone, which controls the volume at which the microphone records the audio signal. Then I set the audio sources device information that was created in the CreateDeviceInfo() function.

private bool CreateDeviceInfo( )

This function queries all the audio devices on a computer. First it instructs the audio device created in the previous function to scan for all devices on the computer by making the call to ScanDevices(). After the scan, the next step—an important one—is to iterate over all the devices found. This step is important because ALL audio devices connected to your computer will be detected by the Intel RealSense SDK. For example, I have a Roland OctaCapture* connected to my computer at home via USB. When I run this function on my computer, I get eight different audio devices listed for just this one Roland unit.

There are several ways to do this, but I’ve chosen what seems to be the standard in all the examples I’ve seen in the SDK: I hop into a for loop and query the system and populate the _audioDeviceInfo object with the “next as specified by i” audio device detected on the system. If the current audio device’s name matches the name of the audio device I want, I set the created variable to true and break out of the loop.

DEVICE_NAME was created and initialized at the top of the source code file as

string DEVICE_NAME = "Microphone Array (2- Creative VF0800)";

How did I know this name? I had to run the for loop, set a break point, and look at all the different devices as I iterated through them. It was obvious on my computer which one was the Intel RealSense camera in comparison to the other devices.

Once we have a match, we can stop looking. The global PXCMAudioSource.DeviceInfo object _audioDeviceInfo will now contain the proper device information to be used back in the calling function CreateAudioSource().

NOTE: I have seen situations where the device name is different. On one computer the Intel RealSense camera’s name will be "Microphone Array (2- Creative VF0800)" and on my other computer with the same Intel RealSense camera but a different physical device, the name will be "Microphone Array (4- Creative VF0800)". I’m not sure why this is but it’s something to keep in mind.

private bool CreateSpeechHandler( )

The CreateSpeechHandler function tells the speech recognition engine what to do after a word has been detected. I create a new PXCMSpeechRecognition.Handler object, ensuring it’s not null, and if not, I assign the OnSpeechRecognition function to the onRecognition delegate.

private void OnSpeechRecognition( PXCMSpeechRecognition.RecognitionData data )

OnSpeechRecognition is the event handler for the _handler.onRecognition delegate when a word has been detected. It accepts a single RecognitionData parameter. This parameter contains things like a list of scores, which is used when we need to maintain a certain confidence level in the word that was detected, as can be seen in the if(..) statement. I want to be sure that the confidence level of the currently detected work is at least 50 as defined here.

int CONFIDENCE_LEVEL = 50;

If the confidence level is at least 50, I create an instance of OnWordDetected event passing in a new instance of the WordEventArg object. WordEventArg takes a single parameter, which is the word that was detected.

At this point, the OnWordDetected sends a message to all subscribers, in this case, the Form informs it that the speech module detected one of the words in the list.

public void StartSpeechRecognition( )

StartSpeechRecogntion tells the speech recognition functionality to start listening to speech. First it checks to see if everything was initialized properly. If not, it returns out of the function.

Next, I tell it to start recording, passing in the audio source I want to listen to/for, and then what handler object has the function to call when a word is detected. And for good measure, just check to see if any error occurred.

public void StopSpeechRecognition( )

The function calls the module’s .StopRec() function to stop the processing and kills the internal thread. This process can take a few milliseconds. As such, if you immediately try to call the Dispose function on this class, there is a strong chance the Dispose() code will cause an exception. So if you try to set _speechRecognition to null and call its dispose method before StopRec() has completed, it will cause the application to crash. This is why I added the Thread.Sleep() function. I want the execution to halt just long enough to give StopRec() time to complete before moving on.

MainForm.CS

MainForm is the Windows GUI client to RSSpeechEngine. As mentioned previously, I designed this entire application so that the Form class only handles GUI operations and kicks off the RSSpeechEngine engine.

The RSSpeechEngineSample application itself is not a multithreaded application per se. However, because the PXCMSpeechRecognition module has a function that runs its own internal thread and we need data from that thread, we have to use some multithreaded constructs in the main form. This can be seen in the function that updates the label and the list box.

To start with, I create a global RSSpeechEngine _RSSpeechEngine object that will get initialized in the constructor. After that I declare two delegates. These delegates do the following

SetStatusLabelDelegate. Sets the applications status label from stopped to running.
AddWordToListDelegate. Adds a detected word to the list on the form.

public MainForm( )

This is the forms constructor. It initializes the _RSSpeechEngine object and assigns the OnWordDetected event to the AddWordToListEventHandler function.

private void btnStart_Click( object sender, EventArgs e )

This is the start button event click handler, which updates the label to “Running” and starts the RSSpeechEngine’s StartVoiceRecognition functionality.

private void btnStop_Click( object sender, EventArgs e )

This is the stop button event click handler, which tells the RSSPeechEngine to stop processing and sets the label to “Not Running.”

private void AddWordToListEventHandler( object source, WordEventArg e )

This is the definition for the AddWordToEventListHandler that gets called when _RSSpeechEngine has detected a word. It calls the AddWordToList function, which knows how to deal with multithreaded functionality.

private void AddWordToList( string s )

The AddWordToList takes one parameter, the word that was detected by the RSSpeechEngine engine. Due to the nature of multithreaded applications in Windows forms, this function looks a little strange.

When dealing with multithreaded applications in Windows, where form elements/controls need to be updated, it’s required to check a control’s InvokeRequired property. If true, this property indicates that the use of a delegate is going to be required. The delegate turns around and calls the exact same function. A new instance of the AddWordToListDelegate is created specifying the name of the function it is to call, which is a call back to the same function.

Once, the delegate has been initialized, I tell the form object to invoke the delegate with the original “s” parameter that came in.

private void SetLabel( string s )

This function works exactly like AddWordToList in the multithreaded area.

private void FormIsClosing( object sender, FormClosingEventArgs e )

This is the event that gets triggered when the application is closing. I check to ensure that _RSSpeechEngine is not null, and if _RSSpeechEngine has been successfully initialized, I call StopSpeechRecognition() forcing the processing to stop and then call the engine’s dispose function to clean up after itself.

Conclusion

I hope this article and sample code has helped you gain a better understanding of how to use Intel RealSense SDK speech recognition. These same principles will apply if you are using Unity. The intent was to show how to use Intel RealSense SDK speech recognition in an easy-to-understand, simple application covering everything to be successful in implementing a new solution.

If you think I have left out any explanation or haven’t been clear in a particular area, shoot me an email at rick.blacker@intel.com or make a comment below.

About Author

In this pair of articles on performance and memory covers basic concepts to provide guidance to developers seeking to improve software performance. These articles specifically address memory and data layout considerations. Part 1 addressed register use and tiling or blocking algorithms to improve data reuse. The paper begins by considering data layout first for general parallelism – shared memory programming with threads and then considers distributed computing via MPI as well. This paper expands those concepts to consider parallelism, both vectorization (single instruction multiple data SIMD) as well as shared memory parallelism (threading), and distributed memory computing. Lastly, this article considers data layout array of structure (AOS) versus structure of arrays (SOA) data layouts.

The basic performance principle emphasized in Part 1 is: reuse data in register or cache before it is evicted. The performance principles emphasized in this paper are: place data close to where it is most commonly used, place data in contiguous access mode, and avoid data conflicts.

Shared Memory Programming with Threads

Let's begin by considering shared memory programming with threads. Threads all share the same memory in a process. There are many popular threading models. The most well-known are Posix* threads and Windows* threads. The work involved in properly creating and managing threads is error prone. Modern software with numerous modules and large development teams makes it easy to make errors in parallel programming with threads. Several packages have been developed to ease thread creation, management and best use of parallel threads. The two most popular are OpenMP* and Intel® Threading Building Blocks. A third threading model, Intel® Cilk™ Plus, has not gained the adoption levels of OpenMP and Threading Building blocks. All of these threading models create a thread pool which is reused for each of the parallel operations or parallel regions. OpenMP has an advantage of incremental parallelism through the use of directives. Often OpenMP directives can be added to existing software with minimal code changes in a step-wise process. Allowing a thread runtime library to manage much of the thread maintenance eases development of threaded software. It also provides a consistent threading model for all code developers to follow, reduces the likelihood of some common threading errors, and provides an optimized threaded runtime library produced by developers dedicated to thread optimization.

The basic parallel principles mentioned in the introductory paragraphs are place data close to where it will be used and avoid moving the data. In threaded programming the default model is that data is shared globally in the process and may accessed by all threads. Introductory articles on threading emphasize how easy it is to begin threading by applying OpenMP to do loops (Fortran*) or for loop (C). These methods typically show good speed up when run on two to four cores. These methods frequently scale well to 64 threads or more. Just as frequently, though, they do not, and in some of those cases where they do not it is a matter of following a good data decomposition plan. This is a matter of designing an architecture of good parallel code.

It is important to explore parallelism at a higher level in the code call stack than where a parallel opportunity is initially identified by developer or software tools. When a developer recognizes that tasks or data can be operated on in parallel consider these questions in light of Amdahls’ law: “can I begin the parallel operations higher up in the call stack before I get to this point? If I do this do I increase the parallel region of my code that will then provide better scalability?”

The placement of data and what data must be shared through messages is carefully considered. Data is laid out to so that the data is placed where it is used most and then sent to other systems as needed. For applications represented in a grid, or a physical domain with specific partitions, it is common practice in MPI software to add a row of “ghost” cells around the subgrid or sub-domain. The ghost cells are used to store the values of data sent by the MPI process which updates those cells. Typically ghost cells are not used in threaded software, but just as you minimize the length of the edge along the partition for message passing, it is desirable to minimize the edge along partitions for threads using shared memory. This minimizes the needs for thread locks (or critical sections) or for cache usage penalties associated with cache ownership.

Large multi-socketed systems, although they share global memory address space, typically have non-uniform memory access (NUMA) times. Data in a memory bank closest to another socket takes more time to retrieve, or has longer latency, than data located in bank closest to the socket where the code is running. Access to close memory has a shorter latency.

Figure 1. Latency memory access, showing relative time to access data.

If one thread allocates and initializes data, that data is usually placed in the bank closest to the socket that the thread allocating and initializing the memory is running on (Figure 1). You can improve performance having each thread allocate and first reference the memory it will predominately use. This is usually sufficient to ensure that the memory is closest to the socket the thread is running on. Once a thread is created and active, the OS typically leaves threads on the same socket. Sometimes it has been beneficial to explicitly bind a thread to a specific core to prevent thread migration. When data has a certain pattern it is beneficial to assign, bind or set affinity of the threads to specific cores to match this pattern. The Intel OpenMP runtime library (part of Intel® Parallel Studio XE 2016) provides explicit mapping attributes which have proven useful for Intel® Xeon Phi™ coprocessor.

These types are compact, scatter, and balanced.

The compact attribute allocates consecutive or adjacent threads to the symmetric multithreading (SMTs) on a single core before beginning to assign threads to other cores. This is ideal where threads share data with consecutively numbered (adjacent) threads.
The scatter affinity assigns a thread to each core, before going back to the initial cores to schedule more threads on the SMTs.
Balanced affinity assigns thread of consecutive or neighboring IDs to the same core in a balanced fashion. Balanced is the recommended be starting affinity for those seeking to optimize thread affinity according to the Intel 16.0 C++ compiler documentation. The Balanced affinity setting is only available for Intel® Xeon Phi™ product family. It is not a valid option for general CPUs. When all the SMTs on a Xeon Phi platform are utilized balanced and compact behave the same. When only some of the SMTs are utilized on a Xeon Phi platform the compact method will fill up all the SMTs on the first cores and leave some cores ideal at the end.

Taking the time to place thread data close to where it is used is important when working with dozens of threads. Just as data layout is important for MPI programs it can be important for threaded software as well.

There are two short items to be considered regarding memory and data layout. These are relatively easy to address, but can have significant impact. The first is false sharing and the second is data alignment. One of the interesting performance issues with threaded software is false sharing. Each thread data operates on is independent. There is no sharing, but the cache line containing both data points is shared. This is why it is called false sharing or false data sharing; the data isn't shared, but the performance behavior is as though it is shared.

Consider a case where each thread increments its own counter, but the counter is in a one-dimensional array. Each thread increments its own counter. To increment its counter, the core must own the cache line. For example, thread A on socket 0 takes ownership of the cacheline and increments iCount[A]. Meanwhile thread A+1 on socket 1 increments iCount[A+1], to do this the core on socket 1 takes owner ship of the cacheline and thread A+1 updates its value. Since a value in the cacheline is altered, the cacheline for the processor on socket 0 is invalidated. At the next iteration, the processor in socket 0 takes ownership of the cacheline from socket 0 and alters the value in iCount[A], which in turn invalidates the cacheline in socket 1. When the thread on socket 1 is ready to write the cycle repeats. A significant number of cycles are spent to maintain cache coherency, as invalidating cachelines, regaining control and synchronizing to memory that performance can be impacted.

The best solution to this is not invalidate the cache. For example, at the entrance to the loop, each thread can read its count and store it in a local variable on its stack (reading does not invalidate the cache). When the work is completed the thread can copy this local value back into the permanent location (see Figure 2). Another alternative is to pad data so that data used predominately by a specific thread in its own cacheline.

int iCount[nThreads] ;
      .
      .
      .
      for (some interval){
       //some work . . .
       iCount[myThreadId]++ // may result in false sharing
     }

int iCount[nThreads*16] ;// memory padding to avoid false sharing
      .
      .
      .
      for (some interval){
       //some work . . .
       iCount[myThreadId*16]++ //no false sharing, unused memory
     }

int iCount[nThreads] ; // make temporary local copy

      .
      .
      .
      // every thread creates its own local variable local_count
      int local_Count = iCount[myThreadID] ;
      for (some interval){
       //some work . . .
       local_Count++ ; //no false sharing
     }
     iCount[myThreadId] = local_Count ; //preserve values
     // potential false sharing at the end,
     // but outside of inner work loop much improved
     // better just preserve local_Count for each thread

Figure 2.

The same false sharing can happen to scalars assigned to adjacent memory location. This last case is shown in the code snippet below:

int data1, data2 ; // data1 and data2 may be placed in memory
                   //such that false sharing could occur
declspec(align(64)) int data3;  // data3 and data4 will be
declspec(align(64)) int data4;  // on separate cache lines,
                                // no false sharing

When a developer designs parallelism from the beginning and minimizes shared data usage, false sharing is typically avoided. If your threaded software is not scaling well, even though there is plenty of independent work going on and there are few barriers (mutexes, critical sections), it may make sense to check for false sharing.

Data Alignment

Software performance is optimal when the data being operated on in a SIMD fashion (AVX512, AVX, SSE4, . . .) is aligned on cacheline boundaries. The penalty for unaligned data access varies according to processor family. The Intel® Xeon Phi™ coprocessors are particularly sensitive to data alignment. On the Intel Xeon Phi platforms data alignment is very important. The difference is not as pronounced on other Intel® Xeon® platforms, but performance improves measurably when data is aligned to cache line boundaries. For this reason it is recommended that the software developer always align data on 64 Byte boundaries. On Linux* and Mac OS X* this can be done with the Intel compiler option – no source code changes – just use the command line option: /align:rec64byte.

For dynamic allocated memory in C, malloc() can be replaced by _mm_alloc(datasize,64). When _mm_alloc() is used, _mm_free() should be used in place of free(). A complete article specifically on data alignment is found here: https://software.intel.com/en-us/articles/data-alignment-to-assist-vectorization.

Please check the compiler documentation as well. To show the affect of data alignment two matrices of the same size were created and both ran the blocked matrix multiply code used in Part 1 of this series. For the first case matrix A was aligned, for the second case matrix A was intentionally offset by 24 bytes (3 doubles), performance decreased by an 56 to 63% using the Intel 16.0 compiler for matrices ranging from size 1200x1200 to 4000x4000. In part 1 of this series I showed a table showing performance of loop ordering that used different compilers, when one matrix was offset there was no longer any performance benefit from using the Intel compiler. It is recommended the developer check their compiler documentation about data alignment and options available so that when data is aligned the compiler makes the best use of that information. The code for evaluating performance for a matrix offset from the cacheline is embedded in the code for Part 1 - the code for this experiment is at: https://github.com/drmackay/samplematrixcode

The compiler documentation will have additional information as well.

To show the effect of data alignment, two matrices of the same size were created and both ran the blocked matrix multiply code used in Part 1. The first matrix was aligned, the second matrix was intentionally offset by 24 bytes (3 doubles), performance decreased by 56 to 63% using the Intel 16.0 compiler for matrices ranging from size 1200x1200 to 4000x4000.

Array of Structure vs. Structure of Array

Processors do well when memory is streamed in contiguously. It is very efficient when every element of a cacheline is moved into the SIMD registers. If contiguous cachelines are also loaded the processors prefetch in an orderly fashion. In an array of structures, data may be laid out something like this:

struct {
   uint r, g, b, w ; // a possible 2D color rgb pixel layout
} MyAoS[N] ;

In this layout the rgb values are laid out contiguously. If the software is working on data across a color plane, then the whole structure is likely to be pulled into cache, but only one value, g (for example), will be used each time. If data is stored in a structure of arrays, the layout might be something like:

struct {
   uint r[N] ;
   uint g[N] ;
   uint b[N] ;
   uint w[N] ;
} MySoA ;

When data is organized in the Structure of Arrays and the software operates on all of the g values (or r or b), when a cacheline is brought into cache the entire cache line is likely to be used in the operations. Data is more efficiently loaded into the SIMD registers, this improves efficiency and performance. In many cases software developers take the time to actually temporarily move data into a Structure of Arrays to operate on and then copy it back as needed. When possible it is best to avoid this extra copying as this takes execution time.

Intel (Vectorization) Advisor 2016 “Memory Access Pattern” (MAP) analysis identifies loops with contiguous (“unit-stride”), non-contiguous and “irregular” access patterns:

The “Strides Distribution” column provides aggregated statistics about how frequent each pattern took place in given source loop. In picture above the left two-thirds of the bar is colored blue – indicating a contiguous access pattern, however right one-third is colored red – which means non-contiguous memory access. For codes with pure AoS pattern Advisor can also automatically get specific “Recommendation” to perform AoS -> SoA transformation.

The Access Pattern and more generally Memory Locality Analysis is simplified in Advisor MAP by additionally providing memory “footprint” metrics and by mapping each “stride” (i.e. access pattern) diagnostic to particular C++ or Fortran* objects/array names. Learn more about Intel Advisor at

https://software.intel.com/en-us/get-started-with-advisor and https://software.intel.com/en-us/intel-advisor-xe

Structure of array and array of structure data layout are relevant for many graphics programs as well as nbody (e.g. molecular dynamics), or anytime data/properties (e.g. mass, position, velocity, charge), may be associated with a point or specific body. Generally, the Structure of Arrays is more efficient and yields better performance.

Starting from Intel Compiler 2016 Update 1, AoS -> SoA transformation made simpler by introducing Intel® SIMD Data Layout Templates (Intel® SDLT). Using SDLT the AoS container could be simply redefined in this style:

SDLT_PRIMITIVE(Point3s, x, y, z)
sdlt::soa1d_container<Point3s> inputDataSet(count);

making it possible to access the Point3s instances in SoA fashion. Read more about SDLT here.

There are several articles written specifically to address the topic of AoS vs SoA. The reader is directed to read one of these specific articles:

https://software.intel.com/en-us/articles/a-case-study-comparing-aos-arrays-of-structures-and-soa-structures-of-arrays-data-layouts

and

https://software.intel.com/en-us/articles/how-to-manipulate-data-structure-to-optimize-memory-use-on-32-bit-intel-architecture
http://stackoverflow.com/questions/17924705/structure-of-arrays-vs-array-of-structures-in-cuda

While in most cases a structure of arrays matches that pattern and provides the best performance, there are a few cases where the data reference and usage more closely matches an array of structure layout, and in that case the array of structure provides better performance.

Summary

In summary here are the basic principles to observe regarding data layout and performance. Structure your code to minimize data movement. Reuse data while it is in the register or in cache; this also helps minimize data movement. Loop blocking can help minimize data movement. This is especially true for software having a 2D or 3D layout. Consider layout for parallelism – how are tasks and data distributed for parallel computation. Good domain decomposition practices benefit both message passing (MPI) and shared memory programming. A structure of arrays usually moves less data than an array of structures and performs better. Avoid false sharing, and create truly local variable or provide padding so each thread is referencing a value in a different cache line. Lastly, set data alignment to begin on a cacheline.

The complete code is available for download here: https://github.com/drmackay/samplematrixcode

In case you missed Part 1 it is located here.

Apply these techniques and see how your code performance improves.

Download PDF

Introduction

This article describes how the Intel® Tamper Protection Toolkit can help protect critical code and valuable data in a password-based encryption utility (Scrypt Encryption Utility) [3] against static and dynamic reverse-engineering and tampering. Scrypt [4] is a modern secure password-based key derivation function that is widely used in security-conscious software. There is a potential threat to scrypt described in [2] when an attacker can force generation of weak keys by forcing use of specific parameters. Intel® Tamper Protection Toolkit can be used to help mitigate this threat. We explain how to refactor relevant code and apply tamper protection to the utility.

In this article we discuss the following components of the Intel Tamper Protection Toolkit:

Iprot. An obfuscation tool that creates self-modifying and self-encrypted code
crypto library. A library that provides iprot-compatible implementations of basic crypto operations: cryptographic hash function, keyed-hash message authentication code (HMAC), and symmetric ciphers.

You can download the Intel Tamper Protection Toolkit at https://software.intel.com/en-us/tamper-protection.

Scrypt Encryption Utility Migration to Windows

Since the Scrypt Encryption Utility is targeted at Linux* and we want to show how to use the Intel Tamper Protection Toolkit on Windows*, our first task is to port the Scrypt Encryption Utility to Windows. Platform-dependent code will be framed with the following conditional directive:

#if defined(WIN_TP)
// Windows-specific code
#else
// Linux-specific code
#endif  // defined(WIN_TP)

Example 1:Basic structure of a conditional directive

The WIN_TP preprocessing symbol localizes Windows-specific code. WIN_TP should be defined for a Windows build, otherwise reference code is chosen for the build.

We use Microsoft Visual Studio* 2013 for building and debugging the utility. There are differences between Windows and Linux in various categories, such as process, thread, memory, file management, infrastructure services, and user interfaces. We had to address these differences for the migration, described in detail below.

The utility uses getopt() to handle command-line arguments. See a list of the program arguments in the Scrypt Encryption Utility section in [2]. The function getopt() is accessed from the unitstd.h POSIX OS header file. We used the get_opt() implementation from an open source project getopt_port [1]. Two new files, getopt.h and getopt.c, taken from this project were added into our source code tree.
Another function, gettimeofday(), present in the POSIX API helps the utility measure salsa opps, a number of salsa20/8 operations per second performed on the user’s platform. The utility needs the metric salsa opps to pick a secure configuration N, r, and p for input parameters so that the Scrypt algorithm executes at least the desired minimal number of salsa20/8 operations to avoid brute force attacks. We added the gettimeofday() implementation [5] to the scryptenc_cpuperf.c file.

Before the utility starts configuring the algorithm it asks the OS about the amount of available RAM allowed to be occupied for the derivation by calling the POSIX system function getrlimit(RLIMIT_DATA, …). For Windows, both soft and hard limits for the maximum size of process’s data segment (initialized data, uninitialized data, and heap) are established to be equal to 4 GB:

/* ... RLIMIT_DATA... */
#if defined(WIN_TP)
rl.rlim_cur = 0xFFFFFFFF;
rl.rlim_max = 0xFFFFFFFF;
if((uint64_t)rl.rlim_cur < memrlimit) {
	memrlimit = rl.rlim_cur;
}
#else
if (getrlimit(RLIMIT_DATA, &rl))
	return (1);
if ((rl.rlim_cur != RLIM_INFINITY) &&
     ((uint64_t)rl.rlim_cur < memrlimit))
	memrlimit = rl.rlim_cur;
#endif  // defined(WIN_TP)

Example 2:RLIMIT data limiting the process to 4GB.

Additionally, the MSVS compiler directive to inline functions in sysendian.h is added:

#if defined(WIN_TP)
static __inline uint32_t
#else
static inline uint32_t
#endif  // WIN_TP
be32dec(const void *pp);

Example 3:Adding sysendian.h inline functions

We migrated the tarsnap_readpass(…) function, which handles and masks retrieving passwords through a terminal. The function turns off echoing and masks the password with blanks in the terminal. The password is stored in memory buffer and sent to the next functions:

/* If we're reading from a terminal, try to disable echo. */
#if defined(WIN_TP)
if ((usingtty = _isatty(_fileno(readfrom))) != 0) {
	GetConsoleMode(hStdin, &mode);
	if (usingtty)
		mode &= ~ENABLE_ECHO_INPUT;
	else
		mode |= ENABLE_ECHO_INPUT;
	SetConsoleMode(hStdin, mode);
}
#else
if ((usingtty = isatty(fileno(readfrom))) != 0) {
	if (tcgetattr(fileno(readfrom), &term_old)) {
		warn("Cannot read terminal settings");
		goto err1;
	}
	memcpy(&term, &term_old, sizeof(struct termios));
	term.c_lflag = (term.c_lflag & ~ECHO) | ECHONL;
	if (tcsetattr(fileno(readfrom), TCSANOW, &term)) {
		warn("Cannot set terminal settings");
		goto err1;
	}
}
#endif  // defined(WIN_TP)

Example 4:Password control via terminal

In the original getsalt() a salt is built from pseudorandom numbers read from the Linux special file /dev/urandom. On Windows we suggest using the rdrand() instruction to read from a hardware random number generator available on Intel® Xeon® and Intel® Core™ processor families starting from Ivy Bridge microarchitecture. The C standard pseudorandom generator is not used, as getsalt() is incompatible with the Intel Tamper Protection Toolkit obfuscation tool. The function getsalt() should be protected with the obfuscator against static and dynamic tampering and reverse-engineering since a salt produced by this function is categorized as sensitive in the Scrypt Encryption Utility section in [2]. The example below shows both original and ported codes of random number generation to fill a salt:

#if defined(WIN_TP)
	uint8_t i = 0;

	for (i = 0; i < buflen; i++, buf++)
	{
		_rdrand32_step(buf);
	}
#else
	/* Open /dev/urandom. */
	if ((fd = open("/dev/urandom", O_RDONLY)) == -1)
		goto err0;
	/* Read bytes until we have filled the buffer. */
	while (buflen > 0) {
		if ((lenread = read(fd, buf, buflen)) == -1)
			goto err1;
		/* The random device should never EOF. */
		if (lenread == 0)
			goto err1;
		/* We're partly done. */
		buf += lenread;
		buflen -= lenread;
	}
	/* Close the device. */
	while (close(fd) == -1) {
		if (errno != EINTR)
			goto err0;
	}
#endif defined(WIN_TP)

Example 5:Original and ported random number generation code

Utility Protection with the Intel® Tamper Protection Toolkit

Now we will make changes in the utility design and code to help protect sensitive data identified in the threat model in the Password-Based Key Derivation section in [2]. The protection of the sensitive data is achieved by code obfuscation using iprot, the obfuscating compiler included in the Intel Tamper Protection Toolkit. It is reasonable to obfuscate only those functions that create, handle, and use sensitive data.

From the Code Obfuscation section in [2] we know that iprot takes as input a dynamic library (.dll) and produces a binary with only obfuscated export functions specified in the command line. So we put all functions working with sensitive data into a dynamic library to be obfuscated, leaving others, like command-line parsing and password reading, in the main executable.

Figure 1 shows the new design for the protected utility. The utility is split into two parts: the main executable and a dynamic library to be obfuscated. The main executable is responsible for parsing a command line, and reading a passphrase and input file into a memory buffer. The dynamic library includes export functions such as scryptenc_file and scryptdec_file that work with sensitive data (N, r, p, salt).

The key data structure used by the dynamic library is the Scrypt context, which stores HMAC digested information about the Scrypt parameters N, r, p and salt. The HMAC digest in the context is used to determine whether the latest changes in the context are done by trusted functions such as scrypt_ctx_enc_init, scrypt_ctx_dec_init, scryptenc_file, and scryptdec_file, which have an HMAC key to resign and to verify the context. These trusted functions will be resistant to modifications since we intend to obfuscate them by the obfuscation tool. Two new functions, scrypt_ctx_enc_init and scrypt_ctx_dec_init, appear to initialize the Scrypt context for both encryption and decryption modes.

Figure 1:Design for protected Scrypt Encryption Utility.

Encryption Flow

The utility uses getopt() to handle command-line arguments. See a list of the program arguments in the Password-Based Key Derivation Function section in [2].
Input file for encryption and a passphrase are read into the memory buffer.
The main executable calls scrypt_ctx_enc_init to initialize the Scrypt context for computing secure Scrypt parameters (N, r, p and salt) for specified CPU time and RAM size to the key derivation through command-line options like maxmem, maxmemfrac, and maxtime. At the end of this call the initialization function creates an HMAC digest, including the newly updated state, to prevent tampering when the function returns. The initialization function will also return the amount of memory the application must allocate to proceed with encryption.
The utility in the main executable dynamically allocates memory based on the size returned by the initialization function.
The executable calls scrypt_ctx_enc_init a second time. The function verifies integrity of the Scrypt context using Hash MAC digest. If integrity verification passes, the function sets the buffer location in the context with the allocated location and updates HMAC. File reading and dynamic memory allocation are done in the executable to avoid iprot incompatible code in the dynamic library. Code containing system calls and C standard functions generate indirect jumps and relocations that are not supported by the obfuscator.
The executable calls scryptenc_file to encrypt the file using the user-supplied passphrase. The function verifies integrity of the Scrypt context with parameters (N, r, p, and salt) used for the key derivation. If verification passes it calls the Scrypt algorithm to derivate a key. The derived key is then used for encryption. The export function forms the same output as the original Scrypt utility. This means the output has similar hash values used for integrity verification of encrypted data and correctness of passphrase during decryption.

Decryption Flow

The utility uses getopt() to handle command-line arguments. See a list of the program arguments in the Password-Based Key Derivation section in [2].
Input file for decryption and a passphrase are read into memory buffer.
The main executable calls scrypt_ctx_dec_init to check whether the provided parameters in the encrypted file data are valid and whether the key derivation function can be computed within the allowed memory and CPU time.
The utility in the main executable dynamically allocates memory based on the size returned by the initialization function.
The executable calls scrypt_ctx_dec_init a second time. The function does the same as in the encryption case.
The executable calls scryptdec_file to decrypt file using password. The function verifies integrity of the Scrypt context with parameters (N, r, p, and salt) used for the key derivation. If verification passes it calls the Scrypt algorithm to derive a key. Using hash values in encrypted data the function verifies correctness of password and integrity of encrypted data.

In the protected utility we replace the OpenSSL* implementation of the Advanced Encryption Standard in CTR mode cipher and keyed hash function with Intel Tamper Protection Toolkit crypto library one. Unlike OpenSSL*, the crypto library satisfies all code restrictions to be obfuscated by iprot and can be used from within obfuscated code without further modification. The AES cipher is called inside the scryptenc_file and scryptdec_file to encrypt/decrypt the input file using a key derived from a password. The keyed hash function is called by the export functions (scrypt_ctx_enc_init, scrypt_ctx_dec_init, scryptenc_file and scryptdec_file) to verify the data integrity of a Scrypt context before using it. In the protected utility all the exported functions of the dynamic library are obfuscated with iprot. The Intel Tamper Protection Toolkit helps us achieve the goal to mitigate threats defined in the Password-Based Key Derivation section in [2].

Our solution is a redesigned utility with an iprot obfuscated dynamic library. This is resistant to attacks determined above, and it can be proved that the Scrypt context can be updated only by the export functions because they have the HMAC private key to recalculate the HMAC digest in the context. Also, these functions and the HMAC key are protected against tampering and reverse engineering by the obfuscator. In addition, other sensitive data such as the key produced by Scrypt is protected since it is derived inside obfuscated exported functions scryptenc_file and scryptdec_file. The obfuscation compiler produces code that is self-encrypted at runtime and protected against tampering and debugging.

Let us consider the code about how scrypt_ctx_enc_init protects the Scrypt context. The main executable signals buf_p through a pointer at the same time scrypt_ctx_enc_init is called. If the pointer is equal to null, the function is called for the first time; otherwise it is called the second time. During the first call of the initialization it picks Scrypt parameters, calculates HMAC digest, and returns the amount of memory required for Scrypt computation as shown below:

// Execute for the first call when it returns memory size required by scrypt
	if (buf_p == NULL) {
// Pick parameters for scrypt and initialize the scrypt context
		// <...>

		// Compute HMAC
		itp_res = itpHMACSHA256Message((unsigned char *)ctx_p, sizeof(scrypt_ctx)-								sizeof(ctx_p->hmac),
							hmac_key, sizeof(hmac_key),
							ctx_p->hmac, sizeof(ctx_p->hmac));

		*buf_size_p = (r << 7) * (p + (uint32_t)N) + (r << 8) + 253;
	}

Example 6:The first call of code protecting the Scrypt context

During the second call, the buf_p point to allocated memory is transmitted to the scrypt_ctx_enc_init function. Using an HMAC digest in the context, the function verifies integrity of the context and makes sure that no one has changed it between the first and the second calls. After that it initializes address inside the context with buf_p and recomputes the HMAC digest since the context has changed as shown below:

// Execute for the second call when memory for scrypt is allocated
	if (buf_p != NULL) {
		// Verify HMAC
		itp_res = itpHMACSHA256Message(
(unsigned char *)ctx_p, sizeof(scrypt_ctx)-sizeof(ctx_p->hmac),
			hmac_key, sizeof(hmac_key),
			hmac_value, sizeof(hmac_value));
		if (memcmp(hmac_value, ctx_p->hmac, sizeof(hmac_value)) != 0) {
			return -1;
		}

		// Initialize pointers to buffers for scrypt computation:
// ctx_p->addrs.B0 = …

		// Recompute HMAC
		itp_res = itpHMACSHA256Message(
			(unsigned char *)ctx_p, sizeof(scrypt_ctx)-sizeof(ctx_p->hmac),
			hmac_key, sizeof(hmac_key),
			ctx_p->hmac, sizeof(ctx_p->hmac));
	}

Example 7:Second call of code protecting the Scrypt context

From [2] we know that iprot imposes some restrictions on input code for it to be obfuscatable. It demands no relocations and no indirect jumps. Coding constructs in C with global variables, system functions, and C standard function calls can generate relocations and indirect jumps. The code in Example 7 calls one C standard function memcmp, which causes code incompatibility with iprot. For this reason we implement some of our own C standard functions such as memcmp, memset, and memmove used by the utility. Also, all global variables in the dynamic library are transformed into local variables and take care of the data initialized on the stack.

In addition, we encountered a problem with obfuscation of code with double values that is not covered by tutorials and is not documented in the Intel Tamper Protection Toolkit user guide. As shown below, in pickparams function salsa20/8 the core operation limit has double type and equals 32768. This value is not initialized on the stack, and a compiler puts the value into a data segment of the binary that generates relocation in the code.

	double opslimit;
#if defined(WIN_TP)
	// unsigned char d_32768[] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xE0, 0x40};
	unsigned char d_32768[sizeof(double)];
	d_32768[0] = 0x00;
	d_32768[1] = 0x00;
	d_32768[2] = 0x00;
	d_32768[3] = 0x00;
	d_32768[4] = 0x00;
	d_32768[5] = 0x00;
	d_32768[6] = 0xE0;
	d_32768[7] = 0x40;
	double *var_32768_p = (double *) d_32768;
#endif

	/* Allow a minimum of 2^15 salsa20/8 cores. */
#if defined(WIN_TP)
	if (opslimit < *var_32768_p)
		opslimit = *var_32768_p;
#else
	if (opslimit < 32768)
		opslimit = 32768;
#endif

Example 8:Code for iprot-compatible double variable

We solved this problem by initializing a byte sequence on the stack with a hex dump that matches the hex representation of this double value in memory and creates a double pointer to this sequence.

To obfuscate the dynamic library with iprot, we use the following command:

iprot scrypt-dll.dll scryptenc_file scryptdec_file scrypt_ctx_enc_init scrypt_ctx_dec_init -c 512 -d 2600 -o scrypt_obf.dll

The interface of the protected utility is not changed. Let us compare the unobfuscated code with the obfuscated version. The following shows the disassembled code with significant difference between the two versions.

# non-obfuscated code
scrypt_ctx_enc_init PROC NEAR
        push    ebp                              ; 10030350 _ 55
        mov     ebp, esp                         ; 10030351 _ 8B. EC
        sub     esp, 100                         ; 10030353 _ 83. EC, 64
        mov     dword ptr [ebp-4H], 0  ; 10030356 _ C7. 45, FC, 00000000
        mov     eax, 1                           ; 1003035D _ B8, 00000001
        imul    ecx, eax, 0                      ; 10030362 _ 6B. C8, 00
        mov     byte ptr [ebp+ecx-1CH], 1 ; 10030365 _ C6. 44 0D, E4, 01
        mov     edx, 1                           ; 1003036A _ BA, 00000001
        shl     edx, 0                           ; 1003036F _ C1. E2, 00
        mov     byte ptr [ebp+edx-1CH], 2 ; 10030372 _ C6. 44 15, E4, 02
        mov     eax, 1                           ; 10030377 _ B8, 00000001
        shl     eax, 1                           ; 1003037C _ D1. E0
        mov     byte ptr [ebp+eax-1CH], 3 ; 1003037E _ C6. 44 05, E4, 03
        mov     ecx, 1                           ; 10030383 _ B9, 00000001<…>

# obfuscated code with default parameters
scrypt_ctx_enc_init PROC NEAR
        mov     ebp, esp                     ; 1000100E _ 8B. EC
        sub     esp, 100                     ; 10001010 _ 83. EC, 64
        mov     dword ptr [ebp-4H], 0        ; 10001013 _ C7. 45, FC, 00000000
        mov     eax, 1                       ; 1000101A _ B8, 00000001
        imul    ecx, eax, 0                  ; 1000101F _ 6B. C8, 00
        mov     byte ptr [ebp+ecx-1CH], 1    ; 10001022 _ C6. 44 0D, E4, 01
        push    eax                          ; 10001027 _ 50
        pop     eax                          ; 1000102D _ 58
        lea     eax, [eax+3FFFD3H]           ; 1000102E _ 8D. 80, 003FFFD3
        mov     dword ptr [eax], 608469404   ; 10001034 _ C7. 00, 2444819C
        mov     dword ptr [eax+4H], -124000508 ; 1000103A _ C7. 40, 04, F89BE704
        mov     dword ptr [eax+8H], -443981569 ; 10001041 _ C7. 40, 08, E58960FF
        mov     dword ptr [eax+0CH], 1633409 ; 10001048 _ C7. 40, 0C, 0018EC81
        mov     dword ptr [eax+10H], -477560832 ; 1000104F _ C7. 40, 10, E3890000<…>

Example 9:Disassembled codes for non-obfuscated and obfuscated versions

Obfuscation degrades performance, and dynamic library size is significantly increased. The obfuscator allows developers to balance security versus performance using cell size and mutation distance. The current obfuscation uses 512-byte cell size and 2600-byte mutation distance. A cell is instruction subsequence from original binary. A cell in obfuscated code is encrypted until the instruction pointer is about to enter it. A decrypted cell gets encrypted back when it is fully executed.

The source code for the utility that the Intel Tamper Protection Toolkit helps protect will soon be available at GitHub.

Acknowledgments

We thank Raghudeep Kannavara for originating the idea about to apply Intel Tamper Protection Toolkit to the scrypt encryption utility and Andrey Somsikov for many helpful discussions.

References

K. Grasman. getopt_port on GitHub https://github.com/kimgr/getopt_port/
R. Kazantsev, D. Katerinskiy, and L. Thaddeus. Understanding Intel® Tamper Protection Toolkit and Scrypt Encryption Utility, Intel Developer Zone, 2016.
C. Percival. The Scrypt Encryption Utility. http://www.tarsnap.com/scrypt/scrypt-1.1.6.tgz
C. Percival and S. Josefsson (2012-09-17). The Scrypt Password-Based Key Derivation Function. IETF.
W. Shawn. Freebsd sources on GitHub https://github.com/lattera/freebsd

About the Authors

Roman Kazantsev works in the Software & Services Group at Intel Corporation. Roman has 7+ year of professional experience in software engineering. His professional interests are focused on cryptography, software security, and computer science. Currently he occupies a position of Software Engineer where his ongoing mission is to deliver cryptographic solutions and expertise for content protection across all Intel platforms. He received his Bachelor and Masters in Computer Science with honors at Nizhny Novgorod State University, Russia.

Denis Katerinskiy works in the Software & Service Group at Intel Corporation. He has 2 years of experience in software development. His main interests are programming, performance optimization, algorithm development, mathematics, and cryptography. In his current role as a Software Developer Engineer Denis develops software simulators for Intel architecture. Denis Katerinskiy currently pursues Bachelor in Computer Science at Tomsk State University.

Thaddeus Letnes works in the Software & Services Group at Intel Corporation. He has 15+ year of professional experience in software development. His main interests are low level systems, languages, and engineering practices. In his current role as a Software Engineer developing software development tools Thaddeus works closely with software developers, architects, and project managers to produce high quality development tools. Thaddeus holds a Bachelor’s degree in Computer Science from Knox College.

Code Sample

Introduction

RealPerspective utilizes Intel® RealSense™ technology to create a unique experience. This code sample utilizes head tracking to perform a monoscopic technique for better 3D fidelity.

Using a system equipped with an Intel® RealSense® camera, the user can move their head around and have the game’s perspective correctly computed. The effect can be best described as looking into a window of another world. Traditionally this has been done with a RGB camera or IR trackers [3], but with the Intel RealSense camera’s depth information, the developer is provided accurate face tracking without any additional hardware on the user.

The sample accomplishes the effect by implementing an off-axis perspective projection described by Kooima [1]. The inputs are the face’s spatial X, Y position and the face’s average depth.

Build and Deploy

The Intel® RealSense™ SDK and the Intel RealSense Depth Camera Manager are required for development.

For deploying the project to end users, the matching SDK Runtime Redistributable must be installed.

To download the SDK, SDK Runtime, and Depth Camera Manager, go to: https://software.intel.com/en-us/intel-realsense-sdk/download

Unity

For the Unity project, to ensure compatibility with the SDK installed on the system, please replace libpxcclr.unity.dll and libpxccpp2c.dll in Libraries\x64 and Libraries\x86 of the project with the DLLs in bin\x64 and bin\x86 of the Intel RealSense SDK respectively.

Method

Initialize Intel RealSense Camera

During start up, the Sense Manager initializes and configures the face module for face detection (bounding rectangle and depth). Once completed the Sense Manager pipeline is ready for data.

Process Input

The process input function returns a Vector3 normalized to 0 to 1 containing the face’s 3D spatial position from the Intel RealSense camera. If the Intel RealSense camera is not available, mouse coordinates are used.

The face’s x, y, z location comes from the Intel RealSense SDK’s face module. The face’s XY planar position comes from the center of the face’s bounding rectangle detection in pixel units. The face’s z comes from the face’s average depth in millimeters. The function is set to be non-blocking, so if data is not available the Update function is not delayed. Otherwise, the previous perspective and view matrices are unchanged.

Calculate Off-Axis Parameters

p_a, p_b, and p_c are points that define the screen extents and determine the screen size, aspect ratio, position, and orientation in space. These screen extents are scaled based on the screen and come from the application’s window. Finally n and f determine the near and far planes, and in Unity, the values come from the Camera class.

For example, if the room is 16 by 9 units with an aspect ratio of 16:9, then p_a, p_b, p_c can be set so the room covers the screen. Distance from p_a to p_b will be width of the room, 16 units, and the distance from p_ato p_c is the height of the room, 9 units. For additional examples, see Kooima [1].

Calculate Off-Axis Matrices

The goal of this function is to return the off-axis matrix. The projection matrix is essentially based on the OpenGL* standard glFrustum. The final step is aligning the eye with the XY plane and translating to the origin. This is similar to what the camera or view matrix does for the graphics pipeline.

Projection matrix

First, the orthonormal basis vectors ( v_r, v_u, v_n) are computed based on the screen extents. The orthonormal basis vectors will later help project the screen space onto the near plane and create the matrix to align the tracker space with the XY plane.

Next, screen extents vectors, v_a, v_b, and v_c, are created from the screen plane.

Next, frustum extents, l, r, b, and t, are computed from the screen extents by projecting the basis vectors onto the screen extent vectors to get their location on the plane then scaled back by distance from the screen plane to the near plane. This is done because the frustum extents define the frustum created on the near plane.

Finally, once the frustum extents are computed, the values are plugged into the glFrustum function to produce the perspective projection matrix. The field of view can be computed from frustum extents [2].

Projection plane orientation

The foreshortening effect of the perspective projection works only when the view position is at the origin. Thus the first step is to align the screen with the XY plane. The matrix M is constructed to get the Cartesian coordinate system to screen-local coordinates by its basis vectors ( v_r, v_u, and v_n). However, the screen space is what needs to be aligned with the XY plane, thus the transpose of matrix M is taken.

View point offset

Similarly, the tracker eye position, p_e, must be translated to the frustum origin. This is done with a translation matrix T.

Composition

The computed matrices are fed back into Unity’s Camera data structure.

Performance

The test system was a GIGABYTE Technology BRIX* Pro with an Intel® Core™ i7-4770R processor (65W TDP).

In general, the performance overhead on average is very low. The entire Update() function completes in less than 1 ms. About 0.50 ms for consecutive detected frames with face and 0.20 ms for frames with no detected faces. New data is available about every 33 ms.

Use Case and Future work

The technique discussed in the sample can be used seamlessly in games when RealSense hardware is available on an Intel® processor-based system. The provided auxiliary input system adds an extra level of detail that improves the game’s immersion and 3D fidelity.

A few possible use cases are RTS (real-time strategy), MOBA (multiplayer online battle arena), and tabletop games, which let the user see the action as if they are playing a game of chess. In simulation and sandbox games the user sees the action and can get the perfect view on his or her virtual minions and lean in to see what they’re up to.

The technique is not limited to retrofitting current and previous games or even gaming use. For gaming, new uses can include dodging, lean-in techniques, and full screen HUDs movement (e.g., crisis helmet HUD). Non-gaming use can be used in digital displays such as picture frames or multiple monitor support. This technique can also be considered on a spectrum of virtual reality without using a bulky and expensive head-mounted display.

References

[1]: Kooima, Robert. Generalized Perspective Projection. 2009.

[2]: OpenGL.org. Transformations.

[3]: Johnny Lee. Head Tracking for Desktop VR Displays using the WiiRemote. 2007.

Appendix

Requirements

Intel RealSense enabled system or SR300 developer camera
Intel RealSense SDK version 6.0+
Intel RealSense Depth Camera Manager SR300 version 3.0+
Microsoft Windows 8.1* or newer
Unity 5.3+

Intel® Math Kernel Library (Intel® MKL) is a highly optimized, extensively threaded, and thread-safe library of mathematical functions for engineering, scientific, and financial applications that require maximum performance. Intel MKL 11.3 Update 2 packages are now ready for download. Intel MKL is available as part of the Intel® Parallel Studio XE and Intel® System Studio . Please visit the Intel® Math Kernel Library Product Page.

Intel® MKL 11.3 Update 2 Bug fixes

New Features in MKL 11.3 Update 2

Introduced mkl_finalize function to facilitate usage models when Intel MKL dynamic libraries or third party dynamic libraries are linked with Intel MKL statically are loaded and unloaded explicitly
Compiler offload mode now allows using Intel MKL dynamic libraries
Added Intel TBB threading for all BLAS level-1 functions
Intel MKL PARDISO:
- Added support for block compressed sparse row (BSR) matrix storage format
- Added optimization for matrixes with variable block structure
- Added support for mkl_progress in Parallel Direct Sparse Solver for Clusters
- Added cluster_sparse_solver_64 interface
Introduced sorting algorithm in Summary Statistics

Check out the latest Release Notes for more updates

Contents

File: w_mkl_11.3.2.180_online.exe
Online Installer for Windows
File: w_mkl_11.3.2.180.exe
A File containing the complete product installation for Windows* (32-bit/x86-64bit development)

Abstract

The arrival of a new and improved front-facing camera, the SR300, has necessitated changes to the Intel® RealSense™ SDK and the Intel® RealSense™ Depth Camera Manager that may prevent legacy applications from functioning. This paper provides an overview of some key aspects in developing camera-independent applications that are portable across the different front-facing cameras: Intel® RealSense™ cameras F200 and SR300. It also details several methods for detecting the set of front- and rear-facing camera devices featured in the Intel RealSense SDK. These methods include how to use the installer scripts to detect the local capture device as well as how to use the Intel RealSense SDK to detect the camera model and its configuration at runtime. This paper is intended for novice and intermediate developers who have either previously developed F200 applications and want to ensure compatibility on SR300-equipped systems or want to develop new Intel® RealSense™ applications targeting SR300’s specific features.

Introduction

The arrival of the new and improved front-facing Intel RealSense camera SR300 has introduced a number of changes to the Intel RealSense SDK as well as new considerations to maintain application compatibility across multiple SDK versions. As of the R5 2015 SDK release for Windows*, three different camera models are supported including the rear-facing Intel RealSense camera R200 and two front-facing cameras: the Intel RealSense camera F200 and the newer SR300. The SR300 brings a number of technical improvements over the legacy F200 camera, including improved tracking range, motion detection, color stream and IR sensors, and lower system resource utilization. Developers are encouraged to create new and exciting applications that take advantage of these capabilities.

However, the presence of systems with different front-facing camera models presents several unique challenges for developers. There are certain steps that should be taken to verify the presence and configuration of the Intel RealSense camera to ensure compatibility. This paper outlines the best-known methods to develop a native SR300 application and successfully migrate an existing F200 application to a SR300 platform while maintaining compatibility across both cameras models.

Detecting The Intel® RealSense™ camera During Installation

In order to ensure support, first verify which camera model is present on the host system during application install time. The Intel RealSense SDK installer script provides options to check for the presence of any of the camera models using command-line options. Unless a specific camera model is required, we recommend that you use the installer to detect orientation (front or rear facing) to maintain portability across platforms with different camera models. If targeting specific features, you can check for specific camera models (Intel RealSense cameras F200, SR300, and R200) by specifying the appropriate options. If the queried camera model is not detected, the installer will abort with an error code. The full SDK installer command list can be found on the SDK documentation website under the topic Installer Options. For reference, you can find the options related to detecting the camera as well as sample commands below.

Installer Command Option

--f200 --sr300 --f200 --r200

Force a camera model check such that the runtime is installed only when the requested camera model is detected. If the camera model is not detected, the installer aborts with status code 1633.

--front --rear

The --front option checks for any front facing camera and --rear option for any rear facing camera.

Examples

Detect presence of any rear-facing camera and install the 3D scan runtime silently via web download:

intel_rs_sdk_runtime_websetup_YYYY.exe --rear --silent --no-progress --acceptlicense=yes --finstall=core,3ds --fnone=all

Detect presence of an F200 camera and Install the face runtime silently:

intel_rs_sdk_runtime_YYYY.exe --f200 --silent --no-progress --acceptlicense=yes --finstall=core,face3d --fnone=all

Detecting The Intel RealSense Camera Configuration at Runtime

After verifying proper camera setup at install time, verify the capture device and driver version (that is, Intel RealSense Depth Camera Manager (DCM) version) during the initialization of your application. To do this, use the provided mechanisms in the Intel RealSense SDK such as DeviceInfo and the ImplDesc structures. Note that the device information is only valid after the Init function of the SenseManager interface.

Checking the Camera Model

To check the camera model at startup, use the QueryDeviceInfo function, which returns a DeviceInfo structure. The DeviceInfo structure includes a DeviceModel member variable that includes all supported camera models available. Note that the values enumerated by the DeviceModel include predefined camera models that will change as the SDK evolves. You will want to verify that the SDK version on which you are compiling your application is recent enough to include the appropriate camera model that your application requires.

Code sample 1 illustrates how to use the QueryDeviceInfo function to retrieve the currently connected camera model in C++. Note that the device information is only valid after the Init function of the SenseManager interface.

Code Sample 1:Using DeviceInfo to check the camera model at runtime.

// Create a SenseManager instance
PXCSenseManager *sm=PXCSenseManager::CreateInstance();
// Other SenseManager configuration (say, enable streams or modules)
...
// Initialize for starting streaming.
sm->Init();
// Get the camera info
PXCCapture::DeviceInfo dinfo={};
sm->QueryCaptureManager()->QueryDevice()->QueryDeviceInfo(&dinfo);
printf_s("camera model = %d\n", dinfo.model);
// Clean up
sm->Release();

Checking The Intel RealSense Depth Camera Manager Version At Runtime

The Intel RealSense SDK also allows you to check the DCM version at runtime (in addition to the SDK runtime and individual algorithm versions). This is useful to ensure that the required Intel® RealSense™ technologies are installed. An outdated DCM may result in unexpected camera behavior, non-functional SDK features (that is, detection, tracking, and so on), or reduced performance. In addition, having the latest Gold DCM for the Intel RealSense camera SR300 is necessary to provide backward compatibility for apps designed on the F200 camera (the latest SR300 DCM must be downloaded on Window 10 machines using Windows Update). An application developed on an SDK earlier than R5 2015 for the F200 camera should verify both the camera model and DCM at startup to ensure compatibility on an SR300 machine.

In order to verify the camera driver version at runtime, use the QueryModuleDesc function, which returns the specified module’s descriptor in the ImplDesc structure. To retrieve the camera driver version, specify the capture device as the input argument to the QueryModuleDesc and retrieve the version member of the ImplDesc structure. Code sample 2 illustrates how to retrieve the camera driver version in the R5 version of the SDK using C++ code. Note that if the DCM is not installed on the host system, the QueryModuleDesc call returns a STATUS_ITEM_UNAVAILABLE error. In the event of a missing DCM or version mismatch, the recommendation is to instruct the user to download the latest version using Windows Update. For full details on how to check the SDK, camera, and algorithm versions, please reference the topic titled Checking SDK, Camera Driver, and Algorithm Versions on the SDK documentation website.

Code Sample 2:Using ImplDesc to get the algorithm and camera driver versions at runtime.

PXCSession::ImplVersion GetVersion(PXCSession *session, PXCBase *module) {
    PXCSession::ImplDesc mdesc={};
    session->QueryModuleDesc(module, &mdesc);
    return mdesc.version;
}
// sm is the PXCSenseManager instance
PXCSession::ImplVersion driver_version=GetVersion(sm->QuerySession(), sm->QueryCaptureManager()->QueryCapture());
PXCSession::ImplVersion face_version=GetVersion(sm->QuerySession(), sm->QueryFace());

Developing For Multiple Front-Facing Camera Models

Starting with the R5 2015 SDK release for Windows, a new front-facing camera model, named the Intel RealSense camera SR300, has been added to the list of supported cameras. The SR300 improves upon the Intel RealSense camera model F200 in several key ways, including increased tracking range, lower power consumption, better color quality in low light, increased SNR for the IR sensor, and more. Applications that take advantage of the SR300 capabilities can result in improved tracking quality, speed, and enhanced responsiveness over F200 applications. However, with the addition of a new camera in the marketplace comes increased development complexity in ensuring compatibility and targeting specific features in the various camera models.

This section summarize the key aspects developers must know in order to write applications that take advantage of the unique properties of the SR300 cameras or run in backward compatibility mode with only F200 features. For a more complete description of how to migrate F200 applications to SR300 applications, please read the section titled Working with Camera SR300 on the SDK documentation website.

Intel RealSense camera F200 Compatibility Mode

In order to allow older applications designed for the F200 camera to function on systems equipped with an SR300 camera, the SR300 DCM (gold or later) implements an F200 compatibility mode. It is automatically activated when a streaming request is sent by a pre-R5 application, and it allows the DCM to emulate F200 behavior. In this mode if the application calls QueryDeviceInfo, the value returned will be “F200” for the device name and model. Streaming requests from an application built on the R5 2015 or later SDK are processed natively and are able to take advantage of all SR300 features as hardware compatibility mode is disabled.

It is important to note that only one mode (native or compatibility) can be run at a time. This means that if two applications are run, one after the other, the first application will determine the state of the F200 compatibility mode. If the first application was compiled on an SDK version earlier than R5, the F200 compatibility mode will automatically be enabled regardless of the SDK version of the second application. Similarly, if the first application is compiled on R5 or later, the F200 compatibility mode will automated be deactivated and any subsequent applications will see the camera as an SR300. Thus if the first application is R5 or later (F200 compatibility mode disabled) but a subsequent application is pre-R5, the second application will not see a valid Intel RealSense camera on the system and thus will not function. This is because the pre-R5 application requires a F200 camera but the DCM is running in native SR300 mode due to the earlier application. There is currently no way to overwrite the F200 compatibility state for the later application, nor is it possible for the DCM to emulate both F200 and SR300 simultaneously.

Table 1 summarizes the resulting state of the compatibility mode when multiple Intel RealSense applications are running on the same system featuring an SR300 camera (application 1 is started before application 2 on the system):

Table 1: Intel RealSense camera F200 Compatibility Mode State Summary with Multiple Applications Running

Application 1	Application 2	F200 Compatibility Mode State	Comments
Pre-R5 Compilation	Pre-R5 Compilation	ACTIVE	App1 is run first, DCM sees pre-R5 app and enables F200 compatibility mode.
Pre-R5 Compilation	R5 or later Compilation	ACTIVE	App1 is run first, DCM sees pre-R5 app and enables F200 compatibility mode.
R5 or later Compilation	Pre-R5 Compilation	NOT ACTIVE	App1 is run first, DCM sees SR300 native app and disables F200 compatibility mode. App2 will not see a valid camera and will not run.
R5 or later Compilation	R5 or later Compilation	NOT ACTIVE	App1 is run first, DCM sees R5 or later app and disables F200 compatibility mode. Both apps will use native SR300 requests.

Developing Device-Independent Applications

To accommodate the arrival of the Intel RealSense camera SR300, many of the 2015 R5 Intel RealSense SDK components have been modified to maintain compatibility and to maximize the efficiency of the SR300’s capabilities. In most cases, developers should strive to develop camera-agnostic applications that will run on any front-facing camera to ensure maximum portability across various platforms. The SDK modules and stream interfaces provide the capability to handle all of the platform differentiation if used properly. However, if developing an application that uses unique features of either the F200 or SR300, the code must identify the camera model and handle cases where the camera is not capable of those functions. This section outlines the key details to keep in mind when developing front- facing Intel RealSense applications to ensure maximum compatibility.

SDK Interface Compatibility

To maintain maximum compatibility between the F200 and SR300 cameras, use the built-in algorithm modules (face, 3DS, BGS, and so on) and the SenseManager interface to read raw streams without specifying any stream resolutions or pixel formats. This approach allows the SDK to handle the conversion automatically and minimize necessary code changes. Keep in mind that the maturity levels of the algorithms designed for SR300 may be less than those designed for F200 given that the SR300 was not supported until the 2015 R5 release. Be sure to read the SDK release notes thoroughly to understand the maturity of the various algorithms needed for your application.

In summary, the following best practices are recommended to specify a stream and read image data:

Avoid enabling streams using specific configuration (resolution, frame rate):
sm->EnableStream(PXCCapture::STREAM_TYPE_COLOR, 640, 480, 60);
Instead, let the SenseManager select the appropriate configuration based on the available camera model:
sm->EnableStream(PXCCapture::STREAM_TYPE_COLOR);
Use the Image functions (such as AcquireAccess and ExportData) to force pixel format conversion.
PXCImage::ImageData data;
image->AcquireAccess(PXCImage::ACCESS_READ, PXCImage::PIXEL_FORMAT_RGB32,&data);
If a native pixel format is desired, be sure to handle all cases so that the code will work independent of the camera model (see SoftwareBitmapToWriteableBitmap sample in the appendix of this document).
When accessing the camera device properties, use the device-neutral device properties as listed in Device Neutral Device Properties.

Intel RealSense SDK Incompatibilities

As of the R5 2015 Intel RealSense SDK release, there remain several APIs that will exhibit some incompatibilities between the F200 and SR300 cameras. Follow the mitigation steps outlined in Table 2 to write camera-independent code that works for any camera:

Table 2: Mitigation Steps for Front-Facing Camera Incompatibilities

Feature	Compatibility Issue	Recommendations
Camera name	Friendly name and device model ID differ between F200 and SR300	- Do not use friendly name string as a unique ID. Only use to display device name in text to the user. - Use the device model name to perform camera-specific operations. Use front-facing or rear-facing orientation value from `DeviceInfo` if sufficient.
SNR	IR sensor in SR300 has much higher SNR and native 10-bit data type (up from 8-bit on F200). As a result the `IR_RELATIVE` pixel format is no longer exposed	- Use `AcquireAccess` function to force a pixel format of `Y16` when accessing SR300 IR stream data.
Depth Stream Scaling Factor	Native depth stream data representation has changed from 1/32 mm in F200 to 1/8 mm in SR300. If accessing native depth data with pixel format `DEPTH_RAW`, a proper scaling factor must be used. (Does not affect apps using pixel format `DEPTH`)	- Retrieve the proper scaling factor using the `QueryDepthUnit` or force a pixel format conversion from `DEPTH_RAW` to `DEPTH` using the `AcquireAccess` function.
Device Properties	Several of the device properties outlined in the F200 & SR300 Member Functions document have differences between the two cameras that should be noted: - Filter option definition table has differences based on the different range capabilities between the two cameras - SR300 only supports the `FINEST` option for the `SetIVCAMAccurary` function	- Avoid using camera-specific properties to avoid camera-level feature changes. Use the Intel RealSense SDK algorithm modules to have the SDK automatically set the best settings for the given algorithm.

Conclusion

This paper outlined several best-known practices to ensure high compatibility across multiple Intel RealSense camera models. The R5 2015 SDK release for Windows features built-in functions to mitigate compatibility. It is generally good practice to design applications to use only common features across all cameras to facilitate development time and ensure portability. If an application uses features unique to a particular camera, be sure to verify the system configuration both at install time and during runtime initialization. In order to facilitate migration of applications developed for the F200 camera to SR300 cameras, the SR300 DCM includes an F200 compatibility mode that will allow legacy applications to run seamlessly on the later-model camera. However, be aware that not updating legacy apps (pre-R5) may result in failure to run on SR300 systems running other R5 or later applications simultaneously. Finally, it is important to read all supporting SDK documentation thoroughly to understand the varying behavior of certain SDK functions with different camera models.

Resources

Intel RealSense SDK Documentation

https://software.intel.com/sites/landingpage/realsense/camera-sdk/v1.1/documentation/html/index.html?doc_devguide_introduction.html

SR300 Migration Guide

https://software.intel.com/sites/landingpage/realsense/camera-sdk/v1.1/documentation/html/index.html?doc_mgsr300_working_with_sr300.html

Appendix

`SoftwareBitmapToWriteableBitmap` Code Sample

// SoftwareBitmap is the UWP data type for images.
public SoftwareBitmapToWriteableBitmap(SoftwareBitmap bitmap,
WriteableBitmap bitmap2)
{
switch (bitmap.BitmapPixelFormat)
{
default:
using (var converted = SoftwareBitmap.Convert(bitmap,
BitmapPixelFormat.Rgba8))
converted.CopyToBuffer(bitmap2.PixelBuffer);
break;
case BitmapPixelFormat.Bgra8:
bitmap.CopyToBuffer(bitmap2.PixelBuffer);
break;
case BitmapPixelFormat.Gray16:
{
// See the UWP StreamViewer sample for all the code.
....
break;
}
}
}

About the Author

Tion Thomas is a software engineer in the Developer Relations Division at Intel. He helps deliver leading-edge user experiences with optimal performance and power for all types of consumer applications with a focus on perceptual computing. Tion has a passion for delivering positive user experiences with technology. He also enjoys studying gaming and immersive technologies.

Achieve Real-Time 4K HEVC Encode, Ensure AVC & MPEG-2 Decode Robustness

Intel® Media Server Studio 2016 is now available! With a 1.1x performance and 10% quality improvement in its HEVC encoder, Intel® Media Server Studio helps transcoding solution providers achieve real-time 4K HEVC encode with broadcast quality on Intel® Xeon® E3-based Intel® Visual Compute Accelerator and select Xeon® E5 processors.¹ Robustness enhancements give extra confidence for AVC and MPEG-2 decode scenarios through handling of broken content seamlessly. See below for more details about new features to accelerate media transcoding.

As a leader in media processing acceleration and cloud-based technologies, - thanks to the power of Intel® processors and Intel® Media Server Studio, - Intel helps media solution providers, broadcasting companies, and media/infrastructure developers innovate and deliver advanced performance, efficiency and quality for media applications, and OTT/live video broadcasting.

Download Media Server Studio 2016 Now

Current Users (login required) New Users: Get Free Community version, Pro Trial or Buy Now

Improve HEVC (H.265) Performance & Quality by 10%, Use Advanced GPU Analysis, Reduce Bandwidth

Professional Edition

With 1.1x performance and 10% quality increase (compared to the previous release), media solution developers can achieve real-time 4K HEVC encode with broadcast quality on select Intel Xeon E5 platforms¹ using Intel HEVC software solution and Intel® Visual Compute Accelerator (Intel® VCA)¹ by leveraging a GPU-accelerated HEVC encoder.
Improve HEVC GPU-accelerated performance by offloading the in-loop filters like deblocking filter (DBF) and sample adaptive offset (SAO) workload to the GPU (in prior releases these filters executed on the CPU(s)).

Figure 1. The 2016 edition continues the rapid cadence of innovation with up to 10% improved video coding efficiency over the 2015 R7 version. In addition to delivering real-time 4K30 encode on select Intel® Xeon® E5 processors, this edition now provides real-time 1080p50 encode on previous generation Intel® Core™ i7 and Xeon E3 platforms.** HEVC Software/GPU Accelerated Encode Quality vs. Performance on 4:2:0, 8-bit 1080p content. Quality data is baseline to ISO HM14 (“0 %”) and computed using Y-PSNR BDRATE curves. Performance is an average across 4 bitrates ranging from low bitrate (avg 3.8Mbps) to high bitrate (avg 25 Mbps). For more information, please refer to Deliver High Quality and Performance HEVC whitepaper.

With Intel® VTune™ Amplifier advancements, developers can more easily get and interpret graphics processor usage and performance of OpenCL* and Intel® Media SDK-optimized applications. Includes CPU and GPU concurrency analysis, GPU usage analysis using hardware metrics, GPU architecture diagram, and much more.
Reduce bandwidth when using HEVC codec by running Region of Interest (ROI) based encoding, where ROI can be least compressed to preserve details as compared to other surroundings. This feature improves Video Conferencing applications. This can be achieved by setting mfxExtEncoderROI structure in the application to specify different ROIs during encoding, and can be used at initialization or at runtime.
Video Conferencing - Connect business meetings and people together more quickly via video conferencing with specially tuned low-delay HEVC mode.
Innovate for 8K - Don't limit your application for encoding steams of 4K resolution, Intel's HEVC codec in Media Server Studio 2016 now supports 8K, both software and GPU-accelerated encoder

Advance AVC (H.264) & MPEG-2 Decode & Transcode

Community, Essentials, Pro Editions

Advanced 5th generation graphics and media accelerators, plus custom drivers unlock transcoding for up to 16 HD AVC streams real-time high quality per socket on Intel Xeon E3 v4 processors (or via Intel VCA)¹by taking advantage of hardware acceleration.
Achieve up to 12 HD AVC streams on Intel® Core™ 4th generation processors with Intel® Iris™ graphics**.
Utilize improved AVC encode quality for BRefType MFX_B_REF_PYRAMID.
AVC and MPEG2 decoder is more robust than ever in handling corrupted streams and returning failure errors. Get extra confidence for AVC and MPEG-2 decode scenarios with increased robustness and recovery to corrupted output, and seamless handling of broken content. Advanced error reporting allows developers to better find and analyze decode errors.

Figure 2: In the 2016 version 40% performance gains are achieved in H.264 scenarios from improved hardware scheduling algorithms compared to the 2015 version.** This figure illustrates results of multiple H.264 encodes from a single H.264 source file accelerated using Intel® Quick Sync Video using sample multi_transcode (avail. in code samples). Each point is an average of 4 streams and 6 bitrates with error bars showing performance variation across streams and bitrates. Target Usage 7 (“TU7”) is the highest speed (and lowest quality) operating point. [1080p 50 content was obtained from media.xiph.org/video/derf/: crowd_run, park_joy (30mbps input; 5, 7.1, 10.2, 14.6, 20.9, 30 mbps output; in_to_tree, old_town_cross 15 mbps input, 2.5, 3.5, 5.1, 7.3, 10.4, 15 mbps output]. Configuration: AVC1→N Multi-Bitrate concurrent transcodes, 1080p, TU7 preset, Intel® Core™ i7-4770K CPU @ 3.50GHz ** Number of 1080p Multi-bitrate channels.

Other New and Improved Features

Improvements in Intel® SDK for OpenCL™ Applications for Windows includes new features for kernel development.
Added support for CTB-level delta QP for all quality presets i.e. Target Usage 1 through 7 for all rate control modes (CBR, VBR, AVBR, ConstQP) and all Profiles (MAIN, MAIN10, REXT).
Support for encoding IPPP..P stream i.e. no B frames by using Generalized P and B control for the applications where B frames are dropped to meet bandwidth limitations
H.264 encode natively consumes ARGB surfaces (captured from screen/game) and YUY2 surfaces, which reduces preprocessing overhead (i.e. color conversion from RGB4 to NV12 for the Intel® Media SDK to process), and increases screen capture performance.

Save Time by Using Updated Code Samples

Major features are added to sample_multi_transcode by extending the pipeline to multiple VPP filters like composition, denoise, detail (edge detection), frame rate control (FRC), deinterlace, color space conversion(CSC).
Sample_decode in the Linux sample package has drm based rendering, which can use input argument "-rdrm". Now, sample_decode and sample_decvpp are merged in the decode sample with new VPP filters like deinterlace and color space conversion added.

For More Information

The above notes are just the top level features and enhancements in Media Server Studio 2016. Access the product site and review the various edition Release Notes for more details.

Essential/Community Edition Release Notes Windows Linux
Professional Edition Release Notes: Windows Linux

1 See Technical Specifications for more details.

_{**Baseline configuration: Intel® Media Server Studio 2016 Essentials vs. 2015 R7, R4 running on Microsoft Windows* 2012 R2. Intel Customer Reference Platform with Intel® Core-i7 4770k (84W, 4C,3.5GHz, Intel® HD Graphics 4600). Intel Z87KL Desktop board with Intel Z87LPC, 16 GB (4x4GB DDR3-1600MHz UDIMM), 1.0TB 7200 SATA HDD, Turbo Boost Enabled, and HT Enabled. Source: Intel internal measurements as of January 2016.}

Download Code Sample

Introduction

The downloadable code sample demonstrates the basics of acquiring raw camera streams from Intel® RealSense™ cameras (R200 and F200) in the MATLAB® workspace using the Intel® RealSense™ SDK and MATLAB’s Image Acquisition Toolbox™ Adaptor Kit. This code sample creates possibilities for MATLAB developers to develop Intel® RealSense™ applications for Intel® platforms and has the following features:

Multi-stream synchronization. Color stream and depth stream can be acquired simultaneously (see Figure 1).
Multi-camera support. Raw streams can be acquired from multiple cameras simultaneously.
User adjustable properties. This adaptor supports video input with different camera-specific properties.

**Figure 1.** Raw Intel® RealSense™ camera (F200) color and depth streams in the MATLAB* figure.

Software Development Environment

The code sample was created on Windows 8* using Microsoft Visual Studio* 2013. The MATLAB version used in this project was MATLAB R2015a.

The SDK and Depth Camera Manager (DCM) version used in this project were:

Intel RealSense SDK V7.0.23.8048
Intel RealSense Depth Camera Manager F200 V1.4.27.41944
Intel RealSense Depth Camera Manager R200 V2.0.3.53109

Hardware Overview

We used the Intel® RealSense™ Developer Kit (F200) and Intel RealSense Developer Kit (R200).

About the Code

This code sample can be built into a dynamic link library (DLL) that implements the connection between the MATLAB Image Acquisition Toolbox™ and Intel RealSense cameras via the Intel RealSense SDK. Figure 2 shows the relationship of this adaptor to the MATLAB and Intel RealSense cameras. The Image Acquisition Toolbox™ is a standard interface provided by MATLAB to acquire images and video from imaging devices.

**Figure 2.** The relationship of the adaptor to the MATLAB* and Intel® RealSense™ cameras.

The MATLAB installation path I used was C:\MATLAB and the SDK installation path was C:\Program Files (x86)\Intel\RSSDK. Note that the include directories and library directories will need to be changed if your SDK and MATLAB installation paths are different. You will also need to set an environment variable MATLAB in system variables that contains the name of your MATLAB installation folder.

The file location I use to put the entire code sample RealSenseImaq was C:\My_Adaptor\RealSenseImaq. The RealSenseImaq solution can also be found under this directory. This RealSenseImaq solution actually consists of two projects:

The imaqadaptorkit is an adaptor kit project provided by MATLAB to make it easier to refer to some adaptor kit files in MATLAB. The file location of this project is: <your_matlab_installation_directory>\R2015a\toolbox\imaq\imaqadaptors\kit
The RealSenseImaq is an adaptor project that acquires the raw camera streams. The color and depth data from multiple cameras can be acquired simultaneously. It also contains functions to support video input with different camera-specific properties.

How to Run the Code

To build the DLL from this code sample:

First run Microsoft Visual Studio as administrator and open the RealSenseImaq solution. You must ensure that “x64” is specified under the platform setting in the project properties.
To build this code sample, right-click the project name RealSenseImaq in the solution explorer, then select it as the startup project from the menu option and build it.
For users who are MATLAB developers and not interested in the source code, pre-build DLL can be found in the C:\My_Adaptor\RealSenseImaq\x64\Debug\ folder. Note that the DLL directory will need to be changed if you put the code sample in a different location.

To register the DLL in the MATLAB:

You must inform the Image Acquisition Toolbox™ software of DLL’s existence by registering it with the Imaqregister function. The DLL can be registered by using the following MATLAB code:

Imaqregister (‘<your_directory>\RealSenseImaq.dll’);

Start MATLAB and call the imaqhwinfo function. You should be able to see the RealSenseImaq adaptor included in the adaptors listed in the InstalledAdaptors field.

To run the DLL in the MATLAB:

Three MATLAB scripts that I created have been put under the code sample directory C:\My_Adaptor\RealSenseImaq\matlab.

To start to run the DLL in MATLAB, use the scripts as follows:

MATLAB script “test1” can be used to acquire raw F200 color streams in MATLAB.
Raw color and depth streams from the Intel RealSense camera (F200) can be acquired simultaneously by using the MATLAB script “test2” (see Figure 1).
You can also use this adaptor to adjust the camera-specific property and retrieve the current value of the property. For example, the MATLAB script “test3” in the code sample file can be used to retrieve the current value of color brightness and adjust its value.

Check It Out

Follow the download link to get the code.

About Intel® RealSense™ Technology

To get started and learn more about the Intel RealSense SDK for Windows, go to https://software.intel.com/en-us/intel-realsense-sdk.

About MATLAB®

MATLAB is the high-level language and interactive environment to let you explore and visualize ideas and collaborate across disciplines. To learn more about the MATLAB, go to http://www.mathworks.com/products/matlab/.

About the Author

Jing Huang is a software application engineer in the Developer Relations Division at Intel. She is currently focused on performance of applications of the Intel Real Sense SDK on Intel platforms but has an extensive background in video and image processing and computer vision, mostly applied to medical imaging applications and multi-camera applications such as video tracking and video classification.

Online webinar: March 19, 9 a.m. (Pacific time)

Register NOW

Media application developers unite! Accessing the heterogeneous capabilities of Intel® Core™ and Intel® Xeon® processors¹ unlocks amazing opportunities for faster performance utilizing some of the most disruptive and rapidly improving aspects of Intel processor design.

Ensure that your media applications and solutions aren't leaving performance options untapped. Learn tips and tricks for adding hardware acceleration to your media code with advanced Intel media software tools in this webinar:

Get Amazing Intel GPU Acceleration for Media Pipelines
March 30, 9 a.m. (Pacific time) - Sign up today

And what’s even better than that? Many of these options and tools are FREE - and already integrated into popular open source frameworks like FFmpeg and OpenCV (more details are below).

Intel’s amazing GPU capabilities are easy-to-use, with an awesome set of tools to help you capture the best performance, quality and efficiency from your media workloads.This overview includes:

Intel GPU capabilities and architecture
Details on Intel's hardware accelerated codecs
How to get started with rapid application development using FFmpeg and OpenCV (it can be easy!)
How to get even better performance by programming directly to Intel® Media SDK and Intel® SDK for OpenCL™ Applications
H.264 (AVC) and H.265 (HEVC) capabilities
Brief tools introduction, and more!

Register Now

Figure 1. CPU/GPU Evolution

Figure 1 shows how Intel's graphics processor (GPU) has had increasing importance and placement with each generation of Intel Core processor. With potential video performance measured by the number of execution units (EUs) you can see how quickly the core processor has moved from only 12 to now 72 EUs.

Advanced Media Software Tools & Free Downloads

Intel Media SDK (for client devices)
Intel® SDK for OpenCL™ Applications
Intel Media Server Studio - free downloads include the Community Edition, or a 30-day trial of the Professional Edition

Webinar Speakers

Future Webinars, Connect with Intel at Upcoming Events

More webinar topics are planned later this year; watch our site for updates. See Intel media acceleration tools and technologies in action, meet with Intel technical experts at:

China Content Broadcasters Network (CCBN)- Beijing, March 24 to 26 -Intel booth location: Exhibition Hall #2, #2303. Meet Media Server Studio/Media SDK experts, to meet with sales contact Fred Fa n.
敬请莅临3月24-26日CCBN展会2号展厅2303展台体验英特尔高级媒体加速器。了解更多 ›
NAB (National Association of Broadcasters) Show - Las Vegas, April 18-21, Intel booth location is in the South Upper Hall, #SU621. Get a free exhibits-only pass - use code:LV6579. See exciting demos of the latest media technologies, solutions and software. Meet with Intel media experts, for a special solution or sales meeting contact Fred Fa n.

¹See hardware requirements for technical specifications.

New Release!

Contents

Introduction

Project structure

Getting Started

Requirements

RSNewImageArg.CS

RSStreaming.CS

public event EventHandler<RSNewImageArg> OnStreamingImage;

public bool Initialized

public bool IsStreaming

public void StartStreaming()

public void StopStreaming ( )

private void InitCamera ( )

private pxcmStatus OnNewSample( )

private void Dispose ( )

MainForm.CS

public MainForm( )

private void btnStream_Click( )

private void btnStopStream_Click( )

private void UpdateColorImageBox( object source, RSNewImageArg e )

private void pictureBox_Paint( object sender, PaintEventArgs e )

private void btnExit_Click( )

private void bMainForm_FormClosing( )

Conclusion

About the Author

Contest Deadline Spurs Rapid Advances

Advice from an Expert App Designer

Minimizing Power Consumption

The Right Tools

Avoid Technical Debt—No Sloppy Code!

A Revolution is Coming

Resources

Abstract

About the Roy Arm

Intel® RealSense™ SDK Hand Tracking APIs

Control Electronics

Servo Controller Settings

Custom Control Software

Alert Data

Foldedness Data

Scaled Data

Check Out the Video

Summary

About Intel® RealSense™ Technology

About the Author

Optimizations and Challenges

Face Tracking

Nose and Eyes

Optimizing the UI

Calibration

Lateral Movement

Vertical Movement

Gameplay Optimizations

Testing and Analysis

Key Lessons

Tools and Resources

Unity

MonoDevelop*

Nine Cubes

Unity Technologies Asset Store

What’s Next for Mystery Mansion

Intel RealSense SDK: Looking Forward

About the Developer

Additional Resources

1. Introduction

2. Autonomous Navigation

3. Implementing Autonomous Navigation

Locomotion

Steering

Seek

Arrive

Obstacle Avoidance

Action Selection

4. Integrating the R200

Scene Manager

Building the Application

5. Going Further with Autonomous Navigation

6. Conclusion

About the Authors