Quantcast
Channel: Intel Developer Zone Articles
Viewing all 533 articles
Browse latest View live

Tutorial for Intel® DAAL : Using simple C++ examples

$
0
0

System Environment

Intel® DAAL version : 2016 Gold Initial Release (w_daal_2016.0.110.exe)

OS : Windows 8.1

IDE : Visual Studio 2013

 

Overview

 Intel® DAAL is a part of Intel® Parallel Studio XE 2016, a developer toolkit for HPC and technical computing applications. Intel® DAAL is a powerful library for big data developers that turns large data clusters into meaningful information with advanced analytics algorithms.

 In this tutorial, we will see how to build and run Intel® DAAL C++ examples included in the package.
 

Finding the examples

 By default, the examples are located at

c:\Program Files (x86)\intelSWTools\compilers_and_libraries\windows\daal\examples

in 'examples' directory, there are example projects in C++ and Java language respectively. The example data according to three different models, batch, distributed and online, for the projects is located in 'data' directory.

 It is recommended to copy the examples to somewhere else where you don't need the administrator access in order to make it easy to modify, build and run.

 

Building the examples

  First, open the 'DAALExamples.sln' solution file located in

 <DAAL Examples>\cpp

Select all projects and open up 'Properties' -> Clink 'Configuration Manager' at the upper right corner -> Change active solution platrofm according to your platform. (e.g. Win32 -> x64 )

As the next step, we need to enable DAAL for the projects. Propertise -> Intel Performance Libraries -> Use Intel DAAL -> Choose 'Default Linking Method'

Click the build menu -> Build Solution ( or Ctrl + Shitf + B )

 

Running the examples

 There is a couple more steps to take before running DAAL applications.

 First, to setup DAAL environment variables, run 'daalvars.bat <arch>'

 Second,set redist runtime dlls ->

PATH=C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2016.0.110\windows\redist\intel64_win\daal;
C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2016.0.110\windows\redist\intel64_win\compiler;
C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2016.0.110\windows\redist\intel64_win\tbb\vc_mt;%PATH%

 

 Now go to <examples>\cpp and run one of the examples


Tutorial for Intel® DAAL : Using simple Java examples

$
0
0

System Environment

Intel® DAAL version : 2016 Gold Initial Release (w_daal_2016.0.110.exe)

OS : Windows 8.1

 

 

Overview

 Intel® DAAL is a part of Intel® Parallel Studio XE 2016, a developer toolkit for HPC and technical computing applications. Intel® DAAL is a powerful library for big data developers that turns large data clusters into meaningful information with advanced analytics algorithms.

 In this tutorial, we will see how to build and run Intel® DAAL Java examples included in the package.

for C++ examples please refer -> Tutorial for Intel® DAAL : Using simple C++ examples

Finding the examples

 By default, the examples are located at

c:\Program Files (x86)\intelSWTools\compilers_and_libraries\windows\daal\examples

in 'examples' directory, there are example projects in C++ and Java language respectively. The example data according to three different models, batch, distributed and online, for the projects is located in 'data' directory.

 It is recommended to copy the examples to somewhere else where you don't need the administrator access in order to make it easy to modify, build and run.

 

Building the examples

  First, to setup DAAL environment variables, run 'daalvars.bat <arch>'

  Second, we need to setup path for the location of 'javac.exe'. Type

set JAVA_HOME=<Your JDK location >

set PATH=%JAVA_HOME%\bin;%PATH%

 Now (go to <examples>\java ) type

launcher.bat <arch> build

 

Then .class files will be generated at

<examples>\java\com\intel\daal\examples\association\rules

 

Running the examples

 

 To run the examples, set redist runtime dlls ->

PATH=C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2016.0.110\windows\redist\intel64_win\daal;
C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2016.0.110\windows\redist\intel64_win\compiler;
C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2016.0.110\windows\redist\intel64_win\tbb\vc_mt;%PATH%

 Now go to <examples>\java and run one of the examples by typing

launcher.bat <arch> run

The results from the examples will be saved at

<examples>\java\_results

each example will generate a result file respectively, and you can see the result by opening .res file.

e.g.)

 

 

Intel System Studio Matrix Multiplication Sample

$
0
0

This is a "matrix multiplication" example that illustrates different features of Intel® System Studio on Microsoft* Visual Studio* IDE, Eclipse* IDE and on Yacto* Target System

By Downloading or copying all or any part of the sample source code, you agree to the terms of the Intel® Sample Source Code License Agreement

Windows* System : system_studio_sample_matrix_multiply.zip(829 KB)

Linux* System : system_studio_sample_matrix_multiply.tar.gz (1380 KB)

These package has got four samples to demonstrate usage of Intel® C++ Compiler,  Intel® VTune™ Amplifier for Systems, Intel® Cilk™ Plus and Intel® MKL. 

  • Using Intel® C++ Compiler for Systems to get better performance
  • Using Intel® VTune™ Amplifier for Systems to identify performance bottleneck
  • Using Intel® Cilk™ Plus to parallelize the application
  • Using optimized functions from Intel® Math Kernel Library

Intel® System Studio Samples and Tutorials

$
0
0

Intel® System Studio is a comprehensive and integrated tool suite that provides developers with advanced system tools and technologies to help accelerate the delivery of the next-generation, energy-efficient, high-performance, and reliable embedded and mobile devices.

We have created a list of samples demonstrating different features of Intel System Studio, Also tutorials will show usage of features in your applications.

By Downloading or copying all or any part of the sample source code, you agree to the terms of the Intel® Sample Source Code License Agreement

Samples

Sample Code Name

Description

Hello World

This is a simple "Hello World" example that illustrates how to set up the environment to build embedded application with Intel Compiler (ICC) for Windows*, Linux* Host and Yocto* Linux* Target , in various usage models like command line, IDEs.

Matrix Multiplication

This is a "matrix multiplication" example that illustrates different features of Intel® System Studio like Intel® C/C++ Compiler, Intel® MKL, Intel® VTune Amplifier and Intel® Cilk Plus.

System Trace – a sample trace

 A sample trace file (sampleTrace.tracecpt) is included in this Intel System Debugger NDA package. This sample trace file is collected from the real Intel® Skylake machine and including multiple traces packets type like BIOS, CSME, TSCU and global error packets example. Before you get started to use System Trace Tool(eclipse plugin) to debug your system issues, this sample trace can help you to get more familiar with all UI operations, functionalities provided by System Trace Tool, like searching key words, opening a new field and exporting partial logs etc.

Processor Trace Sample 

Intel® Processor Trace is the hardware based low overhead code execution logging on instruction level and provides a powerful and deep insight into past instruction flow combined with interactive debug.

Image Blurring and Rotation

This tutorial demonstrates how to:

  •  Implement box blurring of an image with the Intel IPP filtering functions
  •  Rotate an image with the Intel IPP functions for affine warping
  •  Set up environment to build the Intel IPP application
  •  Compile and link your image processing application

 

Averaging Filter(Image Processing)

An Averaging filter is a commonly used filter in the field of image processing and is mainly used for removing any noise in a given image. This sample has demonstrates how to increase the performance of Averaging filter using Intel® Cilk™ Plus. Both threading and SIMD solutions are explored in the performance tuning and their corresponding contributions in the speedup are evaluated.

Discrete Cosine Transforms(DCT)

Discrete Cosine Transform (DCT) and Quantization are the first two steps in JPEG compression standard. This sample demonstrates how DCT and Quantizing stages can be implemented to run faster using Intel® Cilk™ Plus. 

Image Processing: Sepia Filter

A Sepia tone image is monochromatic image with a distinctive Brown Gray color that provides a distinctive tone to a photograph when Black & White film was available. The program works by converting each pixel in the bitmap file to a Sepia tone. This sample demonstrates how to improve the performance of Sepia filter using Intel® Cilk™ Plus. To demonstrate the performance increase, you will use a program that converts a bitmap file from color image to a Sepia tone image. 

Tutorials

Title/Link to Tutorial demo

Description 

Using Intel® C++ Compiler for Embedded Linux Systems

The Intel® C++ Compiler, also known as icc, is a high performance compiler which lets you build and optimize your C/C++ applications for the Linux* based operating system. Embedded system development is a cross-platform development in most cases. The applications development normally needs cross-compilation which requires a host compilation system and a target embedded system. The Intel® C++ compiler fully supports cross-platform compilation as well. 

Intel® VTune™ Amplifier for Systems Usage Models

Intel® VTune™ Amplifier for Systems is a software performance analysis tool for users developing serial and multithreaded applications on Embedded and Mobile system. VTune Amplifier supports multiple usage modes for various target systems depending on your development environment and target environment. In this article, we will describe the Vtune Amplifier usage modes and the recommended modes for different target systems.

Signal Processing Usage for Intel® System Studio – Intel® MKL vs. Intel® IPP

Employing performance libraries can be a great way to streamline and unify the computational execution flow for data intensive tasks, thus minimizing the risk of data stream timing issues and heisenbugs. Here we will describe the two libraries that can be used for signal processing within Intel® System Studio.

Debugging Intel® Quark SoC based target platform uinsg OpenPCD*

This tutorial will help you to understand setting up OpenOCD* based connector to Intel Quark based target systems and usage of Intel System Studio for debugging system software.

 

Intel® System Studio 2016 - What's New

$
0
0

What's New in Intel® System Studio 2016

  • Support for new platforms based on Airmont, Intel® Quark™, Edison and SoFIA by various Components.
  • Intel® C++ Compiler:
    • Enhanced C++11 feature support
    • Enhanced C++14 feature support
    • FreeBSD* support
  • Intel® VTune Amplifier for Systems:
    • Basic Hotspots, Locks and Waits and hardware event-based stack sampling collection supported for RT kernel and RT applications for Linux* targets
    • Hardware event-based stack sampling collection supported for kernel-mode threads
    • Support for Intel® Atom™ x7 Z8700 & x5 Z8500/X8400 processor series (Cherry Trail) including GPU analysis
    • KVM guest OS profiling based on the Linux* Perf tool
    • Analysis of applications in a virtualization environment (KVM) for Linux* kernels  (version 3.2 and higher) and QEMU (version 1.4 and higher)
    • Remote event-based sampling analysis on SoFIA  leveraging an existing sampling driver on the target
  • Intel® Threading Building Blocks (Intel® TBB):
    • Several C++11 improvements
    • Added 64-bit Android* support
  • Intel® Integrated Performance Primitives (Intel® IPP):
    • Extended optimization for Intel® Atom™ processors in the Computer Vision and Image Processing functions
    • Added optimization for Intel® Quark™ processors to the Cryptography functions
  • Intel® Math Kernel Library (Intel® MKL):
    • New ?GEMM_BATCH and (C/Z)GEMM3M_BATCH functions for performing multiple independent matrix-matrix multiply operations
    • New C-language version of the Intel® MKL reference manual
  • Intel® System Debugger:
    • Support for new platforms based on Airmont microarchitecture: Moorefield (Z35XX), Cherrytrail (Z8700), Braswell (N3700)
    • New supported targets: 6th Generation Intel® Core™ Processor Family, Intel® 100 Series Chipset.
  • For 6th Generation Intel® Core™ Processor Family :
    • Intel® Debug Extensions for WinDbg* with Intel® Processor Trace support and JTAG debug support
    • System Trace support for Intel® Trace Hub
    • Intel® Debugger for Heterogeneous Compute
    • The debugger supports 64-bit host OS systems only and requires a 64-bit Java* Runtime Environment (JRE) to operate. See System Debugger release notes for more details.
  • The installation directories structure has changed. Several components link to common directories which are shared with other Intel® Software Development Products. 

Get Help or Advice

Getting Started?
Click the Learn tab for guides and links that will quickly get you started.
Support Articles and White Papers – Solutions, Tips and Tricks

Resources
Documentation
Training Material

Support

We are looking forward to your questions and feedback. Please don't hesitate to escalate any questions you have or issues you run into. We thank you for helping us to continuously improve Intel® System Studio

Intel® Premier Support – (registration is required) - For secure, web-based, engineer-to-engineer support, visit our Intel® Premier Support web site. Intel Premier Support registration is required. Once logged in search for the product name Intel® System Studio.

Please provide feedback at any time:

Intel RealSense SDK - Primo Contatto

$
0
0

In questo articolo ci avvicineremo al SDK di Intel® RealSense™ cercando di capire di cosa si tratta, quali sono le caratteristiche e in cosa può esserci utile.
L'articolo si riferisce alla versione 6.0.x del SDK di Intel® RealSense™ attualmente scaricabile all'indirizzo https://software.intel.com/en-us/intel-realsense-sdk/download

Le Camere 3D

Per chi fosse completamente digiuno di Intel® RealSense™, si tratta, in poche parole, di una piattaforma, hardware e software che permette la realizzazione di applicazioni immersive di nuovissima generazione che possono sfruttare il concetto di Natural User Interface (gesture, voice, posture, etc., etc.) per fornire all'utente una UX evoluta.
Parliamo di hardware perchè il fulcro di tutta la piattaforma sono due camere 3D, rispettivamente i modelli Intel® RealSense™ Camera F200 e R200.
La F200, mostrata nella foto seguente, è una Front Facing Camera, cioè una camera che può essere utilizzata direttamente dall'utente poichè è posizionata frontalmente allo stesso (da qui il prefisso F del nome).

La R200, invece, è una World Facing Camera o Rear Camera (da qui la lettera R all'inizio del nome) cioè una camera che viene posizionata nella parte posteriore dei device (tipicamente dei tablet) in grado di inquadrare, quindi, il mondo circostante.

La F200 sfrutta una tecnologia basata sull'emissione di infrarossi per ottenere una immagine 3D dell'ambiente, mentre la R200 utilizza la tecnica della visione stereoscopica (dispone, cioè di due normali camere) e ricostruisce l'ambiente circostante sfruttando le differenze ottenute dalle due inquadrature non in asse.
Il SDK è in grado di lavorare con entrambe anche se le due camere hanno differenti caratteristiche, forniscono differenti funzionalità  e servono per scenari differenti.
Maggiori informazioni sulle camere (e possibilità  di acquistarle) all'indirizzo https://software.intel.com/en-us/intel-realsense-sdk/download.
Esistono già  prodotti in vendita che ne montano di serie delle versioni miniaturizzate, un elenco completo di questi può essere visionato a questo indirizzo http://www.intel.com/content/www/us/en/architecture-and-technology/realsense-devices.html
Per completare, la F200 permette di abilitare Windows Hello sul nuovo sistema operativo Windows 10.

Installare l'SDK

Utilizzando l'URL riportato precedentemente possiamo scaricare il package contenente l'installazione del SDK oppure procedere anche utilizzando un web installer.
Vi consiglio di scaricare il corposo pacchetto in modo da poterlo utilizzare più volte senza impegnare ogni volta la vostra connessione internet per riscaricarlo.
Il package off-line occupa più di un Giga di spazio su disco (circa 1.3 Gb) ma all'interno trova spazio veramente tanto materiale, quindi l'occupazione di tutto questo spazio è giustificata.
Una volta eseguito il package, l'installazione provvede ad estrarre tutto il necessario prima di visualizzare la schermata che ci consente di selezionare ciò che ci interessa installare.


Il package di installazione ci permette di andare manualmente a selezionare le funzionalità che ci interessano o ci fornisce dei profili per lo sviluppo su una delle due camere in particolare.

Selezionato ciò che ci interessa, possiamo procedere all'installazione vera e propria dell'SDK e di tutte le sue funzionalità.
Una volta terminata l'installazione, abbiamo a disposizione una cartella sul desktop in cui possiamo trovare:

  • Una cartella Documentation con pdf, chm e html che ci aiutano nello sviluppo;
  • Una cartella Samples in cui troviamo esempi di applicazioni (sia come eseguibili che come sorgenti) nei linguaggi e nelle piattaforme di sviluppo supportate da Intel® RealSense™ SDK;
  • Una cartella Tools in cui troviamo degli eseguibili che ci permettono di capire immediatamente se l'eventuale camera attaccata al nostro pc sta lavorando correttamente o meno.

Tra questi menzionerei Camera Explorer che permette di verificare lo stream video e lo stream di profondità della camera:

e SDK Information Viewer che ci permette di verificare informazioni sulla versione del SDK installata, sulle caratteristiche del sistema e della telecamere e tanto altro

E' necessario ricordarsi di scaricare ed installare anche il Depth Camera Manager (DCM) per la camera che si sta utilizzando. Il DCM è un servizio di Windows che permette a più applicazioni sviluppate con il SDK e ad una sola applicazione che non utilizza il SDK di accedere simultaneamente alle sorgenti dati della camera senza darsi fastidio.
Il DCM è esterno al SDK poichè è specifico della camera che si utilizza (come una sorta di driver). Il SDK, invece, è indipendente dalla camera.
Tra le altre cose il DCM consente anche l'aggiornamento del firmware della camera nel caso ce ne fosse bisogno.

Caratteristiche Hardware

Per concludere questo articolo, vediamo quali sono le caratteristiche hardware minime deve avere il PC su cui abbiamo intenzione di utilizzare il SDK.

  • Processore Intel® Core™ di quarta generazione (o successivo)
  • 8 GB di spazio libero su disco fisso
  • Sistema operativo Microsoft Windows* 8.1-10 a 64 bit in modalità  desktop
  • USB 3.0 per la camera

Cordova Whitelisting with Intel XDK for AJAX and launching external apps

$
0
0

Cordova CLI 5.1.1

Starting with Cordova CLI 5.1, the security model that uses Domain whitelisting to restrict the access to other domains from the app has changed.  By default now the Cordova apps are configured to allow access to any site, but it is recommended that before you move your app to production you should provide a whitelist of the domains that you want your app to have access to.

Starting from Cordova Android 4.0 and Cordova iOS 4.0, security policy is extended through Whitelist PluginFor other platforms, Cordova uses the W3C Widget Access specifications for domain whitelisting.

The Whitelist Plugin uses 3 different whitelists and Content Security Policy.

Navigation Whitelist :

Navigation Whitelist controls which URLs the Webview can be navigated to. (Only top level navigations are allowed, with the exception,for Android it applies to iFrames also for non-http(s) schemes). By default, you can only navigate to file:// URLs. To allow other URLS,  <allow-navigation> tag is used in config.xml file. With the Intel XDK you need not specify this in config.xml, the Intel XDK automatically generates config.xml from the Build settings.

In the Intel XDK you specify the URL that you would like the Webview to be navigated to under Build Settings > Android > Cordova CLI 5.1.1 > Whitelist > Cordova Whitelist > Navigation. For example : http://google.com

CLI5.1.1AndroidNavigation.png

Intent Whitelist:

Intent Whitelist controls which URLs the app is allowed to ask the system to open. By default, no external URLs are allowed. This applies to only hyperlinks and calls to window.open(). App can open a browser (for http:// and https”// URLs)  or other apps like phone, sms, email, maps etc. To allow app to launch external apps through URL or launch inAppBrowser through window.open(), <allow-intent> tag is used in config.xml, but again you need not specify this in config.xml, the Intel XDK takes care of it through Build settings. 

In the Intel XDK specify the URL you want to whitelist for external applications under Build Settings > Android > Cordova CLI  5.1.1 > Whitelist > Cordova Whitelist > Intent. For example http://example.com or tel:* or sms:*

CLI5.1.1AndroidIntent.png

Network Request Whitelist:

Network Request Whitelist controls, which network requests, such as content fetching or AJAX (XHR) etc. are allowed to be made from within the app. For the web views that support CSP,  it is recommended that you use CSP. This whitelist is for the older webviews that do not support CSP.  This whitelist is defined in the config.xml using <access origin> tag, but once again in Intel XDK you provide the URL under Build Settings > Android > Cordova CLI 5.1.1 > Whitelist > Cordova Whitelist > Network Request.  For example: http://mywebsite.com

By default, only request to file”// URLs are allowed, but Cordova applications by default include access to all website. It is recommended that you provide your whitelist before publishing your app.

CLI5.1.1AndroidNetwork.png

Content Security Policy:

Content Security Policy controls, which network requests such as images, AJAX requests (XHR) etc. are allowed to be made via web view directly. This is specified through meta tags in your html file. It is recommended that you use CSP <meta> tag on all of your pages. Android KitKat onwards supports CSP, but Crosswalk web view supports CSP on all android versions.

For example include this in your index.html file.

<meta http-equiv=“Content-Security-Policy” conent=“default-src ‘self’ data: gap” https://ssl.gstatic.com; style-src ‘self’ ‘unsafe-inline’; media-src *”><meta http-equiv=“content-Security-Policy” contnet=“default-src ‘self’ https:”>

Important Note:

As of Intel XDK release 2496, Cordova iOS 4.0 is not released yet. So, for iOS W3C Widget Access policy is used. The settings in Intel XDK for whitelisting URLs are as follows.

iOS W3CWidgetAcess CLI 5.1.1

 

For Windows platforms also, W3C Widget Access standards are used and the build settings for whitelisting are as follows.

 

iOS W3CWidgetAcess CLI 5.1.1

Cordova CLI 4.1.2

For using whitelisting with Cordova CLI 4.1.2 please follow this article.

Asteroids and DirectX* 12: Performance and Power Savings

$
0
0

Download Code Sample

The asteroids sample that Intel developed is an example of how to use the Microsoft DirectX* 12 graphics API to achieve performance and power benefits over previous APIs and was initially shown at SIGGRAPH 2014. Now that DirectX 12 is public, we are releasing the source code for the sample. In it, we render a scene of 50,000 fully dynamic and unique asteroids in two modes: maximum performance and maximum power saving. The application can switch between using the DirectX 11 and DirectX 12 APIs at the tap of a button.

All of the results here were captured on a Microsoft Surface* Pro 3 when the tablet was running in a steady, thermally constrained state. This state represented the experience of playing a demanding game for more than 10–15 minutes.

Performance

In the performance mode the application is allowed to run as fast as possible within the thermal and power constraints of the platform. Using DirectX 11, we see the following:

frame rate and the distribution of power between the CPU and GPU

The image shows the frame rate (top left) and the distribution of power between the CPU and GPU. Toggling the demo to run on DirectX 12 shows a significant improvement.

Performance with DirectX 12 increases ~70 percent (from 19 FPS to 33 FPS).

Performance with DirectX 12 increases ~70 percent (from 19 FPS to 33 FPS). The power graph explains why this is happening. DirectX 12 is designed for low-overhead, multithreaded rendering. Using the new API we reduced the CPU power requirement, thus freeing that power for the GPU.

Power

To directly compare the power savings of DirectX 12 to another API, we also support a mode that locks the frame rate so that the demo does the same amount of work in each API. Toggling from DirectX 11 (on the left half of the power graph) to DirectX 12 (on the right) while keeping the workload fixed, we see the following:

DirectX 12 uses less than half the CPU power when compared to DirectX 11

Rendering the same scene with DirectX 12 uses less than half the CPU power when compared to DirectX 11, resulting in a cooler device with longer battery life.

These increases in power efficiency in DirectX 12 are due both to reduced graphics submission overhead and an increase in multithreaded efficiency. Spreading work across more CPU cores at a lower frequency is significantly more power efficient than running a single thread at high frequency.

Summary

With this demo, we have shown that DirectX 12 enables significant improvements in both power and performance. This demo was created to show the two extremes: fixed power and fixed workload. In reality developers can choose any desired blend of power and performance for their applications.

The main takeaway is that power and performance are inseparably linked. Conventional notions of "CPU versus GPU bound" are misleading on modern devices like the Surface Pro 3. An increase in CPU power efficiency can be used for more performance even if an application is not "CPU bound."

More Information

GitHub: https://github.com/GameTechDev/asteroids_d3d12

DirectX Developer Blog: http://blogs.msdn.com/b/directx/
DirectX 12 Twitter feed: https://twitter.com/DirectX12
Intel Software Twitter feed: https://twitter.com/IntelSoftware

Intel technologies may require enabled hardware, specific software, or services activation. Performance varies depending on system configuration. Check with your system manufacturer or retailer.

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark* and MobileMark*, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.

For more information go to http://www.intel.com/performance.


MagLens*: A New Perspective on Information Discovery

$
0
0

By Benjamin A. Lieberman, PhD

One of the biggest challenges in a data-rich world is finding information of relevance, particularly when the information you’re seeking is visual in nature. Imagine looking for a specific image file but being limited to a text search of descriptions of the image or laboriously scanning thumbnail pictures one by one. How can you know if the image file was properly categorized? Or categorized at all? What if you need to pick out a single image from tens of thousands of other, similar images?

The engineers at Intel have developed the Intel® Magnifying Lens Tool, an innovative and exciting approach to solving this problem, and led the effort to develop the first web app based on this approach, MagLens. The MagLens technology shows great promise for changing the way individuals approach mass storage of personal information, including images, text, and video. The technology is part of an ongoing effort at Intel to change the way information is captured, explored, and used.

Images and Files Everywhere

Today, most people carry a camera multiple hours a day in the form of a smartphone. More and more of our personal content is online—text, video, pictures, books, movies, social media, and more. The list seems to grow with each passing day.

Users are increasingly storing their files and content in cloud. Service providers like Amazon, Apple, and Google are all making this migration easier, safer, and less expensive. Personal content is available 24 hours a day, 7 days a week, 365 days a year from practically any mobile device.

Unfortunately, old-style storage structures based on hierarchical folders present a serious barrier to the optimized use of data. Specifically, the classical file storage structures are prone to poor design, mismanagement, neglect, and misuse. Unless users are well organized, these file shares rapidly become a dumping ground, much like books stacked in a pile (Figure 1). The books in the image are clearly organized but not terribly usable. As a result, it becomes increasingly difficult to locate relevant information in these “stacks,” particularly when the current techniques of textual search are applied to visual content.

I know it must be in here somewhere

Figure 1. I know it must be in here somewhere...

In the exploding cloud-storage space, it’s now possible to store tens of thousands of image-related information online. These files typically accumulate over time and are often as haphazardly stored as they are created. Unlike a well-regulated research library, little or no metadata (for example, date, location, subject matter, or topic) are created with the images, making a textual search all but impossible. How can you find the file you want when it could be anywhere, surrounded by virtually anything?

Rapidly scanning this vast forest of files for a single image is a daunting task. Consider the steps involved. First is the desire to locate information of interest: What differentiates that information from other information like it? Next, you attempt to remember where that information was stored. Was it on a local drive or a cloud server; if it’s in the cloud, which service? What was the folder structure? Or is there just a collection of multiple files all stored in one place? Now how do you recognize the file of interest? It’s doubtful that the file name will be helpful (which may be something automatically generated and cryptic, such as 73940-a-200.jpg), and thumbnail images are difficult to see clearly, even on a high-definition display. What’s required is some method to rapidly scan the stored images for specific patterns of shape and color by leveraging our highly evolved visual sense.

MagLens Expands the Options for Discovery

Many years of research in neurobiology and cognitive science have shown that human thinking and pattern recognition are geared toward the visual space. Our brains have evolved to handle the complex (and necessary-to-survival) problem of determining whether the fuzzy shape behind the next clump of bushes is a large, carnivorous animal that would like to invite us over for a quick bite. These days, we’re faced with choices that have somewhat less terminal outcomes, such as finding a set of photos from our most recent vacation. Nevertheless, we would be aided in our task if we could use our well-honed pattern-discovery and matching efficiently.

The Intel® Magnifying Lens Tool is both simple and profound. Like many successful metaphors in computing, the idea is that you should be able to rapidly scan across a visual field, with the focus of attention (that is, the focal point) matching the greatest detailed magnification of an image (Figure 2). Around the focus image, other images are magnified, as well, but in a decreasing amount as you move away from the focal point, similar to the way icon magnification on Mac OS X* works as you pass the cursor over the application Dock. With MagLens, the visual field is the full screen space (rather than a linear bar), allowing a rapid scan across thousands of images in a few seconds. All the images remain in view at all times, varying by size as the focus of attention scans across the field of vision.

MagLens* technology allows dynamic exploration of a visual space

Figure 2. MagLens* technology allows dynamic exploration of a visual space.

Contrast this technology with previous attempts at magnification, where the magnified portion of the image blocks out other vital information in the view (Figure 3). Even if the magnification moves around the screen, only the area directly under the cursor is visible: the remainder of the image is blocked from view, hampering the ability of your visual systems to recognize patterns. You have to engage your short-term memory to temporarily store one screen view, and then mentally compare it with the next. In contrast, the MagLens approach makes the entire view available for pattern matching.

Zooming a section of the image obscures large areas of the non-highlighted image

Figure 3. Zooming a section of the image obscures large areas of the non-highlighted image.

This “elastic presentation space” approach differs from previous attempts at rapid scanning, such as by “flipping” pages of thumbnails, or dragging a scroll bar, in that it simultaneously gives you a natural scan of the information field (much like how your eyes normally scan a complex visual field), dynamically increasing the level of detail at the point of visual attention. Combined with the natural gesture recognition that 3D recognition technology (such as the Intel® RealSense™ technology) provides, this technique opens the visual computation space to a wide range of applications.  To explore this option, the development team integrated Intel RealSense technology into the prototype to optimize the application for a wide range of content.

Where We Have Been

The research on what would become MagLens began as four years of Intel-sponsored research (1997–2001) by Sheelagh Carpendale, who was working on her doctoral dissertation at the University of Calgary (see “For More Information”). Although the approach she devised has been discussed and written about extensively, there has to date been no successful adoption of a widespread technological approach.  John Light at Intel began to pursue a prototype using Dr. Carpendale’s “elastic presentation space” idea.

Light’s team created a prototype that took advantage of modern computing power to allow users to view hundreds of images at the same time.

Design Thinking Overview

Shardul Golwalkar, an Intel intern at the time, expanded this prototype into a more usable proof of concept. The expansion project began with text-heavy content such as textbooks and was later expanded to more visual content exploration through magazines and news publications. Employing a user-centric development technique called Design Thinking (see the sidebar “Design Thinking Overview”), Shardul broadened the prototype into a functional web-enabled space, where it would be possible to perform the visual scan through a standard web browser interface.

At the conclusion of his internship, Shardul continued with the idea as an undergraduate at Arizona State University. During this time, he continued to explore the optimization of the technology and supported project development. Together, Shardul and Light demonstrated that it was possible to model 40,000 simultaneous images to the “discovery space” and enable a multimodal understanding of the data space using both gestures and vision. At the end of this effort the team had succeeded in creating an interface that was intuitive, powerful, and empowering for user-driven self-discovery of new capabilities—a delight for the user.

Where We Are

When the initial development was complete, there was interest at Intel in moving to a more commercial product. Intel sponsored the company Empirical to develop the Intel® Magnifying Lens Tool and move it toward a 2015 product release. Developers at Empirical reworked the original development, building new workflows and making the overall experience more polished and performant. See “For More Information” for a link to the current product, or click here.

A major goal of the initial development was to allow users connected to Internet file shares (such as Google Drive*) to view cloud-based files and ultimately enjoy integration across multiple cloud file stores. The product was optimized for web use, especially for touch screen display devices such as All-in-One desktops, 2 in 1s, notebooks, and tablets. Using MagLens, users no longer need to know the file storage hierarchy to find materials. The MagLens site collects all the identified file stores and “flattens” the visualization to a single, scalable 2D space. Now, it’s possible to locate a file of interest regardless of where it resides.

Imagine the Possibilities

Intel selected Empirical to develop the MagLens concept into a viable product based on its years of experience with product design and development. The Intel collaboration with Empirical has discovered many possible applications for MagLens. Indeed, Intel is open to licensing the MagLens code to software vendors, original equipment manufacturers, cloud services providers, and operating system developers—to expand the concept beyond photos and images to applications involving magazines, photography, films, and visualization of complex multimedia information. The Intel contact for licensing inquiries is Mike.Premi@intel.com.

Research is also continuing on the core concept of browsing through additional metadata to enable exploration and sorting for likely conceptual matches, such as a filter or clustering algorithm that gathers similar images (i.e., a digital photo library). Other techniques include using algorithms for facial recognition and integration as a utility of core operating systems.

MagLens shows great promise for changing the ideas around information discovery, organization, and integration. The future of this technology is limited only by our ability to see the possibilities.

References

About the Author

Ben Lieberman holds a PhD in biophysics and genetics from the University of Colorado, Health Sciences Center. Dr. Lieberman serves as principal architect for BioLogic Software Consulting, bringing more than 20 years of software architecture and IT experience in various fields, including telecommunications, rocket engineering, airline travel, e-commerce, government, financial services, and the life sciences. Dr. Lieberman bases his consulting services on the best practices of software development, with specialization in object-oriented architectures and distributed computing—in particular, Java*-based systems and distributed website development, XML/XSLT, Perl, and C++-based client–server systems. He is also an accomplished professional writer with a book (The Art of Software Modeling, Benjamin A. Auerbach Publications, 2007), numerous software-related articles, and a series of IBM corporate technology newsletters to his credit.

Parallel Noise and Random Functions for OpenCL™ Kernels

$
0
0

Download Now  Noise.zip

About the Sample

The Noise sample code associated with this paper includes an implementation of Perlin noise, which is useful for generating natural-looking textures, such as marble and clouds, for 3D graphics. A test that uses Perlin noise to generate a “cloud” image is included. (See the References section for more information on Perlin noise.) 2D and 3D versions are included, meaning that the functions take two or three inputs to generate a single Perlin noise output value.

The Noise sample also includes pseudo-random number generator (RNG) functions that yield fairly good results—sufficient that a generated image visually appears random. 1D, 2D, and 3D versions are included, again referring to the number of inputs to generate a single pseudo-random value.

Introduction and Motivation

Many applications require a degree of “randomness” — or actually, “pseudo-randomness.” That is, a series of values that would appear random or “noisy” to a human. However, for repeatability, applications also commonly require that the RNG be able to reliably generate exactly the same sequence of values, given the same input “seed” value or values.

Most RNG algorithms meet these requirements by making each generated value depend on the previous generated value, with the first value in the sequence generated directly from the seed value. That approach to RNG is problematic for highly parallel processing languages such as OpenCL. Forcing each of the many processing threads to wait on a single sequential RNG source would reduce or eliminate the parallelism of algorithms using it.

One approach to dealing with this issue is to pre-generate a large table of random values, with each of the parallel threads generating unique but deterministic indices into that table. For example, an OpenCL kernel processing an image might select an entry from the pre-generated table by calculating an index based upon the pixel coordinates that kernel is processing or generating.

However, that approach requires a potentially time-consuming serial RNG process before the parallel algorithm can begin—limiting performance improvements due to parallelism. It also requires that the number of random numbers to be used be known at least approximately, in advance of running the parallel algorithm. That could be problematic for parallel algorithms that need to dynamically determine how many random values will be used by each thread.

The OpenCL kernel-level functions in the Noise sample code associated with this paper takes an approach more suitable for the OpenCL approach to dividing work into parallel operations.

Noise and Random Number Generation for OpenCL

OpenCL defines a global workspace (array of work items) with one, two, or three dimensions. Each work item in that global space has a unique set of identifying integer values corresponding to the x, y, and z coordinates in the global space.

The Perlin noise and RNG functions in the Noise sample generate a random number or noise sequence based on up to three input values, which can be the global IDs for each work item. Alternatively, one or more of the values might be generated by a combination of the global IDs and some data value obtained or generated by the kernel.

For example, the following OpenCL kernel code fragment shows generation of random numbers based on the 2D global ID of the work item.

kernel void	genRand()
{
	uint	x = get_global_id(0);
	uint	y = get_global_id(1);

	uint	rand_num = ParallelRNG2( x, y );

	...

Figure 1. Example of random number use - two dimensions.

 

This approach allows for random or noise functions to run in parallel between work items, yet generate results that have a repeatable sequence of values that are “noisy” both between work items and sequentially within a work item. If multiple 2D sets of values need to be generated, the 3D generation functions can be used, with the first two inputs generated based upon the work item’s global ID, and the 3rd dimension generated by sequentially increasing some starting value for each additional value required. This could be extended to provide multiple sets of 3D random or noise values, as in the following example for Perlin noise:

kernel void multi2dNoise( float fScale, float offset )
{
float	fX = fScale * get_global_id(0);
float	fY = fScale * get_global_id(1);
float	fZ = offset;

float	randResult = Noise_3d(  fX,  fY,  fZ );
...

Figure 2. Example of Perlin noise use - three dimensions.

 

Limitations

The Noise_2d and Noise_3d functions follow the same basic Perlin noise algorithm but differ in implementation based on Perlin’s recommendations. (See reference 1.) In the Noise sample, only Noise_3d is exercised to implement the noise example, but a test kernel for Noise_2d is included in Noise.cl for the reader who wants to modify the sample to test that variation.

The Noise_2d and Noise_3d functions should be called with floating point input values. Values should span a range, such as (0.0, 128.0), to set the size of the “grid” (see Figure 3) of randomized values. Readers should look at the clouds example to understand how Perlin noise can be transformed into various “natural looking” images.

The default ParallelRNG function used in the random test provides visually random results but is not the fastest RNG algorithm. This function is based on the “Wang hash,” which was not designed for use as an RNG. However, some commonly used RNG functions (a commented out example is included in the Noise.cl file) showed visible regularities when filling a 2D image, particularly in the lower order bits of results. The reader may want to experiment with other, faster RNG functions.

The default ParallelRNG function generates only unsigned 32 bit integer results—if floating point values on a range such as (0.0, 1.0) are needed, the application must apply a mapping to that range. The random example maps the random unsigned integer result to the range (0, 255) to generate gray scale pixel values, simply using an AND binary operation to select 8 bits.

The default ParallelRNG function will not generate all 4,294,967,296 (2^32) unsigned integer values for sequential calls using the previously generated value. For any single starting seed value the pseudo-random sequences/cycles range from at least as small as 7,000 unique values to about 2 billion values long. There are around 20 different cycles generated by the default ParallelRNG function. The author believes it will be uncommon that any work item of an OpenCL kernel will require more sequentially generated random numbers than the smallest cycle can provide.

The 2D and 3D versions of the function—ParallelRNG2 and ParallelRNG3—use a “mixing” of cycles by applying an XOR binary operation between the result of a previous call to ParallelRNG and the next input value, which will change the cycle lengths. However, that altered behavior has not been characterized in detail, so it is recommended that the reader carefully validate that the ParallelRNG functions meet the needs of their application.

Project Structure

This section lists only the key elements of the sample application source code.

NoiseMain.cpp:

main()
Main entry point function. After parsing command-line options, it initializes OpenCL, builds the OpenCL kernel program from the Noise.cl file, prepares one of the kernels to be run, and calls ExecuteNoiseKernel(), then ExecuteNoiseReference(). After validating that the two implementations produce the same results, main() prints out the timing information each returned and stores the resulting images from each.

ExecuteNoiseKernel()
Set up and run the selected Noise kernel with OpenCL.

ExecuteNoiseReference()
Set up and run the selected Noise reference C code.

Noise.cl:

defaut_perm[256]
Table of random values 0—255 for 3D Perlin noise kernel. Note that this could be generated and passed to the Perlin noise kernel, for an added degree of randomness.

grads2d[16]
16 uniformly spaced unit vectors, gradients for 2D Perlin noise kernel.

grads3d[16]
16 vector gradients for 3D Perlin noise kernel.

ParallelRNG()
Pseudo-Random Number Generator, one pass over 1 input. An alternative RNG function is commented out, in case the reader wants to test a faster function that yields poorer results.

ParallelRNG2()
RNG doing 2 passes for 2 inputs

ParallelRNG3()
RNG doing 3 passes for 3 inputs

weight_poly3() and weight_poly5() and WEIGHT()
These are alternative weight functions used by Perlin noise, to insure continuous gradients everywhere. The second (preferred) function allows continuous 2nd derivative everywhere as well. The WEIGHT macro selects which is used.

NORM256()
Macro converting range (0, 255) to (-1.0, 1.0)

interp()
Bilinear interpolation using an OpenCL built

hash_grad_dot2()
Selects a gradient and does dot product with input xy, part of Perlin Noise_2d function.

Noise_2d()
Perlin noise generator with 2 inputs.

hash_grad_dot3()
Selects a gradient and does dot product with input xyz, part of Perlin Noise_3d function.

Noise_3d()
Perlin noise generator with 3 inputs.

cloud()
Generates one pixel of a “cloud” output image for CloudTest using Noise_3d .

map256()
Converts from the Perlin noise output range (-1.0, 1.0)to the range (0, 255) needed for gray scale pixels.

CloudTest()
The cloud image generation test. The slice parameter is passed to cloud, to allow the host code to generate alternative cloud images.

Noise2dTest()
Test of Noise_2d– not used by default.

Noise3dTest()
Test of Noise_3d– default Perlin noise function. Uses map256 to generate pixel values for a grayscale image.

RandomTest()
Test of ParallelRNG3, currently uses the low order byte of unsigned integer result to output a grayscale image.

Two Microsoft Visual Studio solution files are provided, for Visual Studio versions 2012 and 2013.  These are “Noise_2012.sln” and “Noise_2013.sln”.   If the reader has a newer version of Visual Studio, it should be possible to use the Visual Studio solution/project update to create a new solution derived from these.

Note that the solutions both assume that the Intel® OpenCL™ Code Builder has been installed.

Controlling the Sample

This sample can be run from a Microsoft Windows* command-line console, from a folder that contains the EXE file:

Noise.exe < Options >

Options:

-h or --help
Show command-line help. Does not run any of the demos.

-t or --type [ all | cpu | gpu | acc | default |<OpenCL constant for device type>
Select the device to run the OpenCL kernel upon by type of device. Default value: all

<OpenCL constant for device type>

CL_DEVICE_TYPE_ALL | CL_DEVICE_TYPE_CPU | CL_DEVICE_TYPE_GPU |
CL_DEVICE_TYPE_ACCELERATOR | CL_DEVICE_TYPE_DEFAULT

-p or --platform< number-or-string > 
Selects platform to use. A list of all platform numbers and names is printed when a demo is run. The platform being used will have “[Selected]” printed to the right of it. If using string, provide enough letters of the platform name to uniquely identify it. Default value: Intel

-d or --device < number-or-string >
Select the device to run the OpenCL kernels upon by device number or name. Device numbers and names on the platform being used are printed when a demo is run. The current device will have “[Selected]” printed to the right of it. Default value: 0

-r or --run [ random | perlin | clouds ]
Select the function demonstration to run. Random number, perlin noise, or cloud image generators each have demo kernels. Default value: random

-s or --seed < integer >
Provide an integer value to vary the algorithm output. Default value: 1

Noise.exe prints the time the OpenCL kernel and reference C-coded equivalent each take to run, as well as the names of the respective output files for each. When the program has finished printing information, it waits for the user to press ENTER before exiting. Please note that no attempt was made to optimize performance of the C-coded reference code functions; they are intended only to validate correctness of the OpenCL kernel code.

Examining Results

After a Noise.exe run is complete, examine the generated BMP format image files OutputOpenCL.bmp and OutputReference.bmp in the working folder, to compare the OpenCL and C++ code results, respectively. The two images should be identical, though it is possible that there might be very small differences between the two Perlin noise or cloud images.

The (Perlin) noise output should appear similar to Figure 3:


Figure 3. Perlin noise output.

The random output should look similar to Figure 4:


Figure 4. Random noise output.

The clouds function output should look similar to Figure 5 :


Figure 5. Generated cloud output.

References

  1. Perlin, K., “Improving Noise,” http://mrl.nyu.edu/~perlin/paper445.pdf
  2. “4-byte Integer Hashing,” http://burtleburtle.net/bob/hash/integer.html
  3. Overton, M. A., “Fast, High-Quality, Parallel Random Number Generators,” Dr. Dobb’s website (2011). http://www.drdobbs.com/tools/fast-high-quality-parallel-random-number/229625477
  4. Intel® Digital Random Number Generator (DRNG) Library Implementation and Uses, https://software.intel.com/en-us/articles/intel-digital-random-number-generator-drng-library-implementation-and-uses?utm_source=Email&utm_medium=IDZ
  5. Intel Sample Source Code License Agreement, https://software.intel.com/en-us/articles/intel-sample-source-code-license-agreement/
  6. Intel® OpenCL™ Code Builder,https://software.intel.com/en-us/opencl-code-builder

 

Perceptual Drone Speech Recognition

$
0
0

Download Code Sample

Controlling Drones with Speech-Recognition Applications Using the Intel® RealSense™ SDK

Every day we hear about drones in the news. With applications ranging from spying and fighting operations, photography and video, and simply for fun, drone technology is on the ground floor and worth looking into.

As developers, we have the ability to create applications that can control them. A drone is ultimately just a programmable device, so connecting to them and sending commands to perform desired actions can be done using a regular PC or smartphone application. For this article, I have chosen to use one of the most “hackable” drones available on the market: the Parrot’s AR.Drone* 2.0.

We will see how to interact with and control this drone with a library written in C#. Using this as our basis we will add speech commands to control the drone using the Intel® RealSense™ SDK.

PARROT AR.DRONE 2.0

Among the currently marketed drones for hobbyists, one of the most interesting is the AR.Drone 2.0 model from Parrot. It includes many features and incorporates a built-in help system that provides a stabilization and calibration interface. The drone’s sturdy Styrofoam protection helps to avoid damage to the propellers or moving parts in case of falls or collisions with fixed obstacles.

The AR.Drone* 2.0 from Parrot

The hardware provides a connection with external devices on its own Wi-Fi* network between the drone and the connected device (smartphone, tablet, or PC). The communication protocol is based on AT-like type messages (like those used to program and control telephone modems years ago).

Using this simple protocol, it is possible to send the drone all the commands needed to get it off the ground, raise or lower in altitude, and fly in different directions. It is also possible to read a stream of images taken from cameras (in HD) placed onboard the drone (one front and one facing down) to save pictures during flights or capture video.

The company provides several applications to fly the drone manually; however, it’s much more interesting to study how to autonomously control the flight. For this reason, I decided (with the help of my colleague Marco Minerva) to create an interface that would allow us to control it through different devices.

Controlling the Drone Programmatically

We said that the drone has its own Wi-Fi network, so we’ll connect to it to send control commands. The AR.Drone 2.0 developer guide gave us all the information we needed. For example, the guide says to send commands via UDP to the 192.168.1.1 address, on port 5556. These are simple strings in the AT format:

AT * REF for takeoff and landing control

AT * PCMD to move the drone (direction, speed, altitude)

Once we connect to the drone, we’ll create a sort of ‘game’ where we send commands to the drone based on the inputs of our application. Let's see how to create a Class Library.

First, we must connect to the device:

public static async Task ConnectAsync(string hostName = HOST_NAME, string port = REMOTE_PORT)
        {
             // Set up the UDP connection.
             var droneIP = new HostName(hostName);

             udpSocket = new DatagramSocket();
             await udpSocket.BindServiceNameAsync(port);
             await udpSocket.ConnectAsync(droneIP, port);
             udpWriter = new DataWriter(udpSocket.OutputStream);

             udpWriter.WriteByte(1);
             await udpWriter.StoreAsync();

             var loop = Task.Run(() => DroneLoop());
        }

As mentioned, we must use the UDP protocol, so we need a DatagramSocket object. After connecting with the ConnectAsync method, we create a DataWriter on the output stream to send the commands themselves. Finally, we send the first byte via Wi-Fi. It will be discarded by the drone and is only meant to initialize the system.

Let's check the command sent to the drone:

        private static async Task DroneLoop()
        {
            while (true)
            {

                var commandToSend = DroneState.GetNextCommand(sequenceNumber);
                await SendCommandAsync(commandToSend);

                sequenceNumber++;
                await Task.Delay(30);
            }
        }

The tag DroneState.GetNextCommand formats the string AT command that must be sent to the device. To do this, we need a sequence number because the drone expects that each command is accompanied by a progressive number and ignores all the commands with a number equal to or less than one already posted.

Then we use WriteString to send the command via StreamSocket to the stream, forcing StoreAsync to write the buffer and submit. Finally, we increment the sequence number and use Task Delay to introduce a 30-millisecond delay for the next iteration.

The DroneState class is the one that deals with determining which command to send:

    public static class DroneState
    {
       public static double StrafeX { get; set; }
       public static double StrafeY { get; set; }
       public static double AscendY { get; set; }
       public static double RollX { get; set; }
       public static bool Flying { get; set; }
       public static bool isFlying { get; set; }

        internal static string GetNextCommand(uint sequenceNumber)
        {
            // Determine if the drone needs to take off or land
            if (Flying && !isFlying)
            {
                isFlying = true;
                return DroneMovement.GetDroneTakeoff(sequenceNumber);
            }
            else if (!Flying && isFlying)
            {
                isFlying = false;
                return DroneMovement.GetDroneLand(sequenceNumber);
            }

            // If the drone is flying, sends movement commands to it.
            if (isFlying && (StrafeX != 0 || StrafeY != 0 || AscendY != 0 || RollX != 0))
                return DroneMovement.GetDroneMove(sequenceNumber, StrafeX, StrafeY, AscendY, RollX);

            return DroneMovement.GetHoveringCommand(sequenceNumber);
        }
    }

The properties StrafeX, StrafeY, AscendY, and RollX define the speed of navigation left/right, forward/backward, the altitude, and rotation change of the drone, respectively. These properties are double and accept values between 1 and -1. For example, setting StrafeX to -0.5 moves the drone to the left at half of its maximum speed; specifying 1 will go to the right at full speed.

Flying is a variable that determines the takeoff or landing. In the GetNextCommand method we check the values of these fields to decide which command to send to the drone. These commands are in turn managed by the DroneMovement class.

Note that, if no command is specified, the last statement creates the so-called Hovering command, an empty command that keeps the communication channel open between the drone and the device. The drone needs to be constantly receiving messages from the controlling application, even when there’s no action to do and no status has changed.

The most interesting method of the DroneMovement class is definitely GetDroneMove, which effectively composes and sends the command to the drone.  For other methods related to movement, please refer to this sample.

public static string GetDroneMove(uint sequenceNumber, double velocityX, double velocityY, double velocityAscend, double velocityRoll)
    {
        var valueX = FloatConversion(velocityX);
        var valueY = FloatConversion(velocityY);
        var valueAscend = FloatConversion(velocityAscend);
        var valueRoll = FloatConversion(velocityRoll);

        var command = string.Format("{0},{1},{2},{3}", valueX, valueY, valueAscend, valueRoll);
        return CreateATPCMDCommand(sequenceNumber, command);
    }
private static string CreateATPCMDCommand(uint sequenceNumber, string command, int mode = 1)
    {
        return string.Format("AT*PCMD={0},{1},{2}{3}", sequenceNumber, mode, command, Environment.NewLine);
    }

The FloatConversion method is not listed here, but it converts a double value between -1 and 1 in a signed integer that can be used by the AT commands, like the PCMD string to control the movements.

The code shown here is available as a free library on NuGet, called AR.Drone 2.0 Interaction Library, which provides everything you need to control the device from takeoff to landing.

AR.Drone UI on NuGet

Thanks to this sample application, we can forget the implementation details and focus instead on delivering apps that, through different modes of interaction, allow us to pilot the drone.

Intel® RealSense™ SDK

Now let’s look at one of the greatest and easiest-to-use features (for me) of the Intel RealSenseSDK — speech recognition.

The SDK offers two different approaches to speech:

  • Command recognition (from a given dictionary)
  • Free text recognition (dictation)

The first is essentially a list of commands, defined by the application, in a specified language for instructing the ‘recognizer’. Words not on the list are ignored.

The second is a sort of a recorder that “understands” any vocabulary in a free-form stream. It is ideal for transcriptions, automatic subtitling, etc.

For our project we will use the first option because we want to implement only a finite number of commands to send to the drone.

First, we need to define some variables to use:

        private PXCMSession Session;
        private PXCMSpeechRecognition SpeechRecognition;
        private PXCMAudioSource AudioSource;
        private PXCMSpeechRecognition.Handler RecognitionHandler;

Session is a tag required to access I/O and the SDK’s algorithms, since all subsequent actions are inherited from this instance.

SpeechRecognition is the instance of the recognition module created with a CreateImpl function inside the Session environment.

AudioSource is the device interface to establish and select an input audio device (in our sample code we select the first audio device available to keep it simple).

RecognitionHandler is the real handler that assigns the eventhandler for the OnRecognition event.

Let’s now initialize the session, the AudioSource, and the SpeechRecognition instance.

            Session = PXCMSession.CreateInstance();
            if (Session != null)
            {
                // session is a PXCMSession instance.
                AudioSource = Session.CreateAudioSource();
                // Scan and Enumerate audio devices
                AudioSource.ScanDevices();

                PXCMAudioSource.DeviceInfo dinfo = null;

                for (int d = AudioSource.QueryDeviceNum() - 1; d >= 0; d--)
                {
                    AudioSource.QueryDeviceInfo(d, out dinfo);
                }
                AudioSource.SetDevice(dinfo);

                Session.CreateImpl<PXCMSpeechRecognition>(out SpeechRecognition);

As noted before, to keep the code simple we select the first Audio device available.

PXCMSpeechRecognition.ProfileInfo pinfo;
              SpeechRecognition.QueryProfile(0, out pinfo);
              SpeechRecognition.SetProfile(pinfo);

Then we need to query the system about the actual configuration parameter and assign it to a variable (pinfo).

We should also set some parameters in the profile info to change the recognized language. Set the recognition confidence level (higher value request stronger recognition), end of recognition timeout, etc.

In our case we set the default parameter as in profile 0 (the first received from Queryprofile).

                String[] cmds = new String[] { "Takeoff", "Land", "Rotate Left", "Rotate Right", "Advance","Back", "Up", "Down", "Left", "Right", "Stop" , "Dance"};
                int[] labels = new int[] { 1, 2, 4, 5, 8, 16, 32, 64, 128, 256, 512, 1024 };
                // Build the grammar.
                SpeechRecognition.BuildGrammarFromStringList(1, cmds, labels);
                // Set the active grammar.
                SpeechRecognition.SetGrammar(1);

Next, we’ll define the grammar dictionary for instructing recognition system. Using BuildGrammarFromStringList we create a simple list of verbs and corresponding return values defining grammar number 1.

It is possible to define multiple grammars to use in our application and activate one at a time when needed, so we could create all the different command dictionaries for all the supported languages and provide a way for the user to switch between the different languages recognized by the SDK. In this case, you must install all the corresponding DLL files for the specific language support (the default SDK setup installs only the US English support assemblies). In this sample, we use only one grammar set with the default installation of US English.

We then select which grammar to make active in SpeechRecognition instance.

                RecognitionHandler = new PXCMSpeechRecognition.Handler();

                RecognitionHandler.onRecognition = OnRecognition;

Those instructions define a new eventhandler for the OnRecognition event and assign it to a method defined below:

        public void OnRecognition(PXCMSpeechRecognition.RecognitionData data)
        {
            var RecognizedValue = data.scores[0].label;
            double movement = 0.3;
            TimeSpan duration = new TimeSpan(0, 0, 0, 500);
            switch (RecognizedValue)
            {
                case 1:
                    DroneState.TakeOff();
                    WriteInList("Takeoff");
                    break;
                case 2:
                    DroneState.Land();
                    WriteInList("Land");
                    break;
                case 4:
                    DroneState.RotateLeftForAsync(movement, duration);
                    WriteInList("Rotate Left");
                    break;
                case 5:
                    DroneState.RotateRightForAsync(movement, duration);
                    WriteInList("Rotate Right");
                    break;
                case 8:
                    DroneState.GoForward(movement);
                    Thread.Sleep(500);
                    DroneState.Stop();
                    WriteInList("Advance");
                    break;
                case 16:
                    DroneState.GoBackward(movement);
                    Thread.Sleep(500);
                    DroneState.Stop();
                    WriteInList("Back");
                    break;
                case 32:
                    DroneState.GoUp(movement);
                    Thread.Sleep(500);
                    DroneState.Stop();
                    WriteInList("Up");
                    break;
                case 64:
                    DroneState.GoDown(movement);
                    Thread.Sleep(500);
                    DroneState.Stop();
                    WriteInList("Down");
                    break;
                case 128:
                    DroneState.StrafeX = .5;
                    Thread.Sleep(500);
                    DroneState.StrafeX = 0;
                    WriteInList("Left");
                    break;
                case 256:
                    DroneState.StrafeX = -.5;
                    Thread.Sleep(500);
                    DroneState.StrafeX = 0;
                    WriteInList("Right");
                    break;
                case 512:
                    DroneState.Stop();
                    WriteInList("Stop");
                    break;
                case 1024:
                    WriteInList("Dance");
                    DroneState.RotateLeft(movement);
                    Thread.Sleep(500);
                    DroneState.RotateRight(movement);
                    Thread.Sleep(500);
                    DroneState.RotateRight(movement);
                    Thread.Sleep(500);
                    DroneState.RotateLeft(movement);
                    Thread.Sleep(500);
                    DroneState.GoForward(movement);
                    Thread.Sleep(500);
                    DroneState.GoBackward(movement);
                    Thread.Sleep(500);
                    DroneState.Stop();
                    break;
                default:
                    break;

            }
            Debug.WriteLine(data.grammar.ToString());
            Debug.WriteLine(data.scores[0].label.ToString());
            Debug.WriteLine(data.scores[0].sentence);
            // Process Recognition Data
        }

This is a method of getting a value returned from the recognition data and executing the corresponding command (in our case the corresponding flight instruction for the drone).

Every drone command refers to the DroneState call with the specific method (TakeOff, GoUp, DoDown, etc.) with some specific parameter of movement or duration, referring in each case to a specific quantity of movement or a time duration for it.

Some commands need an explicit call to the Stop method to interrupt the actual action otherwise the drone will continue to move as instructed (refer to the previous code for those command).

In some cases is necessary to insert a Thread.Sleep between two different commands to permit the completion of the previous operation before sending the new command.

In order to test the recognition even if we don’t have a drone available I’ve inserted a variable (controlled by the checkbox present in the main window) that instructs the Drone Stub functional mode that creates the command but doesn’t send it.

To close the application, call the OnClosing method to close and destroy all the instances and handlers and to basically clean up the system.

In the code you can find some debug commands that print some helpful information in the Visual Studio* debug windows when testing the system.

Conclusion

In this article, we have shown how we can interact with a device as complex as a drone using an interaction interface with natural language. We have seen how we can define a simple dictionary of verbs and instruct the system to understand it and consequently control a complex device like a drone in flight. What I show in this article is only a small fraction of the available possibilities to operate the drone, and infinite options are possible.

Photo of the flying demonstration at the .NET Campus event session in 2014

About the Author

Marco Dal Pino has worked in IT from more than 20 years, and is a Freelance Consultant working on the .NET platform. He’s part of the staff of DotNetToscana, which is a community focused on Microsoft technologies, and he is a Microsoft MVP for Windows Platform Development.  He develops Mobile and Embedded applications for retail and enterprise sectors, and is also involved in developing Windows Phone and Windows 8 applications for a 3rd party company.

Marco has been Nokia Developer Champion for Windows Phone since 2013, and that same year Intel recognized him as an Intel Developer Zone Green Belt for the activity of developer support and evangelization about Perceptual and Intel RealSense technology. He’s also an Intel Software Innovator for Intel RealSense and IoT technologies.

He is a Trainer and speaks at major technical conferences.

Marco Minerva has been working on .NET platform since its first introduction. Now he is mainly focused on designing and developing Windows Store and Windows Phone apps using Windows Azure as back-end. He is co-founder and president of DotNetToscana, the Tuscany .NET User Group. He is speaker in technical conferences and writes for magazines.

Platform Analyzer - Analyzing Healthy and not-so Healthy Applications

$
0
0

Recently my wife purchased a thick and expensive book. As an ultrasonic diagnostician for children, she purchases many books, but this one had me puzzled.  The book was titled Ultrasound Anatomy of the Healthy Child.  Why would she need a book that showed only healthy children?  I asked her and her answer was simple: to diagnose any disease, even one not yet discovered, you need to know what a healthy child looks like. 

In this article we will act like doctors, analyzing and comparing a healthy and a not-so-healthy application.

Knock – knock – knock.

The doctor says: “It’s open, please enter.”

In walks our patient,  Warrior Wave*, an awesome game in which your hand acts as the road for the warriors to cross. It’s extremely fun to play, innovative, and uses Intel® RealSense™ technology. 

While playing the game, though, something felt a little off.  Something that I hadn’t felt before in other games based on Intel® RealSense™ technology.  The problem could be caused by so many things, but what is it in this case?  

Like any good doctor who is equipped with the latest and greatest analysis tools to diagnose the problem, we have the perfect tools to analyze our patient.

Using Intel® Graphics Performance Analyzer (Intel® GPA) Platform Analyzer, we receive a time-line view of our application’s CPU load, frame time, frames per second (FPS), and draw calls:

Let’s take a look.

Hmm… the first things that catch our eye are the regular FPS surges that occur periodically. All is relatively smooth for ~200 milliseconds and then jumps up and down severely.

For comparison, let’s look at a healthy FPS trace bellow. The game in this trace felt smooth and played well.  

No pattern was evident within the frame time, just normal random deviations.

But in our case we see regular surges. These surges happen around four times a second.  Let’s investigate the problem deeper, by zooming in on one of the surges and seeing what happening in the threads:

We can see that working thread 2780 spends most of the time in synchronization. The thread does almost nothing but wait for the next frame from the Intel® RealSense™ SDK:

At the same time, we see that rendering goes in another worker thread. If we scroll down, we find thread 2372.

Instead of “actively” waiting for the next frame from the Intel RealSense SDK, the game could be doing valuable work. Drawing and Intel® RealSense™ SDK work could be done in one worker thread instead of two, simplifying thread communication.

Excessive inter-thread communication can drastically slow down the execution and cause many problems.

Here is the example of a “healthy” game, where the Intel® RealSense™ SDK work and the DirectX* calls are in one thread. 

RealSense™ experts say: there is no point in waiting for the frames from the Intel® RealSense™ SDK. They won’t be ready any faster. 

But we can see that the main problem is at the top of the timeline.

On average, five out of six CPU frames did not result in a GPU frame. This is the cause of the slow and uneven GPU frame rate, which on average is less than 16 FPS.

Now let’s look at the pipeline to try and understand how the code is executing.  Looking at the amount of packets on “Engine 0,” the pipeline is filled to the brim, but the execution is almost empty.

The brain can process 10 to 12 separate images per second, perceiving them individually. This explains why the first movies were cut at a rate of 16 FPS: this is the average threshold at which the majority of people stop seeing a slide show and start seeing a movie.

Once again, let’s see the profile of the nice-looking game: 

Notice that the GPU frames follow the CPU frames with little shift. For every CPU frame, there is a corresponding GPU that starts execution after a small delay.

Let’s try to understand why our game doesn’t have this pattern.

First, let’s examine our DirectX* calls. The highlighted one with the tooltip is our “Present” call that sends the finished frame to the GPU. In the screenshot above, we see that it creates a “Present” packet on the GPU pipeline (marked with X’s).  At round the 2215 ms mark, it has moved closer to execution, jumping over three positions, but at 2231 ms it just disappears without completing execution.

And if we look at each present call within the trace, not one call successfully makes it to execution.

Question: How does the game draw itself if all our DirectX* Present calls are ignored?! Good thing we have good tools so we can figure this out. Let’s take a look.

Can you see something curious inside the gray oval? We can see that this packet, not caused by any DirectX* call of our code, still gets to the execution, fast and out of order. Hey, wait a minute!!!

Let's look closely at our packet. 

And now to the packet that got executed. 

Wow! It came from an EXTERNAL thread. What could this mean? External threads are threads that don’t belong to the game.

Our own packets get ignored, but an external thread draws our game? What? Hey, this tool went nuts!

No, the image is quite right. The explanation is that on the Windows* system (starting with Windows Vista*), there is a program called Desktop Window Manager (DWM), which does the actual composition on the screen. Its packets are the ones we see executing at a fast rate with high priority.  And no, our packets aren’t lost—they are intercepted by DWM to create the final picture.

But why would DWM get involved in a full- screen game? After thinking a while, I realized that the answer is simple: I have a multi-monitor desktop configuration. Switching my second monitor off the schema made the Warrior Wave behave like other games: normal GPU FPS, no glitches, and no DWM packets.

The patient will live! What a relief!

But other games still worked well even with a multi-monitor configuration, right (says the evil voice in the back of my head)?

To dig deeper, we need another tool to do that. Intel® GPA Platform Analyzer allows you to see CPU and GPU execution over time, but it doesn’t give you lower level details of each frame.

We would need to look more closely at the Direct3D* Device creation code. For this we could use Intel® GPA Frame Analyzer for DirectX*, but this is a topic for another article.

So let’s summarize what we have learned:

During this investigation we were able to detect poor usage of threads that led to FPS surges and a nasty DWM problem that was easily fixed by switching the second monitor of the desktop schema.

Conclusion: Intel® GPA Platform Analyzer is a must-have tool for initial investigation of the problem. Get familiar with it and add it to your toolbox.

About the Author:

Alexander Raud works in the Intel® Graphics Performance Analyzers team in Russia and previously worked on the VTune Amplifier. Alex has dual citizenship in Russia and the EU, speaks Russian, English, some French, and is learning Spanish.  Alex has a wife and two children and still manages to play Progressive Metal professionally and head the International Ministry at Jesus Embassy Church.

Performance Considerations for Resource Binding in Microsoft DirectX* 12

$
0
0

By Wolfgang Engel, CEO of Confetti

With the release of Windows* 10 on July 29 and the release of the 6th generation Intel® Core™ processor family (code-name Skylake), we can now look closer into resource binding specifically for Intel® platforms.

The previous article “Introduction to Resource Binding in Microsoft DirectX* 12” introduced the new resource binding methods in DirectX 12 and concluded that with all these choices, the challenge is to pick the most desirable binding mechanism for the target GPU, types of resources, and their frequency of update.

This article describes how to pick different resource binding mechanisms to run an application efficiently on specific Intel’s GPUs.

Tools of the Trade

To develop games with DirectX 12, you need the following tools:

  • Windows 10
  • Visual Studio* 2013 or higher
  • DirectX 12 SDK comes with Visual Studio
  • DirectX 12-capable GPU and drivers

Overview

A descriptor is a block of data that describes an object to the GPU, in a GPU-specific opaque format. DirectX 12 offers the following descriptors, previously named “resource views” in DirectX 11:

  • Constant buffer view (CBV)
  • Shader resource view (SRV)
  • Unordered access view (UAV)
  • Sampler view (SV)
  • Render target view (RTV)
  • Depth stencil view (DSV)
  • and others

These descriptors or resource views can be considered a structure (also called a block) that is consumed by the GPU front end. The descriptors are roughly 32–64 bytes in size and hold information like texture dimensions, format, and layout.

Descriptors are stored in a descriptor heap, which represents a sequence of structures in memory.

A descriptor table holds offsets into this descriptor heap. It maps a continuous range of descriptors to shader slots by making them available through a root signature. This root signature can also hold root constants, root descriptors, and static samplers.

Descriptors, descriptor heap, descriptor tables, root signature

Figure 1. Descriptors, descriptor heap, descriptor tables, root signature.

Figure 1 shows the relationship between descriptors, a descriptor heap, descriptor tables, and the root signature.

The code that Figure 1 describes looks like this:

// the init function sets the shader registers
// parameters: type of descriptor, num of descriptors, base shader register
// the first descriptor table entry in the root signature in
// image 1 sets shader registers t1, b1, t4, t5
// performance: order from most frequent to least frequent used
D3D12_DESCRIPTOR_RANGE Param0Ranges[3];
Param0Ranges[0].Init(D3D12_DESCRIPTOR_RANGE_SRV, 1, 1); // t1 Param0Ranges[1].Init(D3D12_DESCRIPTOR_RANGE_CBV, 1, 1); // b1 Param0Ranges[2].Init(D3D12_DESCRIPTOR_RANGE_SRV, 2, 4); // t4-t5

// the second descriptor table entry in the root signature
// in image 1 sets shader registers u0 and b2
D3D12_DESCRIPTOR_RANGE Param1Ranges[2]; Param1Ranges[0].Init(D3D12_DESCRIPTOR_RANGE_UAV, 1, 0); // u0 Param1Ranges[1].Init(D3D12_DESCRIPTOR_RANGE_CBV, 1, 2); // b2

// set the descriptor tables in the root signature
// parameters: number of descriptor ranges, descriptor ranges, visibility
// visibility to all stages allows sharing binding tables
// with all types of shaders
D3D12_ROOT_PARAMETER Param[4];
Param[0].InitAsDescriptorTable(3, Param0Ranges, D3D12_SHADER_VISIBILITY_ALL);
Param[1].InitAsDescriptorTable(2, Param1Ranges, D3D12_SHADER_VISIBILITY_ALL); // root descriptor
Param[2].InitAsShaderResourceView(1, 0); // t0
// root constants
Param[3].InitAsConstants(4, 0); // b0 (4x32-bit constants)

// writing into the command list
cmdList->SetGraphicsRootDescriptorTable(0, [srvGPUHandle]);
cmdList->SetGraphicsRootDescriptorTable(1, [uavGPUHandle]);
cmdList->SetGraphicsRootConstantBufferView(2, [srvCPUHandle]);
cmdList->SetGraphicsRoot32BitConstants(3, {1,3,3,7}, 0, 4);

The source code above sets up a root signature that has two descriptor tables, one root descriptor, and one root constant. The code also shows that root constants have no indirection and are directly provided with the SetGraphicsRoot32bitConstants call. They are routed directly into the shader registers; there is no actual constant buffer, constant buffer descriptor, or binding happening. Root descriptors have only one level of indirection, because they store a pointer to memory (descriptor->memory), and descriptor tables have two levels of indirection (descriptor table -> descriptor-> memory).

Descriptors live in different heaps depending on their types, such as SV and CBV/SRV/UAV. This is due to wildly inconsistent sizes of descriptor types on different hardware platforms. For each type of descriptor heap, there should be only one heap allocated because changing heaps could be expensive.

In general DirectX 12 offers an allocation of more than one million descriptors upfront, enough for a whole game level. While previous DirectX versions dealt with allocations in the driver on their own terms, with DirectX 12 it is possible to avoid any allocations during runtime. That means any initial allocation of a descriptor can be taken out of the performance “equation.”

Note: With 3rd generation Intel® Core™ processors (code-name Ivy Bridge)/4th generation Intel® Core™ processor family (code-name Haswell) and DirectX 11 and the Windows Display Driver Model (WDDM) version 1.x, resources were dynamically mapped into memory based on the resources referenced in the command buffer with a page table mapping operation. This way copying data was avoided. The dynamic mapping was important because those architectures only offer 2 GB of memory to the GPU (Intel® Xeon® processor E3-1200 v4 product family (code-name Broadwell) offers more).
With DirectX 12 and WDDM version 2.x, it is no longer possible to remap resources into the GPU virtual address space as necessary, because resources have to be assigned a static virtual address when created and therefore the virtual address of resources cannot change after creation. Even if a resource is “evicted” from GPU memory, it maintains its virtual address for later when it is made resident again.
Therefore the overall available memory of 2 GB in Ivy Bridge/Haswell can become a limiting factor.

As stated in the previous article, a perfectly reasonable outcome for an application might be a combination of all types of bindings: root constants, root descriptors, descriptor tables for descriptors gathered on-the-fly as draw calls are issued, and dynamic indexing of large descriptor tables.

Different hardware architectures will show different performance trade-offs between using sets of root constants and root descriptors versus using descriptor tables. Therefore it might be necessary to tune the ratio between root parameters and descriptor tables depending on the hardware target platforms.

Expected Patterns of Change

To understand which kinds of change incur an additional cost, we have to analyze first how game engines typically change data, descriptors, descriptor tables, and root signatures.

Let’s start with what is called constant data. Most game engines store usually all constant data in “system memory.” The game engine will change data in CPU accessible memory and then later on during the frame, a whole block of constant data is copied/mapped into GPU memory and then read by the GPU through a constant buffer view or through the root descriptor.

If the constant data is provided through SetGraphicsRoot32BitConstants() as a root constant, the entry in the root descriptor does not change but the data might change. If it is provided through a CBV == descriptor and then a descriptor table, the descriptor doesn’t change but the data might change.

In case we need several constant buffer views—for example, for double or triple buffered rendering— the CBV or descriptor might change for each frame in the root signature.

For texture data, it is expected that the texture is allocated in GPU memory during startup. Then an SV == descriptor will be created, stored in a descriptor table or a static sampler, and then referenced in the root descriptor. The data and the descriptor or static sample do not change after that.

For dynamic data like changing texture or buffer data (for example, textures with rendered localized text, buffers of animated vertices or procedurally generated meshes), we allocate a render target or buffer, provide an RTV or UAV, which are descriptors, and then these descriptors might not change from there on. The data in the render target or buffer might change.

In case we need several render targets or buffers—for example, for double or triple buffered rendering—the descriptors might change for each frame in the root signature.

For the following discussion, a change is considered important for binding resources if it does the following:

  • Changes/replaces a descriptor in a descriptor table, for example, the CBVs, RTVs, or UAVs described above
  • Changes any entry in the root signature

Descriptors in Descriptor Tables with Haswell/Broadwell

On platforms based on Haswell/Broadwell, the cost of changing one descriptor table in the root signature is equivalent to changing all descriptor tables. Changing one argument means that the hardware has to make a copy (version) of all the current arguments. The number of root parameters in a root signature is the amount of data that the hardware has to version when any subset changes.

Note: All the other types of memory in DirectX 12, like descriptor heaps, buffer resources, and so on, are not versioned by hardware.

In other words, changing all of the parameters is roughly the same cost as just changing one (see [Lauritzen] and [MSDN]). Changing none is still the cheapest, but not that useful.

Note: Other hardware, that has for example a split between fast / slow (spill) root argument storage only has to version the region of memory where the argument changed – either the fast area or the spill area.

On Haswell/Broadwell, an additional cost of changing descriptor tables can come from the limited size of the binding table in hardware.

Descriptor tables on those hardware platforms use “binding table” hardware. Each binding table entry is a single DWORD that can be considered an offset into the descriptor heap. The 64 KB ring can store 16,384 binding table entries.

In other words the amount of memory consumed per draw call is dependent on the total number of descriptors that are indexed in a descriptor table and then referenced through a root signature.

In case we run out of the 64 KB memory for the binding table entries, the driver will allocate another 64 KB binding table. The switch between those tables leads to a pipeline stall as shown in Figure 2.

Pipeline stall (courtesy of Andrew Lauritzen)

Figure 2. Pipeline stall (courtesy of Andrew Lauritzen).

For example a root signature references 64 descriptors in a descriptor table. The stall will happen every 16,384 / 64 = 256 draw calls.

Because changing a root signature is considered cheap, having multiple root signatures with a low number of descriptors in the descriptor table is favorable over having root signatures with a larger amount of descriptors in the descriptor table.

Therefore it is favorable on Haswell/Broadwell to keep the number of descriptors referenced in descriptor tables as low as possible.

What does this mean for renderer designs? Using more descriptor tables with less descriptors and therefore more root signatures should increase the number of pipeline state objects (PSO), because with an increased number of root signatures the number of PSOs needs to increase because of the one-to-one relationship between these two.

Having more pipeline state objects might lead to a larger number of shaders that, in this case, might be more specialized, instead of longer shaders that offer a wider range of features, which is the common recommendation.
 

Root Constants/Descriptors on Haswell/Broadwell

Similar to where changing one descriptor table is the same cost compared to changing all of them, changing one root constant or root descriptor is the equivalent to changing all of them (see [Lauritzen]).

Root constants are implemented with “push constants” that are a buffer that hardware uses to prepopulate Execution Unit (EU) registers. Because the values are immediately available when the EU thread launches, it can be a performance win to store constant data as root constants, instead of storing them with descriptor tables.

Root descriptors are implemented as “push constants” as well. They are just pointers passed as constants to the shader, reading data through the general memory path.

Descriptor Tables versus Root Constants/Descriptors on Haswell/Broadwell

Now that we looked at the way descriptor tables, root constants, and descriptors are implemented, we can answer the main question of this article: is one favorable over the other? Because of the limited size of binding table hardware and the potential stalls resulting from crossing this limit, changing root constants and root descriptors is expected to be cheaper on Haswell/Broadwell hardware because they do not use the binding table hardware. For root descriptors and root constants, this is especially recommended in case the data changes every draw call.

Static Samplers on Haswell/Broadwell

As described in the previous article, it is possible to define samplers in the root signature or right in the shader with HLSL root signature language. These are called static samplers.

On Haswell/Broadwell hardware, the driver will place static samplers in the regular sampler heap. This is equivalent to putting them into descriptors manually. Other hardware implements samplers in shader registers, so static samplers can be compiled directly into the shader.

In general static samplers should be a win on many platforms, so there is no downside to using them. On Haswell/Broadwell hardware there is still the chance that by increasing the number of descriptors in a descriptor table, we end up more often with a pipeline stall, because descriptor table hardware has only 16,384 slots to offer.

Here is the syntax for a static sampler in HLSL:

StaticSampler( sReg,
               [ filter = FILTER_ANISOTROPIC,
               addressU = TEXTURE_ADDRESS_WRAP,
               addressV = TEXTURE_ADDRESS_WRAP,
               addressW = TEXTURE_ADDRESS_WRAP,
               mipLODBias = 0.f,     maxAnisotropy = 16,
               comparisonFunc = COMPARISON_LESS_EQUAL,
               borderColor = STATIC_BORDER_COLOR_OPAQUE_WHITE,
               minLOD = 0.f, maxLOD = 3.402823466e+38f,
               space = 0, visibility = SHADER_VISIBILITY_ALL ])

Most of the parameters are self-explanatory because they are similar to the C++ level usage. The main difference is the border color: on the C++ level it offers a full color range while the HLSL level is restricted to opaque white/black and transparent black. An example for a static shader is:

StaticSampler(s4, filter=FILTER_MIN_MAG_MIP_LINEAR)

Skylake

Skylake allows dynamic indexing of the entire descriptor heap (~1 million resources) in one descriptor table. That means one descriptor table could be enough to index all the available descriptor heap memory.

Compared to previous architectures, it is not necessary to change descriptor table entries in the root signature as often. That also means that the number of root signatures can be reduced. Obviously different materials will require different shaders and therefore different PSOs. But those PSOs can reference the same root signatures.

With modern rendering engines utilizing less shaders than their DirectX 9 and 11 ancestors so that they can avoid the cost of changing shaders and the attached states, reducing the number of root signatures and therefore the number of PSOs is favorable and should result in a performance gain on any hardware platform.

Conclusion

Focusing on Haswell/Broadwell and Skylake, the recommendation for developing performant DirectX 12 applications are dependent on the underlying platform. While for Haswell/Broadwell, the number of descriptors in a descriptor table should be kept low, for Skylake it is recommended to keep this number high and decrease the number of descriptor tables.

To achieve optimal performance, the application programmer can check during startup for the type of hardware and then pick the most efficient resource binding pattern. (There is a GPU detect example that shows how to detect different Intel® hardware architectures at https://software.intel.com/en-us/articles/gpu-detect-sample/) The choice of resource binding pattern will influence how shaders for the system are written.

About the Author

Wolfgang is the CEO of Confetti. Confetti is a think-tank for advanced real-time graphics research and a service provider for the video game and movie industry. Before cofounding Confetti, Wolfgang worked as the lead graphics programmer in Rockstar's core technology group RAGE for more than four years. He is the founder and editor of the ShaderX and GPU Pro books series, a Microsoft MVP, the author of several books and articles on real-time rendering and a regular contributor to websites and conferences worldwide. One of the books he edited, ShaderX4, won the Game developer Front line award in 2006. Wolfgang is in many advisory boards throughout the industry; one of them is the Microsoft’s Graphics Advisory Board for DirectX 12. He is an active contributor to several future standards that drive the Game Industry. You can find him on twitter at wolfgangengel. Confetti's website is  www.conffx.com

Acknowledgement

I would like to thank the reviewers of this article:

  • Andrew Lauritzen
  • Robin Green
  • Michal Valient
  • Dean Calver
  • Juul Joosten
  • Michal Drobot

References and Related Links

** Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark* and MobileMark*, are measured using specific computer systems, components, software, operations, and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.

How Intugine Integrated the Nimble* Gesture Recognition Platform with Intel® RealSense™ Technology

$
0
0

Shwetha Doss, Senior Application Engineer, Intel Corporation

Harshit Shrivastava, Founder and CEO, Intugine Technologies

Abstract

Intel® RealSense™ technology helps developers enable a natural user interface (NUI) for their gesture recognition platforms. The gesture recognition platform seamlessly integrates with Intel RealSense technology for NUI across segments of applications on Microsoft Windows* platforms. The gesture recognition platform handles all interactions with the user and the Intel® RealSense™ SDK, ensuring that no code changes are required for individual applications.

This paper highlights how Intugine (http://www.intugine.com/) enabled its gesture recognition platforms for Intel® RealSense™ technology. It also discusses how the same methodology can be applied to other applications related to games and productivity applications.

Introduction

Intel® RealSense™ technology adds “human-like” senses to computing devices. Intel® is working with OEMs to create future computing devices that will be able to hear, see, and feel the environment, as well as understand human emotion and a human’s sensitivity to context. These devices will interact with humans in immersive, natural, and intuitive ways.

Intel® RealSense™ technology understands four important modes of communication: hands, the face, speech, and the environment around you. This multi-modal processing will enable the devices to behave more like humans.

The Intel® RealSense™ Camera

The Intel® RealSense™ camera uses depth-sensing technology so that computing devices see more like you do. To harness the possibilities of the Intel® RealSense™ technology, developers need to use the Intel® RealSense™ SDK along with the Intel® RealSense™ camera. There are two camera options: theF200 and the R200. These Intel-developed depth cameras support full VGA depth resolution, full 1080p RGB resolution, and require USB 3.0. Both cameras support depth and IR processing at 640×480 resolution at 60 frames per second (FPS).

There are many OEM devices with integrated Intel® RealSense™ cameras available, including Ultrabooks*, tablets, notebooks, 2 in1s, and all-in-one form factors.

Gesture Recognition Platform

Figure 1. Intel® RealSense™ cameras.

The Intel® RealSense™ camera (F200)

Figure 2. The Intel® RealSense™ camera (F200).

The infrared (IR) laser projector on the Intel RealSense camera (F200) sends non-visible patterns (coded light) onto the object. The IR camera captures the reflected patterns. These patterns are processed by the ASIC, which assigns depth values to each pixel to create a depth video frame.

Applications see both depth and color video streams. The ASIC syncs depth with color stream (texture mapping) using a UVC time stamp and generates data flags for each depth value (valid, invalid, or motion detected.) The range of the F200 camera is about 120 cm.

The Intel® RealSense™ camera (R200)

Figure 3. The Intel® RealSense™ camera (R200).

The R200 camera actually has three cameras providing RGB (color) and stereoscopic IR to produce depth. With the help of a laser projector, the camera does 3D scanning for scene perception and enhanced photography. The inside range is approximately 0.5–3.5 meters, and the outside range is up to 10 meters.

Intel® RealSense™ SDK

The Intel® RealSense™ SDK includes a set of pattern detection and recognition algorithm implementations exposed through standardized interfaces. These algorithms implementations enable the application developer’s focus to move from coding the algorithm details to innovating on the usage of these algorithms.

Intel® RealSense™ SDK Architecture

The SDK library architecture consists of several components. The essence of the SDK functionalities lays in the I/O modules and the algorithm modules. The I/O modules retrieve input from the input device or send output to an output device.

The algorithm module includes various pattern detection and recognition algorithms related to face recognition, gesture recognition, and speech recognition.

The Intel® RealSense™ SDK architecture

Figure 4. The Intel® RealSense™ SDK architecture.

The Intel® RealSense™ SDK

Figure 5. The Intel® RealSense™ SDK provides 78-point face landmarks.

The Intel® RealSense™ SDK provides skeletal tracking

Figure 6. The Intel® RealSense™ SDK provides skeletal tracking.

Intugine Nimble*

Intugine Nimble* is a high-accuracy, motion-sensing wearable device. The setup consists of a USB sensor and two wearable devices: a ring and a finger clip. The sensor tracks the movement of rings in 3D space with sub-millimeter accuracy and low latency. The device works on computer vision, where the rings do a certain patterned emission in a narrow nanometer bandwidth, and the sensor is coupled to see only that wavelength. The software algorithm sitting on the host device recognizes the emitted pattern and tracks the rings individually. The software generates the coordinates of the rings at a high frame rate of over 60 coordinates per second, for each ring.

The Intugine Nimble

Figure 7. The Intugine Nimble* effectively replaces the mouse and keyboard.

I.

Applications With Nimble

Some of the available applications that Nimble can control are games such as Fruit Ninja*, Angry Birds*, and Counter-Strike* and utility applications such as Microsoft PowerPoint* and media players. These available applications are currently controlled by mouse and keyboard inputs. To control them with Nimble, we need to generate the keyboard and mouse events programmatically.

The software module that takes care of the keyboard and mouse events is called the interaction layer. Nimble uses a proprietary software interaction layer to interact with existing games and applications. The interaction layer maps the user’s fingertip coordinates to the application/OS recognizable mouse and keyboard events.

Nimble with the Intel® RealSense™ SDK

The Intel® RealSense™ SDK can detect IR emissions of 860 nm. The patterned emission of Nimble rings can be customized to a certain wavelength range. Replacing the emission source in the ring by an 860 nm emitter, the ring emits similar patterns in the 860 nm range. The Intel® RealSense™ SDK can sense these emissions, which can be taken as an image stream and then tracked using the SDK. By implementing Nimble pattern recognition and tracking algorithms in the Intel® RealSense™ SDK, we get the coordinates of individual rings at 60 FPS.

Intel® RealSense™ SDK’s design avoids most of lens and curvature defects, which allows a better scaled motion tracking of Nimble rings. The IR resolution of 640×480 generates refined spatial coordinate information. The Intel® RealSense™ SDK supports up to 300 FPS in the IR stream, which provides almost zero latency in Nimble’s tracking and provides an extremely responsive experience.

Nimble technology is designed to track only the emissions of rings and thus misses the details of skeletal tracking that might be required for a few applications.

The Intugine Nimble

Figure 8. The Intugine Nimble* along with Intel® RealSense™ technology.

Value proposition for Intel® RealSense™ Technology

Nimble along with Intel® RealSense™ technology can support a wide range of existing applications. Currently over 100 applications are working seamlessly without needing any source-code modifications. And potentially most of the Microsoft* Windows and Android* applications can work with this solution.

Currently the Intel® RealSense™ camera (F200) supports a range of 120 cm. With the addition of Nimble, this range can extend to over 15 feet.

Nimble allows sub-millimeter accurate finger tracking within a range of 3 feet and sub-centimeter accurate tracking within a range of 15 feet. This enables many high-accuracy games and applications to be used with better control.

Nimble along with Intel® RealSense™ technology reduces the application latency to less than 5 milliseconds.

Nimble along with Intel® RealSense™ technology can support multiple rings together; we have tested up to eight rings with Intel® RealSense™ technology.

Summary

Nimble’s interaction layer along with Intel® RealSense™ technology can help add gesture support to any application without any changes to the source code. Using this technology, applications in Windows* and Android* platforms can add gesture support with minimal efforts.

For More Information

  1. Intel® RealSense™ technology: http://www.intel.in/content/www/in/en/architecture-and-technology/realsense-overview.html
  2. Intugine: http://www.intugine.com/
  3. https://software.intel.com/en-us/articles/realsense-r200-camera

Intel® C++ Composer XE 2013 SP1 for Windows*, Update 6

$
0
0

Intel® C++ Composer XE 2013 SP1 Update 6 includes the latest Intel C/C++ compilers and performance libraries for IA-32 and Intel® 64 architecture systems. This new product release now includes: Intel® C++ Compiler XE Version 14.0.6, Intel® Math Kernel Library (Intel® MKL) Version 11.1 Update 4, Intel® Integrated Performance Primitives (Intel® IPP) Version 8.1 Update 1, Intel® Threading Building Blocks (Intel® TBB) Version 4.2 Update 5, Intel(R) Debugger Extension 7.5-1.0 for Intel(R) Many Integrated Core Architecture.

New in this release:

Note:  For more information on the changes listed above, please read the individual component release notes. See the previous releases's ReadMe to see what was new in that release.

Resources

Contents
File: w_ccompxe_online_2013_sp1.6.241.exe
Online installer

File: w_ccompxe_2013_sp1.6.241.exe
Product for developing 32-bit and 64-bit applications

File:  w_ccompxe_redist_msi_2013_sp1.6.241.zip
Redistributable Libraries for 32-bit and 64-bit msi files

File:  get-ipp-8.1-crypto-library.htm
Cryptography Library


Intel® Visual Fortran Composer XE 2013 SP1 for Windows* with Microsoft Visual Studio 2010 Shell & Libraries*, Update 6

$
0
0

Intel® Visual Fortran Composer XE 2013 SP1 Update 6 includes the latest Intel Fortran compilers and performance libraries for IA-32 and Intel® 64 architecture systems. This new product release now includes: Intel® Visual Fortran Compiler XE Version 14.0.6, Intel® Math Kernel Library (Intel® MKL) Version 11.1 Update 4, Intel® Debugger Extension 7.5-1.0 for Intel® Many Integrated Core Architecture (Intel® MIC Architecture)

New in this release:

Note:  For more information on the changes listed above, please read the individual component release notes. See the previous releases's ReadMe to see what was new in that release.

Resources

Contents
File:  w_fcompxe_online_2013_SP1.6.241.exe
Online installer

File:  w_fcompxe_2013_sp1.6.241.exe
Product for developing 32-bit and 64-bit applications (with Microsoft Visual Studio 2010 Shell & Libraries*, English version)

File:  w_fcompxe_all_jp_2013_sp1.6.241.exe
Product for developing 32-bit and 64-bit applications (with Microsoft Visual Studio 2010 Shell & Libraries*, Japanese version)

File:  w_fcompxe_redist_msi_2013_sp1.6.241.zip 
Redistributable Libraries for 32-bit and 64-bit msi files

Intel® Visual Fortran Composer XE 2013 SP1 for Windows* with IMSL*, Update 6

$
0
0

Intel® Visual Fortran Composer XE 2013 SP1 Update 6 includes the latest Intel Fortran compilers and performance libraries for IA-32 and Intel® 64 architecture systems. This new product release now includes: Intel® Visual Fortran Compiler XE Version 14.0.6, Intel® Math Kernel Library (Intel® MKL) Version 11.1 Update 4, Intel® Debugger Extension 7.5-1.0 for Intel® Many Integrated Core Architecture (Intel® MIC Architecture), IMSL* Fortran Numerical Library Version 7.0.1

New in this release:

Note:  For more information on the changes listed above, please read the individual component release notes. See the previous releases's ReadMe to see what was new in that release.

Resources

Contents
File:  w_fcompxe_online_2013_sp1.6.241.exe
Online installer

File:  w_fcompxe_2013_sp1.6.241.exe
Product for developing 32-bit and 64-bit applications (with Microsoft Visual Studio 2010 Shell & Libraries*, English version)

File:  w_fcompxe_all_jp_2013_sp1.6.241.exe
Product for developing 32-bit and 64-bit applications (with Microsoft Visual Studio 2010 Shell & Libraries*, Japanese version)

File:  w_fcompxe_redist_msi_2013_sp1.6.241.zip 
Redistributable Libraries for 32-bit and 64-bit msi files

File:  w_fcompxe_imsl_2013_sp1.0.024.exe 
IMSL* Library for developing 32-bit and 64-bit applications

3D People Full-Body Scanning System With Intel® RealSense™ 3D Cameras and Intel® Edison: How We Did It

$
0
0

By Konstantin Popov of Cappasity

Cappasity has been developing 3D scanning technologies for two years. This year we are going to release a scanning software product for Ultrabook™ devices and tablets with Intel® RealSense™ cameras: Cappasity Easy 3D Scan*. Next year we plan to create hardware and software solutions to scan people and objects. 
 
As an Intel® Software Innovator and with the help of the Intel® team, we were invited to show the prototype of the people scanning system much earlier than planned. We had limited time for preparations, but still we decided to take on the challenge. In this article I'll explain how we created our demo for the Intel® Developer Forum 2015 held August 18– 20 in San Francisco.

Cappasity instant 3D body scan

Our demo is based upon previously developed technology that combines the multiple depth cameras and the RGB cameras into a single scanning system (U.S. Patent Pending). The general concept is as follows: we calibrate the positions, angles, and optical properties of the cameras. This calibration allows us to merge the data for subsequent reconstruction of the 3D model. To capture the scene in 3D we can place the cameras around the scene, rotate the camera system around the scene, or rotate the scene itself in front of the cameras.
 
We selected the Intel® RealSense™ camera because we believe that it's an optimum value-for-money solution for our B2B projects. At present we are developing two prototype systems using several Intel® RealSense™ cameras: a scanning box with several 3D cameras for instant scanning and a system for full-body people scanning.
 
We demonstrated both prototypes at IDF 2015. The people scanning prototype operated with great success for the three days of the conference, scanning many visitors who came to our booth.

A system for full-body people scanning

Now let's see how it works. We attached three Intel® RealSense™ cameras to a vertical bar so that the bottom camera is aimed at the feet and lower legs, the middle camera captures the legs and the body, and the top-most camera films the head and the shoulders.

Three Intel RealSense cameras attached to a vertical bar

Each camera is connected to a separate Intel® NUC computer, and all the computers are connected to the local area network.
 
Since the cameras are mounted onto a fixed bar, we used a rotating table to rotate the person being filmed. The table construction is quite basic: a PLEXIGLAS* pad, roller bearings, and a step motor. The table is connected to the PC via an Intel® Edison board; it receives commands through the USB port.

The table is connected to the PC via an Intel® Edison board

a simple lighting system to steadily illuminate the front

We also used a simple lighting system to steadily illuminate the front of a person being filmed. In the future, all these components will be built into a single box, but at present we were just demonstrating an early prototype of the scanning system, so we had to assemble everything using a commercially available component.

Cappasity fullbody scan

Our software operates based on the client-server architecture, but the server part can be run on almost any modern PC. That is, any computer that performs our calculations is a "server" in our system. We often use an ordinary Ultrabook® with Intel® HD Graphics as a server. The server sends the recording command to the Intel® NUC computers, gets the data from them, then analyzes and rebuilds the 3D model. 
 
Now, let's look at some particular aspects of the task we are trying to solve. The 3D rebuilding technology that we use in the Cappasity products is based upon our implementation of the Kinect* Fusion algorithm. But in this case our challenge was much more complex: we had only one month to create an algorithm to reconstruct the data from several sources. We called it "Multi-Fusion." In its present state the algorithm can merge the data from an unlimited number of sources into a single voxel volume. For scanning people three data sources were enough.
 
Calibration is the first stage. The Cappasity software allows the devices to be calibrated pairwise. Our studies from the year we spent in R&D came in pretty handy in preparation for IDF 2015. In just a couple of weeks we reworked the calibration procedure and implemented support for voxel volumes after Fusion. Previously the calibration process was more involved with processing the point cloud. The system needs to be calibrated just once, after the cameras are installed. Calibration takes no more than 5 minutes.
 
Then we had to come up with a data-processing approach, and after doing some research we chose post-processing. That is, first we record the data from all cameras, then we upload the data to the server via the network, and then we begin the reconstruction process. All cameras record color and depth streams. As a result, we have the complete data cast for further processing. It is convenient considering that the post-processing algorithms are constantly improved, and the ones we're using were written in just a couple of days before IDF.
 
Compared to the Intel® RealSense™ camera (F200), the Intel® RealSense™ camera (long-range R200) performs better with black color and complex materials. We had few glitches in tracking. The most important thing, however, is that the cameras allow us to capture the images at the required range. We have optimized the Fusion reconstruction algorithm for OpenCL* to achieve good performance even on Intel® HD Graphics 5500 and later. To remove the noise we used Fusion plus additional data segmentation after a single mesh was composed.

Fusion plus additional data segmentation after a single mesh was composed

High resolution texture mapping algorithm

In addition, we have refined the high-resolution texture mapping algorithm. We use the following approach: we capture the image at the full resolution of the color camera, and then we project the image onto the mesh. We are not using voxel color since it causes the texture quality to degrade. The projection method is quite complex to implement, but it allows us to use both built-in and external cameras as color sources. For example, the scanning box we are developing operates using DSLR cameras to get high-resolution textures, which is important for our e-commerce customers.
 
However, even the built-in Intel® RealSense™ cameras with RGB provide perfect colors. Here is a sample after mapping the textures:

Sample after mapping the textures

We are developing a new algorithm to eradicate the texture shifting. We plan to have it ready by the release of our Easy 3D Scan software product. 
 
Our seemingly simple demo is based upon complex code allowing us to compete with expensive scanning systems at USD 100K+ price range. The Intel® RealSense™ cameras are budget-friendly, which will help them revolutionize the B2B market.
 
Here are the advantages of our people scanning system:

  • It is an affordable solution, and it’s easy to setup and operate. Only a press of a button is needed.
  • Small size: the scanning system can be placed in retail areas, recreational centers, medical institutions, casinos, and so on.
  • The quality of the 3D models is suitable for 3D printing and for developing content for AR/VR applications.
  • The precision of the resulting 3D mesh is suitable for taking measurements.

 
We understand that the full potential of the Intel® RealSense™ cameras is yet to be uncovered. We are confident that at CES 2016 we'll be able to demonstrate significantly improved products.

Blend the Intel® RealSense™ Camera and the Intel® Edison Board with JavaScript*

$
0
0

Introduction

Smart devices can now connect to things we never before thought possible. This is being enabled by the Internet of Things (IoT), which allows these devices to collect and exchange data.

Intel has created Intel® RealSense™ technology, which includes the Intel® RealSense™ camera and the Intel® RealSense™ SDK. Using this technology, you can create applications that detect gesture and head movement, analyze facial data, perform background segmentation, read depth level, recognize and synthesize voice and more. Imagine that you are developing a super sensor that can detect many things. Combined with the versatile uses of the Intel® Edison kit and its output, you can build creative projects that are both useful and entertaining.

The Intel® RealSense™ SDK provides support to popular programming language and frameworks such as C++, C#, Java*, JavaScript*, Processing, and Unity*. This means that developers can get started quickly using a programming environment they are familiar with.

Peter Ma’s article, Using an Intel® RealSense™ 3D Camera with the Intel® Edison Development Platform, presents two examples of applications using C#. The first uses the Intel® RealSense™ camera as input and the Intel® Edison board as output. The result is that if you spread your fingers in front of Intel® RealSense™ camera, it sends a signal to the Intel® Edison board to turn on the light.

In the second example, Ma reverses the flow, with the Intel® Edison board as input and the Intel® RealSense™ camera as output. The Intel® Edison board provides data that comes from a sensor to be processed and presents it to us through the Intel® RealSense™ camera as voice synthesis to provide more humanized data.

Ma’s project inspired me to build something similar, but using JavaScript* instead of C#. I used the Intel® RealSense™ SDK to read and send hand gesture data to a node.js server, which then sends the data to the Intel® Edison board to trigger a buzzer and LED that are connected to it.

About the Project

This project is written in JavaScript*. If you are interested in implementing only a basic gesture, the algorithm module is already in the Intel® RealSense™ SDK. It gives you everything you need.

Hardware

Requirements:

Intel® Edison board with the Arduino breakout board

The Intel® Edison board is a low-cost, general-purpose computer platform. It uses a 22nm dual-core Intel® Atom™ SoC running at 500 MHz. It supports 40 GPIOs and includes 1 GB LPDDR3 RAM, 4 GB EMMC for storage, dual-band Wi-Fi, and Bluetooth, and has a small size.

The board runs the Linux* kernel and is compatible with Arduino, so it can run an Arduino implementation as a Linux* program.

 


Figure 1. Intel® Edison breakout board kit.

Grove Starter Kit Plus - Intel® XDK IoT Edition

Grove Starter Kit Plus - Intel® XDK IoT Edition is designed for the Intel® Galileo board Gen 2, but it is fully compatible with the Intel® Edison board via the breakout board kit.

The kit contains sensors, actuators, and shields, such as a touch sensor, light sensor, and sound sensor, and also contains an LCD display as shown in Figure 2. This kit is an affordable solution for developing an IoT project.

You can purchase the Grove Starter Kit Plus here: 


Figure 2. Grove* Starter Kit Plus - Intel® XDK IoT Edition

Intel® RealSense™ Camera

The Intel® RealSense™ camera is built for game interactions, entertainment, photography, and content creation with a system-integrated or a peripheral version. The camera’s minimum requirements are a USB 3.0 port, a 4th gen Intel Core processor, and 8 GB of hard drive space.

The camera (shown in Figure 3) features full 1080p color and an in-depth sensor and gives the PC a 3D visual and immersive experience.


Figure 3. Intel® RealSense™ camera

You can purchase the complete developer kit, which includes the camera here.

GNU/Linux* server

A GNU/Linux* server is easy to develop. You can use an old computer or laptop or you can put the server on a cloud. I used a cloud server with an Ubuntu* server. If you have different Linux* flavors for the server, just adapt to your favorite command.

Software

Before we start to develop the project, make sure you have the following software installed on your system. You can use the links to download the software.

Set Up the Intel® RealSense™ Camera

To set up the Intel® RealSense™ camera, connect the Intel® RealSense™ camera (F200) to the USB 3.0 port, and then install the driver as the camera connected to your computer. Navigate to the Intel® RealSense™ SDK location, and open the JavaScript* sample on your browser:

Install_Location\RSSDK\framework\JavaScript\FF_HandsViewer\FF_HandsViewer.html

After the file opens, the scripts checks to see what platform you have. While the script is checking your platform, click the link on your web browser to install the Intel® RealSense™ SDK WebApp Runtime.

When the installation is finished, restart your web browser, and then open the file again. You can check to see that the installation was a success by raising your hand in front of the camera. It should show your hand gesture data visualized on your web browser.

Gesture Set Up

The first key code line that enables gesture looks like the following: 

{"timeStamp":130840014702794340 ,"handId": 4,"state": 0,"frameNumber":1986 ,"name":"spreadfinger"
}

This sends "name":"spreadfingers" to the server to be processed.

Next, we will write some JavaScript* code to stream gesture data from the Intel® RealSense™ camera to the Intel® Edison board through the node.js server.

Working with JavaScript*

Finally, we get to do some programming. I suggest that you first move the whole folder because the default installation doesn’t allow the original folder to be rewritten.

Copy the FF_HandsViewer folder from this location and paste it somewhere else. The folder’s location is:

\install_Location\RSSDK\framework\JavaScript\FF_HandsViewer\

Eventually, you will be able to create your own project folder to keep things managed.

Next, copy the realsense.js file from the location below and paste it inside the FF_HandsViewer folder:

Install_Location\RSSDK\framework\common\JavaScript

To make everything easier, let’s create one file named edisonconnect.js. This file will receive gesture data from the Intel® RealSense™ camera and send it to the node.js server. Remember that you have to change the IP address on the socket variable directing it to your node.js server IP address:

// var socket = io ('change this to IP node.js server');

var socket = io('http://192.168.1.9:1337');

function edisonconnect(data){
  console.log(date.name);
  socket.emit('realsense_signal',data);
}

Now for the most important step: commanding the file sample.js to create the gesture data, and running a thread to intercept that gesture data and pass it to edisonconnect.js. You don’t need to watch CPU activity because it doesn’t take much frame rate or RAM as it compiles.

// retrieve the fired gestures
for (g = 0; g < data.firedGestureData.length; g++){
  $('#gestures_status').text('Gesture: ' + JSON.stringify(data.firedGestureData[g]));

  // add script start - passing gesture data to edisonconnect.js
	edisonconnect(data.firedGestureData[g]);
  // add script end
}

After the function above is running and called to create some gesture data, the code below finishes the main task of the JavaScript* program.  After that, you have to replace the realsense.js file path.

It is critical to do the following: link the socket.io and edisonconnect.js files

<!DOCTYPE html><html><head><title> Intel&reg; RealSense&trade; SDK JavaScript* Sample</title><script src=”https://aubahn.s3.amazonaws.com/autobahnjs/latest/autobahn.min.jgz” </script><script src=”https://promisejs.org/polyfills/promise-6.1.0.js” </script><script src=”https://ajax.googleapis.com/ajax/libs/jquery/1.7.2/jquery.min.js” </script><script src=”https://common/JavaScript/realsense.js” </script><script src=”sample.js” </script><script src=”three.js” </script><!-- add script start --><script src=”https://cdn.socket.io/socket.io-1.3.5.js” </script><script src=”edisonconnect.js” </script><!-- add script end → <link rel=”stylesheet” type=”text/css” href=”style.css”></head><body>

The code is taken from SDK sample. It has been reduced in order to make the code simple and easy. The code is about to send gesture data to the server. The result is that the Intel® RealSense™ SDK was successful in understanding gesture and is ready to send it to the server.

Set Up the Server

We will use a GNU/Linux*-based server. I use an Ubuntu* server as the OS, but you can use any GNU/Linux* distribution that you familiar with. We will skip the installation server section, because related tutorials are readily found on the Internet.

Log in as a root user through SSH to configure the server.

As the server has just been installed, we need to update the repository list and upgrade the server. To do this, I will use a common command that is found on Ubuntu distribution but you can use a similar command depending on the GNU/Linux* distribution that you are using.

# apt-get update && apt-get upgrade

Once the repository list is updated, the next step is to install node.js.

# apt-get install nodejs

We also need to install npm Package Manager.

# apt-get install npm

Finally, install socket.io express from npm Package Manager.

# npm install socket.io express

Remember to create file server.js and index.html.

# touch server.js index.html

Edit the server.js file, using your favorite text editor such as vim or nano #

vim server.js

Write down this code:

var express   = require("express");
var app   	= express();
var port  	= 1337;

app.use(express.static(__dirname + '/'));
var io = require('socket.io').listen(app.listen(port));
console.log("Listening on port " + port);

io.on('connection', function(socket){'use strict';
  console.log('a user connected from ' + socket.request.connection.remoteAddress);

	// Check realsense signal
	socket.on('realsense_signal', function(data){
  	socket.broadcast.emit('realsense_signal',data);
  	console.log('Hand Signal: ' + data.name);
	});
  socket.on('disconnect',function(){
	console.log('user disconnected');
  });
});

var port = 1337; means that an available port has been assigned to port 1337. console.log("Listening on port " + port) ; indicates whether the data from JavaScript* has been received or not. The main code is socket.broadcast.emit('realsense_signal',data); this means the data is received and is ready to broadcast to all listening port and clients.

The last thing we need to do is to run the server.js file with node. If listening at port 1337 displays as shown in the screenshot below, you have been successful.
# node server.js

root@edison:~# node server.js
Listening on port 1337
events.js:85

Set up the Intel® Edison Board

The Intel® Edison SDK is easy to deploy. Refer to the following documentation:

Now it's time to put the code into the Intel® Edison board. This code connects the server and listens to any broadcast that comes from the server. It is like the code for the other server and listening step. If any gesture data is received, the Intel® Edison board triggers Digital Pins to On/Off.

Open the Intel® XDK IoT Edition and create a new project from Templates, using the DigitalWrite template, as shown in the screenshot below.

Edit line 9 in package.json. by adding dependencies socket.io-client. If it is empty,  search to find the proper installation. By adding dependencies, it will install the socket io client, if there was no client in the Intel® Edison board.

"dependencies": {"socket.io-client":"latest" // add this script
}

Find the file named main.js. You need to connect to the server in order to make sure that server is ready to listen. Next, check to see whether the gesture data name "spreadfingers" exists in that file, which will trigger Digital Pins2 and Digital Pins8 state to 1 / On and reversed.
Change the referring server IP’s addresses. If you want to change the pins, make sure you change on mraa.Gpio(selectedpins) too.

var mraa  = require("mraa");

var pins2 = new mraa.Gpio(2);
	pins2.dir(mraa.DIR_OUT);

var pins8 = new mraa.Gpio(8);
	pins8.dir(mraa.DIR_OUT);

var socket = require('socket.io-client')('http://192.168.1.9:1337');

socket.on('connect', function(){
  console.log('i am connected');
});

socket.on('realsense_signal', function(data){
  console.log('Hand Signal: ' + data.name);
  if(data.name=='spreadfingers'){
	pins2.write(1);
	pins8.write(1);
  } else {
	pins2.write(0);
	pins8.write(0);
  }
});

socket.on('disconnect', function(){
  console.log('i am not connected');
});

Select Install/Build, and then select Run after making sure the Intel® Edison board is connected to your computer.

Now make sure the server is up and running, and the Intel® RealSense camera and Intel® Edison board are connected to the Internet.

Conclusion

Using Intel® RealSense™ technology, this project modified the JavaScript* framework sample script to send captured gesture data to the Node.js server. But this project is only a beginning for more to come.

This is easy to code. The server broadcasts Gesture Data to any socket client that listening. The Intel® Edison board that installed with socket.io-client is listening to the broadcast from server. Because of that, Gesture Data name spreadfingers will trigger Digital Pins change state from 1 to 0 and vice versa.

Possibilities are endless. The Intel RealSense camera is lightweight, easy to bring and use. Intel® Edison is a powerful embedded PC. If we blend and connect the Intel® Edison and the Intel® RealSense™ camera with JavaScript*, it is easy to pack, code, and build an IoT device. You can create something great yet useful.

About the Author

Aulia Faqih - Intel® Software Innovator

Intel® RealSense™ Technology Innovator based in Yogyakarta, Indonesia, currently lecturing at UIN Sunan Kalijaga Yogyakarta. Love playing with Galileo / Edison, Web and all geek things.

Enabling IPP on OpenCV ( Windows* and Linux* Ubuntu* )

$
0
0

To set up the environment (Windows* systems):

  • Configuration of OpenCV 3.0.0 – Enabling IPP
    • Download OpenCV 3.0.0( http://www.opencv.org/ ) and CMake-3.2.3 (http://www.cmake.org/download/ )
    • Extract OpenCV where you want and install CMake and run CMake.
    • Add OpenCV’s location as the source location and choose a location where you want your build will be created.
    • To enable IPP you have 2 options. One you can just use ‘ICV’ that is a special IPP build for OpenCV which is free and the other option is that you can use your IPP from any Intel® software tool suites ( Intel® System Studio or Intel® Parallel Studio )if you have one.
    • To go with ICV just have WITH_IPP on. ICV package will download automatically and cmake configuration will catch it.
    • In order to enable IPP from Intel® Software Suites , you need to manually add an entry for IPP as well on top of setting WITH_IPP. Click ‘Add Entry’ and type in its name as ‘IPPROOT’. Choose its type as PATH and insert where your IPP is located.
    • If configuration gets done without a problem. Then it is ready to go

 

To set up the environment (Linux* Ubuntu* systems):

  • Configuration of OpenCV 3.0.0 – Enabling IPP
    • Download OpenCV 3.0.0( http://www.opencv.org/ )
    • Extract OpenCV where you want
    • Open a terminal and go to where you extracted OpenCV
    • As the same as Windows case, you can go with either ICV or IPP
    • For ICV, type  'cmake -D WITH_IPP=ON .'
    • Example configuration result for ICV
    • For IPP, type  'cmake -D WITH_IPP=ON -D IPPROOT=<Your IPP Location> .'
    • Example configuration result for IPP
    • If the configuration went without a problem, then proceed and type 'make -j4'
    • When building is done, type 'make install' to filnally install the library

 

Viewing all 533 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>