Quantcast
Channel: Intel Developer Zone Articles
Viewing all 533 articles
Browse latest View live

Intel® IPP ZLIB Coding Functions

$
0
0

1. Overview

ZLIB is a lossless data compression method and software library by Jean-loup Gailly and Mark Adler initially released in 1995 and became “de-facto” standard of lossless data compression. ZLIB is inherent part of almost all operating systems based on Linux*, including Android, OS X* and versions for embedded and mobile platforms. Many applications, including the software packages as HTTP servers, use ZLIB as one (and sometimes, as the only) data compression methods.

Intel® Integrated Performance Primitives (Intel® IPP) library has functionality supporting and optimizing ZLIB library since Intel® IPP version 5.2. Unlike other ZLIB implementations, Intel® IPP functions for ZLIB optimize not only data compression part, but decompression operations too.

This article describes how Intel® IPP supports ZLIB, Intel® IPP for ZLIB distribution model, recent changes in Intel® IPP ZLIB functionality in version 2017, provides performance data obtained on different Intel® platforms.

2. ZLIB and Intel® IPP Implementation

The distribution model of Intel® IPP for ZLIB is as follows:

  • Intel® IPP library package provides files for source code patching for all ZLIB major versions – from 1.2.5.3 to 1.2.8. This patches should be applied to ZLIB source code files downloaded from ZLIB repositories at zlib.net (latest ZLIB version), or zlib.net/fossils for previous versions of ZLIB;
  • After patch file applied, the source code contains a set of conditional compilation constructions with respect to WITH_IPP definition. For example (from file deflate.c):
send_bits(s, (STATIC_TREES<<1)+last, 3);
#if !defined(WITH_IPP)
compress_block(s, (const ct_data *)static_ltree,(const ct_data *)static_dtree);
#else
{
  IppStatus status;
  status = ippsDeflateHuff_8u( (const Ipp8u*)s->l_buf, (const Ipp16u*)s->d_buf,
                     (Ipp32u)s->last_lit, (Ipp16u*)&s->bi_buf, (Ipp32u*)&s->bi_valid,
                     (IppDeflateHuffCode*)static_ltree, (IppDeflateHuffCode*)static_dtree,
                     (Ipp8u*)s->pending_buf, (Ipp32u*)&s->pending );
 Assert( ippStsNoErr == status, "ippsDeflateHuff_8u returned a bad status" );
}
send_code(s, END_BLOCK, static_ltree);
#endif

So, when source code file is compiled with no WITH_IPP definition, the original ZLIB library is built. If “-DWITH_IPP” compiler option is used, the Intel® IPP-enabled ZLIB library produced. Of course, several other compiler/linker options are required to build ZLIB with IPP (look below).

Intel® IPP library has the following functions to support ZLIB functionality:
Common functions:

  • ippsAdler32_8u,
  • ippsCRC32_8u

For compression (deflate):

  • ippsDeflateLZ77Fast_8u,
  • ippsDeflateLZ77Fastest_8u,
  • ippsDeflateLZ77Slow_8u,
  • ippsDeflateHuff_8u,
  • ippsDeflateDictionarySet_8u,
  • ippsDeflateUpdateHash_8u

For decompression (inflate):

  • ippsInflateBuildHuffTable,
  • ippsInflate_8u.

6 source code files are patched in ZLIB source code tree with the optimized Intel® IPP functions:

  • adler32.c,
  •  crc32.c,
  • deflate.c,
  •  inflate.c,
  •  inftrees.h,
  •  trees.c.

In general, the most compute intensive parts of ZLIB code are substituted with Intel® IPP function calls, all common/service parts of ZLIB remain intact.

3. What’s New in Intel® IPP 2017 Implementation of ZLIB

Intel® IPP 2017 adds some significant enhancement for the ZLIB optimization code, including a faster CPU-specific optimization code, a new “fastest“ compression level with the best compression performance, deflate parameters tuning support, and additional compression levels support:

3.1 CPU-Specific Optimizations

Intel® IPP 2017 functions provide the additional optimization for new Intel® platforms. For particular ZLIB needs, Intel® IPP library 2017 contains the following optimizations:

  • Checksum computing using modern Intel® CPU instructions;
  • Hash table operation using modern Intel® CPU instructions;
  • Huffman tables generation functionality;
  • Huffman tables decomposition during inflating;
  • Additional optimization of pattern matching algorithms (new in Intel® IPP 2017)

3.2 New Fastest Compression Level

Intel® IPP 2017 for ZLIB implementation introduces a brand new compression level with best compression performance. This is done by simplifying pattern matching, thus by slightly decreasing compression ratio.
New compression level – called “fastest” – got numeric code of “-2” to distinguish it from ZLIB “default” compression level (Z_DEFAULT_COMPRESSION = -1).
The value of compression level decrease can be seen from the following table:

Data Compression Corpus

Ratio (level “fast”, 1)/ Performance* (MBytes/s)

Ratio (level “fastest”, -2) )/ Performance* (MBytes/s)

Large Calgary

2.80 / 86

2.10 (-0.7) / 197 (+111)

Canterbury

3.09 / 107

2.26 (-0.83)/ 294(+187)

Large (3 files)

3.10 / 97

2.01 (-1.09)/ 209(+112)

Silesia

2.80 / 89

2.16(-0.64) / 194(+105)

Note: "Compression ratio” in the table above is geometric mean of ratios of uncompressed file sizes to compressed file sizes, “performance” is number of input data megabytes compressed per second measured on Intel® Xeon® processor E5-2680 v3, 2.5 GHz, single thread.

3.3 Deflate Parameters Tuning

To give additional freedom in tuning of data compression parameters, in Intel® IPP 2017 for ZLIB the original deflateTune function is activated:

        ZEXTERN int ZEXPORT deflateTune OF((z_streamp strm, int good_length, int max_lazy,
                                        int nice_length, int max_chain));

The purpose and usage of function parameters is the same as in original ZLIB deflate algorithm. The modified deflate function itself loads the pattern matching parameters from configuration_table array of deflate.c with some pre-defined sets for each compression level.

3.4 Additional Compression Levels

The deflateTune function parameters give a freedom to modify compression search algorithm to obtain best “compression ratio”/”compression performance” ratio for particular customer needs. Nevertheless, the process of finding optimal parameter set is not straightforward, because actual behavior of compress functionality highly depends on input data specifics.

Intel® IPP team has done several experiments with different data and fixed some parameter sets as additional compression levels. The level values and input data characteristics are in the table below.

Additional compression levels

Input data

11-19

General data (text documents, binary files) of large size (greater than 1 MB)

21-29

Highly-compressible data (database tables, text documents with repeating phrases, large uncompressed pictures like BMPs, PPMs)

These sets are stored in array configuration_table in the file deflate.c. The affect to compression ratio in the levels, for example, from 11 to 19 in the same as of original levels from 1 to 9. That is, higher level provides better compression.You may use these sets, or discover your own.

4. Getting Started With Intel® IPP 2017 ZLIB

The process of preparation of Intel® IPP boosted Zlib library is described in readme.html file provided with Intel® IPP “components” package. It is explained how to download Zlib source code files from its site, how to un-archive, patch source code files and how to build Intel® IPP-enabled Zlib for different needs (static or dynamic Zlib libraries, statically or dynamically linked to Intel® IPP).

5. Usage Notes for Intel® IPP ZLIB Functions

5.1 Using the "Fastest" Compression Level

In order to obtain better compression performance, keeping ZLIB (deflate) compatibility, the new “fastest” compression method is implemented. It is light-weight compression, which

  • Doesn’t look back in the dictionary to find a better match;
  • Doesn’t collect input stream statistics for better Huffman-based coding.

This method corresponds to compression level “-2” and can be used as follows:

       z_stream str_deflate;
       str_deflate.zalloc =NULL;
       str_deflate.zfree =NULL;
       deflateInit(&str_deflate,-2);

The output (compressed) stream, generated with “fastest” compression is fully compatible with “deflate” standard and can be decompressed using regular ZLIB.

5.2 Tuning Compression Level

In Intel® IPP 2017 product ZLIB-related functions use table of substring marching parameters to control compression ratio and performance. This table, defined as configuration_table in deflate.c file contains the sets of four values. They are max_chain, good_length, nice_length and max_lazy. The description of these values is in the table below:

Value

Description

max_chain

Maximum number of searches in the dictionary for better (higher matching length) substring match. Reasonable value range is 1-8192.

good_length

If substring of this or higher length is matched in the dictionary, the maximum number of searches for this particular input string is reduced fourfold. Reasonable value range is 4-258.

nice_match

If substring of this or higher length is matched in the dictionary, the search is stopped. Reasonable value range is 4-258.

max_lazy

If this or wider substring is found in dictionary:

  • For fast compression method (compression levels from 1 to 4) hash table is not updated;
  • For slow compression method (levels from 5 to 9) the search algorithm doesn’t check nearest input data for better match.

Note: the final results of compression ratio and performance highly depends on input data specifics.

The actual values of parameters are shown in the table below

Compression level

Deflate function

max_chain

good_length

nice_match

max_lazy

1

Fast

4

8

8

8

2

4

16

16

9

3

4

16

16

12

4

48

32

32

16

5

Slow

32

8

32

16

6

128

8

256

16

7

144

8

256

16

8

192

32

258

128

9

256

32

258

258

These values were chosen for similar compression ratios with original open-source ZLIB on standard data compression collections. You can try your own combinations of matching values using deflateTune ZLIB function.For example, to change max_chain value from 128 to 64, and thus to speedup compression with some compression ratio degradation you need to do the following:

    z_stream str_deflate;
    str_deflate.zalloc =NULL;
    str_deflate.zfree =NULL;
    deflateInit(&str_deflate, Z_DEFAULT_COMPRESSION);
    deflateTune(&str_deflate, 8, 26, 256, 64);
    …
    deflateEnd(&str_deflate);

Note, that the string matching parameters changed for all subsequent compression operations (ZLIB deflate calls) with str_deflate object, until it is destroyed, or re-initialized with deflateReset function call.

5.3 Using additional Compression Levels

Some input data sets for compression can have specifics, for example input data can be long, or input data can be highly compressible.
For this specific data we introduced additional compression level, which are in fact function calls of the same “fast” or “slow” compression functions, but with different sets of string matching values. The new compression levels are the following:

  • From 11 to 19 – compression levels for big input data buffers (1 Mbyte and longer);
  • From 21 to 29 – compression levels for highly-compressible data (compression ratio of 30x and more).

For example, for levels 6 and 16 on “Large” data compression corpus on Intel® Xeon® processor E5-2680 v3, the “geomean” results are:

Level

Ratio

Compression Performance (in Mbyte/sec)

6

3.47

17.7

16

3.46

19.9

For levels, 6 and 26 on some synthetic highly-compressible data on Intel® Xeon® processor E5-2680 v3, the “geomean” results are:

Level

Ratio

Compression Performance (in Mbyte/sec)

6

218

768

26

218

782

Note: These levels are “experimental” and don’t guarantee improvements on all input data.


Intel® Software Guard Extensions Tutorial Series: Part 4, Enclave Design

$
0
0

In Part 4 of the Intel® Software Guard Extensions (Intel® SGX) tutorial series we’ll be designing our enclave and its interface. We’ll take a look at the enclave boundary that was defined in Part 3 and identify the necessary bridge functions, examine the impact the bridge functions have on the object model, and create the project infrastructure necessary to integrate the enclave into our application. We’ll only be stubbing the enclave ECALLS at this point; full enclave integration will come in Part 5 of the series.

You can find the list of all of the published tutorials in the article Introducing the Intel® Software Guard Extensions Tutorial Series.

There is source code provided with this installment of the series: the enclave stub and interface functions are provided for you to download.

Application Architecture

Before we jump into designing the enclave interface, we need to take a moment and consider the overall application architecture. As discussed in Part 1, enclaves are implemented as dynamically loaded libraries (DLLs under Windows* and shared libraries under Linux*) and they can only link against 100-percent native C code.

The Tutorial Password Manager, however, will have a GUI written in C#. It uses a mixed-mode assembly written in C++/CLI to get us from managed to unmanaged code, but while that assembly contains native code it is not a 100-percent native module and it cannot interface directly with an Intel SGX enclave. Attempts to incorporate the untrusted enclave bridge functions in C++/CLI assemblies will result in a fatal error:

	Command line error D8045: cannot compile C file 'Enclave_u.c' with the /clr option

That means we need to place the untrusted bridge functions in a separate DLL that is all native code. As a result, our application will need to have, at minimum, three DLLs: the C++/CLI core, the enclave bridge, and the enclave itself. This structure is shown in Figure 1.


Figure 1. Component makeup for a mixed-mode application with enclaves.

Further Refinements

Since the enclave bridge functions must reside in a separate DLL, we’ll go a step further and place all the functions that deal directly with the enclave in that same DLL. This compartmentalization of the application layers will not only make it easier to manage (and debug) the program, but also to ease integration by lessening the impact to the other modules. When a class or module has a specific task with a clearly defined boundary, changes to other modules are less likely to impact it.

In this case, the PasswordManagerCoreNative class should not be burdened with the additional task of instantiating enclaves. It just needs to know whether or not Intel SGX is supported on the platform so that it can execute the appropriate function.

As an example, the following code block shows the unlock() method:

int PasswordManagerCoreNative::vault_unlock(const LPWSTR wpassphrase)
{
	int rv;
	UINT16 size;

	char *mbpassphrase = tombs(wpassphrase, -1, &size);
	if (mbpassphrase == NULL) return NL_STATUS_ALLOC;

	rv= vault.unlock(mbpassphrase);

	SecureZeroMemory(mbpassphrase, size);
	delete[] mbpassphrase;

	return rv;
}

This is a pretty simple method that takes the user’s passphrase as a wchar_t, converts it to a variable-length encoding (UTF-8), and then calls the unlock() method in the vault object. Rather than clutter up this class, and this method, with enclave-handling functions and logic, it would be best to add enclave support to this method through a one-line addition:

int PasswordManagerCoreNative::vault_unlock(const LPWSTR wpassphrase)
{
	int rv;
	UINT16 size;

	char *mbpassphrase = tombs(wpassphrase, -1, &size);
	if (mbpassphrase == NULL) return NL_STATUS_ALLOC;

	// Call the enclave bridge function if we support Intel SGX
	if (supports_sgx()) rv = ew_unlock(mbpassphrase);
	else rv= vault.unlock(mbpassphrase);

	SecureZeroMemory(mbpassphrase, size);
	delete[] mbpassphrase;

	return rv;
}

Our goal will be to put as little enclave awareness into this class as is feasible. The only other additions the PasswordManagerCoreNative class needs is a flag for Intel SGX support and methods to both set and get it.

class PASSWORDMANAGERCORE_API PasswordManagerCoreNative
{
	int _supports_sgx;

	// Other class members ommitted for clarity

protected:
	void set_sgx_support(void) { _supports_sgx = 1; }
	int supports_sgx(void) { return _supports_sgx; }

Designing the Enclave

Now that we have an overall application plan in place, it’s time to start designing the enclave and its interface. To do that, we return to the class diagram for the application core in Figure 2, which was first introduced in Part 3. The objects that will reside in the enclave are shaded in green while the untrusted components are shaded in blue.


Figure 2. Class diagram for the Tutorial Password Manager with Intel® Software Guard Extensions.

The enclave boundary only crosses one connection: the link between the PasswordManagerCoreNative object and the Vault object. That suggests that the majority of our ECALLs will simply be wrappers around the class methods in Vault. We’ll also need to add some additional ECALLs to manage the enclave infrastructure. One of the complications of enclave development is that the ECALLs, OCALLs, and bridge functions must be native C code, and we are making extensive use of C++ features. Once the enclave has been launched, we’ll also need functions that span the gap between C and C++ (objects, constructors, overloads, and others).

The wrapper and bridge functions will go in their own DLL, which we’ll name EnclaveBridge.dll. For clarity, we’ll prefix the wrapper functions with ew_ (for “enclave wrapper”), and the bridge functions that make the ECALLs with ve_ (for “vault enclave”).

Calls from PasswordManagerCoreNative to the corresponding method in Vault will follow the basic flow shown in Figure 3.


Figure 3. Execution flow for bridge functions and ECALLs.

The method in PasswordManagerCoreNative will call into the wrapper function in EnclaveBridge.dll. That wrapper will, in turn, invoke one or more ECALLs, which enter the enclave and invoke the corresponding class method in the Vault object. Once all ECALLs have completed, the wrapper function returns back to the calling method in PasswordManagerCoreNative and provides it with a return value.

Enclave Logistics

The first step in designing the enclave is working out a system for managing the enclave itself. The enclave must be launched and the resulting enclave ID must be provided to the ECALLs. Ideally, this should be transparent to the upper layers of the application.

The easiest solution for the Tutorial Password Manager is to use global variables in the EnclaveBridge DLL to hold the enclave information. This design decision comes with a restriction: only one thread can be active in the enclave at a time. This is a reasonable solution because the password manager application would not benefit from having multiple threads operating on the vault. Most of its actions are driven by the user interface and do not consume a significant amount of CPU time.

To solve the transparency problem, each wrapper function will first call a function to check to see if the enclave has been launched, and launch it if it hasn’t. This logic is fairly simple:

#define ENCLAVE_FILE _T("Enclave.signed.dll")

static sgx_enclave_id_t enclaveId = 0;
static sgx_launch_token_t launch_token = { 0 };
static int updated= 0;
static int launched = 0;
static sgx_status_t sgx_status= SGX_SUCCESS;

// Ensure the enclave has been created/launched.

static int get_enclave(sgx_enclave_id_t *eid)
{
	if (launched) return 1;
	else return create_enclave(eid);
}

static int create_enclave(sgx_enclave_id_t *eid)
{
	sgx_status = sgx_create_enclave(ENCLAVE_FILE, SGX_DEBUG_FLAG, &launch_token, &updated, &enclaveId, NULL);
	if (sgx_status == SGX_SUCCESS) {
		if ( eid != NULL ) *eid = enclaveId;
		launched = 1;
		return 1;
	}

	return 0;
}

Each wrapper function will start by calling get_enclave(), which checks to see if the enclave has been launched by examining a static variable. If it has, then it (optionally) populates the eid pointer with the enclave ID. This step is optional because the enclave ID is also stored as a global variable, enclaveID, which can of course just be used directly.

What happens if an enclave is lost due to a power event or a bug that causes it to crash? For that, we check the return value of the ECALL: it indicates the success or failure of the ECALL operation itself, not of the function being called in the enclave.

sgx_status = ve_initialize(enclaveId, &vault_rv);

The return value of the function being called in the enclave, if any, is transferred via the pointer which is provided as the second argument to the ECALL (these function prototypes are generated for you automatically by the Edger8r tool). You must always check the return value of the ECALL itself. Any result other than SGX_SUCCESS indicates that the program did not successfully enter the enclave and the requested function did not run. (Note that we’ve defined sgx_status as a global variable as well. This is another simplification stemming from our single-threaded design.)

We’ll add a function that examines the error returned by the ECALL and checks for a lost or crashed enclave:

static int lost_enclave()
{
	if (sgx_status == SGX_ERROR_ENCLAVE_LOST || sgx_status == SGX_ERROR_ENCLAVE_CRASHED) {
		launched = 0;
		return 1;
	}

	return 0;
}

These are recoverable errors. The upper layers don’t currently have logic to deal with these specific conditions, but we provide it in the EnclaveBridge DLL in order to support future enhancements.

Also notice that there is no function provided to destroy the enclave. As long as the user has the password manager application open, the enclave is in place even if they choose to lock their vault. This is not good enclave etiquette. Enclaves draw from a finite pool of resources, even when idle. We’ll address this problem in a future segment of the series when we talk about data sealing.

The Enclave Definition Language

Before moving on to the actual enclave design, we’ll take a few moments to discuss the Enclave Definition Language (EDL) syntax. An enclave’s bridge functions, both its ECALLs and OCALLs, are prototyped in its EDL file and its general structure is as follows:

enclave {
	// Include files

	// Import other edl files

	// Data structure declarations to be used as parameters of the function prototypes in edl

	trusted {
	// Include file if any. It will be inserted in the trusted header file (enclave_t.h)

	// Trusted function prototypes (ECALLs)

	};

	untrusted {
	// Include file if any. It will be inserted in the untrusted header file (enclave_u.h)

	// Untrusted function prototypes (OCALLs)

	};
};

ECALLs are prototyped in the trusted section, and OCALLs are prototyped in the untrusted section.

The EDL syntax is C-like and function prototypes very closely resemble C function prototypes, but it’s not identical. In particular, bridge function parameters and return values are limited to some fundamental data types and the EDL includes some additional keywords and syntax that defines some enclave behavior. The Intel® Software Guard Extensions (Intel® SGX) SDK User’s Guide explains the EDL syntax in great detail and includes a tutorial for creating a sample enclave. Rather than repeat all of that here, we’ll just discuss those elements of the language that are specific to our application.

When parameters are passed to enclave functions, they are marshaled into the protected memory space of the enclave. For parameters passed as values, no special action is required as the values are placed on the protected stack in the enclave just as they would be for any other function call. The situation is quite different for pointers, however.

For parameters passed as pointers, the data referenced by the pointer must be marshaled into and out of the enclave. The edge routines that perform this data marshalling need to know two things:

  1. Which direction should the data be copied: into the bridge function, out of the bridge function, or both directions?
  2. What is the size of the data buffer referenced by the pointer?

Pointer Direction

When providing a pointer parameter to a function, you must specify the direction by the keywords in brackets: [in], [out], or [in, out], respectively. Their meaning is given in Table 1.

Direction

ECALL

OCALL

in

The buffer is copied from the application into the enclave. Changes will only affect the buffer inside the enclave.

The buffer is copied from the enclave to the application. Changes will only affect the buffer outside the enclave.

out

A buffer will be allocated inside the enclave and initialized with zeros. It will be copied to the original buffer when the ECALL exits.

A buffer will be allocated outside the enclave and initialized with zeros. This untrusted buffer will be copied to the original buffer in the enclave when the OCALL exits.

in, out

Data is copied back and forth.

Same as ECALLs.

Table 1. Pointer direction parameters and their meanings in ECALLs and OCALLs.

Note from the table that the direction is relative to the bridge function being called. For an ECALL, [in] means “copy the buffer to the enclave,” but for an OCALL it’s “copy the buffer to the untrusted function.”

(There is also the option called user_check that can be used in place of these, but it’s not relevant to our discussion. See the SDK documentation for information on its use and purpose.)

Buffer Size

The edge routines calculate the total buffer size, in bytes, as:

bytes = element_size * element_count

By default, the edge routines assume element_count = 1, and calculate element_size from the element referenced by the pointer parameter, e.g., for an integer pointer it assumes element_size is:

sizeof(int)

For a single element of a fixed data type, such as an int or a float, no additional information needs to be provided in the EDL prototype for the function. For a void pointer, you must specify an element size or you’ll get an error at compile time. For arrays, char and wchar_t strings, and other types where the length of the data buffer is more than one element you must specify the number of elements in the buffer or only one element will be copied.

Add either the count or size parameter (or both) to the bracketed keywords for the pointer as appropriate. They can be set to a constant value or one of the parameters to the function. For most cases, count and size are functionally the same, but it’s good practice to use them in their correct contexts. Strictly speaking, you would only specify size when passing a void pointer. Everything else would use count.

If you are passing a C string or wstring (a NULL-terminated char or wchar_t array), then you can use the string or wstring parameter in place of count or size. In this case, the edge routines will determine the size of the buffer by getting the length of the string directly.

function([in, size=12] void *param);
function([in, count=len] char *buffer, uint32_t len);
function([in, string] char *cstr);

Note that you can only use string or wstring if the direction is set to [in] or [in, out]. When the direction is set only to [out], the string has not yet been created so the edge routine can’t know the size of the buffer. Specifying [out, string] will generate an error at compile time.

Wrapper and Bridge Functions

We are now ready to define our wrapper and bridge functions. As we pointed out above, the majority of our ECALLs will be wrappers around the class methods in Vault. The class definition for the public member functions is shown below:

class PASSWORDMANAGERCORE_API Vault
{
	// Non-public methods and members ommitted for brevity

public:
	Vault();
	~Vault();

	int initialize();
	int initialize(const char *header, UINT16 size);
	int load_vault(const char *edata);

	int get_header(unsigned char *header, UINT16 *size);
	int get_vault(unsigned char *edate, UINT32 *size);

	UINT32 get_db_size();

	void lock();
	int unlock(const char *password);

	int set_master_password(const char *password);
	int change_master_password(const char *oldpass, const char *newpass);

	int accounts_get_count(UINT32 *count);
	int accounts_get_info(UINT32 idx, char *mbname, UINT16 *mbname_len, char *mblogin, UINT16 *mblogin_len, char *mburl, UINT16 *mburl_len);

	int accounts_get_password(UINT32 idx, char **mbpass, UINT16 *mbpass_len);

	int accounts_set_info(UINT32 idx, const char *mbname, UINT16 mbname_len, const char *mblogin, UINT16 mblogin_len, const char *mburl, UINT16 mburl_len);
	int accounts_set_password(UINT32 idx, const char *mbpass, UINT16 mbpass_len);

	int accounts_generate_password(UINT16 length, UINT16 pwflags, char *cpass);

	int is_valid() { return _VST_IS_VALID(state); }
	int is_locked() { return ((state&_VST_LOCKED) == _VST_LOCKED) ? 1 : 0; }
};

There are several problem functions in this class. Some of them are immediately obvious, such as the constructor, destructor, and the overloads for initialize(). These are C++ features that we must invoke using C functions. Some of the problems, though, are not immediately obvious because they stem from the function’s inherent design. (Some of these problem methods were poorly designed on purpose so that we could cover specific issues in this tutorial, but some were just poorly designed, period!) We’ll tackle each problem, one by one, presenting both the prototypes for the wrapper functions and the EDL prototypes for the proxy/bridge routines.

The Constructor and Destructor

In the non-Intel SGX code path, the Vault class is a member of PasswordManagerCoreNative. We can’t do this for the Intel SGX code path; however, the enclave can include C++ code so long as the bridge functions themselves are pure C functions.

Since we have already limited the enclave to a single thread, we can make the Vault class a static, global object in the enclave. This greatly simplifies our code and eliminates the need for creating bridge functions and logic to instantiate it.

The Overload on initialize()

There are two prototypes for the initialize() method:

  1. The method with no arguments initializes the Vault object for a new password vault with no contents. This is a password vault that the user is creating for the first time.
  2. The method with two arguments initializes the Vault object from the header of the vault file. This represents an existing password vault that the user is opening (and, later on, attempting to unlock).

This will be broken up into two wrapper functions:

ENCLAVEBRIDGE_API int ew_initialize();
ENCLAVEBRIDGE_API int ew_initialize_from_header(const char *header, uint16_t hsize);

And the corresponding ECALLs will be defined as:

public int ve_initialize ();
public int ve_initialize_from_header ([in, count=len] unsigned char *header, uint16_t len);

get_header()

This method has a fundamental design issue. Here’s the prototype:

int get_header(unsigned char *header, uint16_t *size);

This function accomplishes one of two tasks:

  1. It gets the header block for the vault file and places it in the buffer pointed to by header. The caller must allocate enough memory to store this data.
  2. If you pass a NULL pointer in the header parameter, the uint16_t pointed to by size is set to the size of the header block, so that the caller knows how much memory to allocate.

This is a fairly common compaction technique in some programming circles, but it presents a problem for enclaves: when you pass a pointer to an ECALL or an OCALL, the edge functions copy the data referenced by the pointer into or out of the enclave (or both). Those edge functions need to know the size of the data buffer so they know how many bytes to copy. The first usage involves a valid pointer with a variable size which is not a problem, but the second usage has a NULL pointer and a size of zero.

We could probably come up with an EDL prototype for the ECALL that could make this work, but clarity should generally trump brevity. It’s better to split this into two ECALLs:

public int ve_get_header_size ([out] uint16_t *sz);
public int ve_get_header ([out, count=len] unsigned char *header, uint16_t len);

The enclave wrapper function will take care of the necessary logic so that we don’t have to make changes to other classes:

ENCLAVEBRIDGE_API int ew_get_header(unsigned char *header, uint16_t *size)
{
	int vault_rv;

	if (!get_enclave(NULL)) return NL_STATUS_SGXERROR;

	if ( header == NULL ) sgx_status = ve_get_header_size(enclaveId, &vault_rv, size);
	else sgx_status = ve_get_header(enclaveId, &vault_rv, header, *size);

	RETURN_SGXERROR_OR(vault_rv);
}

accounts_get_info()

This method operates similarly to get_header(): pass a NULL pointer and it returns the size of the object in the corresponding parameter. However, it is uglier and sloppier because of the multiple parameter arguments. It is better off being broken up into two wrapper functions:

ENCLAVEBRIDGE_API int ew_accounts_get_info_sizes(uint32_t idx, uint16_t *mbname_sz, uint16_t *mblogin_sz, uint16_t *mburl_sz);
ENCLAVEBRIDGE_API int ew_accounts_get_info(uint32_t idx, char *mbname, uint16_t mbname_sz, char *mblogin, uint16_t mblogin_sz, char *mburl, uint16_t mburl_sz);

And two corresponding ECALLs:

public int ve_accounts_get_info_sizes (uint32_t idx, [out] uint16_t *mbname_sz, [out] uint16_t *mblogin_sz, [out] uint16_t *mburl_sz);
public int ve_accounts_get_info (uint32_t idx,
	[out, count=mbname_sz] char *mbname, uint16_t mbname_sz,
	[out, count=mblogin_sz] char *mblogin, uint16_t mblogin_sz,
	[out, count=mburl_sz] char *mburl, uint16_t mburl_sz
);

accounts_get_password()

This is the worst offender of the lot. Here’s the prototype:

int accounts_get_password(UINT32 idx, char **mbpass, UINT16 *mbpass_len);

The first thing you’ll notice is that it passes a pointer to a pointer in mbpass. This method is allocating memory.

In general, this is not a good design. No other method in the Vault class allocates memory so it is internally inconsistent, and the API violates convention by not providing a method to free this memory on the caller’s behalf. It also poses a unique problem for enclaves: an enclave cannot allocate memory in untrusted space.

This could be handled in the wrapper function. It could allocate the memory and then make the ECALL and it would all be transparent to the caller, but we have to modify the method in the Vault class, regardless, so we should just fix this the correct way and make the corresponding changes to PasswordManagerCoreNative. The caller should be given two functions: one to get the password length and one to fetch the password, just as with the previous two examples. PasswordManagerCoreNative should be responsible for allocating the memory, not any of these functions (the non-Intel SGX code path should be changed, too).

ENCLAVEBRIDGE_API int ew_accounts_get_password_size(uint32_t idx, uint16_t *len);
ENCLAVEBRIDGE_API int ew_accounts_get_password(uint32_t idx, char *mbpass, uint16_t len);

The EDL definition should look familiar by now:

public int ve_accounts_get_password_size (uint32_t idx, [out] uint16_t *mbpass_sz);
public int ve_accounts_get_password (uint32_t idx, [out, count=mbpass_sz] char *mbpass, uint16_t mbpass_sz);

load_vault()

The problem with load_vault() is subtle. The prototype is fairly simple, and at first glance it may look completely innocuous:

int load_vault(const char *edata);

What this method does is load the encrypted, serialized password database into the Vault object. Because the Vault object has already read the header, it knows how large the incoming buffer will be.

The issue here is that the enclave’s edge functions don’t have this information. A length has to be explicitly given to the ECALL so that the edge function knows how many bytes to copy from the incoming buffer into the enclave’s internal buffer, but the size is stored inside the enclave. It’s not available to the edge function.

The wrapper function’s prototype can mirror the class method’s prototype, as follows:

ENCLAVEBRIDGE_API int ew_load_vault(const unsigned char *edata);

The ECALL, however, needs to pass the header size as a parameter so that it can be used to define the size of the incoming data buffer in the EDL file:

public int ve_load_vault ([in, count=len] unsigned char *edata, uint32_t len)

To keep this transparent to the caller, the wrapper function will be given extra logic. It will be responsible for fetching the vault size from the enclave and then passing it through as a parameter to this ECALL.

ENCLAVEBRIDGE_API int ew_load_vault(const unsigned char *edata)
{
	int vault_rv;
	uint32_t dbsize;

	if (!get_enclave(NULL)) return NL_STATUS_SGXERROR;

	// We need to get the size of the password database before entering the enclave
	// to send the encrypted blob.

	sgx_status = ve_get_db_size(enclaveId, &dbsize);
	if (sgx_status == SGX_SUCCESS) {
		// Now we can send the encrypted vault data across.

		sgx_status = ve_load_vault(enclaveId, &vault_rv, (unsigned char *) edata, dbsize);
	}

	RETURN_SGXERROR_OR(vault_rv);
}

A Few Words on Unicode

In Part 3, we mentioned that the PasswordManagerCoreNative class is also tasked with converting between wchar_t and char strings. Given that enclaves support the wchar_t data type, why do this at all?

This is a design decision intended to minimize our footprint. In Windows, the wchar_t data type is the native encoding for Win32 APIs and it stores UTF-16 encoded characters. In UTF-16, each character is 16 bits in order to support non-ASCII characters, particularly for languages that aren’t based on the Latin alphabet or have a large number of characters. The problem with UTF-16 is that a character is always 16-bits long, even when encoding plain ASCII text.

Rather than store twice as much data both on disk and inside the enclave for the common case where the user’s account information is in plain ASCII and incur the performance penalty of having to copy and encrypt those extra bytes, the Tutorial Password Manager converts all of the strings coming from .NET to the UTF-8 encoding. UTF-8 is a variable-length encoding, where each character is represented by one to four 8-bit bytes. It is backwards-compatible with ASCII and it results in a much more compact encoding than UTF-16 for plain ASCII text. There are cases where UTF-8 will result in longer strings than UTF-16, but for our tutorial password manager we’ll accept that tradeoff.

A commercial application would choose the best encoding for the user’s native language, and then record that encoding in the vault (so that it would know which encoding was used to create it in case the vault is opened on a system using a different native language).

Sample Code

As mentioned in the introduction, there is sample code provided with this part for you to download. The attached archive includes the source code for the Tutorial Password Manager bridge DLL and the enclave DLL. The enclave functions are just stubs at this point, and they will be filled out in Part 5.

Coming Up Next

In Part 5 of the tutorial we’ll complete the enclave by porting the Crypto, DRNG, and Vault classes to the enclave, and connecting them to the ECALLs. Stay tuned!

Happy Together: Ground-Breaking Media Performance with Intel® Processors + Software - Oct. 27 Free Webinar

$
0
0

REGISTER NOW    

Now, you can get the sweetest, fastest, high density and quality results for media workloads and video streaming - with the latest Intel hardware and media software working together. Take advantage of these platforms and learn how to access hardware-accelerated codecs on Intel® Xeon® E3-1500 v5 and 6th generation Intel® Core™ processors (codenamed Skylake) in a free webinar on Oct. 27 at 9.a.m. (Pacific).

  • Optimize media solutions and apps for HEVC, AVC and MPEG-2 using Intel® Media Server Studioor Intel® Media SDK
  • Achieve up to real-time 4K@60fps HEVC, or up to 18 AVC HD@30fps transcoding sessions on one platform**
  • Access big performance boosts are possible with Intel graphics processors (GPUs)
  • Get the skinny on shortcuts to fast-track results

Sign Up Today  Oct. 27 Free Webinar: Happy Together: Ground-Breaking Media Performance with Intel® Processors + Software

intelskl-hevc
Technical specifications apply. See performance benchmarks and the Media Server Studio site for more details. 

 

Jeff-McAllisterWebinar Speaker

Jeff McAllister– Media Software Technical Consulting Engineer

 

Advanced Bitrate Control Methods in Intel® Media SDK

$
0
0

Introduction

In the world of media, there is a great demand to increase encoder quality but this comes with tradeoffs between quality and bandwidth consumption. This article addresses some of those concerns by discussing advanced bitrate control methods, which provide the ability to increase quality (relative to legacy rate controls) while maintaining the bitrate constant using Intel® Media SDK/ Intel® Media Server Studio tools.

The Intel Media SDK encoder offers many bitrate control methods, which can be divided into legacy and advanced/special purpose algorithms. This article is the 2nd part of 2-part series of Bitrate Control Methods in  Intel® Media SDK. The legacy rate control algorithms are detailed in the 1st part, which is Bitrate Control Methods (BRC) in Intel® Media SDK; the advanced rate control methods (summarized in the table below) will be explained in this article.

Rate Control

HRD/VBV Compliant

OS supported

Usage

LA

No

Windows/Linux

Storage transcodes

LA_HRD

Yes

Windows/Linux

Storage transcodes; Streaming solution (where low latency is not a requirement)

ICQ

No

Windows

Storage transcodes (better quality with smaller file size)

LA_ICQ

No

Windows

Storage transcodes

The following tools (along with the downloadable links) are what we used to explain the concepts and generate performance data for this article: 

Look Ahead (LA) Rate Control

As the name explains, this bitrate control method looks at successive frames, or the frames to be encoded next, and stores them in a look-ahead buffer. The number of frames or the length of the look ahead buffer can be specified by the LookAheadDepth parameter. This rate control is recommended for transcoding/encoding in a storage solution.

Generally, many parameters can be used to modify the quality/performance of the encoded stream.  In this particular rate control, the encoding performance can be controlled by changing the size of the look ahead buffer. The LookAheadDepth parameter value can be changed between 10 - 100 to specify the size of the look ahead buffer. The LookAheadDepth parameter specifies the number of frames that the SDK encoder analyzes before encoding. As the LookAheadDepth increases, so does the number of frames that the encoder looks into; this results in an increase in quality of the encoded stream, however the performance (encoding frames per second) will decrease. In our experiments, this performance tradeoff was negligible for small input streams such as SIntel1080p.

Look Ahead rate control is enabled by default in sample_encode and sample_multi_transcode, part of code samples. The example below describes how to use this rate control method using the sample_encode application.

sample_encode.exe h264 -i sintel_1080p.yuv -o LA_out.264 -w 1920 -h 1080 -b 10000 –f 30 -lad 100 -la

As the value of LookAheadDepth increases, encoding quality improves, because the number of frames stored in the look ahead buffer has also increased, and the encoder will have more visibility to upcoming frames.

It should be noted that LA is not HRD (Hypothetical Reference Decoder) compliant. The following picture, obtained from Intel® Video Pro Analyzer shows a HRD buffer fullness view with “Buffer” mode enabled where sub-mode “HRD” is greyed out. This means no HRD parameters were passed in the stream headers, which indicates LA rate control is not HRD compliant. The left axis of the plot shows frame sizes and the right axis of the plot shows the slice QP (Quantization Parameter) values.

LA BRC
Figure 1: Snapshot of Intel Video Pro Analyzer analyzing H264 stream(Sintel -1080p), encoded using LA rate control method.

 

Sliding Window condition:

Sliding window algorithm is a part of the Look Ahead rate control method. This algorithm is applicable for both LA and LA_HRD rate control methods by defining WinBRCMaxAvgKbps and WinBRCSize through the mfxExtCodingOption3 structure.

Sliding window condition is introduced to strictly constrain the maximum bitrate of the encoder by changing two parameters: WinBRCSize and WinBRCMaxAvgKbps. This helps in limiting the achieved bitrate which makes it a good fit in limited bandwidth scenarios such as live streaming.

  • WinBRCSize parameter specifies the sliding window size in frames. A setting of zero means that sliding window condition is disabled.
  • WinBRCMaxAvgKbps specifies the maximum bitrate averaged over a sliding window specified by WinBRCSize.

In this technique, the average bitrate in a sliding window of WinBRCSize must not exceed WinBRCMaxAvgKbps. The above condition becomes weaker as the sliding window size increases and becomes stronger if the sliding window size value decreases. Whenever this condition fails, the frame will be automatically re-encoded with a higher quantization parameter and performance of the encoder decreases as we keep encountering failures. To reduce the number of failures and to avoid re-encoding, frames within the look ahead buffer will be analyzed by the encoder. A peak will be detected when there is a condition failure by encountering a large frame in the look ahead buffer. Whenever a peak is predicted, the quantization parameter value will be increased, thus reducing the frame size.

Sliding window can be implemented by adding the following code to the pipeline_encode.cpp program in the sample_encode application.

m_CodingOption3.WinBRCMaxAvgKbps = 1.5*TargetKbps;
m_CodingOption3.WinBRCSize = 90; //3*framerate
m_EncExtParams.push_back((mfxExtBuffer *)&m_CodingOption3);

The above values were chosen when encoding sintel_1080p.yuv of 1253 frames with H.264 codec, TargetKbps = 10000, framerate = 30fps. Sliding window parameter values (WinBRCMaxAvgKbps and WinBRCSize) are subject to change when using different input options.

If WinBRCMaxAvgKbps is close to TargetKbps and WinBRCSize almost equals 1, the sliding window will degenerate into the limitation of the maximum frame size (TargetKbps/framerate).

Sliding window condition can be evaluated by checking in any WinBRCSize consecutive frames, the total encoded size doesn't exceed the value set by WinBRCMaxAvgKbps. The following equation explains the sliding window condition.

The condition of limiting frame size can be checked after the asynchronous encoder run and encoded data is written back to the output file in pipeline_encode.cpp.

Look Ahead with HRD Compliance (LA_HRD) Rate Control

As Look Ahead bitrate control is not HRD compliant, there is a dedicated mode to achieve HRD compliance with the LookAhead algorithm, known as LA_HRD mode (MFX_RATECONTROL_LA_HRD). With HRD compliance, the Coded Picture Buffer should neither overflow nor underflow. This rate control is recommended in storage transcoding solutions and streaming scenarios, where low latency is not a major requirement.

To use this rate control in sample_encode, it will require code changes as illustrated below -

Statements to be added in sample_encode.cpp file within ParseInputString() function

else if (0 == msdk_strcmp(strInput[i], MSDK_STRING("-hrd")))
pParams->nRateControlMethod = MFX_RATECONTROL_LA_HRD;

LookAheadDepth value can be mentioned in the command line when executing the sample_encode binary. The example below describes how to use this rate control method using the sample_encode application.

sample_encode.exe h264 -i sintel_1080p.yuv -o LA_out.264 -w 1920 -h 1080 -b 10000 –f 30 -lad 100 –hrd

In the following graph, the LookAheadDepth(lad) value is 100.                                                                                          

Look Ahead HRD

Figure 2: a snapshot of Intel® Video Pro Analyzer(VPA), which verifies that LA_HRD rate control is HRD compliant. The buffer fullness mode is activated by selecting “Buffer” mode and “HRD” is chosen in sub-mode.

The above figure shows HRD buffer fullness view with “Buffer” mode enabled in Intel VPA, in which the sub-mode “HRD” is selected. The horizontal red lines show the upper and lower limits of the buffer and green line shows the instantaneous buffer fullness. The buffer fullness didn’t cross the upper and lower limits of the buffer. This means neither overflow nor underflow occurred in this rate control.

Extended Look Ahead (LA_EXT) Rate Control

For 1:N transcoding scenarios (1 decode and N encode session), there is an optimized lookahead algorithm knows as Extended Look Ahead Rate Control algorithm (MFX_RATECONTROL_LA_EXT), available only in Intel® Media Server Studio (not part of the Intel® Media SDK). This is recommended for broadcasting solutions.

An application should be able to load the plugin ‘mfxplugin64_h264la_hw.dll’ to support MFX_RATECONTROL_LA_EXT. This plugin can be found in the following location in the local system, where the Intel® Media Server Studio is installed.

  • “\Program Installed\Software Development Kit\bin\x64\588f1185d47b42968dea377bb5d0dcb4”.

The path of this plugin needs to be mentioned explicitly because it is not part of the standard installation directory. This capability can be used in either of two ways:

  1. Preferred Method - Register the plugin with registry and point all necessary attributes such as API version, plugin type, path etc; so the dispatcher, which is a part of the software, can find it through the registry and connect to a decoding/encoding session.
  2. Have all binaries (Media SDK, plugin, and app) in a directory and execute from the same directory.

LookAheadDepth parameter must be mentioned only once and considered to be the same value of LookAheadDepth of all N transcoded streams. LA_EXT rate control can be implemented using sample_multi_transcode, below is the example cmd line - 

sample_multi_transcode.exe -par file_1.par

Contents of the par file are

-lad 40 -i::h264 input.264 -join -la_ext -hw_d3d11 -async 1 -n 300 -o::sink
-h 1088 -w 1920 -o::h264 output_1.0.h264 -b 3000 -join -async 1 -hw_d3d11 -i::source -l 1 -u 1 -n 300
-h 1088 -w 1920 -o::h264 output_2.h264 -b 5000 -join -async 1 -hw_d3d11 -i::source -l 1 -u 1 -n 300
-h 1088 -w 1920 -o::h264 output_3.h264 -b 7000 -join -async 1 -hw_d3d11 -i::source -l 1 -u 1 -n 300
-h 1088 -w 1920 -o::h264 output_4.h264 -b 10000 -join -async 1 -hw_d3d11 -i::source -l 1 -u 1 -n 300

Intelligent Constant Quality (ICQ) Rate Control

The ICQ bitrate control algorithm is designed to improve subjective video quality of an encoded stream: it may or may not improve video quality objectively - depending on the content. ICQQuality is a control parameter which defines the quality factor for this method. ICQQuality parameter can be changed between 1 - 51, where 1 corresponds to the best quality. The achieved bitrate and encoder quality (PSNR) can be adjusted by increasing or decreasing ICQQuality parameter. This rate control is recommended for storage solutions, where high quality is required while maintaining a smaller file size.

To use this rate control in sample_encode, it will require code changes as explained below - 

Statements to be added in sample_encode.cpp within ParseInputString() function

else if (0 == msdk_strcmp(strInput[i], MSDK_STRING("-icq")))
pParams->nRateControlMethod = MFX_RATECONTROL_ICQ;

ICQQuality is available in the mfxInfoMFX structure. The desired value can be entered for this variable in InitMfxEncParams() function, e.g.: 

m_mfxEncParams.mfx.ICQQuality = 12;

The example below describes how to use this rate control method using the sample_encode application.

sample_encode.exe h264 -i sintel_1080p.yuv -o ICQ_out.264 -w 1920 -h 1080 -b 10000 -icq
VBR vs ICQ RD Graph
Figure 3: Using Intel Media SDK samples and Video Quality Caliper, compare VBR and ICQ (ICQQuality varied between 13 and 18) with H264 encoding for 1080p, 30fps sintel.yuv of 1253 frames

Using about the same bitrate, ICQ shows improved Peak Signal to Noise Ratio (PSNR) in the above plot. The RD-graph data for  the above plot is captured using the Video Quality Caliper, which compares two different streams encoded with ICQ and VBR.

Observation from above performance data:

  • At the same achieved bitrate, ICQ shows much improved quality (PSNR) compared to VBR, while maintaining the same encoding FPS.
  • The encoding bitrate and quality of the stream decreases as the ICQQuality parameter value increases.

The snapshot below shows a subjective comparison between encoded frames using VBR (on the left) and ICQ (on the right). Highlighted sections demonstrate missing details in VBR and improvements in ICQ.

VBR and ICQ subjective comparison
Figure 4: Using Video Quality Caliper, compare encoded frames subjectively for VBR vs ICQ

 

Look Ahead & Intelligent Constant Quality (LA_ICQ) Rate Control

This method is the combination of ICQ with Look Ahead.  This rate control is also recommended for storage solutions. ICQQuality and LookAheadDepth are the two control parameters where the qualify factor is specified by mfxInfoMFX::ICQQuality and look ahead depth is controlled by the  mfxExtCodingOption2: LookAheadDepth parameter.

To use this rate control in sample_encode, it requires code changes as explained below - 

Statements to be added in sample_encode.cpp within ParseInputString() function

else if (0 == msdk_strcmp(strInput[i], MSDK_STRING("-laicq")))
pParams->nRateControlMethod = MFX_RATECONTROL_LA_ICQ;

ICQQuality is available in the mfxInfoMFX structure. Desired values can be entered for this variable in InitMfxEncParams() function

m_mfxEncParams.mfx.ICQQuality = 12;

LookAheadDepth can be mentioned in command line as lad.

sample_encode.exe h264 -i sintel_1080p.yuv -o LAICQ_out.264 -w 1920 -h 1080 -b 10000 –laicq -lad 100
VBR vs LAICQ RD-graph
Figure 5: Using Intel Media SDK samples and Video Quality Caliper, compare VBR and LA_ICQ (LookAheadDepth 100, ICQQuality varied between 20 and 26) with H264 encoding for 1080p, 30fps sintel.yuv of 1253 frames

At similar bitrate, better PSNR is observed for LA_ICQ compared to VBR as shown in the above plot. By keeping LookAheadDepth value at 100, the ICQQuality parameter value was changed between 1 - 51. The RD-graph data for this plot was captured using the Video Quality Caliper, which compares two different streams encoded with LA_ICQ and VBR.

Conclusion

There are several advanced bitrate control methods available to play with, to see if higher quality encoded streams can be achieved while maintaining bandwidth requirements constant.  Each rate control has its own advantages and can be used in specific industry level use-cases depending on the requirement. To implement the Bitrate Control methods, refer also to the Intel® Media SDK Reference Manual, which comes with an installation of the Intel® Media SDK or Intel® Media Server Studio, and the Intel® Media Developer’s Guide from the documentation website. Visit Intel’s media support forum for further questions.

Resources

Driver Support Matrix for Intel® Media SDK and OpenCL™

$
0
0

 

Developers can access Intel's processor graphics GPU capabilities through the Intel® Media SDK and Intel® SDK for OpenCL™ Applications. This article provides more information on how the software, driver, and hardware layers map together.

 

Delivery Models


There are two different packaging/delivery models:

  1. For Windows* Client: all components needed to run applications written with these SDKs are distributed with the Intel graphics driver. These components are intended to be updated on a separate cadence than Media SDK/OpenCL installs.  Drivers are released separately and moving to the latest available driver is usually encouraged. Use Intel® Driver Update Utility to keep your system up-to-date with latest graphics drivers or manually update from downloadcenter.intel.com. To verify driver version installed on the machine, use the system analyzer tool.
     
  2. For Linux* and Windows Server*:Intel® Media Server Studio is an integrated software tools suite that includes both SDKs, plus a specific version of the driver validated with each release.

Driver Branches

Driver development uses branches covering specific hardware generations, as described in the table below. The general pattern is that each branch covers only the two latest architectures (N and N-1). This means there are two driver branches for each architecture except the newest one. Intel recommends using the most recent branch. If issues are found it is easier to get fixes for newer branches. The most recent branch has the most resources and gets the most frequent updates. Older branches/architectures get successively fewer resources and updates.

Driver Support Matrix
Processor ArchitectureIntel® Integrated Graphics WindowsLinux

3rd Generation Core,

4th Generation Core

(Ivybridge/Haswell)

LEGACY ONLY, downloads available but not updated

Ivybridge - Gen 7 Graphics
 

Haswell - Gen 7.5 graphics 
 

15.33

Operating Systems:

Client: Windows 7, 8, 8.1, 10

Server: Windows Server 2012 r2

16.3 (Media Server Studio 2015 R1)

Gold Operating Systems:

Ubuntu 12.04, SLES 11.3

4th Generation Core,

5th Generation Core

(Haswell/Broadwell)

LEGACY 

Haswell - Gen 7.5 graphics 
 

Broadwell - Gen 8 graphics
 

15.36

Operating Systems:

Client: Windows 7, 8, 8.1, 10

Server: Windows Server 2012 r2

16.4 (Media Server Studio 2015/2016)

Gold Operating Systems:

CentOS 7.1

Generic kernel: 3.14.5

5th Generation Core 

6th Generation Core

(Broadwell/Skylake)

CURRENT RELEASE

 

Broadwell - Gen 8 graphics
 

Skylake - Gen 9 graphics
 

15.40 (Broadwell/Skylake Media Server Studio 2017)

15.45 (Skylake + forward, client)

Operating Systems:

Client: Windows 7, 8, 8.1, 10

Server: Windows Server 2012 r2

 

16.5 (Media Server Studio 2017)

Gold Operating Systems:

CentOS 7.2

Generic kernel: 4.4.0

 

Windows client note: Many OEMs have specialized drivers with additional validation. If you see a warning during install please check with your OEM for supported drivers for your machine.

 

Hardware details

 

Ivybridge (IVB) codename for 3rd generation Intel processor based on 22nm manufacturing technology and Gen 7 graphics architecture. 

Ivybridge

Gen7

3rd Generation Core

 

GT2: Intel® HD Graphics 2500

GT2: Intel® HD Graphics 4000



Haswell (HSW) codename for 4th generation Intel processor based on 22nm manufacturing technology and Gen 7.5 graphics architecture. Available in multiple graphics versions- GT2(20 Execution Units), GT3(40 Execution Units) and GT3e(40 Execution Units + eDRAM to provide faster secondary cache).

Haswell

Gen 7.5

4th Generation Core

 

GT2: Intel® HD Graphics 4200

GT2: Intel® HD Graphics 4400

GT2: Intel® HD Graphics 4600

 

GT3: Intel® Iris™ Graphics 5000

GT3: Intel® Iris™ Graphics 5100

 

GT3e: Intel® Iris™ Pro Graphics 5200

 

Broadwell (BDW) codename for 5th generation Intel processor based on 14nm die shrink of Haswell architecture and Gen 8 graphics architecture. Available in multiple graphics versions - GT2(24 Execution Units), GT3(48 Execution Units) and GT3e(48 Execution Units + eDRAM to provide faster secondary cache).

Broadwell

Gen8

5th Generation Core

GT2: Intel® HD Graphics 5500

GT2: Intel® HD Graphics 5600

GT2: Intel® HD Graphics 5700

 

GT3: Intel® Iris™ Graphics 6100

GT3e: Intel® Iris™ Pro Graphics 6200

Skylake (SKL) codename for 6th generation Intel processor based on 14nm manufacturing technology and Gen 9 graphics architecture. Available in multiple graphics versions - GT1 (12 Execution Units), GT2(24 Execution Units), GT3(48 Execution Units) and GT3e(48 Execution Units + eDRAM), GT4e (72 Execution Units + eDRAM to provide faster secondary cache).

Skylake

Gen9

6th Generation Core

GT1: Intel® HD Graphics 510 (12 EUs)

GT2: Intel® HD Graphics 520 (24 EUs, 1050MHz)

GT2: Intel® HD Graphics 530 (24 EUs, 1150MHz)

 

GT3e: Intel® Iris™ Graphics 540 (48 EUs, 1050MHz, 64 MB eDRAM)

GT3e: Intel® Iris™ Graphics 550 (48 EUs, 1100MHz, 64 MB eDRAM)

 

GT4e: Intel® Iris™ Pro Graphics 580 (72 EUs, 1050 MHz, 128 MB eDRAM)

GT4e: Intel® Iris™ Pro Graphics p580 (72 EUs, 1100 MHz, 128 MB eDRAM)

For more details please check

 

 

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Kronos.

 

Intel® Software Guard Extensions Tutorial Series: Part 5, Enclave Development

$
0
0

In Part 5 of the Intel® Software Guard Extensions (Intel® SGX) tutorial series, we’ll finish developing the enclave for the Tutorial Password Manager application. In Part 4 of the series, we created a DLL to serve as our interface layer between the enclave bridge functions and the C++/CLI program core, and defined our enclave interface. With those components in place, we can now focus our attention on the enclave itself.

You can find the list of all of the published tutorials in the article Introducing the Intel® Software Guard Extensions Tutorial Series.

There is source code provided with this installment of the series: the completed application with its enclave. This version is hardcoded to run the Intel SGX code path.

The Enclave Components

To identify which components need to be implemented within the enclave, we’ll refer to the class diagram for the application core in Figure 1, which was first introduced in Part 3. As before, the objects that will reside in the enclave are shaded in green while the untrusted components are shaded in blue.


Figure 1. Class diagram for the Tutorial Password Manager with Intel® Software Guard Extensions.

From this we can identify four classes that need to be ported:

  • Vault
  • AccountRecord
  • Crypto
  • DRNG

Before we get started, however, we do need to make a design decision. Our application must function on systems both with and without Intel SGX support, and that means we can’t simply convert our existing classes so that they function within the enclave. We must create two versions of each: one intended for use in enclaves, and one for use in untrusted memory. The question is, how should this dual-support be implemented?

Option 1: Conditional Compilation

The first option is to implement both the enclave and untrusted functionality in the same source module and use preprocessor definitions and #ifdef statements to compile the appropriate code based on the context. The advantage of this approach is that we only need one source file for each class, and thus do not have to maintain changes in two places. The disadvantages are that the code can be more difficult to read, particularly if the changes between the two versions are numerous or significant, and the project structure will be more complex. Two of our Visual Studio* projects, Enclave and PasswordManagerCore, will share source files, and each will need to set a preprocessor symbol to ensure that the correct source code is compiled.

Option 2: Separate Classes

The second option is to duplicate each source file that has to go into the enclave. The advantages of this approach are that the enclave has its own copy of the source files which we can modify directly, allowing for a simpler project structure and easier code view. But, these come at a cost: if we need to make changes to the classes, those changes must be made in two places, even if those changes are common to both the enclave and untrusted versions.

Option 3: Inheritance

The third option is to use the C++ feature of class inheritance. The functions common to both versions of the class would be implemented in the base class, and the derived classes would implement the branch-specific methods. The big advantage to this approach is that it is a very natural and elegant solution to the problem, using a feature of the language that is designed to do exactly what we need. The disadvantages are the added complexity required in both the project structure and the code itself.

There is no hard and fast rule here, and the decision does not have to be a global one. A good rule of thumb is that Option 1 is best for modules where the changes are small or easily compartmentalized, and Options 2 and 3 are best when the changes are significant or result in source code that is difficult to read and maintain. However, it really comes down to style and preference, and either approach is fine.

For now, we’ll choose Option 2 because it allows for easy side-by-side comparisons of the enclave and untrusted source files. In a future installment of the tutorial series we may switch to Option 3 in order to tighten up the code.

The Enclave Classes

Each class has its own set of issues and challenges when it comes to adapting it to the enclave, but there is one universal truth that will apply to all of them: we no longer have to zero-fill our memory before freeing it. As you recall from Part 3, this was a recommended action when handling secure data in untrusted memory. Because our enclave memory is encrypted by the CPU, using an encryption key that is not available to any hardware layer, the contents of freed memory will contain what appears to be random data to other applications. This means we can remove all calls to SecureZeroMemory that are inside the enclave.

The Vault Class

The Vault class is our interface to the password vault operations. All of our bridge functions act through one or more methods in Vault. Its declaration from Vault.h is shown below.

class PASSWORDMANAGERCORE_API Vault
{
	Crypto crypto;
	char m_pw_salt[8];
	char db_key_nonce[12];
	char db_key_tag[16];
	char db_key_enc[16];
	char db_key_obs[16];
	char db_key_xor[16];
	UINT16 db_version;
	UINT32 db_size; // Use get_db_size() to fetch this value so it gets updated as needed
	char db_data_nonce[12];
	char db_data_tag[16];
	char *db_data;
	UINT32 state;
	// Cache the number of defined accounts so that the GUI doesn't have to fetch
	// "empty" account info unnecessarily.
	UINT32 naccounts;

	AccountRecord accounts[MAX_ACCOUNTS];
	void clear();
	void clear_account_info();
	void update_db_size();

	void get_db_key(char key[16]);
	void set_db_key(const char key[16]);

public:
	Vault();
	~Vault();

	int initialize();
	int initialize(const unsigned char *header, UINT16 size);
	int load_vault(const unsigned char *edata);

	int get_header(unsigned char *header, UINT16 *size);
	int get_vault(unsigned char *edata, UINT32 *size);

	UINT32 get_db_size();

	void lock();
	int unlock(const char *password);

	int set_master_password(const char *password);
	int change_master_password(const char *oldpass, const char *newpass);

	int accounts_get_count(UINT32 *count);
	int accounts_get_info_sizes(UINT32 idx, UINT16 *mbname_sz, UINT16 *mblogin_sz, UINT16 *mburl_sz);
	int accounts_get_info(UINT32 idx, char *mbname, UINT16 mbname_sz, char *mblogin, UINT16 mblogin_sz,
		char *mburl, UINT16 mburl_sz);

	int accounts_get_password_size(UINT32 idx, UINT16 *mbpass_sz);
	int accounts_get_password(UINT32 idx, char *mbpass, UINT16 mbpass_sz);

	int accounts_set_info(UINT32 idx, const char *mbname, UINT16 mbname_len, const char *mblogin, UINT16 mblogin_len,
		const char *mburl, UINT16 mburl_len);
	int accounts_set_password(UINT32 idx, const char *mbpass, UINT16 mbpass_len);

	int accounts_generate_password(UINT16 length, UINT16 pwflags, char *cpass);

	int is_valid() { return _VST_IS_VALID(state); }
	int is_locked() { return ((state&_VST_LOCKED) == _VST_LOCKED) ? 1 : 0; }
};

The declaration for the enclave version of this class, which we’ll call E_Vault for clarity, will be identical except for one crucial change: database key handling.

In the untrusted code path, the Vault object must store the database key, decrypted, in memory. Every time we make a change to our password vault we have to encrypt the updated vault data and write it to disk, and that means the key must be at our disposal. We have four options:

  1. Prompt the user for their master password on every change so that the database key can be derived on demand.
  2. Cache the user’s master password so that the database key can be derived on demand without user intervention.
  3. Encrypt, encode, and/or obscure the database key in memory.
  4. Store the key in the clear.

None of these are good solutions and they highlight the need for technologies like Intel SGX. The first is arguably the most secure, but no user would want to run an application that behaved in this manner. The second could be achieved using the SecureString class in .NET*, but it is still vulnerable to inspection via a debugger and there is a performance cost associated with the key derivation function that a user might find unacceptable. The third option is effectively insecure as the second, only it comes without a performance penalty. The fourth option is the worst of the lot.

Our Tutorial Password Manager uses the third option: the database key is XOR’d with a 128-bit value that is randomly generated when a vault file is opened, and it is stored in memory only in this XOR’d form. This is effectively a one-time pad encryption scheme. It is open to inspection for anyone running a debugger, but it does limit the amount of time in which the database key is present in memory in the clear.

void Vault::set_db_key(const char db_key[16])
{
	UINT i, j;
	for (i = 0; i < 4; ++i)
		for (j = 0; j < 4; ++j) db_key_obs[4 * i + j] = db_key[4 * i + j] ^ db_key_xor[4 * i + j];
}

void Vault::get_db_key(char db_key[16])
{
	UINT i, j;
	for (i = 0; i < 4; ++i)
		for (j = 0; j < 4; ++j) db_key[4 * i + j] = db_key_obs[4 * i + j] ^ db_key_xor[4 * i + j];
}

This is obviously security through obscurity, and since we are publishing the source code, it’s not even particularly obscure. We could choose a better algorithm or go to greater lengths to hide both the database key and the pad’s secret key (including how they are stored in memory); but in the end, the method we choose would still be vulnerable to inspection via a debugger, and the algorithm would still be published for anyone to see.

Inside the enclave, however, this problem goes away. The memory is protected by hardware-backed encryption, so even when the database key is decrypted it is not open to inspection by anyone, even a process running with elevated privileges. As a result, we no longer need these class members or methods:

char db_key_obs[16];
char db_key_xor[16];

	void get_db_key(char key[16]);
	void set_db_key(const char key[16]);

We can replace them with just one class member: a char array to hold the database key.

char db_key[16];

The AccountInfo Class

The account data is stored in a fixed-size array of AccountInfo objects as a member of the Vault object. The declaration for AccountInfo is also found in Vault.h, and it is shown below:

class PASSWORDMANAGERCORE_API AccountRecord
{
	char nonce[12];
	char tag[16];
	// Store these in their multibyte form. There's no sense in translating
	// them back to wchar_t since they have to be passed in and out as
	// char * anyway.
	char *name;
	char *login;
	char *url;
	char *epass;
	UINT16 epass_len; // Can't rely on NULL termination! It's an encrypted string.

	int set_field(char **field, const char *value, UINT16 len);
	void zero_free_field(char *field, UINT16 len);

public:
	AccountRecord();
	~AccountRecord();

	void set_nonce(const char *in) { memcpy(nonce, in, 12); }
	void set_tag(const char *in) { memcpy(tag, in, 16); }

	int set_enc_pass(const char *in, UINT16 len);
	int set_name(const char *in, UINT16 len) { return set_field(&name, in, len); }
	int set_login(const char *in, UINT16 len) { return set_field(&login, in, len); }
	int set_url(const char *in, UINT16 len) { return set_field(&url, in, len); }

	const char *get_epass() { return (epass == NULL)? "" : (const char *)epass; }
	const char *get_name() { return (name == NULL) ? "" : (const char *)name; }
	const char *get_login() { return (login == NULL) ? "" : (const char *)login; }
	const char *get_url() { return (url == NULL) ? "" : (const char *)url; }
	const char *get_nonce() { return (const char *)nonce; }
	const char *get_tag() { return (const char *)tag; }

	UINT16 get_name_len() { return (name == NULL) ? 0 : (UINT16)strlen(name); }
	UINT16 get_login_len() { return (login == NULL) ? 0 : (UINT16)strlen(login); }
	UINT16 get_url_len() { return (url == NULL) ? 0 : (UINT16)strlen(url); }
	UINT16 get_epass_len() { return (epass == NULL) ? 0 : epass_len; }

	void clear();
};

We actually don’t need to do anything to this class for it to work inside the enclave. Other than remove the unnecessary calls to SecureZeroFree, this class is fine as is. However, we are going to change it anyway in order to illustrate a point: within the enclave, we gain some flexibility that we did not have before.

Returning to Part 3, another of our guidelines for securing data in untrusted memory space was avoiding container classes that manage their own memory, specifically the Standard Template Library’s std::string class. Inside the enclave this problem goes away, too. For the same reason that we don’t need to zero-fill our memory before freeing it, we don’t have to worry about how the Standard Template Library (STL) containers manager their memory. The enclave memory is encrypted, so even if fragments of our secure data remain there as a result of container operations, they can’t be inspected by other processes.

There’s also a good reason to use the std::string class inside the enclave: reliability. The code behind the STL containers has been through significant peer review over the years and it can be argued that it is safer to use it than implement our own high-level string functions when given the choice. For simple code like what’s in the AccountInfo class, it’s probably not a significant issue, but in more complex programs this can be a huge benefit. However, this does come at the cost of a larger DLL due to the added STL code.

The new class declaration, which we’ll call E_AccountInfo, is shown below:

#define TRY_ASSIGN(x) try{x.assign(in,len);} catch(...){return 0;} return 1

class E_AccountRecord
{
	char nonce[12];
	char tag[16];
	// Store these in their multibyte form. There's no sense in translating
	// them back to wchar_t since they have to be passed in and out as
	// char * anyway.
	string name, login, url, epass;

public:
	E_AccountRecord();
	~E_AccountRecord();

	void set_nonce(const char *in) { memcpy(nonce, in, 12); }
	void set_tag(const char *in) { memcpy(tag, in, 16); }

	int set_enc_pass(const char *in, uint16_t len) { TRY_ASSIGN(epass); }
	int set_name(const char *in, uint16_t len) { TRY_ASSIGN(name); }
	int set_login(const char *in, uint16_t len) { TRY_ASSIGN(login); }
	int set_url(const char *in, uint16_t len) { TRY_ASSIGN(url); }

	const char *get_epass() { return epass.c_str(); }
	const char *get_name() { return name.c_str(); }
	const char *get_login() { return login.c_str(); }
	const char *get_url() { return url.c_str(); }

	const char *get_nonce() { return (const char *)nonce; }
	const char *get_tag() { return (const char *)tag; }

	uint16_t get_name_len() { return (uint16_t) name.length(); }
	uint16_t get_login_len() { return (uint16_t) login.length(); }
	uint16_t get_url_len() { return (uint16_t) url.length(); }
	uint16_t get_epass_len() { return (uint16_t) epass.length(); }

	void clear();
};

The tag and nonce members are still stored as char arrays. Our password encryption is done with AES in GCM mode, using a 128-bit key, a 96-bit nonce, and a 128-bit authentication tag. Since the size of the nonce and the tag are fixed there is no reason to store them as anything other than simple char arrays.

Note that this std::string-based approach has allowed us to almost completely define the class in the header file.

The Crypto Class

The Crypto class provides our cryptographic functions. The class declaration is shown below.

class PASSWORDMANAGERCORE_API Crypto
{
	DRNG drng;

	crypto_status_t aes_init (BCRYPT_ALG_HANDLE *halgo, LPCWSTR algo_id, PBYTE chaining_mode, DWORD chaining_mode_len, BCRYPT_KEY_HANDLE *hkey, PBYTE key, ULONG key_len);
	void aes_close (BCRYPT_ALG_HANDLE *halgo, BCRYPT_KEY_HANDLE *hkey);

	crypto_status_t aes_128_gcm_encrypt(PBYTE key, PBYTE nonce, ULONG nonce_len, PBYTE pt, DWORD pt_len, PBYTE ct, DWORD ct_sz, PBYTE tag, DWORD tag_len);
	crypto_status_t aes_128_gcm_decrypt(PBYTE key, PBYTE nonce, ULONG nonce_len, PBYTE ct, DWORD ct_len, PBYTE pt, DWORD pt_sz, PBYTE tag, DWORD tag_len);
	crypto_status_t sha256_multi (PBYTE *messages, ULONG *lengths, BYTE hash[32]);

public:
	Crypto(void);
	~Crypto(void);

	crypto_status_t generate_database_key (BYTE key_out[16], GenerateDatabaseKeyCallback callback);
	crypto_status_t generate_salt (BYTE salt[8]);
	crypto_status_t generate_salt_ex (PBYTE salt, ULONG salt_len);
	crypto_status_t generate_nonce_gcm (BYTE nonce[12]);

	crypto_status_t unlock_vault(PBYTE passphrase, ULONG passphrase_len, BYTE salt[8], BYTE db_key_ct[16], BYTE db_key_iv[12], BYTE db_key_tag[16], BYTE db_key_pt[16]);

	crypto_status_t derive_master_key (PBYTE passphrase, ULONG passphrase_len, BYTE salt[8], BYTE mkey[16]);
	crypto_status_t derive_master_key_ex (PBYTE passphrase, ULONG passphrase_len, PBYTE salt, ULONG salt_len, ULONG iterations, BYTE mkey[16]);

	crypto_status_t validate_passphrase(PBYTE passphrase, ULONG passphrase_len, BYTE salt[8], BYTE db_key[16], BYTE db_iv[12], BYTE db_tag[16]);
	crypto_status_t validate_passphrase_ex(PBYTE passphrase, ULONG passphrase_len, PBYTE salt, ULONG salt_len, ULONG iterations, BYTE db_key[16], BYTE db_iv[12], BYTE db_tag[16]);

	crypto_status_t encrypt_database_key (BYTE master_key[16], BYTE db_key_pt[16], BYTE db_key_ct[16], BYTE iv[12], BYTE tag[16], DWORD flags= 0);
	crypto_status_t decrypt_database_key (BYTE master_key[16], BYTE db_key_ct[16], BYTE iv[12], BYTE tag[16], BYTE db_key_pt[16]);

	crypto_status_t encrypt_account_password (BYTE db_key[16], PBYTE password_pt, ULONG password_len, PBYTE password_ct, BYTE iv[12], BYTE tag[16], DWORD flags= 0);
	crypto_status_t decrypt_account_password (BYTE db_key[16], PBYTE password_ct, ULONG password_len, BYTE iv[12], BYTE tag[16], PBYTE password);

	crypto_status_t encrypt_database (BYTE db_key[16], PBYTE db_serialized, ULONG db_size, PBYTE db_ct, BYTE iv[12], BYTE tag[16], DWORD flags= 0);
	crypto_status_t decrypt_database (BYTE db_key[16], PBYTE db_ct, ULONG db_size, BYTE iv[12], BYTE tag[16], PBYTE db_serialized);

	crypto_status_t generate_password(PBYTE buffer, USHORT buffer_len, USHORT flags);
};

The public methods in this class are modeled to perform various high-level vault operations: unlock_vault, derive_master_key, validate_passphrase, encrypt_database, and so on. Each of these methods invokes one or more cryptographic algorithms in order to complete its task. For example, the unlock_vault method takes the passphrase supplied by the user, runs it through the SHA-256-based key derivation function, and uses the resulting key to decrypt the database key using AES-128 in GCM mode.

These high-level methods do not, however, directly invoke the cryptographic primitives. Instead, they call into a middle layer which implements each cryptographic algorithm as a self-contained function.


Figure 2. Cryptographic library dependancies.

The private methods that make up our middle layer are built on the cryptographic primitives and support functions provided by the underlying cryptographic library, as illustrated in Figure 2. The non-Intel SGX implementation relies on Microsoft’s Cryptography API: Next Generation (CNG) for these, but we can’t use this same library inside the enclave because an enclave cannot have dependencies on external DLLs. To build the Intel SGX version of this class, we need to replace those underlying functions with the ones in the trusted crypto library that is distributed with the Intel SGX SDK. (As you might recall from Part 2, we were careful to choose cryptographic functions that were common to both CNG and the Intel SGX trusted crypto library for this very reason.)

To create our enclave-capable Crypto class, which we’ll call E_Crypto, what we need to do is modify these private methods:

crypto_status_t aes_128_gcm_encrypt(PBYTE key, PBYTE nonce, ULONG nonce_len, PBYTE pt, DWORD pt_len, PBYTE ct, DWORD ct_sz, PBYTE tag, DWORD tag_len);
	crypto_status_t aes_128_gcm_decrypt(PBYTE key, PBYTE nonce, ULONG nonce_len, PBYTE ct, DWORD ct_len, PBYTE pt, DWORD pt_sz, PBYTE tag, DWORD tag_len);
	crypto_status_t sha256_multi (PBYTE *messages, ULONG *lengths, BYTE hash[32]);

A description of each, and the primitives and support functions from CNG upon which they are built, is given in Table 1.

Method

Algorithm

CNG Primitives and Support Functions

aes_128_gcm_encrypt

AES encryption in GCM mode with:

  • A 128-bit key
  • A 128-bit authentication tag
  • No additional authenticated data (AAD)

BCryptOpenAlgorithmProvider
BCryptSetProperty
BCryptGenerateSymmetricKey
BCryptEncrypt
BCryptCloseAlgorithmProvider
BCryptDestroyKey

aes_128_gcm_decrypt

AES encryption in GCM mode with:

  • A 128-bit key
  • A 128-bit authentication tag
  • No AAD

BCryptOpenAlgorithmProvider
BCryptSetProperty
BCryptGenerateSymmetricKey
BCryptDecrypt
BCryptCloseAlgorithmProvider
BCryptDestroyKey

sha256_multi

SHA-256 hash (incremental)

BCryptOpenAlgorithmProvider
BCryptGetProperty
BCryptCreateHash
BCryptHashData
BCryptFinishHash
BCryptDestroyHash
BCryptCloseAlgorithmProvider

Table 1. Mapping Crypto class methods to Cryptography API: Next Generation functions

CNG provides very fine-grained control over its encryption algorithms, as well as several optimizations for performance. Our Crypto class is actually fairly inefficient: each time one of these algorithms is called, it initializes the underlying primitives from scratch and then completely closes them down. This is not a significant issue for a password manager, which is UI-driven and only encrypts a small amount of data at a time. A high-performance server application such as a web or database server would need a more sophisticated approach.

The API for the trusted cryptography library distributed with the Intel SGX SDK more closely resembles our middle layer than CNG. There is less granular control over the underlying primitives, but it does make developing our E_Crypto class much simpler. Table 2 shows the new mapping between our middle layer and the underlying provider.

Method

Algorithm

Intel® SGX Trusted Cryptography Library Primitives and Support Functions

aes_128_gcm_encrypt

AES encryption in GCM mode with:

  • A 128-bit key
  • A 128-bit authentication tag
  • No additional authenticated data (AAD)

sgx_rijndael128GCM_encrypt

aes_128_gcm_decrypt

AES encryption in GCM mode with:

  • A 128-bit key
  • A 128-bit authentication tag
  • No AAD

sgx_rijndael128GCM_decrypt

sha256_multi

SHA-256 hash (incremental)

sgx_sha256_init
sgx_sha256_update
sgx_sha256_get_hash
sgx_sha256_close

Table 2. Mapping Crypto class methods to Intel® SGX Trusted Cryptography Library functions

The DRNG Class

The DRNG class is the interface to the on-chip digital random number generator, courtesy of Intel® Secure Key. To stay consistent with our previous actions we’ll name the enclave version of this class E_DRNG.

We’ll be making two changes in this class to prepare it for the enclave, but both of these changes are internal to the class methods. The class declaration will stay the same.

The CPUID Instruction

One of our application requirements is that the CPU supports Intel Secure Key. Even though Intel SGX is a newer feature than Secure Key, there is no guarantee that all future generations of all possible CPUs which support Intel SGX will also support Intel Secure Key. While it’s hard to conceive of such a situation today, best practice is to not assume a coupling between features where one does not exist. If a set of features have independent detection mechanisms, then you must assume that the features are independent of one another and check for them separately. This means that no matter how tempting it may be to assume that a CPU with support for Intel SGX will also support Intel Secure Key, we absolutely must not do so under any circumstances.

Further complicating the situation is the fact that Intel Secure Key actually consists of two independent features, both of which must also be checked separately. Our application must determine support for both the RDRAND and RDSEED instructions. For more information on Intel Secure Key, see the Intel Digital Random Number Generator (DRNG) Software Implementation Guide.

The constructor in the DRNG class is responsible for the RDRAND and RDSEED feature detection checks. It makes the necessary calls to the CPUID instruction using the compiler intrinsics __cpuid and __cpuidex, and sets a static, global variable with the results.

static int _drng_support= DRNG_SUPPORT_UNKNOWN;
static int _drng_support= DRNG_SUPPORT_UNKNOWN;

DRNG::DRNG(void)
{
	int info[4];

	if (_drng_support != DRNG_SUPPORT_UNKNOWN) return;

	_drng_support= DRNG_SUPPORT_NONE;

	// Check our feature support

	__cpuid(info, 0);

	if ( memcmp(&(info[1]), "Genu", 4) ||
		memcmp(&(info[3]), "ineI", 4) ||
		memcmp(&(info[2]), "ntel", 4) ) return;

	__cpuidex(info, 1, 0);

	if ( ((UINT) info[2]) & (1<<30) ) _drng_support|= DRNG_SUPPORT_RDRAND;

#ifdef COMPILER_HAS_RDSEED_SUPPORT
	__cpuidex(info, 7, 0);

	if ( ((UINT) info[1]) & (1<<18) ) _drng_support|= DRNG_SUPPORT_RDSEED;
#endif
}

The problem for the E_DRNG class is that CPUID is not a legal instruction inside of an enclave. To call CPUID, one must use an OCALL to exit the enclave and then invoke CPUID in untrusted code. Fortunately, the Intel SGX SDK designers have created two convenient functions that greatly simplify this task: sgx_cpuid and sgx_cpuidex. These functions automatically perform the OCALL for us, and the OCALL itself is automatically generated. The only requirement is that the EDL file must import the sgx_tstdc.edl header:

enclave {

	/* Needed for the call to sgx_cpuidex */
	from "sgx_tstdc.edl" import *;

    trusted {
        /* define ECALLs here. */

		public int ve_initialize ();
		public int ve_initialize_from_header ([in, count=len] unsigned char *header, uint16_t len);
		/* Our other ECALLs have been omitted for brevity */
	};

    untrusted {
    };
};

The feature detection code in the E_DRNG constructor becomes:

static int _drng_support= DRNG_SUPPORT_UNKNOWN;

E_DRNG::E_DRNG(void)
{
	int info[4];
	sgx_status_t status;

	if (_drng_support != DRNG_SUPPORT_UNKNOWN) return;

	_drng_support = DRNG_SUPPORT_NONE;

	// Check our feature support

	status= sgx_cpuid(info, 0);
	if (status != SGX_SUCCESS) return;

	if (memcmp(&(info[1]), "Genu", 4) ||
		memcmp(&(info[3]), "ineI", 4) ||
		memcmp(&(info[2]), "ntel", 4)) return;

	status= sgx_cpuidex(info, 1, 0);
	if (status != SGX_SUCCESS) return;

	if (info[2]) & (1 << 30)) _drng_support |= DRNG_SUPPORT_RDRAND;

#ifdef COMPILER_HAS_RDSEED_SUPPORT
	status= __cpuidex(info, 7, 0);
	if (status != SGX_SUCCESS) return;

	if (info[1]) & (1 << 18)) _drng_support |= DRNG_SUPPORT_RDSEED;
#endif
}

 

 

Because calls to the CPUID instruction must take place in untrusted memory, the results of CPUID cannot be trusted! This warning applies whether you run CPUID yourself or rely on the SGX functions to do it for you. The Intel SGX SDK offers this advice: “Code should verify the results and perform a threat evaluation to determine the impact on trusted code if the results were spoofed.”

In our tutorial password manager, there are three possible outcomes:

  1. RDRAND and/or RDSEED are not detected, but a positive result for one or both is spoofed. This will lead to an illegal instruction fault at runtime, at which point the program will crash.
     
  2. RDRAND is detected, but a negative result is spoofed. This will result in an error at runtime, causing the program to exit gracefully since a required feature is not detected.
     
  3. RDSEED is detected, but a negative result is spoofed. This will cause the program to fall back to the seed-from-RDRAND method for generating random seeds, which has a small performance impact. The program will otherwise function normally.
Since our worst-case scenarios are denial-of-service issues, which do not compromise the application’s secrets or robustness, we will not attempt to detect spoofing attacks.

Generating Seeds from RDRAND

In the event that the underlying CPU does not support the RDSEED instruction, we need to be able to use the RDRAND instruction to generate random seeds that are functionally equivalent to what we would have received from RDSEED if it were available. The Intel Digital Random Number Generator (DRNG) Software Implementation Guide describes the process of obtaining random seeds from RDRAND in detail, but the short version is that one method for doing this is to generate 512 pairs of 128-bit values and mix the intermediate values together using the CBC-MAC mode of AES to produce a single, 128-bit seed. The process is repeated to generate as many seeds as necessary.

In the non-Intel SGX code path, the method seed_from_rdrand uses CNG to build the cryptographic algorithm. Since the Intel SGX code path can’t depend on CNG, we once again need to turn to the trusted cryptographic library that is distributed with the Intel SGX SDK. The changes are summarized in Table 3.

Algorithm

CNG Primitives and Support Functions

Intel® SGX Trusted Cryptography Library Primitives and Support Functions

aes-cmac

BCryptOpenAlgorithmProvider
BCryptGenerateSymmetricKey
BCryptSetProperty
BCryptEncrypt
BCryptDestroyKey
BCryptCloseAlgorithmProvider

sgx_cmac128_init
sgx_cmac128_update
sgx_cmac128_final
sgx_cmac128_close

Table 3. Cryptographic function changes to the E_DRNG class’s seed_from_rdrand method

Why is this algorithm embedded in the DRNG class and not implemented in the Crypto class with the other cryptographic algorithms? This is simply a design decision. The DRNG class only needs this one algorithm, so we chose not to create a co-dependency between DRNG and Crypto (currently, Crypto does depend on DRNG). The Crypto class is also structured to provide the cryptographic services for vault operations rather than function as a general-purpose cryptographic API.

Why Not Use sgx_read_rand?

The Intel SGX SDK provides the function sgx_read_rand as a means of obtaining random numbers inside of an enclave. There are three reasons why we aren’t using it:

  1. As documented in the Intel SGX SDK, this function is “provided to replace the C standard pseudo-random sequence generation functions inside the enclave, since these standard functions are not supported in the enclave, such as rand, srand, etc.” While sgx_read_rand does call the RDRAND instruction if it is supported by the CPU, it falls back to the trusted C library’s implementation of srand and rand if it is not. The random numbers produced by the C library are not suitable for cryptographic use. It is highly unlikely that this situation will ever occur, but as mentioned in the section on CPUID, we must not assume that it will never occur.
  2. There is no Intel SGX SDK function for calling the RDSEED instruction and that means we still have to use compiler intrinsics in our code. While we could replace the RDRAND intrinsics with calls to sgx_read_rand, it would not gain us anything in terms of code management or structure and it would cost us additional time.
  3. The intrinsics will marginally outperform sgx_read_rand since there is one less layer of function calls in the resulting code.

Wrapping Up

With these code changes, we have a fully functioning enclave! However, there are still some inefficiencies in the implementation and some gaps in functionality, and we’ll revisit the enclave design in Parts 7 and 8 in order to address them.

As mentioned in the introduction, there is sample code provided with this part for you to download. The attached archive includes the source code for the Tutorial Password Manager core, including the enclave and its wrapper functions. This source code should be functionally identical to Part 3, only we have hardcoded Intel SGX support to be on.

Coming Up Next

In Part 6 of the tutorial we’ll add dynamic feature detection to the password manager, allowing it to choose the appropriate code path based on whether or not Intel SGX is supported on the underlying platform. Stay tuned!

Intel® Software Guard Extensions Tutorial Series: Part 6, Dual Code Paths

$
0
0

In Part 6 of the Intel® Software Guard Extensions (Intel® SGX) tutorial series, we set aside the enclave to address an outstanding design requirement that was laid out in Part 2, Application Design: provide support for dual code paths. We want to make sure our Tutorial Password Manager will function on hosts both with and without Intel SGX capability. Much of the content in this part comes from the article, Properly Detecting Intel® Software Guard Extensions in Your Applications.

You can find the list of all of the published tutorials in the article Introducing the Intel® Software Guard Extensions Tutorial Series.

There is source code provided with this installment of the series.

All Intel® Software Guard Extensions Applications Need Dual Code Paths

First it’s important to point out that all Intel SGX applications must have dual code paths. Even if an application is written so that it should only execute if Intel SGX is available and enabled, a fallback code path must exist so that you can present a meaningful error message to the user and then exit gracefully.

In short, an application should never crash or fail to launch solely because the platform does not support Intel SGX.

Scoping the Problem

In Part 5 of the series we completed our first version of our application enclave and tested it by hardcoding the enclave support to be on. That was done by setting the _supports_sgx flag in PasswordCoreNative.cpp.

PasswordManagerCoreNative::PasswordManagerCoreNative(void)
{
	_supports_sgx= 1;
	adsize= 0;
	accountdata= NULL;
	timer = NULL;
}

Obviously, we can’t leave this on by default. The convention for feature detection is that features are off by default and turned on if they are detected. So our first step is to undo this change and set the flag back to 0, effectively disabling the Intel SGX code path.

PasswordManagerCoreNative::PasswordManagerCoreNative(void)
{
	_supports_sgx= 0;
	adsize= 0;
	accountdata= NULL;
	timer = NULL;
}

However, before we get into the feature detection procedure, we’ll give the console application that runs our test suite, CLI Test App, a quick functional test by executing it on an older system that does not have the Intel SGX feature. With this flag set to zero, the application will not choose the Intel SGX code path and thus should run normally.

Here’s the output from a 4th generation Intel® Core™ i7 processor-based laptop, running Microsoft Windows* 8.1, 64-bit. This system does not support Intel SGX.

CLI Test App

What Happened?

Clearly we have a problem even when the Intel SGX code path is explicitly disabled in the software. This application, as written, cannot execute on a system without Intel SGX support. It didn’t even start executing. So what’s going on?

The clue in this case comes from the error message in the console window:

System.IO.FileNotFoundException: Could not load file or assembly ‘PasswordManagerCore.dll’ or one of its dependencies. The specified file could not be found.

Let’s take a look at PasswordManagerCore.dll and its dependencies:

Additional Dependencies

In addition to the core OS libraries, we have dependencies on bcrypt.lib and EnclaveBridge.lib, which will require bcrypt.dll and EnclaveBridge.dll at runtime. Since bcrypt.dll comes from Microsoft and is included in the OS, we can reasonably assume its dependencies, if any, are already installed. That leaves EnclaveBridge.dll.

Examining its dependencies, we see the following:

Additional Dependencies

This is the problem. Even though we have the Intel SGX code path explicitly disabled, EnclaveBridge.dll still has references to the Intel SGX runtime libraries. All symbols in an object module must be resolved as soon as it is loaded. It doesn’t matter if we disable the Intel SGX code path: undefined symbols are still present in the DLL. When PasswordManagerCore.dll loads, it resolves its undefined symbols by loading bcrypt.dll and EnclaveBridge.dll, the latter of which, in turn, attempts to resolve its undefined symbols by loading sgx_urts.dll and sgx_uae_service.dll. The system we tried to run our command-line test application on does not have these libraries, and since the OS can’t resolve all of the symbols it throws an exception and the program crashes before it even starts.

These two DLLs are part of the Intel SGX Platform Software (PSW) package, and without them Intel SGX applications written using the Intel SGX Software Development Kit (SDK) cannot execute. Our application needs to be able to run even if these libraries are not present.

The Platform Software Package

As mentioned above, the runtime libraries are part of the PSW. In addition to these support libraries, the PSW includes:

  • Services that support and maintain the trusted compute block (TCB) on the system
  • Services that perform and manage certain Intel SGX operations such as attestation
  • Interfaces to platform services such as trusted time and the monotonic counters

The PSW must be installed by the application installer when deploying an Intel SGX application, because Intel does not offer the PSW for direct download by end users. Software vendors must not assume that it will already be present and installed on the destination system. In fact, the license agreement for Intel SGX specifically states that licensees must re-distribute the PSW with their applications.

We’ll discuss the PSW installer in more detail in a future installment of the series covering packaging and deployment.

Detecting Intel Software Guard Extensions Support

So far we’ve focused on the problem of just starting our application on systems without Intel SGX support, and more specifically, without the PSW. The next step is to detect whether or not Intel SGX support is present and enabled once the application is running.

Intel SGX feature detection is, unfortunately, a complicated procedure. For a system to be Intel SGX capable, four conditions must be met:

  1. The CPU must support Intel SGX.
  2. The BIOS must support Intel SGX.
  3. In the BIOS, Intel SGX must be explicitly enabled or set to the “software controlled” state.
  4. The PSW must be installed on the platform.

Note that the CPUID instruction, alone, is not sufficient to detect the usability of Intel SGX on a platform. It can tell you whether or not the CPU supports the feature, but it doesn’t know anything about the BIOS configuration or the software that is installed on a system. Relying solely on the CPUID results to make decisions about Intel SGX support can potentially lead to a runtime fault.

To make feature detection even more difficult, examining the state of the BIOS is not a trivial task and is  generally not possible from a user process. Fortunately the Intel SGX SDK provides a simple solution: the function sgx_enable_device will both check for Intel SGX capability and attempt to enable it if the BIOS is set to the software control state (the purpose of the software control setting is to allow applications to enable Intel SGX without requiring users to reboot their systems and enter their BIOS setup screens, a particularly daunting and intimidating task for non-technical users).

The problem with sgx_enable_device, though, is that it is part of the Intel SGX runtime, which means the PSW must be installed on the system in order to use it. So before we attempt to call sgx_enable_device, we must first detect whether or not the PSW is present.

Implementation

With our problem scoped out, we can now lay out the steps that must be followed, in order, for our dual-code path application to function properly. Our application must:

  1. Load and begin executing even without the Intel SGX runtime libraries.
  2. Determine whether or not the PSW package is installed.
  3. Determine whether or not Intel SGX is enabled (and attempt to enable it).

Loading and Executing without the Intel Software Guard Extensions Runtime

Our main application depends on PasswordManagerCore.dll, which depends on EnclaveBridge.dll, which in turn depends on the Intel SGX runtime. Since all symbols need to be resolved when an application loads, we need a way to prevent the loader from trying to resolve symbols that come from the Intel SGX runtime libraries. There are two options:

Option #1: Dynamic Loading      

In dynamic loading, you don’t explicitly link the library in the project. Instead you use system calls to load the library at runtime and then look up the names of each function you plan to use in order to get the addresses of where they have been placed in memory. Functions in the library are then invoked indirectly via function pointers.

Dynamic loading is a hassle. Even if you only need a handful of functions, it can be a tedious process to prototype function pointers for every function that is needed and get their load address, one at a time. You also lose some of the benefits provided by the integrated development environment (such as prototype assistance) since you are no longer explicitly calling functions by name.

Dynamic loading is typically used in extensible application architectures (for example, plug-ins).

Option #2: Delayed-Loaded DLLs

In this approach, you dynamically link all your libraries in the project, but instruct Windows to do delayed loading of the problem DLL. When a DLL is delay-loaded, Windows does not attempt to resolve symbols that are defined by that DLL when the application starts. Instead it waits until the program makes its first call to a function that is defined in that DLL, at which point the DLL is loaded and the symbols get resolved (along with any of its dependencies). What this means is that a DLL is not loaded until the application needs it. A beneficial side effect of this approach is that it allows applications to reference a DLL that is not installed, so long as no functions in that DLL are ever called.

When the Intel SGX feature flag is off, that is exactly the situation we are in so we will go with option #2.

You specify the DLL to be delay-loaded in the project configuration for the dependent application or DLL. For the Tutorial Password Manager, the best DLL to mark for delayed loading is EnclaveBridge.dll as we only call this DLL if the Intel SGX path is enabled. If this DLL doesn’t load, neither will the two Intel SGX runtime DLLS.

We set the option in the Linker -> Input page of the PasswordManagerCore.dll project configuration:

Password Manager

After the DLL is rebuilt and installed on our 4th generation Intel Core processor system, the console test application works as expected.

CLI Test App

Detecting the Platform Software Package

Before we can call the sgx_enable_device function to check for Intel SGX support on the platform, we first have to make sure that the PSW package is installed because sgx_enable_device is part of the Intel SGX runtime. The best way to do this is to actually try to load the runtime libraries.

We know from the previous step that we can’t just dynamically link them because that will cause an exception when we attempt to run the program on a system that does not support Intel SGX (or have the PSW package installed). But we also can’t rely on delay-loaded DLLs either: delayed loading can’t tell us if a library is installed because if it isn’t, the application will still crash! That means we must use dynamic loading to test for the presence of the runtime libraries.

The PSW runtime libraries should be installed in the Windows system directory so we’ll use GetSystemDirectory to get that path, and limit the DLL search path via a call to SetDllDirectory. Finally, the two libraries will be loaded using LoadLibrary. If either of these calls fail, we know the PSW is not installed and that the main application should not attempt to run the Intel SGX code path.

Detecting and Enabling Intel Software Guard Extensions

Since the previous step dynamically loads the PSW runtime libraries, we can just look up the symbol for sgx_enable_device manually and then invoke it via a function pointer. The result will tell us whether or not Intel SGX is enabled.

Implementation

To implement this in the Tutorial Password Manager we’ll create a new DLL called FeatureSupport.dll. We can safely dynamically link this DLL from the main application since it has no explicit dependencies on other DLLs.

Our feature detection will be rolled into a C++/CLI class called FeatureSupport, which will also include some high-level functions for getting more information about the state of Intel SGX. In rare cases, enabling Intel SGX via software may require a reboot, and in rarer cases the software enable action fails and the user may be forced to enable it explicitly in their BIOS.

The class declaration for FeatureSupport is shown below.

typedef sgx_status_t(SGXAPI *fp_sgx_enable_device_t)(sgx_device_status_t *);


public ref class FeatureSupport {
private:
	UINT sgx_support;
	HINSTANCE h_urts, h_service;

	// Function pointers

	fp_sgx_enable_device_t fp_sgx_enable_device;

	int is_psw_installed(void);
	void check_sgx_support(void);
	void load_functions(void);

public:
	FeatureSupport();
	~FeatureSupport();

	UINT get_sgx_support(void);
	int is_enabled(void);
	int is_supported(void);
	int reboot_required(void);
	int bios_enable_required(void);

	// Wrappers around SGX functions

	sgx_status_t enable_device(sgx_device_status_t *device_status);

};

Here are the low-level routines that check for the PSW package and attempt to detect and enable Intel SGX.

int FeatureSupport::is_psw_installed()
{
	_TCHAR *systemdir;
	UINT rv, sz;

	// Get the system directory path. Start by finding out how much space we need
	// to hold it.

	sz = GetSystemDirectory(NULL, 0);
	if (sz == 0) return 0;

	systemdir = new _TCHAR[sz + 1];
	rv = GetSystemDirectory(systemdir, sz);
	if (rv == 0 || rv > sz) return 0;

	// Set our DLL search path to just the System directory so we don't accidentally
	// load the DLLs from an untrusted path.

	if (SetDllDirectory(systemdir) == 0) {
		delete systemdir;
		return 0;
	}

	delete systemdir; // No longer need this

	// Need to be able to load both of these DLLs from the System directory.

	if ((h_service = LoadLibrary(_T("sgx_uae_service.dll"))) == NULL) {
		return 0;
	}

	if ((h_urts = LoadLibrary(_T("sgx_urts.dll"))) == NULL) {
		FreeLibrary(h_service);
		h_service = NULL;
		return 0;
	}

	load_functions();

	return 1;
}

void FeatureSupport::check_sgx_support()
{
	sgx_device_status_t sgx_device_status;

	if (sgx_support != SGX_SUPPORT_UNKNOWN) return;

	sgx_support = SGX_SUPPORT_NO;

	// Check for the PSW

	if (!is_psw_installed()) return;

	sgx_support = SGX_SUPPORT_YES;

	// Try to enable SGX

	if (this->enable_device(&sgx_device_status) != SGX_SUCCESS) return;

	// If SGX isn't enabled yet, perform the software opt-in/enable.

	if (sgx_device_status != SGX_ENABLED) {
		switch (sgx_device_status) {
		case SGX_DISABLED_REBOOT_REQUIRED:
			// A reboot is required.
			sgx_support |= SGX_SUPPORT_REBOOT_REQUIRED;
			break;
		case SGX_DISABLED_LEGACY_OS:
			// BIOS enabling is required
			sgx_support |= SGX_SUPPORT_ENABLE_REQUIRED;
			break;
		}

		return;
	}

	sgx_support |= SGX_SUPPORT_ENABLED;
}

void FeatureSupport::load_functions()
{
	fp_sgx_enable_device = (fp_sgx_enable_device_t)GetProcAddress(h_service, "sgx_enable_device");
}

// Wrappers around SDK functions so the user doesn't have to mess with dynamic loading by hand.

sgx_status_t FeatureSupport::enable_device(sgx_device_status_t *device_status)
{
	check_sgx_support();

	if (fp_sgx_enable_device == NULL) {
		return SGX_ERROR_UNEXPECTED;
	}

	return fp_sgx_enable_device(device_status);
}

Wrapping Up

With these code changes, we have integrated Intel SGX feature detection into our application! It will execute smoothly on systems both with and without Intel SGX support and choose the appropriate code branch.

As mentioned in the introduction, there is sample code provided with this part for you to download. The attached archive includes the source code for the Tutorial Password Manager core, including the new feature detection DLL. Additionally, we have added a new GUI-based test program that automatically selects the Intel SGX code path, but lets you disable it if desired (this option is only available if Intel SGX is supported on the system).

SGX Code Branch

The console-based test program has also been updated to detect Intel SGX, though it cannot be configured to turn it off without modifying the source code.

Coming Up Next

We’ll revisit the enclave in Part 7 in order to fine-tune the interface. Stay tuned!

Intel® RealSense™ SDK-Based Real-Time Face Tracking and Animation

$
0
0

Download Code Sample [ZIP 12.03 MB]

In some high-quality games, an avatar may have facial expression animation. These animations are usually pre-generated by the game artist and replayed in the game according to the fixed story plot. If players are given the ability to animate that avatar’s face based on their own facial motion in real time, it may enable personalized expression interaction and creative game play. Intel® RealSense™ technology is based on a consumer-grade RGB-D camera, which provides the building blocks like face detection and analysis functions for this new kind of usage. In this article, we introduce a method for an avatar to simulate user facial expression with the Intel® RealSense™ SDK and also provide the sample codes to be downloaded.

Figure 1: The sample application of Intel® RealSense™ SDK-based face tracking and animation.

System Overview

Our method is based on the idea of the Facial Action Coding System (FACS), which deconstructs facial expressions into specific Action Units (AU). AUs are a contraction or relaxation of one or more muscles. With the weights of the AUs, nearly any anatomically possible facial expression can be synthesized.

Our method also assumes that the user and avatar have compatible expression space so that the AU weights can be shared between them. Table 1 illustrates the AUs defined in the sample code.

Action UnitsDescription
MOUTH_OPENOpen the mouth
MOUTH_SMILE_LRaise the left corner of mouth
MOUTH_SMILE_RRaise the right corner of mouth
MOUTH_LEFTShift the mouth to the left
MOUTH_RIGHTShift the mouth to the right
  
EYEBROW_UP_LRaise the left eyebrow
EYEBROW_UP_RRaise the right eyebrow
EYEBROW_DOWN_LLower the left eyebrow
EYEBROW_DOWN_RLower the right eyebrow
  
EYELID_CLOSE_LClose left eyelid
EYELID_CLOSE_RClose right eyelid
EYELID_OPEN_LRaise left eyelid
EYELID_OPEN_RRaise right eyelid
  
EYEBALL_TURN_RMove both eyeballs to the right
EYEBALL_TURN_LMove both eyeballs to the left
EYEBALL_TURN_UMove both eyeballs up
EYEBALL_TURN_DMove both eyeballs down

Table 1: The Action Units defined in the sample code.

The pipeline of our method includes three stages: (1) tracking the user face by the Intel RealSense SDK, (2) using the tracked facial feature data to calculate the AU weights of the user’s facial expression, and (3) synchronizing the avatar facial expression through normalized AU weights and corresponding avatar AU animation assets.

Prepare Animation Assets

To synthesize the facial expression of the avatar, the game artist needs to prepare the animation assets for each AU of the avatar’s face. If the face is animated by a blend-shape rig, the blend-shape model of the avatar should contain the base shape built for a face of neutral expression and the target shapes, respectively, constructed for the face with the maximum pose of the corresponding AU. If a skeleton rig is used for facial animation, the animation sequence must be respectively prepared for every AU. The key frames of the AU animation sequence transform the avatar face from a neutral pose to the maximum pose of the corresponding AU. The duration of the animation doesn’t matter, but we recommend a duration of 1 second (31 frames, from 0 to 30).

The sample application demonstrates the animation assets and expression synthesis method for avatars with skeleton-based facial animation.

In the rest of the article, we discuss the implementation details in the sample code.

Face Tracking

In our method, the user face is tracked by the Intel RealSense SDK. The SDK face-tracking module provides a suite of the following face algorithms:

  • Face detection: Locates a face (or multiple faces) from an image or a video sequence, and returns the face location in a rectangle.
  • Landmark detection: Further identifies the feature points (eyes, mouth, and so on) for a given face rectangle.
  • Pose detection: Estimates the face’s orientation based on where the user's face is looking.

Our method chooses the user face that is closest to the Intel® RealSense™ camera as the source face for expression retargeting and gets this face’s 3D landmarks and orientation in camera space to use in the next stage.

Facial Expression Parameterization

Once we have the landmarks and orientation of the user’s face, the facial expression can be parameterized as a vector of AU weights. To obtain the AU weights, which can be used to control an avatar’s facial animation, we first measure the AU displacement. The displacement of the k-th AU

Dk is achieved by the following formula:

Where Skc is the k-th AU state in the current expression, Skn is the k-th AU state in a neutral expression, and Nk is the normalization factor for k-th AU state.

We measure AU states Skc and Skn in terms of the distances between the associated 3D landmarks. Using a 3D landmark in camera space instead of a 2D landmark in screen space can prevent the measurement from being affected by the distance between the user face and the Intel RealSense camera.

Different users have different facial geometry and proportions. So the normalization is required to ensure that the AU displacement extracted from two users have approximately the same magnitude when both are in the same expression. We calculated Nk in the initial calibration step on the user’s neutral expression, using the similar method to measure MPEG4 FAPU (Face Animation Parameter Unit).

In normalized expression space, we can define the scope for each AU displacement. The AU weights are calculated by the following formula:

Where Dkmax is the maximum of the k-th AU displacement.

Because of the accuracy of face tracking, the measured AU weights derived from the above formulas may generate an unnatural expression in some special situations. In the sample application, geometric constraints among AUs are used to adjust the measured weights to ensure that a reconstructed expression is plausible, even if not necessarily close to the input geometrically.

Also because of the input accuracy, the signal of the measured AU weights is noisy, which may have the reconstructed expression animation stuttering in some special situations. So smoothing AU weights is necessary. However, smoothing may cause latency, which impacts the agility of expression change.

We smooth the AU weights by interpolation between the weight of the current frame and that of previous frame as follows:

Where wi,k is the weight of the k-th AU in i-th frame.

To balance the requirements of both smoothing and agility, the smoothing factor of the i-th frame for AU weights, αi is set as the face-tracking confidence of this frame. The face-tracking confidence is evaluated according to the lost tracking rate and the angle of the face deviating from a neutral pose. The higher the lost tracking rate and bigger deviation angle, the lower the confidence to get accurate tracking data.

Similarly, the face angle is smoothed by interpolation between the angle of the current frame and that of the previous frame as follows:

To balance the requirements of both smoothing and agility, the smoothing factor of the i-th frame for face angle, βi, is adaptive to face angles and calculated by

Where T is the threshold of noise, taking the smaller variation between face angles as more noise to smooth out, and taking the bigger variation as more actual head rotation to respond to.

Expression Animation Synthesis

This stage synthesizes the complete avatar expression in terms of multiple AU weights and their corresponding AU animation assets. If the avatar facial animation is based on a blend-shape rig, the mesh of the final facial expression Bfinal is generated by the conventional blend-shape formula as follows:

Where B0 is the face mesh of a neutral expression, Bi is the face mesh with the maximum pose of the i-th AU.

If the avatar facial animation is based on a skeleton rig, the bone matrices of the final facial expression Sfinal are achieved by the following formula:

Where S0 is the bone matrices of a neutral expression, Ai(wi) is the bone matrices of the i-th AU extracted from this AU’s key-frame animation sequence Ai by this AU’s weight wi.

The sample application demonstrates the implementation of facial expression synthesis for a skeleton-rigged avatar.

Performance and Multithreading

Real-time facial tracking and animation is a CPU-intensive function. Integrating the function into the main loop of the application may significantly degrade application performance. To solve the issue, we wrap the function in a dedicated work thread. The main thread retrieves the new data from the work thread just when the data are updated. Otherwise, the main thread uses the old data to animate and render the avatar. This asynchronous integration mode minimizes the performance impact of the function to the primary tasks of the application.

Running the Sample

When the sample application launches (Figure 1), by default it first calibrates the user’s neutral expression, and then real-time mapping user performed expressions to the avatar face. Pressing the “R” key resets the system when the user wants to or a new user substitutes to control the avatar expression, which will activate a new session including calibration and retargeting.

During the calibration phase—in the first few seconds after the application launches or is reset—the user is advised to hold his or her face in a neutral expression and position his or her head so that it faces the Intel RealSense camera in the frontal-parallel view. The calibration completes when the status bar of face-tracking confidence (in the lower-left corner of the Application window) becomes active.

After calibration, the user is free to move his or her head and perform any expression to animate the avatar face. During this phase, it’s best for the user to keep an eye on the detected Intel RealSense camera landmarks, and make sure they are green and appear in the video overlay.

Summary

Face tracking is an interesting function supported by Intel® RealSense™ technology. In this article, we introduce a reference implementation of user-controlled avatar facial animation based on Intel® RealSense™ SDK, as well as the sample written in C++ and uses DirectX*. The reference implementation includes how to prepare animation assets, to parameterize user facial expression and to synthesize avatar expression animation. Our practices show that not only are the algorithms of the reference implementation essential to reproduce plausible facial animation, but also the high quality facial animation assets and appropriate user guide are important for better user experience in real application environment.

Reference

1. https://en.wikipedia.org/wiki/Facial_Action_Coding_System

2. https://www.visagetechnologies.com/uploads/2012/08/MPEG-4FBAOverview.pdf

3. https://software.intel.com/en-us/intel-realsense-sdk/download

About the Author

Sheng Guo is a senior application engineer in Intel Developer Relations Division. He has been working on top gaming ISVs with Intel client platform technologies and performance/power optimization. He has 10 years expertise on 3D graphics rendering, game engine, computer vision etc., as well as published several papers in academic conference, and some technical articles and samples in industrial websites. He hold the bachelor degree of computer software from Nanjing University of Science and Technology, and the Master’s degree in Computer Science from Nanjing University.

Wang Kai is a senior application engineer from Intel Developer Relations Division. He has been in the game industry for many years. He has professional expertise on graphics, game engine and tools development. He holds a bachelor degree from Dalian University of Technology.


NetUP Uses Intel® Media SDK to Help Bring the Rio Olympic Games to a Worldwide Audience of Millions

$
0
0

In August of 2016, half a million fans came to Rio de Janeiro to witness 17 days and nights of the Summer Olympics. At the same time, millions more people all over the world were enjoying the competition live in front of their TV screens.

Arranging a live TV broadcast to another continent is a daunting task that demands reliable equipment and agile technical support. That was the challenge for Thomson Reuters, the world’s largest multimedia news agency.

To help it meet the challenge, Thomson Reuters chose NetUP as its technical partner, using NetUP equipment for delivering live broadcasts from Rio de Janeiro to its New York and London offices. In developing the NetUP Transcoder, NetUP worked with Intel, using Intel® Media SDK, a cross-platform API for developing media applications on Windows*.

“This project was very important for us,” explained Abylay Ospan, founder of NetUP. “It demonstrates the quality and reliability of our solutions, which can be used for broadcasting global events such as the Olympics. Intel Media SDK gave us the fast transcoding we needed to help deliver the Olympics to a worldwide audience.”

Get the whole story in our new case study.

Intel® Software Guard Extensions Tutorial Series: Part 7, Refining the Enclave

$
0
0

Part 7 of the Intel® Software Guard Extensions (Intel® SGX) tutorial series revisits the enclave interface and adds a small refinement to make it simpler and more efficient. We’ll discuss how the proxy functions marshal data between unprotected memory space and the enclave, and we’ll also discuss one of the advanced features of the Enclave Definition Language (EDL) syntax.

You can find a list of all of the published tutorials in the article Introducing the Intel® Software Guard Extensions Tutorial Series.

Source code is provided with this installment of the series. With this release we have migrated the application to the 1.7 release of the Intel SGX SDK and also moved our development environment to Microsoft Visual Studio* Professional 2015.

The Proxy Functions

When building an enclave using the Intel SGX SDK you define the interface to the enclave in the EDL. The EDL specifies which functions are ECALLs (“enclave calls,” the functions that enter the enclave) and which ones are OCALLs (“outside calls,” the calls to untrusted functions from within the enclave).

When the project is built, the Edger8r tool that is included with the Intel SGX SDK parses the EDL file and generates a series of proxy functions. These proxy functions are essentially wrappers around the real functions that are prototyped in the EDL. Each ECALL and OCALL gets a pair of proxy functions: a trusted half and an untrusted half. The trusted functions go into EnclaveProject_t.h and EnclaveProjct_t.c and are included in the Autogenerated Files folder of your enclave project. The untrusted proxies go into EnclaveProject_u.h and EnclaveProject_u.c and are placed in the Autogenerated Files folder of the project that will be interfacing with your enclave.

Your program does not call the ECALL and OCALL functions directly; it calls the proxy functions. When you make an ECALL, you call the untrusted proxy function for the ECALL, which in turn calls the trusted proxy function inside the enclave. That proxy then calls the “real” ECALL and the return value propagates back to the untrusted function. This sequence is shown in Figure 1. When you make an OCALL, the sequence is reversed: you call the trusted proxy function for the OCALL, which calls an untrusted proxy function outside the enclave that, in turn, invokes the “real” OCALL.


Figure 1. Proxy functions for an ECALL.

The proxy functions are responsible for:

  • Marshaling data into and out of the enclave
  • Placing the return value of the real ECALL or OCALL in an address referenced by a pointer parameter
  • Returning the success or failure of the ECALL or OCALL itself as an sgx_status_t value

Note that this means that each ECALL or OCALL has potentially two return values. There’s the success of the ECALL or OCALL itself, meaning, were we able to successfully enter or exit the enclave, and then the return value of the function being called in the ECALL or OCALL.

The EDL syntax for the ECALL functions ve_lock() and ve_unlock() in our Tutorial Password Manager’s enclave is shown below:

enclave {
   trusted {
      public void ve_lock ();
      public int ve_unlock ([in, string] char *password);
    }
}

And here are the untrusted proxy function prototypes that are generated by the Edger8r tool:

sgx_status_t ve_lock(sgx_enclave_id_t eid);
sgx_status_t ve_unlock(sgx_enclave_id_t eid, int* retval, char* password);

Note the additional arguments that have been added to the parameter list for each function and that the functions now return a type of sgx_status_t.

Both proxy functions need the enclave identifier, which is passed in the first parameter, eid. The ve_lock() function has no parameters and does not return a value so no further changes are necessary. The ve_unlock() function, however, does both. The second argument to the proxy function is a pointer to an address that will store the return value from the real ve_unlock() function in the enclave, in this case a return value of type int. The actual function parameter, char *password, is included after that.

Data Marshaling

The untrusted portion of an application does not have access to enclave memory. It cannot read from or write to these protected memory pages. This presents some difficulties when the function parameters include pointers. OCALLs are especially problematic, because a memory allocated inside the enclave is not accessible to the OCALL, but even ECALLs can have issues. Enclave memory is mapped into the application’s memory space, so enclave pages can be adjacent to unprotected memory pages. If you pass a pointer to untrusted memory into an enclave, and then fail to do appropriate bounds checking in your enclave, you may inadvertently cross the enclave boundary when reading or writing to that memory in your ECALL.

The Intel SGX SDK’s solution to this problem is to copy the contents of data buffers into and out of enclaves, and have the ECALLs and OCALLs operate on these copies of the original memory buffer. When you pass a pointer into an enclave, you specify in the EDL whether the buffer referenced by the pointer is being pass into the call, out of the call, or in both directions, and then you specify the size of the buffer. The proxy functions generated by the Edger8r tool use this information to check that the address range does not cross the enclave boundary, copy the data into or out of the enclave as indicated, and then substitute a pointer to the copy of the buffer in place of the original pointer.

This is the slow-and-safe approach to marshaling data and pointers between unprotected memory and enclave memory. However, this approach has drawbacks that may make it undesirable in some cases:

  • It’s slow, since each memory buffer is checked and copied.
  • It requires additional heap space in your enclave to store the copies of the data buffers.
  • The EDL syntax is a little verbose.

There are also cases where you just need to pass a raw pointer into an ECALL and out to an OCALL without it ever being used inside the enclave, such as when passing a function pointer for a callback function straight through to an OCALL. In this case, there is no data buffer per se, just the pointer address itself, and the marshaling functions generated by Edger8r actually get in the way.

The Solution: user_check

Fortunately, the EDL language does support passing a raw pointer address into an ECALL or an OCALL, skipping both the boundary checks and the data buffer copy. The user_check parameter tells the Edger8r tool to pass a pointer as it is and assume that the developer has done the proper bounds checking on the address. When you specify user_check you are essentially trading safety for performance.

A pointer marked with the user_check does not have a direction (in or out) associated with it, because there is no buffer copy taking place. Mixing user_check with in or out will result in an error at compile time. Similarly, you don’t supply a count or size parameter, either.

In the Tutorial Password Manager, the most appropriate place to use the user_check parameter is in the ECALLs that load and store the encrypted password vault. While our design constraints put a practical limit on the size of the vault itself, generally speaking these sorts of bulk reads and writes benefit from allowing the enclave to directly operate on untrusted memory.

The original EDL for ve_load_vault() and ve_get_vault() looks like this:

public int ve_load_vault ([in, count=len] unsigned char *edata, uint32_t len);

public int ve_get_vault ([out, count=len] unsigned char *edata, uint32_t len);

Rewriting these to specify user_check results in the following:

public int ve_load_vault ([user_check] unsigned char *edata);

public int ve_get_vault ([user_check] unsigned char *edata, uint32_t len);

Notice that we were able to drop the len parameter from ve_load_vault(). As you might recall from Part 4, the issue we had with this function was that although the length of the vault is stored as a variable in the enclave, the proxy functions don’t have access to it. In order for the ECALL’s proxy functions to copy the incoming data buffer, we had to supply the length in the EDL so that the Edger8r tool would know the size of the buffer. With the user_check option, there is no buffer copy operation, so this problem goes away. The enclave can read directly from untrusted memory, and it can use its internal variable to determine how many bytes to read.

However, we still send the length as a parameter to ve_get_vault(). This is a safety check to ensure that we don’t accidentally overflow a buffer when fetching the encrypted vault from the enclave.

Summary

The EDL provides three options for passing pointers into an ECALL or an OCALL: in, out, and user_check. These options are summarized in Table 1.

Specifier/DirectionECALLOCALL
inThe buffer is copied from the application into the enclave. Changes will only affect the buffer inside the enclave.The buffer is copied from the enclave to the application. Changes will only affect the buffer outside the enclave.
outA buffer will be allocated inside the enclave and initialized with zeros. It will be copied to the original buffer when the ECALL exits.A buffer will be allocated outside the enclave and initialized with zeros. This untrusted buffer will be copied to the original buffer in the enclave when the OCALL exits.
in, outData is copied back and forth.Data is copied back and forth.
user_checkThe pointer is not checked. The raw address is passed.The pointer is not checked. The raw address is passed.

Table 1. Pointer specifiers and their meanings in ECALLs and OCALLs.

If you use the direction indicators, the data buffer referenced by your pointer gets copied and you must supply a count so that the Edger8r can determine how many bytes are in the buffer. If you specify user_check, the raw pointer is passed to the ECALL or OCALL unaltered.

Sample Code

The code sample for this part of the series has been updated to build against the Intel SGX SDK version 1.7 using Microsoft Visual Studio 2015. It should still work with the Intel SGX SDK version 1.6 and Visual Studio 2013, but we encourage you to update to the newer release of the Intel SGX SDK.

Coming Up Next

In Part 8 of the series, we’ll add support for power events. Stay tuned!

Building an Arcade Cabinet with Skull Canyon

$
0
0

Hi I’m Bela Messex, one half of Buddy System, a bedroom studio based in Los Angeles, and makers of the game Little Bug.

Why an Arcade Cabinet?

My co-developer and I come from worlds where DIY wasn’t a marketable aesthetic, but a natural and necessary creative path. Before we met and found ourselves in video game design we made interactive sculpture, zines, and comics. We’ve been interested in ways to blend digital games with physical interaction, and while this can take many forms, a straightforward route was to house our debut game, Little Bug, in a custom arcade cabinet. As it turns out doing so was painless, fun, and easy; and at events like Fantastic Arcade and Indiecade, it provided a unique interaction that really drew attendees.

The Plan

To start off, I rendered a design in Unity complete with Image Effects, Animations and completely unrealistic lighting… If only real life were like video games, but at least I now had a direction.

The Components

This worked for us and could be a good starting point for you, but you might want to tailor a bit to your game’s unique needs.

  • Intel NUC Skull Canyon.
  • 2 arcade joysticks.
  • 3 arcade buttons.
  • 2 generic PC joystick boards with wires included.
  • 4’ x 8’ MDF panel.
  • 24” monitor.
  • 8” LED accent light.
  • Power Strip.
  • Power Drill.
  • Nail gun and wood glue.
  • Screws of varying sizes and springs.
  • 6” piano hinge.
  • Velcro strips.
  • Zip ties.
  • Black spray paint and multicolored paint markers.
  • Semi opaque plexi.

Building the Cabinet

When I was making sculptures, I mainly welded, so I asked my friend Paul for some help measuring and cutting the MDF panels. We did this by designing our shapes on the spot with a jigsaw, pencil, and basic drafting tools. Here is Paul in his warehouse studio with the soon to be cabinet.

We attached the cut pieces with glue and a nail gun, but you could use screws if you need a little more strength. Notice the hinge in the front - this was Paul’s idea and ended up being a life saver later on when I needed to install buttons and joysticks. Next to the paint can is a foot pedal we made specific for Little Bug’s unique controls: two joysticks and a button used simultaneously. On a gamepad this dual stick setup is no problem but translated to two full sized arcade joysticks both hands would be occupied, so how do you press that button? Solution: use your foot!

After painting the completed frame, it was time for the fun part - installing electronics. I used a cheap ($15) kit that include six buttons, a joystick, a USB controller board and all the wiring. After hundreds of plays, it’s all still working great. Notice the LED above the screen to light up the marquee for a classic arcade feel.

Once the NUC was installed in the back via velcro strips, I synced the buttons and joysticks inside the Unity inspector and created a new build specifically designed for the cabinet. Little Bug features hand drawn sprites, so we drew on all of the exterior designs with paint markers to keep that look coherent. The Marquee was made by stenciling painter’s tape with spray paint.

The Joy of Arcade

There is really nothing like watching players interact with a game you’ve made. Even though Little Bug itself is the same, the interaction is now fundamentally different, and as game designers it has been mesmerizing to watch people play it in this new way. The compact size and performance of the NUC was perfect for creating experiences like this, and it’s worked so well I’m already drawing up plans for more games in the same vein.

SOME METHODOLOGIES TO OPTIMIZE YOUR VR APPLICATIONS POWER ON INTEL® PLATFORM

$
0
0

As VR becomes a popular consumer product, more and more VR contents come out. From recent investment, lots of users love VR devices without wires, like AIO devices or Mobile devices. For these devices, it is not charging during playing so developers need to take special care of application power.

For details, please see the attachments.

CSharp Application with Intel Software Guard Extension

$
0
0

C# Application with Intel Software Guard Extension

Enclaves must be 100 percent native code and the enclave bridge functions must be 100 percent native code with C (and not C++) linkages, it is possible, indirectly, to make an ECALL into an enclave from .NET and to make an OCALL from an enclave into a .NET object.

Mixing Managed Code and Native Code with C++/CLI

Microsoft Visual Studio* 2005 and later offers three options for calling unmanaged code from managed code:

  • Platform Invocation Services, commonly referred to by developers as P/Invoke:
    • P/Invoke is good for calling simple C functions in a DLL, which makes it a reasonable choice for interfacing with enclaves, but writing P/Invoke wrappers and marshaling data can be difficult and error-prone.
  • COM:
    • COM is more flexible than P/Invoke, but it is also more complicated; that additional complexity is unnecessary for interfacing with the C bridge functions required by enclaves
  • C++/CLI:
    • C++/CLI offers significant convenience by allowing the developer to mix managed and unmanaged code in the same module, creating a mixed-mode assembly which can in turn be linked to modules comprised entirely of either managed or native code.
    • Data marshaling in C++/CLI is also fairly easy: for simple data types it is done automatically through direct assignment, and helper methods are provided for more complex types such as arrays and strings.
    • Data marshaling is, in fact, so painless in C++/CLI that developers often refer to the programming model as IJW (an acronym for “it just works”).
    • The trade-off for this convenience is that there can be a small performance penalty due to the extra layer of functions, and it does require that you produce an additional DLL when interfacing with Intel SGX enclaves.

Please find the detailed information in the PDF and also i have shared sample code.

Overview of Intel Protected File System Library Using Software Guard Extensions

$
0
0

Intel® Protected File System Library

  • A new feature called Intel Protection File System Library is introduced in the Intel SGX 1.7 Release. This Library is used to create, operate and delete files inside the enclave.  To make use of the Intel Protected File system Library we need the following requirements:
    • Visual Studio 2015,
    • Intel SGX SDK version 1.7 
    • Intel SGX  PSW version 1.7
  • The above requirements are essential for implementing Intel SGX Protected File System. In this document we will discuss regarding the architecture, API’s, Implementation and Limitations of Intel Protected File System Library.

Overview of Intel® Protected File System Library:

  • Intel® Protected File System Library provides protected files API for Intel® SGX enclaves. It supports a basic subset of the regular C file API and enables you to create files and work with them as you would normally do from a regular application.
  • We have 15 file operation functions API’s provided by Intel SGX. These API work almost the same as the regular C file API.
  • With this API, the files are encrypted and saved on the untrusted disk during a write operation, and they are verified for confidentiality and integrity during a read operation.
  • To encrypt a file, you should provide a file encryption key. This key is a 128 bits key, and is used as a key derivation key, to generate multiple encryption keys.
  • The key derivation key used as an input to one of the key derivation functions is called a key derivation key, can be generated by an approved cryptographic random bit generator, or by an approved automated key establishment process. Another option is to use automatic keys derived from the enclave sealing key.
  • This way we can keep our files secure and safe inside the Enclave. Since our files are encrypted and stored they are safe and secure inside the enclave.

Intel Protected File System API:

The Intel Protected File System Library provides the following functionalities.

  • sgx_fopen
  • sgx_fopen_auto_key
  • sgx_fclose
  • sgx_fread
  • sgx_fwrite
  • sgx_fflush
  • sgx_ftell
  • sgx_fseek
  • sgx_feof
  • sgx_ferror
  • sgx_clearerr
  • sgx_remove
  • sgx_fexport_auto_key
  • sgx_fimport_auto_key
  • sgx_fclear_cache

The above mentioned API’s are present in the SGX Protected FS trusted library. And these can be called only within the trusted enclave code which makes our files secure. 

Limitation of Protected File System

  1.  Protected Files have meta-data embedded in them, only one file handle can be opened for writing at a time, or many file handles for reading.
  2. Operating System protection mechanism is used for protecting against accidentally opening more than one ‘write’ file handle. If this protection is bypassed, the file will get corrupted.
  3. An open file handle can be used by many threads inside the same enclave, the APIs include internal locks for handling this and the operations will be executed by one.

Please find the detailed information in the PDF and also i have shared sample code for Intel Protected File System Library using SGX.

How to install Windows 10 IoT Core on Intel Joule

$
0
0

During the last Intel® IDF in San Francisco, the Intel Joule board was presented supporting 3 different OS BSP: Ostro, Ubuntu and Windows 10 IoT Core.

For the first two OS the images was published at IDF time. For Windows 10 IoT Core the public image and support has published in mid October.

As all the other Windows 10 IoT Core images for other supported boards, the distribution is located in an unique site:

www.WindowsOnDevices.com .

I'll try to graphically describe the step by step procedure that Microsoft publish to prepare and flash the board.

I’m writing this because I hear some concern from some persons to follow and understand all the spets involving this procedure, for this reason I’m creating a full “coloured book” of the installation procedure.

The previous link show the following page:

Capture001

Clicking on “Get Started” reach the page:

Capture002

Selecting the Joule icon you’ll see the Step 2 and click next.

Capture003

You’ll now reach the page with the full list of operation to do specifically for Joule, that I’m actually trying to explain. (This is the page direct link )

 

Capture004

Now the step will not essentially be very fast to follow, in some case we could have to install something before approach the next step.

 

 

You’ll need a PC with WIndows10 (version 1607 o later) installed.

Open a CMD shell from start button and type the command:

winver as in picture:

Capture005

On this PC you have to install the Windows Assessment and Deployment Kit

In my case I have to install the 1607 version as the previous winver command tell me.

Capture006

When the setup of WADK is done we can continue with the next steps, first of all restart the PC.

Note that the previous installation create in your Windows installation some new folder under “Program FIles” and specifically (in case of standard installation) “Program Files (x86)\Windows Kits\10\”.

Following the instruction in this article, create a bootable WinPE USB key referring to the amd64 version (the joule install a x64 version of Windows10).

Then from the Intel Download center get the zip file with a couple of important script we’ll need later.

Capture007

Save the JouleInstaller.zip file locally on your PC and unpack it. Then copy the JouleInstaller.cmd file in the root of the WinPE USB key.

Now it’s time to download the image of Windows10 IoT Core for our Joule from this link. Obviously the current version could have a release version greather than the one in figure (Microsoft release the insider preview images also for IoT Core)

Once selected the version click “Confirm”, in the next menu select that you want downlad the Joule distribution and click “Confirm” again.

Capture009

The page will generate a temporary download link for your selection. CLick on “Download Now” and save the ISO file locally on your PC.

Capture010

When the download has done, double click on the ISO file in order to mount and open it.

Capture011

When the virtual DVD unit will open double click on the “Windows 10 IoT Core for Broxton” file. When the setup has done, under “\Program Files (x86)\Microsoft IoT\FFU\Broxton” folder you’ll find the Flash.FFU file representing the real OS image.

Capture012

It’s now time to copy this file in the WinPE USB key, we’re almost ready to get our Joule board and start working on it.

Let’s get you Intel Joule board, aand the following:

  •  
  1. -12V, 3A power adapter (power adapter must be Listed LPS or Class 2 output rated, 12Vdc, 3A minimum) with barrel jack for powering the expansion board. The barrel dimensions are 5.5mm outer diameter, and 2.1mm inner diameter. The barrel length is ~9-10mm. The plug is positive center.
  2. -Micro HDMI cable and monitor
  3. -POWERED USB 3.0 hub. The development platform has only one Type A USB port. The Type B and C ports should not be used unless directed.
  4. -USB keyboard and mouse.
  • All these are needed to boot the board, run into the BIOS, change some parameters and then boot into Windows CLI interface in order to flash the OS image inside the board.

 

First of all, we’ve to check which firmware version is installed on our board and eventually update it from the Intel Software page.

Considering the number of board actually distributed, we can easily consider to have an updated version of it and skip to the next step.

 

Connect, the USB hub with keyboard and mouse, insert the hdmi cable, plug the power adaptor and start the board.

When the board boot-up press the F2 button on the keyboard to enter the BIOS (if you miss it, restart the board and retry).

You probably see a screen like the following:

WP_20161012_22_57_17_Rich

Using the cursor keys scroll to “Device Manager” and press Enter,

WP_20161012_22_58_00_Rich

From the following menu choose, System Setup and press Enter,

WP_20161012_22_58_16_Rich

In the next screen select Boot

WP_20161012_22_58_33_Rich

From the OS Selection option press enter and choose between the option  “Windows” and confirm with enter.

WP_20161012_23_04_56_Rich

Pressing F4 you’ll be asked to save the changes and reboot the board.

WP_20161012_23_05_06_Rich

Remember to put the WinPE USB Key in the USB Hub before reboot the board (the system will “see” the boot key at next start).

At following boot, press F2 again to enter BIOS, select in sequence: “Boot Manager” and in the next screen EFI USB Device. Pressing Enter the board will reboot.

Let the board boot from the USB Key.

The boot will stop in a Command prompt window, there you have to type “C:” to change the active unit where is mapped the USB key (usually C: or D: )

From there type the command: “JouleInstaller.cmd” and Enter. This script will flash the OS in the internal eMMC (the operation will show a percentage advancing till the end of the operation)

WP_20161012_23_35_58_Rich

At the end of the flashing operation th prompt will return to the original drive.

Type “WPEUTIL reboot” to restart the board.

The next boot sequence provide the first configuration for the system.

WP_20161013_00_17_29_Rich

WP_20161013_00_18_01_Rich

WP_20161013_00_18_02_Rich

WP_20161013_00_18_18_Rich

OK, now you are running Windows 10 IoT Core on your Intel Joule!

 


Getting Started with Graphics Performance Analyzer - Platform Analyzer

$
0
0

Prerequisite

You are recommended to get familiar how to use Graphics Performance Analyzer - System Analyzer first. System Analyzer is the central interface to record the trace logs for Platform Analyzer. You can refer this link to getting started with System Analyzer. Graphics Performance Analyzer is part of Intel System Studio.

Introduction

Platform Analyzer provides offline performance analysis capability† by reviewing the tasks timelines among real hardware engine, software queue for GPU scheduler and background threads activities. With this utility, developers can quickly review which GPU engines (Render, Video Enhance - Video Processing or Video Codec) resource are involved in the target test application. For example, the trace result can reveal whether the Gen graphics HW codec acceleration is used or not. Platform Analyzer can also provide a rough clue regarding performance issue by reviewing tasks timelines.

Take a quick look about Platform Analyzer below. Top window shows when graphics related tasks occurs on hardware engines, GPU software queue and threads timelines. Bottom windows contains overall time cost for each type task and HW engines usage utilization.

†Platform analyzer is part of GPA (Graphics Performance Analyzer) which focuses more on graphics application optimization usage, mainly on Windows DirectX and Android/Linux OpenGL apps. Check the GPA product page to get the quick overview of GPA.

A quick start to capture the trace logs

Follow the normal steps to start analyzing the target application. In windows, it allows to using a hotkeys combination to capture the platform analyzer trace logs directly.

  1. Right click the “GPA monitors” system icon and choose “Analyze Application” to lunch target application.
  2. Press Shift + Ctrl + T to capture the trace logs
  3. Once you see one overlay indicates capture completed as the figure below, the trace logs have been successfully captured.

 

Find the pattern: what symptom will the performance issues show in Platform Analyzer?

First of all, you might want to understand a little bit about VSYNC and Present function call in Platform Analyzer. Vsync indicates a hardware signal right after that system will output one frame to display device. The interval between two Vsync signals implies the refresh rate of display device. As for Present API, it’s used to indicate the operation to copy/move one frame to another memory destination of frame.

With the background knowledge above, you may watch and investigate these symptoms indicated by Platform Analyzer.

  • Irregular VSYNC pattern. (VSYNC intervals should be consistent otherwise screen blinking or flashing symptom may reveal during the test.)
  • Long delay (big packet) in GPU engine. (this packet could be the Present call, if the packet size/length in timeline cross multiple VSYNCs, it can cause display frames stuck)
  • Overloading - multiple heavy stacked packets in software queue, like the figure below. (Stacked packets means several tasks are schedule in the same time, the packet should be dropped or delayed to process if GPU cannot handle several tasks in timely manner. It also causes display frames stuck)

Further information

Intel System Studio includes three components of GPA, System Analyzer, Platform Analyzer and Frame Analyzer. System Analyzer provides an interface to record the logs what Platform Analyzer and Frame Analyzer need for offline analysis.

GPA Analyzers Name

Functionalities

System Analyzer

Provide real-time system & app analysis. Central interface to record logs for other GPA analyzers. More information.

Platform Analyzer

Provide GPU engines and threads activities interactions analysis. Present the captured log by showing all graphics hardware engines workloads (including decoding, video processing, rendering and computing) and threads activities in a timeline view. More information.

Frame Analyzer

Provide offline single frame rendering analysis. Reconstruct the frame by replaying DirectX/OpenGL APIs logged by System Analyzer. More information

 

See also

Intel® Graphics Performance Analyzers (Intel® GPA) Documentation

Register and Download Intel System Studio Windows Professional Edition.

Intel® Unite™ Solution: Introduction and Using Plugins

$
0
0

Contents

Introduction

When you start a meeting in a conference room, do you ever have the following problems?

  • You cannot find a cable to connect to your laptop and the display equipment.
  • Your laptop does not have an available interface.
  • The current display cable isn’t compatible with your computer.
  • Only one person at a time can present.

The Intel® Unite™ solution is an easy-to-use collaboration solution that can help you solve these problems in a traditional meeting room. Using the Intel Unite solution, you can quickly connect to the display in a conference room through a wireless connection. No cables, connectors, or adapters are needed. Both an in-room connection and a remote connection are supported. Multiple attendees can present the screen at the same time and easily annotate and transfer files.

Features

The Intel Unite solution has the following features:

  • Fast deployment and fast connection
  • No cables needed
  • Ability to share displays
    • In-room and remote viewing
    • Application sharing
    • Extended display
    • No cable needed for the client side
  • Peer-to-peer sharing
    • An Intel vPro technology-based client can share content with other clients; no hub is needed
  • Annotation capabilities for the presenter and other attendees
  • Split-screen view
    • Up to four users can share content per room display
  • File transfer
  • Support for plugin extensions

Deployment Options and Components

There are two deployment options for the Intel Unite solution.

Enterprise Movement

Option A:Enterprise environment (applicable to enterprise environments that have multiple working and network environments)

Components

Server. The enterprise web server that runs the PIN service and provides an admin portal for admin to configure and download pages for the client application.

Hub. The specified Mini PC equipped with Intel vPro technology. This PC runs the Intel Unite solution hub application and connects to the display in the conference room. Any client can connect to the hub PC, then start their presentations or view content from the hub display.

Client. The user’s device that will connect to the hub. The client can be a Windows* or Mac*device. For client devices, Intel vPro technology is not mandatory. A mobile device is also supported. An Android* client application will be provided in the future. The end user controls the hub through the client application.

Plugin. The software components installed on the hub (for example, Skype for Business plugin, Guess Access plugin). The plugin can enrich the functionality and extend user experience of the Intel Unite solution. 

Option B:Standalone mode (applicable to a single business network environment)

Components

Hub. The specified Mini PC equipped with Intel vPro technology. This PC runs an Intel Unite solution hub application that displays a PIN and hosting plugins. The hub is connected to a display in the conference room.

Client. The user’s device that will connect to the hub; it could be Windows or Mac. Intel vPro technology is not mandatory for the client device. A mobile device is also supported. An Android client application will be provided in the future. The end user controls the hub through the client application.

Plugin. The software components installed on the hub (for example, Skype for Business plugin, Guest Access plugin). The plugin enriches the functionality and extends the user experience of the Intel Unite solution.

Applications for the Intel® Unite™ Solution

A group of applications needs to be installed on each component:

Application that Needs to Be InstalledComponentsRemarks
Intel® Unite™ application on the serverServerOnly required for the enterprise environment. - Intel provides this application.
Intel® Unite™ application on the hubHubIt is the core component of Intel Unite solution. - Intel provides this application.
Intel® Unite™ client applicationClientIt is installed on the end-user device - Intel provides this application.
Intel® Unite™ plugin (for example, Intel Unite plugin for Skype for Business), it could be an installation program or just a .dll file (named xxxPlugin.dll)PluginThe plugin is software and must be installed on the hub computer. When the client connects to the hub, the available plugins will be displayed on the client side. - Intel or a third party provides the plugin.

Hardware and software requirements

You’ll need to meet the following requirements before you start to use the Intel Unite solution.

Server

  • Microsoft Windows Server* 2008 or greater
  • Microsoft Internet Information Services* with SSL enabled - This will require a trusted web server certificate with an internal or public root of trust
  • Microsoft SQL Server* 2008 R2 or greater
  • Microsoft .NET* 4.5 or greater
  • 4 GB RAM
  • 32 GB available storage

Hub

To see the list of supported devices, go to: http://www.intel.com/buy/us/en/audience/unite

  • Microsoft Windows 7, 8, 8.1, or 10
  • Recommended latest patch level
  • Microsoft .NET 4.5 or greater
  • A Mini PC based on the 4th generation Intel® Core™ vPro™ processor or newer (Note: The hub must be an Intel vPro technology-based computer, but not all Intel vPro technology-based computers can be used as the hub. Only an authorized Mini PC can run the hub application.)
  • 4 GB RAM
  • 32 GB available storage

Client

  • Microsoft Windows 7, 8, 8.1, or 10
  • Microsoft .NET 4.5 or greater
  • OS X* 10.10.5 or greater
  • iOS* 9.3 or higher
  • Wired or wireless network connection
  • 1 GB RAM
  • 1 GB available storage

NOTE: A client application for Android will be available in 2017.

Below is the basic hardware combination of the Intel Unite solution.

  • Hub computer attached with display, audio, mic-phone, or other devices.
  • Client device (laptop or mobile device)

Intel Unite Solution 

Where to get software components

The Intel® Unite™ client software can be downloaded for free here:

https://downloadcenter.intel.com/download/25280/Intel-Unite-app

For other installations (server, hub, plugins, and so on) and related documents, go to: http://msp.intel.com/unite

Running the Intel Unite solution on an authorized PC does not require a license fee. Trying to run the hub on an unauthorized PC will fail. For software download and installation instructions, please contact Intel Unite solution integrator or technical support.

Getting started

This section discusses how to set up Intel Unite solution 3.0 in standalone mode and how to install and use the plugins. Some common issues will also be discussed. We will install two plugins: Skype for Business and Guest Access. To set up using enterprise mode, please refer to the Intel Unite Solution Enterprise Deployment Instructions, which can be downloaded from http://msp.intel.com/unite.

First, we need to prepare the installation package.

Intel Unite Plugin

You should install the hub application first, and then install the client application and plugins.

Install the Intel® Unite™ Application on hub

  1. Install the hub application on an authorized Intel vPro technology-based Mini PC. For a list of supported PCs, go to http://www.intel.com/buy/us/en/audience/unite.
  2. Select Standalone mode during the installation process.

    Standalone Option
  3. At the next several screens, click Next until the installation is finished. You will see two exe files on the desktop:

    Intel Unite exe files
  4. Launch IntelUnite.exe (it is the default and located in C:\Program Files (x86)\Intel\Intel Unite\Hub).
    • Enter the shared key on this hub. The shared key is used to identify the hub, and the client application should use the same shared key with the hub to communicate with it.

      Intel Unite Setup
  5. If your PC is not an authorized machine (even if it’s an Intel vPro technology-based computer), when you try to launch the application, you’ll get an error message like the one below:

    Intel Unite error message
  6. After setting up the shared key, you’ll see the following screen.

    Intel Unite configure screen
     

    Selecting Yes means that once you start this computer, it will log on automatically and start the Intel Unite application in full-screen mode. At this time it is inconvenient to check other applications or windows. Selecting No means you must open the Intel Unite application manually. If you are just testing or studying the application or need to access other applications, it is recommended that you choose No. If your computer is ready to be used as an Intel Unite appliance, choose Yes. Once you choose Yes, the following changes will be made to your system:

    1. A new non-administrative user will be created:
      User name: UniteUser
      Password: P@ssw0rd
    2. The computer will be set to automatically log in as UniteUser and start the Intel Unite application when the system boots.
    3. A Windows firewall exception will be added to allow the Intel Unite application.
    4. The power settings will be set to Always On
      Notice: This can be undone by using the Intel Unite Settings application or uninstalling the Intel® Unite™ software.
  7. After the configuration is finished, launch IntelUnite.exe; it will display in full-screen mode.

    Intel Unite Download Client
  8. Change the hub settings: Open “Intel Unite Setting.exe” and then make changes according to each item’s prompt information.

    Intel Unite hub settings
  9. The hub setting is finished. Next connect a big display (such as a monitor or TV) to the hub machine and connect it to your enterprise network.
  10. Press ALT+F4 to exit full-screen mode and close the application.

Install the Intel® Unite™ Client Application

  1. Install the Intel® Unite™ client application “Intel Unite Client.mui.msi” on any windows PC (OS X or iOS is supported; we only discuss the Windows client here).
  2. The client device should connect into a same subnet with the hub in standalone mode.
  3. When launching for the first time, you should fill in the shared key, which is from the hub side.

    Intel® Unite™ shared key
  4. After entering the shared key, you’ll launch the main menu of the Intel Unite client.

    Intel® Unite™ Main Menu
  5. To join a session, enter the PIN displayed on the hub screen.

    Note: if your client fails to connect to the hub, please do the following:

    1. Make sure the client uses the same shared key as the hub.
    2. Verify the PIN.
    3. Make sure the hub and client are in the same subnet in standalone mode; otherwise the connection will fail.
    4. Check the Windows firewall policy; please add Intel Unite into your allowed app list. For detailed steps, please refer to “Intel_Unite_Firewall_Help_Guide.pdf”. This document is can be applied and downloaded from http://msp.intel.com/unite.
  6. You’re ready to start using the Intel Unite solution on your client.
    • Quick connection for new attendees.
    • Present content or view content from the hub display.
    • Annotation capabilities by both presenters and attendees.
    • Up to four users can present their content on the hub display.
    • File transfer capabilities.
  7. You’re ready to start using the Intel Unite solution on your client.
    • Quick connection for new attendees.
    • Present content or view content from the hub display.
    • Annotation capabilities by both presenters and attendees.
    • Up to four users can present their content on the hub display.
    • File transfer capabilities.

Install the Skype* for Business Plugin

The Skype for Business plugin is installed and deployed on the hub side. It allows people on a Skype online meeting to join an Intel Unite solution session; it also allows attendees in the Intel Unite solution-enabled meeting room to join and control the Skype online meeting. The Skype online communications are operated by the Intel Unite client application, but actually the meeting is implemented on the hub computer. The Intel Unite client does not need to install this plugin or the Skype for Business application.

Before installing the Skype for Business plugin, please make sure the software below is installed on the hub side:

  • Intel Unite application hub installed and configured
  • Microsoft Exchange 2010 or greater
  • Skype for Business installed and logged in

If you don’t have a Microsoft Exchange and Skype for Business account, it is recommended that you purchase Office 365* series software: https://products.office.com/en-US/business/compare-office-365-for-business-plans?legRedir=true&CorrelationId=a9d3fc01-6514-425e-91d9-3533b2a597f1

Office 365*

Double-click the installation package “Intel Unite Plugin for Skype for Business Installer.mui.msi”, and then configure the exchange server:

Exchange Server Configuration

The address in this snapshot is the exchange web server address for Office 365. And the username and password are the Office 365 username and password.

If you are not using Office 365, please find out your exchange server information by doing the following:

  1. Launch Outlook. (Note: Outlook is not required to run on the hub; you can run this on any machine.)
  2. Press and hold the Ctrl key and right-click the Outlook system tray icon.
  3. You will see two new options in the context menu: Connection Status and Test E-mail Auto Configuration.
  4. Click Test E-mail Auto Configuration and then test to check the email server configuration.
  5. In the Results tab note the OOF URL to use as the server URL for the plugin (for example, https://exchange.domain.com/EWS/Exchange.aspx).
  6. To check whether the exchange information is correct, click Test Connection. It must pass the test before it can go to the next step.
  7. Click Next until the installation is finished.
  8. Before using this plugin, please set the Skype for Business to automatically launch and sign in. Otherwise once the hub computer is restarted, Skype for Business will not launch and the plugin cannot work.
  9. Begin to use Skype for Business plugin.
    The usage scenario is:
    • The Intel Unite Hub is deployed in an Intel meeting room with the Skype for Business plugin installed. The hub computer has an Outlook* email account and the Skype for Business account is already signed up on this computer. Ten Intel employees will attend the meeting in this room; they have installed the Intel Unite client application but do not have to have a Skype for Business account.
    • User A is an Intel employee but is out of the Intel office and connects to the Intel network through VPN. User A has a Skype for Business account.
    • User B is an external user at a location outside of Intel; user B has a Skype for Business account.
    • User A, user B, and those in the Intel Unite application hub meeting room (10 in-room participants) will have a meeting online. As user B is an external user, he or she cannot connect to the Unite meeting room directly. In this case, the Skype for Business plugin can help the Intel Unite application hub join the Skype for Business online meeting and connect user B to the meeting, too.

    Join Meeting

    Schedule and join a meeting:

    • Anyone who has a Skype for Business account can schedule the meeting in Microsoft Outlook. But the hub must be invited, and then the plugin can accept or decline this meeting request. Please refer to the figure below before sending out the meeting invitation:

      Join Skype* Meeting
    • If the hub’s calendar doesn’t have conflicts, the plugin will accept this invitation and display the meeting information on the bottom status line 10 minutes in advance of the meeting.
    • Before the meeting starts, the participants in the meeting room open their Intel Unite client application. Anyone can click on the Skype for Business plugin icon. The Join button will be available 10 minutes before the meeting start time. User A can also connect to the meeting room and open the Intel Unite client application to join the meeting.

      Ready to Present
    • Clicking the Join button will display a toast message on the upper-right corner of the hub display indicating that the room is joining the Skype online meeting, and within 5 seconds the Skype for Business window should be full screen and in front. The meeting room joined the Skype for Business online meeting successfully.
    • User A and user B can join the Skype online meeting by clicking the Join Skype Meeting link in the email body. Although user A can connect to the Intel Unite application hub, after the hub joins the Skype online meeting, the hub’s screen will display Skype’s online meeting content, and the hub’s screen cannot be pushed to the client local screen. Since user A is not in the meeting room, he or she must join Skype online meeting to view the Skype meeting content. User B just follows the Skype online meeting instructions.

    Operations in the meeting

    • Share content
      The Intel Unite client in the meeting room clicks Start Presentation or Present Application. The hub computer will display the shared content (client to hub through the Intel Unite technology), and after several seconds users A and B will display this content in the Skype online meeting window (Hub → User A and B through Skype collaboration server).

      See below the different screens that display for in-room attendee’s personal screen, user B’s screen, and the hub’s attached display:

      Intel® Unite™ Solution Overview

      Intel® Unite™ Solution Overview

      Intel® Unite™ Solution Overview

      User B’s sharing will be presented on the hub’s screen, but in-room attendees can view it through the display attached to the hub.
    • Video and audio chat
      If the Skype for Business-enabled camera and mic-phone and audio devices are attached to the hub computer, the users in the meeting room can chat and talk to users A and B through Skype for Business.

      video and audio chat
    • Leave the meeting.
      Click the Skype for Business plugin icon, and then select Leave or another choice. These operations affect the hub computer.

      Skype* Plugin Filly

Install Protected Guest Access Plugin

You can use the Protected Guest Access plugin to add a guest to the Intel Unite solution session when he or she cannot connect to your enterprise network. This plugin allows the guest client to connect to the hub directly without connecting to the enterprise network. The hub will create an ad hoc access point to this guest, and the guest device can download, install the Intel Unite client application, and connect to hub successfully. This plugin is available from the Intel Unite solution 3.0 version and installs and deploys on the hub side.

Please make sure the Intel Unite application is deployed on the hub computer before you start to install “Protected Guest Access” plugin.

  1. Double click the installation package “Intel Unite Plugin for Protected Guest Access.mui.msi” on hub computer
  2. Click Next until the installation is finished.
  3. Open UniteSetting.exe on the hub machine. If Verify Certificates on Plugins is set to Yes, please make sure the Guest Access plugin is trusted.

    plugin settings
  4. Start Unite.exe on the hub machine.
  5. Open an Intel Unite client application, connect to the hub, then you will see the Guest Access icon.

    Guest Access Button
  6. Click the Guest Access button, and then Start Guest Access.

    The hub will display the guest access information at the bottom of the status line.

    pin and shared key
  7. On the guest computer, search SSID: IntelUnite_82M and connect it. Then type http://192.168.173.1/guest and follow the instructions to download the Intel Unite client application.
  8. Start the Intel Unite client application on the guest computer, and then type the PIN displayed on the hub to join the Intel Unite solution session.

By using Intel® Unite™ plugins, ISVs can extend their reach into the conference room and provide rich functions combined with the meeting room. With plugins integrated, the Intel Unite solution is more powerful and brings a user better experience.

Troubleshooting and Q&As

  1. “ID666666” error when launching on an Intel vPro technology-based PC.
    Solution: The PC you are using is not an authorized PC that can act as an Intel Unite application hub.
  2. Client connects to the hub successfully, but when the client tries to presents, the desktop has a black screen.
    Solution: Please update your display driver to the latest version.
  3. If the hub connects to two monitors, can the client present to both of them? Can the client present to a specified monitor?
    Solution: When the hub in is duplicate mode, the client can present to both monitors and has no display choice. When the hub is in extend mode, the client can present to one of the two monitors, and the user can choose which monitor to present to.
  4. If there is one hub in meeting room 1 and another hub in meeting room 2, can both hubs display the same content?
    Solution: As of Intel Unite solution version 3.0, this function is not supported. This feature will be supported in the Intel Unite solution version 4.0 in 2017.
  5. The Skype for Business plugin icon does not appear.
    Solution:
    1. Open the Hub Setting.exe - Plugins tab and set Verify the certificates in plugins to No.
    2. Ensure Skype for Business is installed and logged in before the Intel Unite application starts on the hub.
    3. Uninstall/Reinstall the plugin, and use the Test Connection button to ensure you have the correct settings.
  6. The Skype for Business plugin icon appears, but the Join icon does not appear since there is an available meeting at that time.
    Solution:
    1. Verify that the hub is invited in the meeting invitation. If it isn’t, the Join icon will not appear.
    2. Verify that the hub has accepted this invitation. If it didn’t, the Join icon will not appear.
    3. Verify that the Join Skype Meeting link is included in the email message. If it isn’t, the Join icon will not appear.
  7. The Skype for Business plugin Join icon appears, but fails to join the Skype meeting.

    Solution: On the hub computer, manually click the Join Skype Meeting link in the email message to verify whether Internet Explorer* can be opened and connect to the meeting successfully.
  8. After clicking the Guest Access plugin icon, it always says, “There was an error starting Guest Access”.

    Solution: Please do the following:
    1. Disable the firewall to check whether the firewall is preventing the plugin from talking on the local host. If yes, set this program to “Allowed”.
    2. Check whether the Intel Unite Guest Access service is started on the hub. If No, restart this service, and then retry.
  9. Is the Intel Unite solution free?
    There are no software licensing fees from Intel for the Intel Unite solution.
    The software is only available when purchasing authorized Mini PCs from Intel or OEM partners. But there may be a fee if you get the software from a third-party solution integrator.
  10. Does the Intel Unite solution support Linux*?
    No.
  11. How do I check the logs if I encounter an issue?
    1. In a command line, run “regedit”.
    2. Set regkey HKCU\Software\Intel\Unite\LogFile=‘c:\temp\log.txt’
    3. Run “Intel Unite.exe /debug” for popup debug window.
    4. Check logs in ‘c:\temp\log.txt’.
  12. Is Intel vPro technology mandatory to support the Intel Unite solution? How can I add the Intel Unite solution into an OEM product?

    Yes, Intel vPro technology is mandatory. The Intel Unite solution has hardware, software, productivity, and form factor requirements. Please contact Intel support to get further information.
  13. How many Intel Unite solution-enabled meeting rooms are there at Intel? How do I find these rooms?
    Please check for available rooms at: http://fmsitf01eot01.amr.corp.intel.com/uniterooms

References

  1. Intel Unite overview: http://intel.com/unite
  2. Intel Unite applications and plugin SDK: http://www.intel.com/content/www/us/en/support/software/software-applications/intel-unite-app.html
  3. Intel Unite solution community: https://soco.intel.com/groups/it-unite-info

About the Author

Qian, Caihong is application engineer in the Client Computing Enabling Team, Developer Relations Division, Software and Solutions Group. She is responsible for client business enabling such as security technology and Intel Unite solution technology.

Intel® Software Guard Extensions Tutorial Series: Part 6, Dual Code Paths

$
0
0

In Part 6 of the Intel® Software Guard Extensions (Intel® SGX) tutorial series, we set aside the enclave to address an outstanding design requirement that was laid out in Part 2, Application Design: provide support for dual code paths. We want to make sure our Tutorial Password Manager will function on hosts both with and without Intel SGX capability. Much of the content in this part comes from the article, Properly Detecting Intel® Software Guard Extensions in Your Applications.

You can find the list of all of the published tutorials in the article Introducing the Intel® Software Guard Extensions Tutorial Series.

There is source code provided with this installment of the series.

All Intel® Software Guard Extensions Applications Need Dual Code Paths

First it’s important to point out that all Intel SGX applications must have dual code paths. Even if an application is written so that it should only execute if Intel SGX is available and enabled, a fallback code path must exist so that you can present a meaningful error message to the user and then exit gracefully.

In short, an application should never crash or fail to launch solely because the platform does not support Intel SGX.

Scoping the Problem

In Part 5 of the series we completed our first version of our application enclave and tested it by hardcoding the enclave support to be on. That was done by setting the _supports_sgx flag in PasswordCoreNative.cpp.

PasswordManagerCoreNative::PasswordManagerCoreNative(void)
{
	_supports_sgx= 1;
	adsize= 0;
	accountdata= NULL;
	timer = NULL;
}

Obviously, we can’t leave this on by default. The convention for feature detection is that features are off by default and turned on if they are detected. So our first step is to undo this change and set the flag back to 0, effectively disabling the Intel SGX code path.

PasswordManagerCoreNative::PasswordManagerCoreNative(void)
{
	_supports_sgx= 0;
	adsize= 0;
	accountdata= NULL;
	timer = NULL;
}

However, before we get into the feature detection procedure, we’ll give the console application that runs our test suite, CLI Test App, a quick functional test by executing it on an older system that does not have the Intel SGX feature. With this flag set to zero, the application will not choose the Intel SGX code path and thus should run normally.

Here’s the output from a 4th generation Intel® Core™ i7 processor-based laptop, running Microsoft Windows* 8.1, 64-bit. This system does not support Intel SGX.

CLI Test App

What Happened?

Clearly we have a problem even when the Intel SGX code path is explicitly disabled in the software. This application, as written, cannot execute on a system without Intel SGX support. It didn’t even start executing. So what’s going on?

The clue in this case comes from the error message in the console window:

System.IO.FileNotFoundException: Could not load file or assembly ‘PasswordManagerCore.dll’ or one of its dependencies. The specified file could not be found.

Let’s take a look at PasswordManagerCore.dll and its dependencies:

Additional Dependencies

In addition to the core OS libraries, we have dependencies on bcrypt.lib and EnclaveBridge.lib, which will require bcrypt.dll and EnclaveBridge.dll at runtime. Since bcrypt.dll comes from Microsoft and is included in the OS, we can reasonably assume its dependencies, if any, are already installed. That leaves EnclaveBridge.dll.

Examining its dependencies, we see the following:

Additional Dependencies

This is the problem. Even though we have the Intel SGX code path explicitly disabled, EnclaveBridge.dll still has references to the Intel SGX runtime libraries. All symbols in an object module must be resolved as soon as it is loaded. It doesn’t matter if we disable the Intel SGX code path: undefined symbols are still present in the DLL. When PasswordManagerCore.dll loads, it resolves its undefined symbols by loading bcrypt.dll and EnclaveBridge.dll, the latter of which, in turn, attempts to resolve its undefined symbols by loading sgx_urts.dll and sgx_uae_service.dll. The system we tried to run our command-line test application on does not have these libraries, and since the OS can’t resolve all of the symbols it throws an exception and the program crashes before it even starts.

These two DLLs are part of the Intel SGX Platform Software (PSW) package, and without them Intel SGX applications written using the Intel SGX Software Development Kit (SDK) cannot execute. Our application needs to be able to run even if these libraries are not present.

The Platform Software Package

As mentioned above, the runtime libraries are part of the PSW. In addition to these support libraries, the PSW includes:

  • Services that support and maintain the trusted compute block (TCB) on the system
  • Services that perform and manage certain Intel SGX operations such as attestation
  • Interfaces to platform services such as trusted time and the monotonic counters

The PSW must be installed by the application installer when deploying an Intel SGX application, because Intel does not offer the PSW for direct download by end users. Software vendors must not assume that it will already be present and installed on the destination system. In fact, the license agreement for Intel SGX specifically states that licensees must re-distribute the PSW with their applications.

We’ll discuss the PSW installer in more detail in a future installment of the series covering packaging and deployment.

Detecting Intel Software Guard Extensions Support

So far we’ve focused on the problem of just starting our application on systems without Intel SGX support, and more specifically, without the PSW. The next step is to detect whether or not Intel SGX support is present and enabled once the application is running.

Intel SGX feature detection is, unfortunately, a complicated procedure. For a system to be Intel SGX capable, four conditions must be met:

  1. The CPU must support Intel SGX.
  2. The BIOS must support Intel SGX.
  3. In the BIOS, Intel SGX must be explicitly enabled or set to the “software controlled” state.
  4. The PSW must be installed on the platform.

Note that the CPUID instruction, alone, is not sufficient to detect the usability of Intel SGX on a platform. It can tell you whether or not the CPU supports the feature, but it doesn’t know anything about the BIOS configuration or the software that is installed on a system. Relying solely on the CPUID results to make decisions about Intel SGX support can potentially lead to a runtime fault.

To make feature detection even more difficult, examining the state of the BIOS is not a trivial task and is  generally not possible from a user process. Fortunately the Intel SGX SDK provides a simple solution: the function sgx_enable_device will both check for Intel SGX capability and attempt to enable it if the BIOS is set to the software control state (the purpose of the software control setting is to allow applications to enable Intel SGX without requiring users to reboot their systems and enter their BIOS setup screens, a particularly daunting and intimidating task for non-technical users).

The problem with sgx_enable_device, though, is that it is part of the Intel SGX runtime, which means the PSW must be installed on the system in order to use it. So before we attempt to call sgx_enable_device, we must first detect whether or not the PSW is present.

Implementation

With our problem scoped out, we can now lay out the steps that must be followed, in order, for our dual-code path application to function properly. Our application must:

  1. Load and begin executing even without the Intel SGX runtime libraries.
  2. Determine whether or not the PSW package is installed.
  3. Determine whether or not Intel SGX is enabled (and attempt to enable it).

Loading and Executing without the Intel Software Guard Extensions Runtime

Our main application depends on PasswordManagerCore.dll, which depends on EnclaveBridge.dll, which in turn depends on the Intel SGX runtime. Since all symbols need to be resolved when an application loads, we need a way to prevent the loader from trying to resolve symbols that come from the Intel SGX runtime libraries. There are two options:

Option #1: Dynamic Loading      

In dynamic loading, you don’t explicitly link the library in the project. Instead you use system calls to load the library at runtime and then look up the names of each function you plan to use in order to get the addresses of where they have been placed in memory. Functions in the library are then invoked indirectly via function pointers.

Dynamic loading is a hassle. Even if you only need a handful of functions, it can be a tedious process to prototype function pointers for every function that is needed and get their load address, one at a time. You also lose some of the benefits provided by the integrated development environment (such as prototype assistance) since you are no longer explicitly calling functions by name.

Dynamic loading is typically used in extensible application architectures (for example, plug-ins).

Option #2: Delayed-Loaded DLLs

In this approach, you dynamically link all your libraries in the project, but instruct Windows to do delayed loading of the problem DLL. When a DLL is delay-loaded, Windows does not attempt to resolve symbols that are defined by that DLL when the application starts. Instead it waits until the program makes its first call to a function that is defined in that DLL, at which point the DLL is loaded and the symbols get resolved (along with any of its dependencies). What this means is that a DLL is not loaded until the application needs it. A beneficial side effect of this approach is that it allows applications to reference a DLL that is not installed, so long as no functions in that DLL are ever called.

When the Intel SGX feature flag is off, that is exactly the situation we are in so we will go with option #2.

You specify the DLL to be delay-loaded in the project configuration for the dependent application or DLL. For the Tutorial Password Manager, the best DLL to mark for delayed loading is EnclaveBridge.dll as we only call this DLL if the Intel SGX path is enabled. If this DLL doesn’t load, neither will the two Intel SGX runtime DLLS.

We set the option in the Linker -> Input page of the PasswordManagerCore.dll project configuration:

Password Manager

After the DLL is rebuilt and installed on our 4th generation Intel Core processor system, the console test application works as expected.

CLI Test App

Detecting the Platform Software Package

Before we can call the sgx_enable_device function to check for Intel SGX support on the platform, we first have to make sure that the PSW package is installed because sgx_enable_device is part of the Intel SGX runtime. The best way to do this is to actually try to load the runtime libraries.

We know from the previous step that we can’t just dynamically link them because that will cause an exception when we attempt to run the program on a system that does not support Intel SGX (or have the PSW package installed). But we also can’t rely on delay-loaded DLLs either: delayed loading can’t tell us if a library is installed because if it isn’t, the application will still crash! That means we must use dynamic loading to test for the presence of the runtime libraries.

The PSW runtime libraries should be installed in the Windows system directory so we’ll use GetSystemDirectory to get that path, and limit the DLL search path via a call to SetDllDirectory. Finally, the two libraries will be loaded using LoadLibrary. If either of these calls fail, we know the PSW is not installed and that the main application should not attempt to run the Intel SGX code path.

Detecting and Enabling Intel Software Guard Extensions

Since the previous step dynamically loads the PSW runtime libraries, we can just look up the symbol for sgx_enable_device manually and then invoke it via a function pointer. The result will tell us whether or not Intel SGX is enabled.

Implementation

To implement this in the Tutorial Password Manager we’ll create a new DLL called FeatureSupport.dll. We can safely dynamically link this DLL from the main application since it has no explicit dependencies on other DLLs.

Our feature detection will be rolled into a C++/CLI class called FeatureSupport, which will also include some high-level functions for getting more information about the state of Intel SGX. In rare cases, enabling Intel SGX via software may require a reboot, and in rarer cases the software enable action fails and the user may be forced to enable it explicitly in their BIOS.

The class declaration for FeatureSupport is shown below.

typedef sgx_status_t(SGXAPI *fp_sgx_enable_device_t)(sgx_device_status_t *);


public ref class FeatureSupport {
private:
	UINT sgx_support;
	HINSTANCE h_urts, h_service;

	// Function pointers

	fp_sgx_enable_device_t fp_sgx_enable_device;

	int is_psw_installed(void);
	void check_sgx_support(void);
	void load_functions(void);

public:
	FeatureSupport();
	~FeatureSupport();

	UINT get_sgx_support(void);
	int is_enabled(void);
	int is_supported(void);
	int reboot_required(void);
	int bios_enable_required(void);

	// Wrappers around SGX functions

	sgx_status_t enable_device(sgx_device_status_t *device_status);

};

Here are the low-level routines that check for the PSW package and attempt to detect and enable Intel SGX.

int FeatureSupport::is_psw_installed()
{
	_TCHAR *systemdir;
	UINT rv, sz;

	// Get the system directory path. Start by finding out how much space we need
	// to hold it.

	sz = GetSystemDirectory(NULL, 0);
	if (sz == 0) return 0;

	systemdir = new _TCHAR[sz + 1];
	rv = GetSystemDirectory(systemdir, sz);
	if (rv == 0 || rv > sz) return 0;

	// Set our DLL search path to just the System directory so we don't accidentally
	// load the DLLs from an untrusted path.

	if (SetDllDirectory(systemdir) == 0) {
		delete systemdir;
		return 0;
	}

	delete systemdir; // No longer need this

	// Need to be able to load both of these DLLs from the System directory.

	if ((h_service = LoadLibrary(_T("sgx_uae_service.dll"))) == NULL) {
		return 0;
	}

	if ((h_urts = LoadLibrary(_T("sgx_urts.dll"))) == NULL) {
		FreeLibrary(h_service);
		h_service = NULL;
		return 0;
	}

	load_functions();

	return 1;
}

void FeatureSupport::check_sgx_support()
{
	sgx_device_status_t sgx_device_status;

	if (sgx_support != SGX_SUPPORT_UNKNOWN) return;

	sgx_support = SGX_SUPPORT_NO;

	// Check for the PSW

	if (!is_psw_installed()) return;

	sgx_support = SGX_SUPPORT_YES;

	// Try to enable SGX

	if (this->enable_device(&sgx_device_status) != SGX_SUCCESS) return;

	// If SGX isn't enabled yet, perform the software opt-in/enable.

	if (sgx_device_status != SGX_ENABLED) {
		switch (sgx_device_status) {
		case SGX_DISABLED_REBOOT_REQUIRED:
			// A reboot is required.
			sgx_support |= SGX_SUPPORT_REBOOT_REQUIRED;
			break;
		case SGX_DISABLED_LEGACY_OS:
			// BIOS enabling is required
			sgx_support |= SGX_SUPPORT_ENABLE_REQUIRED;
			break;
		}

		return;
	}

	sgx_support |= SGX_SUPPORT_ENABLED;
}

void FeatureSupport::load_functions()
{
	fp_sgx_enable_device = (fp_sgx_enable_device_t)GetProcAddress(h_service, "sgx_enable_device");
}

// Wrappers around SDK functions so the user doesn't have to mess with dynamic loading by hand.

sgx_status_t FeatureSupport::enable_device(sgx_device_status_t *device_status)
{
	check_sgx_support();

	if (fp_sgx_enable_device == NULL) {
		return SGX_ERROR_UNEXPECTED;
	}

	return fp_sgx_enable_device(device_status);
}

Wrapping Up

With these code changes, we have integrated Intel SGX feature detection into our application! It will execute smoothly on systems both with and without Intel SGX support and choose the appropriate code branch.

As mentioned in the introduction, there is sample code provided with this part for you to download. The attached archive includes the source code for the Tutorial Password Manager core, including the new feature detection DLL. Additionally, we have added a new GUI-based test program that automatically selects the Intel SGX code path, but lets you disable it if desired (this option is only available if Intel SGX is supported on the system).

SGX Code Branch

The console-based test program has also been updated to detect Intel SGX, though it cannot be configured to turn it off without modifying the source code.

Coming Up Next

We’ll revisit the enclave in Part 7 in order to fine-tune the interface. Stay tuned!

Implementing a masked SVML-like function explicitly in user defined way

$
0
0

Intel Compiler provides SIMD intrinsics APIs for short vector math library (SVML) and starting with AVX512 generation it also exposes masked versions of SVML functions to the users. e.g. see zmmintrin.h:

extern __m512d __ICL_INTRINCC _mm512_mask_exp_pd(__m512d, __mmask8, __m512d);

Masked SIMD functions are handy, just like masked instructions – one can use mask as a vector predicate to avoid computations on certain elements in a vector register e.g. because of unwanted floating point, memory or performance side-effects. Intel Compiler autovectorizer could always optimize this loop with condition into a masked SVML function call

   for (int32_t i=0; i<LEN; i++)
      if (x[i] > 0.0)
        y[i] = exp(x[i]);
      else
        y[i] = 0.0;

AVX512(-xCORE-AVX512) code generation(disassembly) snippet for above code:

 ..B1.24:                        # Preds ..B1.59 ..B1.23
                                # Execution count [8.48e-01]
        vpcmpud   $1, %ymm16, %ymm18, %k6                       #54.17
        vmovupd   (%rbx,%r12,8), %zmm2{%k6}{z}                  #55.9
        vcmppd    $6, %zmm17, %zmm2, %k5                        #55.16
        kandw     %k5, %k6, %k4                                 #55.16
        vmovupd   (%rbx,%r12,8), %zmm1{%k4}{z}                  #56.18
        vmovaps   %zmm17, %zmm0                                 #56.14
        kmovw     %k4, %k1                                      #56.14
        call      __svml_exp8_mask                              #56.14
                                # LOE rbx rsi r12 r13 r14 edi r15d ymm16 ymm18 ymm19 zmm0 zmm17 k4 k5 k6
..B1.59:                        # Preds ..B1.24
                                # Execution count [8.48e-01]
        vpaddd    %ymm19, %ymm18, %ymm18                        #54.17
        kandnw    %k6, %k5, %k1                                 #58.7
        vmovupd   %zmm0, (%r13,%r12,8){%k4}                     #56.7
        vmovupd   %zmm17, (%r13,%r12,8){%k1}                    #58.7
        addq      $8, %r12                                      #54.17
        cmpq      %rsi, %r12                                    #54.17
        jb        ..B1.24       # Prob 82%                      #54.17

Before AVX512 , the x86 vector unit instruction set didn’t provide architectural support for vector masks but the desired behaviour could be easily emulated. For example disassembly of AVX2(-xCORE-AVX2) for above conditional code.

..B1.11:                        # Preds ..B1.14 ..B1.10
                                # Execution count [0.00e+00]
        vmovupd   (%rbx,%r14,8), %ymm0                          #55.9
        vcmpgtpd  %ymm10, %ymm0, %ymm11                         #55.16
        vptest    %ymm8, %ymm11                                 #55.16
        je        ..B1.13       # Prob 20%                      #55.16
                                # LOE rbx r12 r13 r14 r15d ymm0 ymm8 ymm9 ymm10 ymm11
..B1.12:                        # Preds ..B1.11
                                # Execution count [8.48e-01]
        vmovdqa   %ymm11, %ymm1                                 #56.14
        call      __svml_exp4_mask                              #56.14
                                # LOE rbx r12 r13 r14 r15d ymm0 ymm8 ymm9 ymm10 ymm11
..B1.39:                        # Preds ..B1.12
                                # Execution count [8.48e-01]
        vmovdqa   %ymm0, %ymm2                                  #56.14
        vmovupd   (%r12,%r14,8), %ymm0                          #56.7
        vblendvpd %ymm11, %ymm2, %ymm0, %ymm2                   #56.7
        jmp       ..B1.14       # Prob 100%                     #56.7
                                # LOE rbx r12 r13 r14 r15d ymm2 ymm8 ymm9 ymm10 ymm11
..B1.13:                        # Preds ..B1.11
                                # Execution count [0.00e+00]
        vmovupd   (%r12,%r14,8), %ymm2                          #58.7
                                # LOE rbx r12 r13 r14 r15d ymm2 ymm8 ymm9 ymm10 ymm11
..B1.14:                        # Preds ..B1.39 ..B1.13
                                # Execution count [8.48e-01]
        vxorpd    %ymm11, %ymm9, %ymm0                          #55.16
        vandnpd   %ymm2, %ymm0, %ymm1                           #58.7
        vmovupd   %ymm1, (%r12,%r14,8)                          #58.7
        addq      $4, %r14                                      #54.17
        cmpq      $8388608, %r14                                #54.17
        jb        ..B1.11       # Prob 82%                      #54.17

So users benefited from masked functions in SVML even before architecture added support for vector masks. In below recipe we would like to address users that do not rely on autovectorizer and chose to call SVML through intrinsics on pre-AVX512 platforms. We are not exposing pre-AVX512 masked APIs through intrinsics this time, instead we show how users could implement their own masked vector math functions if needed. Here’s an example:

static __forceinline __m256d _mm256_mask_exp_pd(__m256d old_dst, __m256d mask, __m256d src)
{
    // Need to patch masked off inputs with good values
    // that do not cause side-effects like over/underflow/nans/denormals, etc.
    // 0.5 is a good value for EXP and most other functions.
    // acosh is not defined in 0.5, so it can rather use 2.0
    // 0.0 and 1.0 are often bad points, e.g. think log()
   __m256d patchValue = _mm256_set1_pd(0.5);
    __m256d patchedSrc = _mm256_blendv_pd(patchValue, src, mask);
    // compute SVML function on a full register
    // NOTE: one may choose to totally skip expensive call to exp
    // if the mask was all-zeros, this is left as an exercise to
    // the reader.
    __m256d res = _mm256_exp_pd(patchedSrc);
    // discard masked off results, restore values from old_dst
    old_dst = _mm256_blendv_pd(old_dst, res, mask);
    return old_dst;
}

One would probably achieve better performance if masked function was inlined, thus we use static __forceinline in the declaration.And here’s how one would use this function if the original loop was written with intrinsics:

void vfoo(int n4, double * a, double *r)
{
    int i;
    for (i = 0; i < n4; i+=4)
    {
        __m256d src, dst, mask;
        src = _mm256_load_pd(a + i);


        // fill mask based on desired condition
        mask = _mm256_cmp_pd(src, _mm256_setzero_pd(), _CMP_GT_OQ);
        // do something useful for the else path
        dst = _mm256_setzero_pd();
        // compute masked exp that will preserve above useful values
        dst = _mm256_mask_exp_pd(dst, mask, src);


        _mm256_store_pd(r + i, dst);
    }
}

 Here’s the assembly listing for the above loop:

..B1.3:                         # Preds ..B1.8 ..B1.2
                                # Execution count [5.00e+00]
        vmovupd   (%rdi,%r12,8), %ymm1                          #25.30
        vcmpgt_oqpd %ymm9, %ymm1, %ymm10                        #28.16
        vblendvpd %ymm10, %ymm1, %ymm8, %ymm0                   #32.15
        call      __svml_exp4                                   #32.15
                                # LOE rbx rsi rdi r12 r13 r14 r15 ymm0 ymm8 ymm9 ymm10
..B1.8:                         # Preds ..B1.3
                                # Execution count [5.00e+00]
        vblendvpd %ymm10, %ymm0, %ymm9, %ymm1                   #32.15
        vmovupd   %ymm1, (%rsi,%r12,8)                          #34.25
        addq      $4, %r12                                      #22.25
        cmpq      %r13, %r12                                    #22.21
        jl        ..B1.3        # Prob 82%                      #22.21

Note:Similarly we can develop our own masked version of intrinsics for other functions like log,sqrt,cos,sin also by just trivial change of “exp” to “cos", "sin" ..etc. as in the above sample code. Mind the note on patch value though.

 

Getting Started with Graphics Performance Analyzer - Platform Analyzer

$
0
0

Prerequisite

You are recommended to get familiar how to use System Analyzer for Intel® Graphics Performance Analyzer (Intel® GPA) first. System Analyzer is the central interface used to record the trace logs for Platform Analyzer. Intel GPA is part of Intel® System Studio.

Introduction

Platform Analyzer provides offline performance analysis capability† by reviewing the task timelines among real hardware engines, software queues for the GPU scheduler, and background thread activities. With this utility, developers can quickly review which GPU engines (render, video enhance, video processing, or video codec) are involved in the target test application. For example, the trace result can reveal whether the graphics from Intel hardware codec acceleration is used or not. Platform Analyzer can also provide a rough clue regarding performance issues by reviewing task timelines.

This image shows an example of Platform Analyzer data. The top pane shows when graphics-related tasks occur on hardware engines, GPU software queues, and thread timelines. The bottom pane contains the overall time cost for each type task and HW engines usage.

†Platform analyzer is part of Intel GPA, which focuses more on graphics application optimization usage, mainly on Microsoft* DirectX, and Android and Linux OpenGL apps. Check the Intel GPA product page to get the quick overview.

A quick start to capture the trace logs

Follow the normal steps to start analyzing the target application. Windows allows using a hotkey combination to capture the platform analyzer trace logs directly:

  1. Right-click the GPA monitors system icon, and then select Analyze Application” to lunch the target application.
  2. Press Shift + Ctrl + T to capture the trace logs
  3. Once you see a message that indicates the capture is complete (as shown in the figure below), the trace logs have been successfully captured.

 

Find the pattern: what symptom will the performance issues show in Platform Analyzer?

First of all, you might want to understand a little bit about the VSync and present function calls in Platform Analyzer. VSync indicates a hardware signal right after that system outputs one frame to a display device. The interval between two VSync signals implies the refresh rate of display device. As for the present API, it indicates the operation to copy one frame to another memory destination.

With this background knowledge, you may watch and investigate these symptoms indicated by Platform Analyzer:

  • Irregular VSync pattern. (VSync intervals should be consistent otherwise the screen will blink or flash during the test.)
  • Long delay (big packet) in GPU engine. (This packet could be the present call. If the packet's size or length in the timeline crosses multiple VSyncs, it can cause display frames to stick.)
  • Overloading, which is multiple heavily-stacked packets in software queue, as shown the figure below. (Stacked packets means several tasks are scheduled in the same time. The packet will be dropped or delayed for processing if the GPU cannot handle several tasks in a timely manner. They also cause display frames to stick.)

Further information

Intel System Studio includes three components from Intel GPA: System Analyzer, Platform Analyzer, and Graphics Frame Analyzer. System Analyzer provides an interface to record the logs, which is what Platform Analyzer and Frame Analyzer need for offline analysis.

Name

Functionalities

System Analyzer

Provides real-time system and app analysis. Includes a central interface to record logs for other Intel GPA analyzers. For more information, see the System Analyzer Controls.

Platform Analyzer

Provides GPU engines and threads with an analysis of activity interactions. It presents the captured log by showing all graphics hardware engine workloads (including decoding, video processing, rendering, and computing) and thread activities in a timeline view. For more information, see Platform Analyzer Controls.

Graphics Frame Analyzer

Provides an offline analysis of single-frame rendering. Reconstruct the frame by replaying APIs for Microsoft DirectX* or OpenGL* that are logged by System Analyzer. For more information, see Graphics Frame Analyzer Controls.

 

See also

Intel® Graphics Performance Analyzers (Intel® GPA) Documentation

Register and Download Intel System Studio 2017 Professional Edition (for Windows)

Viewing all 533 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>