The Ultimate Question of Programming, Refactoring, and Everything

February 19, 2017, 5:10 am

Latest and popular articles on Intel Technologies

≫ Next: Intel® Software Guard Extensions Tutorial Series: Part 8, GUI Integration

≪ Previous: Getting Started with Graphics Performance Analyzer - Platform Analyzer

Yes, you've guessed correctly - the answer is "42". In this article you will find 42 recommendations about coding in C++ that can help a programmer avoid a lot of errors, save time and effort. The author is Andrey Karpov - technical director of "Program Verification Systems", a team of developers, working on PVS-Studio static code analyzer. Having checked a large number of open source projects, we have seen a large variety of ways to shoot yourself in the foot; there is definitely much to share with the readers. Every recommendation is given with a practical example, which proves the currentness of this question. These tips are intended for C/C++ programmers, but usually they are universal, and may be of interest for developers using other languages.

Preface

About the author. My name is Andrey Karpov. The scope of my interests ? the C/C++ language and the promotion of code analysis methodology. I have been Microsoft MVP in Visual C++ for 5 years. The main aim of my articles and work in general - is to make the code of programs safer and more secure. I'll be really glad if these recommendations help you write better code, and avoid typical errors. Those who write code standards for companies may also find some helpful information here.

A little bit of history. Not so long ago I created a resource, where I shared useful tips and tricks about programming in C++. But this resource didn't get the expected number of subscribers, so I don't see the point in giving a link to it here. It will be on the web for some time, but eventually, it will be deleted. Still, these tips are worth keeping. That's why I've updated them, added several more and combined them in a single text. Enjoy reading!

1. Don't do the compiler's job

Consider the code fragment, taken from MySQL project. The code contains an error that PVS-Studio analyzer diagnoses in the following way: V525 The code containing the collection of similar blocks. Check items '0', '1', '2', '3', '4', '1', '6' in lines 680, 682, 684, 689, 691, 693, 695.

static int rr_cmp(uchar *a,uchar *b)
{
  if (a[0] != b[0])
    return (int) a[0] - (int) b[0];
  if (a[1] != b[1])
    return (int) a[1] - (int) b[1];
  if (a[2] != b[2])
    return (int) a[2] - (int) b[2];
  if (a[3] != b[3])
    return (int) a[3] - (int) b[3];
  if (a[4] != b[4])
    return (int) a[4] - (int) b[4];
  if (a[5] != b[5])
    return (int) a[1] - (int) b[5];     <<<<====
  if (a[6] != b[6])
    return (int) a[6] - (int) b[6];
  return (int) a[7] - (int) b[7];
}

Explanation

This is a classic error, related to copying fragments of code (Copy-Paste). Apparently, the programmer copied the block of code "if (a[1] != b[1]) return (int) a[1] - (int) b[1];". Then he started changing the indices and forgot to replace "1" with "5". This resulted in the comparison function occasionally returning an incorrect value; this issue is going to be difficult to notice. And it's really hard to detect since all the tests had not revealed it before we scanned MySQL with PVS-Studio.

Correct code

if (a[5] != b[5])
  return (int) a[5] - (int) b[5];

Recommendation

Although the code is neat and easy-to-read, it didn't prevent the developers from overlooking the error. You can't stay focused when reading code like this because all you see is just similar looking blocks, and it's hard to concentrate the whole time.

These similar blocks are most likely a result of the programmer's desire to optimize the code as much as possible. He "unrolled the loop" manually. I don't think it was a good idea in this case.

Firstly, I doubt that the programmer has really achieved anything with it. Modern compilers are pretty smart, and are very good at automatic loop unrolling if it can help improve program performance.

Secondly, the bug appeared in the code because of this attempt to optimize the code. If you write a simpler loop, there will be less chance of making a mistake.

I'd recommend rewriting this function in the following way:

static int rr_cmp(uchar *a,uchar *b)
{
  for (size_t i = 0; i < 7; ++i)
  {
    if (a[i] != b[i])
      return a[i] - b[i];
  }
  return a[7] - b[7];
}

Advantages:

The function is easier to read and comprehend.
You are much less likely to make a mistake writing it.

I am quite sure that this function will work no slower than its longer version.

So, my advice would be - write simple and understandable code. As a rule, simple code is usually correct code. Don't try to do the compiler's job - unroll loops, for example. The compiler will most definitely do it well without your help. Doing such fine manual optimization work would only make sense in some particularly critical code fragments, and only after the profiler has already estimated those fragments as problematic (slow).

2. Larger than 0 does not mean 1

The following code fragment is taken from CoreCLR project. The code has an error that PVS-Studio analyzer diagnoses in the following way: V698 Expression 'memcmp(....) == -1' is incorrect. This function can return not only the value '-1', but any negative value. Consider using 'memcmp(....) < 0' instead.

bool operator( )(const GUID& _Key1, const GUID& _Key2) const
  { return memcmp(&_Key1, &_Key2, sizeof(GUID)) == -1; }

Explanation

Let's have a look at the description of memcmp() function:

int memcmp ( const void * ptr1, const void * ptr2, size_t num );

Compares the first num bytes of the block of memory pointed by ptr1 to the first num bytes pointed by ptr2, returning zero if they all match, or a value different from zero representing which is greater, if they do not.

Return value:

< 0 - the first byte that does not match in both memory blocks has a lower value in ptr1 than in ptr2 (if evaluated as unsigned char values).
== 0 - the contents of both memory blocks are equal.
> 0 - the first byte that does not match in both memory blocks has a greater value in ptr1 than in ptr2 (if evaluated as unsigned char values).

Note that if blocks aren't the same, then the function returns values greater than or less than zero. Greater or less. This is important! You cannot compare the results of such functions as memcmp(), strcmp(), strncmp(), and so on with the constants 1 and -1.

Interestingly, the wrong code, where the result is compared with the 1/ -1 can work as the programmer expects for many years. But this is sheer luck, nothing more. The behavior of the function can unexpectedly change. For example, you may change the compiler, or the developers will optimize memcmp() in a new way, so your code will cease working.

Correct code

bool operator( )(const GUID& _Key1, const GUID& _Key2) const
  { return memcmp(&_Key1, &_Key2, sizeof(GUID)) < 0; }

Recommendation

Don't rely on the way the function works now. If the documentation says that a function can return values less than or greater than 0, it does mean it. It means that the function can return -10, 2, or 1024. The fact that you always see it return -1, 0, or 1 doesn't prove anything.

By the way, the fact that the function can return such numbers as 1024, indicates, that the result of memcmp() execution cannot be stored in the variable of char type. This is one more wide-spread error, whose consequences can be really serious. Such a mistake was the root of a serious vulnerability in MySQL/MariaDB in versions earlier than 5.1.61, 5.2.11, 5.3.5, 5.5.22. The thing is, that when a user connects to MySQL/MariaDB, the code evaluates a token (SHA from the password and hash) that is then compared with the expected value of memcmp() function. But on some platforms the return value can go beyond the range [-128..127] As a result, in 1 out of 256 cases the procedure of comparing hash with an expected value always returns true, regardless of the hash. Therefore, a simple command on bash gives a hacker root access to the volatile MySQL server, even if the person doesn't know the password. The reason for this was the following code in the file 'sql/password.c':

typedef char my_bool;
...
my_bool check(...) {
  return memcmp(...);
}

A more detailed description of this issue can be found here: Security vulnerability in MySQL/MariaDB.

3. Copy once, check twice

The fragment is taken from Audacity project. The error is detected by the following PVS-Studio diagnostic: V501 There are identical sub-expressions to the left and to the right of the '-' operator.

sampleCount VoiceKey::OnBackward (....) {
  ...
  int atrend = sgn(buffer[samplesleft - 2]-
                   buffer[samplesleft - 1]);
  int ztrend = sgn(buffer[samplesleft - WindowSizeInt-2]-
                   buffer[samplesleft - WindowSizeInt-2]);
  ...
}

Explanation

The "buffer[samplesleft - WindowSizeInt-2]" expression is subtracted from itself. This error appeared because of copying a code fragment (Copy-Paste): the programmer copied a code string but forgot to replace 2 with 1.

This a really banal error, but still it is a mistake. Errors like this are a harsh reality for programmers, and that's why there will speak about them several times here. I am declaring war on them.

Correct code

int ztrend = sgn(buffer[samplesleft - WindowSizeInt-2]-
                 buffer[samplesleft - WindowSizeInt-1]);

Recommendation

Be very careful when duplicating code fragments.

It wouldn't make sense to recommend rejecting the copy-paste method altogether. It's too convenient, and too useful to get rid of such an editor functionality.

Instead, just be careful, and don't hurry - forewarned is forearmed.

Remember that copying code may cause many errors. Here, take a look at some examples of bugs detected with the V501 diagnostic. Half of these errors are caused by using Copy-Paste.

If you copy the code and then edit it - check what you've got! Don't be lazy!

We'll talk more about Copy-Paste later. The problem actually goes deeper than it may seem, and I won't let you forget about it.

4. Beware of the ?: operator and enclose it in parentheses

Fragment taken from the Haiku project (inheritor of BeOS). The error is detected by the following PVS-Studio diagnostic: V502 Perhaps the '?:' operator works in a different way than it was expected. The '?:' operator has a lower priority than the '-' operator.

bool IsVisible(bool ancestorsVisible) const
{
  int16 showLevel = BView::Private(view).ShowLevel();
  return (showLevel - (ancestorsVisible) ? 0 : 1) <= 0;
}

Explanation

Let's check the C/C++ operation precedence. The ternary operator ?: has a very low precedence, lower than that of operations /, +, <, etc; it is also lower than the precedence of the minus operator. As a result, the program doesn't work in the way the programmer expected.

The programmer thinks that the operations will execute in the following order:

(showLevel - (ancestorsVisible ? 0 : 1) ) <= 0

But it will actually be like this:

((showLevel - ancestorsVisible) ? 0 : 1) <= 0

The error is made in very simple code. This illustrates how hazardous the ?: operator is. It's very easy to make a mistake when using it; the ternary operator in more complex conditions is pure damage to the code. It's not only that you are very likely to make and miss a mistake; such expressions are also very difficult to read.

Really, beware of the ?: operator. I've seen a lot of bugs where this operator was used.

Correct code

return showLevel - (ancestorsVisible ? 0 : 1) <= 0;

Recommendation

In previous articles, we've already discussed the problem of a ternary operator, but since then I've become even more paranoid. The example given above shows how easy it is to make an error, even in a short and simple expression, that's why I'll modify my previous tips.

I don't suggest rejecting the ?: operator completely. It may be useful, and even necessary sometimes. Nevertheless, please do not overuse it, and if you have decided to use it, here is my recommendation:

ALWAYS enclose the ternary operator in parentheses.

Suppose you have an expression:

A = B ? 10 : 20;

Then you should write it like this:

A = (B ? 10 : 20);

Yes, the parentheses are excessive here...

But, it will protect your code later when you or your colleagues add an X variable to 10 or 20 while doing code refactoring:

A = X + (B ? 10 : 20);

Without the parentheses, you could forget that the ?: operator has low precedence, and accidentally break the program.

Of course, you can write "X+" inside the parentheses, but it will still lead to the same error, although it is additional protection that shouldn't be rejected.

5. Use available tools to analyze your code

The fragment is taken from LibreOffice project. The error is detected by the following PVS-Studio diagnostic: V718 The 'CreateThread' function should not be called from 'DllMain' function.

BOOL WINAPI DllMain( HINSTANCE hinstDLL,
                     DWORD fdwReason, LPVOID lpvReserved )
{
  ....
  CreateThread( NULL, 0, ParentMonitorThreadProc,
                (LPVOID)dwParentProcessId, 0, &dwThreadId );
  ....
}

Explanation

I used to have a side job as a freelancer long time ago. Once I was given a task I failed to accomplish. The task itself was formulated incorrectly, but I didn't realise that at the time. Moreover, it seemed clear and simple at first.

Under a certain condition in the DllMain I had to do some actions, using Windows API functions; I don't remember which actions exactly, but it wasn't anything difficult.

So I spent loads of time on that, but the code just wouldn't work. More than that, when I made a new standard application, it worked; but it didn't when I tried it in the DllMain function. Some magic, isn't it? I didn't manage to figure out the root of the problem at the time.

It's only now that I work on PVS-Studio development, so many years later, that I have suddenly realized the reason behind that old failure. In the DllMain function, you can perform only a very limited set of actions. The thing is that some DLL may be not loaded yet, and you cannot call functions from them.

Now we have a diagnostic to warn programmers when dangerous operations are detected in DllMain functions. So it was this, which was the case with that old task I was working on.

Details

More details about the usage of DllMain can be found on the MSDN site in this article: Dynamic-Link Library Best Practices. I'll give some abstracts from it here:

DllMain is called while the loader-lock is held. Therefore, significant restrictions are imposed on the functions that can be called within DllMain. As such, DllMain is designed to perform minimal initialization tasks, by using a small subset of the Microsoft Windows API. You cannot call any function in DllMain which directly, or indirectly, tries to acquire the loader lock. Otherwise, you will introduce the possibility that your application deadlocks or crashes. An error in a DllMain implementation can jeopardize the entire process and all of its threads.

The ideal DllMain would be just an empty stub. However, given the complexity of many applications, this is generally too restrictive. A good rule of thumb for DllMain is to postpone the initialization as for as long as possible. Slower initialization increases how robust the application is, because this initialization is not performed while the loader lock is held. Also, slower initialization enables you to safely use much more of the Windows API.

Some initialization tasks cannot be postponed. For example, a DLL that depends on a configuration file will fail to load if the file is malformed or contains garbage. For this type of initialization, the DLLs should attempt to perform the action, and in the case of a failure, exit immediately rather than waste resources by doing some other work.

You should never perform the following tasks from within DllMain:

Call LoadLibrary or LoadLibraryEx (either directly or indirectly). This can cause a deadlock or a crash.
Call GetStringTypeA, GetStringTypeEx, or GetStringTypeW (either directly or indirectly). This can cause a deadlock or a crash.
Synchronize with other threads. This can cause a deadlock.
Acquire a synchronization object that is owned by code that is waiting to acquire the loader lock. This can cause a deadlock.
Initialize COM threads by using CoInitializeEx Under certain conditions, this function can call LoadLibraryEx.
Call the registry functions. These functions are implemented in Advapi32.dll. If Advapi32.dll is not initialized before your DLL, the DLL can access uninitialized memory and cause the process to crash.
Call CreateProcess. Creating a process can load another DLL.
Call ExitThread. Exiting a thread during DLL detach can cause the loader lock to be acquired again, causing a deadlock or a crash.
Call CreateThread. Creating a thread can work if you do not synchronize with other threads, but it is risky.
Create a named pipe or other named object (Windows 2000 only). In Windows 2000, named objects are provided by the Terminal Services DLL. If this DLL is not initialized, calls to the DLL can cause the process to crash.
Use the memory management function from the dynamic C Run-Time (CRT). If the CRT DLL is not initialized, calls to these functions can cause the process to crash.
Call functions in User32.dll or Gdi32.dll. Some functions load another DLL, which may not be initialized.
Use managed code.

Correct code

The code fragment from the LibreOffice project cited above may or may not work - it all a matter of chance.

It's not easy to fix an error like this. You need refactor your code in order to make the DllMain function as simple, and short, as possible.

Recommendation

It's hard to give recommendations. You can't know everything; everyone may encounter a mysterious error like this. A formal recommendation would sound like this: you should carefully read all the documentation for every program entity you work with. But you surely understand that one can't foresee every possible issue. You'd only spend all your time reading documentation then, have no time for programming. And even having read N pages, you couldn't be sure you haven't missed some article that could warn you against some trouble.

I wish I could give you somewhat more practical tips, but there is unfortunately only one thing I can think of: use static analyzers. No, it doesn't guarantee you will have zero bugs. Had there been an analyzer all those years ago, which could have told me that I couldn't call the Foo function in DllMain, I would have saved a lot of time and even more nerves: I really was angry, and going crazy, because of not being able to solve the task.

6. Check all the fragments where a pointer is explicitly cast to integer types

The fragment is taken from IPP Samples project. The error is detected by the following PVS-Studio diagnostic: V205 Explicit conversion of pointer type to 32-bit integer type: (unsigned long)(img)

void write_output_image(...., const Ipp32f *img,
                        ...., const Ipp32s iStep) {
  ...
  img = (Ipp32f*)((unsigned long)(img) + iStep);
  ...
}

Note. Some may say that this code isn't the best example for several reasons. We are not concerned about why a programmer would need to move along a data buffer in such a strange way. What matters to us is the fact that the pointer is explicitly cast to the "unsigned long" type. And only this. I chose this example purely because it is brief.

Explanation

A programmer wants to shift a pointer at a certain number of bytes. This code will execute correctly in Win32 mode because the pointer size is the same as that of the long type. But if we compile a 64-bit version of the program, the pointer will become 64-bit, and casting it to long will cause the loss of the higher bits.

Note. Linux uses a different data model. In 64-bit Linux programs, the 'long' type is 64-bit too, but it's still a bad idea to use 'long' to store pointers there. First, such code tends to get into Windows applications quite often, where it becomes incorrect. Second, there are special types whose very names suggest that they can store pointers - for example, intptr_t. Using such types makes the program clearer.

In the example above, we can see a classic error which occurs in 64-bit programs. It should be said right off that there are lots of other errors, too, awaiting programmers in their way of 64-bit software development. But it is the writing of a pointer into a 32-bit integer variable that's the most widespread and insidious issue.

This error can be illustrated in the following way:

Figure 1. A) 32-bit program. B) 64-bit pointer refers to an object that is located in the lower addresses. C) 64-bit pointer is damaged.

Speaking about its insidiousness, this error is sometimes very difficult to notice. The program just "almost works". Errors causing the loss of the most significant bits in pointers may only show up in a few hours of intense use of the program. First, the memory is allocated in the lower memory addresses, that's why all the objects and arrays are stored in the first 4 GB of memory. Everything works fine.

As the program keeps running, the memory gets fragmented, and even if the program doesn't use much of it, new objects may be created outside those first 4 GB. This is where the troubles start. It's extremely difficult to purposely reproduce such issues.

Correct code

You can use such types as size_t, INT_PTR, DWORD_PTR, intrptr_t, etc. to store pointers.

img = (Ipp32f*)((uintptr_t)(img) + iStep);

Actually, we can do it without any explicit casting. It is not mentioned anywhere that the formatting is different from the standard one, that's why there is no magic in using __declspec(align( # )) and so on. So, the pointers are shifted by the number of bytes that is divisible by Ipp32f; otherwise we will have undefined behavior (see EXP36-C)

So, we can write it like this:

img += iStep / sizeof(*img);

Recommendation

Use special types to store pointers - forget about int and long. The most universal types for this purpose are intptr_t and uintptr_t. In Visual C++, the following types are available: INT_PTR, UINT_PTR, LONG_PTR, ULONG_PTR, DWORD_PTR. Their very names indicate that you can safely store pointers in them.

A pointer can fit into the types size_t and ptrdiff_t too, but I still wouldn't recommend using them for that, for they are originally intended for storing sizes and indices.

You cannot store a pointer to a member function of the class in uintptr_t. Member functions are slightly different from standard functions. Except for the pointer itself, they keep hidden value of this that points to the object class. However, it does not matter - in the 32-bit program, you can not assign such a pointer to unsigned int. Such pointers are always handled in a special way, that's why there aren't many problems in 64-bit programs. At least I haven't seen such errors.

If you are going to compile your program into a 64-bit version, first, you need to review and fix all the code fragments where pointers are cast into 32-bit integer types. Reminder - there will be more troublesome fragments in the program, but you should start with the pointers.

For those who are creating or planning to create 64-bit applications, I suggest studying the following resource: Lessons on development of 64-bit C/C++ applications.

7. Do not call the alloca() function inside loops

This bug was found in Pixie project. The error is detected by the following PVS-Studio diagnostic: V505 The 'alloca' function is used inside the loop. This can quickly overflow stack.

inline  void  triangulatePolygon(....) {
  ...
  for (i=1;i<nloops;i++) {
    ...
    do {
      ...
      do {
        ...
        CTriVertex *snVertex =
          (CTriVertex *) alloca(2*sizeof(CTriVertex));
        ...
      } while(dVertex != loops[0]);
      ...
    } while(sVertex != loops[i]);
    ...
  }
  ...
}

Explanation

The alloca(size_t) function allocates memory by using the stack. Memory allocated by alloca() is freed when leaving the function.

There's not much stack memory usually allocated for programs. When you create a project in Visual C++, you may see that the default setting is just 1 megabyte for the stack memory size, this is why the alloca() function can very quickly use up all the available stack memory if used inside a loop.

In the example above, there are 3 nested loops at once. Therefore, triangulating a large polygon will cause a stack overflow.

It is also unsafe to use such macros as A2W in loops as they also contain a call of the alloca() function.

As we have already said, by default, Windows-programs use a stack of 1 Megabyte. This value can be changed; in the project settings find and change the parameters 'Stack Reserve Size', and 'Stack Commit Size'. Details: "/STACK (Stack Allocations)". However, we should understand that making the stack size bigger isn't the solution to the problem - you just postpone the moment when the program stack will overflow.

Recommendation

Do not call the alloca() function inside loops. If you have a loop and need to allocate a temporary buffer, use one of the following 3 methods to do so:

Allocate memory in advance, and then use one buffer for all the operations. If you need buffers of different sizes every time, allocate memory for the biggest one. If that's impossible (you don't know exactly how much memory it will require), use method 2.
Make the loop body a separate function. In this case, the buffer will be created and destroyed right off at each iteration. If that's difficult too, there's only method N3 left.
Replace alloca() with the malloc() function or new operator, or use a class such as std::vector. Take into account that memory allocation will take more time in this case. In the case of using malloc/new you will have to think about freeing it. On the other hand, you won't get a stack overflow when demonstrating the program on large data to the customer.

8. Remember that an exception in the destructor is dangerous

This issue was found in LibreOffice project. The error is detected by the following PVS-Studio diagnostic: V509 The 'dynamic_cast<T&>' operator should be located inside the try..catch block, as it could potentially generate an exception. Raising exception inside the destructor is illegal.

virtual ~LazyFieldmarkDeleter()
{
  dynamic_cast<Fieldmark&>
    (*m_pFieldmark.get()).ReleaseDoc(m_pDoc);
}

Explanation

When an exception is thrown in a program, the stack begins to unroll, and objects get destroyed by calling their destructors. If the destructor of an object being destroyed during stack unrolling throws another exception which leaves the destructor, the C++ library will immediately terminate the program by calling the terminate() function. What follows from this is the rule that destructors should never let exceptions out. An exception thrown inside a destructor must be handled inside the same destructor.

The code cited above is rather dangerous. The dynamic_cast operator will generate a std::bad_cast exception if it fails to cast an object reference to the required type.

Likewise, any other construct that can throw an exception is dangerous. For example, it's not safe to use the new operator to allocate memory in the destructor. If it fails, it will throw a std::bad_alloc exception.

Correct code:

The code can be fixed using the dynamic_cast notwith a reference, but with the pointer. In this case, if it's impossible to convert the type of the object, it won't generate an exception, but will return nullptr.

virtual ~LazyFieldmarkDeleter()
{
  auto p = dynamic_cast<Fieldmark*>m_pFieldmark.get();
  if (p)
    p->ReleaseDoc(m_pDoc);
}

Recommendation

Make your destructors as simple as possible. Destructors aren't meant for memory allocation and file reading.

Of course, it's not always possible to make destructors simple, but I believe we should try to reach that. Besides that, a destructor being complex is generally a sign of a poor class design, and ill-conceived solutions.

The more code you have in your destructor, the harder it is to provide for all possible issues. It makes it harder to tell which code fragment can or cannot throw an exception.

If there is some chance that an exception may occur, a good solution is usually to suppress it by using the catch(...):

virtual ~LazyFieldmarkDeleter()
{
  try
  {
    dynamic_cast<Fieldmark&>
      (*m_pFieldmark.get()).ReleaseDoc(m_pDoc);
  }
  catch (...)
  {
    assert(false);
  }
}

True, using it may conceal some error in the destructor, but it may also help the application to run more stably in general.

I'm not insisting on configuring destructors to never throw exceptions - it all depends on the particular situation. Sometimes it's rather useful to generate an exception in the destructor. I have seen that in specialized classes, but these were rare cases. These classes are designed in such a way that the objects generate an exception upon the destruction, but if it is a usual class like "own string","dot", "brush""triangle", "document" and so on, in these cases the exceptions shouldn't be thrown from the destructor.

Just remember that double exception on end cause a program termination, so it's up to you to decide if you want this to happen in your project or not.

9. Use the '\0' literal for the terminal null character

The fragment is taken from Notepad++ project. The error is detected by the following PVS-Studio diagnostic: The error text: V528 It is odd that pointer to 'char' type is compared with the '\0' value. Probably meant: *headerM != '\0'.

TCHAR headerM[headerSize] = TEXT("");
...
size_t Printer::doPrint(bool justDoIt)
{
  ...
  if (headerM != '\0')
  ...
}

Explanation

Thanks to this code's author, using the '\0' literal to denote the terminal null character, we can easily spot and fix the error. The author did a good job, but not really.

Imagine this code were written in the following way:

if (headerM != 0)

The array address is verified against 0. The comparison doesn't make sense as it always true. What's that - an error or just a redundant check? It's hard to say, especially if it is someone else's code or code written a long time ago.

But since the programmer used the '\0' literal in this code, we can assume that the programmer wanted to check the value of one character. Besides, we know that comparing the headerM pointer with NULL doesn't make sense. All of that taken into account, we figure that the programmer wanted to find out if the string is empty or not but made a mistake when writing the check. To fix the code, we need to add a pointer dereferencing operation.

Correct code

TCHAR headerM[headerSize] = TEXT("");
...
size_t Printer::doPrint(bool justDoIt)
{
  ...
  if (*headerM != _T('\0'))
  ...
}

Recommendation

The number 0 may denote NULL, false, the null character '\0', or simply the value 0. So please don't be lazy - avoid using 0 for shorter notations in every single case. It only makes the code less comprehensible, and errors harder to find.

Use the following notations:

0 - for integer zero;
nullptr - for null pointers in C++;
NULL - for null pointers in C;
'\0', L'\0', _T('\0') - for the terminal null;
0.0, 0.0f - for zero in expressions with floating-point types;
false, FALSE - for the value 'false'.

Sticking to this rule will make your code clearer, and make it easier for you and other programmers to spot bugs during code reviews.

10. Avoid using multiple small #ifdef blocks

The fragment is taken from CoreCLR project. The error is detected by the following PVS-Studio diagnostic: V522 Dereferencing of the null pointer 'hp' might take place.

heap_segment* gc_heap::get_segment_for_loh (size_t size
#ifdef MULTIPLE_HEAPS
                                           , gc_heap* hp
#endif //MULTIPLE_HEAPS
                                           )
{
#ifndef MULTIPLE_HEAPS
    gc_heap* hp = 0;
#endif //MULTIPLE_HEAPS
    heap_segment* res = hp->get_segment (size, TRUE);
    if (res != 0)
    {
#ifdef MULTIPLE_HEAPS
        heap_segment_heap (res) = hp;
#endif //MULTIPLE_HEAPS
  ....
}

Explanation

I believe that #ifdef/#endif constructs are evil - an unavoidable evil, unfortunately. They are necessary and we have to use them. So I won't urge you to stop using #ifdef, there's no point in that. But I do want to ask you to be careful to not "overuse" it.

I guess many of you have seen code literally stuffed with #ifdefs. It's especially painful to deal with code where #ifdef is repeated every ten lines, or even more often. Such code is usually system-dependent, and you can't do without using #ifdef in it. That doesn't make you any happier, though.

See how difficult it is to read the code sample above! And it is code reading which programmers have to do as their basic activity. Yes, I do mean it. We spend much more time reviewing and studying existing code than writing new one. That's why code which is hard to read reduces our efficiency so much, and leaves more chance for new errors to sneak in.

Getting back to our code fragment, the error is found in the null pointer dereferencing operation, and occurs when the MULTIPLE_HEAPS macro is not declared. To make it easier for you, let's expand the macros:

heap_segment* gc_heap::get_segment_for_loh (size_t size)
{
  gc_heap* hp = 0;
  heap_segment* res = hp->get_segment (size, TRUE);
  ....

The programmer declared the hp variable, initialized it to NULL, and dereferenced it right off. If MULTIPLE_HEAPS hasn't been defined, we'll get into trouble.

Correct code

This error is still living in CoreCLR (12.04.2016) despite a colleague of mine having reported it in the article "25 Suspicious Code Fragments in CoreCLR", so I'm not sure how best to fix this error.

As I see it, since (hp == nullptr), then the 'res' variable should be initialized to some other value, too - but I don't know what value exactly. So we'll have to do without the fix this time.

Recommendations

Eliminate small #ifdef/#endif blocks from your code - they make it really hard to read and understand! Code with "woods" of #ifdefs is harder to maintain and more prone to mistakes.

There is no recommendation to suit every possible case - it all depends on the particular situation. Anyway, just remember that #ifdef is a source of trouble, so you must always strive to keep your code as clear as possible.

Tip N1. Try refusing #ifdef.

#ifdef can be sometimes replaced with constants and the usual if operator. Compare the following 2 code fragments: A variant with macros:

#define DO 1

#ifdef DO
static void foo1()
{
  zzz();
}
#endif //DO

void F()
{
#ifdef DO
  foo1();
#endif // DO
  foo2();
}

This code is hard to read; you don't even feel like doing it. Bet you've skipped it, haven't you? Now compare it to the following:

const bool DO = true;

static void foo1()
{
  if (!DO)
    return;
  zzz();
}

void F()
{
  foo1();
  foo2();
}

It's much easier to read now. Some may argue the code has become less efficient since there is now a function call and a check in it. But I don't agree with that. First, modern compilers are pretty smart and you are very likely to get the same code without any extra checks and function calls in the release version. Second, the potential performance losses are too small to be bothered about. Neat and clear code is more important.

Tip N2. Make your #ifdef blocks larger.

If I were to write the get_segment_for_loh() function, I wouldn't use a number of #ifdefs there; I'd make two versions of the function instead. True, there'd be a bit more text then, but the functions would be easier to read, and edit too.

Again, some may argue that it's duplicated code, and since they have lots of lengthy functions with #ifdef in each, having two versions of each function may cause them to forget about one of the versions when fixing something in the other.

Hey, wait! And why are your functions lengthy? Single out the general logic into separate auxiliary functions - then both of your function versions will become shorter, ensuring that you will easily spot any differences between them.

I know this tip is not a cure-all. But do think about it.

Tip N3. Consider using templates - they might help.

Tip N4. Take your time and think it over before using #ifdef. Maybe you can do without it? Or maybe you can do with fewer #ifdefs, and keep this "evil" in one place?

11. Don't try to squeeze as many operations as possible in one line

The fragment is taken from Godot Engine project. The error is detected by the following PVS-Studio diagnostic: V567 Undefined behavior. The 't' variable is modified while being used twice between sequence points.

static real_t out(real_t t, real_t b, real_t c, real_t d)
{
  return c * ((t = t / d - 1) * t * t + 1) + b;
}

Explanation

Sometimes, you can come across code fragments where the authors try to squeeze as much logic as possible into a small volume of code, by means of complex constructs. This practice hardly helps the compiler, but it does make the code harder to read and understand for other programmers (or even the authors themselves). Moreover, the risk of making mistakes in such code is much higher, too.

It is in such fragments, where programmers try to put lots of code in just a few lines, that errors related to undefined behavior are generally found. They usually have to do with writing in and reading from one and the same variable within one sequence point. For a better understanding of the issue, we need to discuss in more detail the notions of "undefined behavior" and "sequence point".

Undefined behavior is the property of some programming languages to issue a result that depends on the compiler implementation or switches of optimization. Some cases of undefined behavior (including the one being discussed here) are closely related to the notion of a "sequence point".

A sequence point defines any point in a computer program's execution at which it is guaranteed that all side effects of previous evaluations will have been performed, and no side effects from subsequent evaluations have yet been revealed. In C/C++ programming languages there are following sequence points:

sequence points for operators "&&", "||", ",". When not overloaded, these operators guarantee left-to-right execution order;
sequence point for ternary operator "?:";
sequence point at the end of each full expression (usually marked with ';');
sequence point in place of the function call, but after evaluating the arguments;
sequence point when returning from the function.

Note. The new C++ standard has discarded the notion of a "sequence point", but we'll be using the above given explanation to let those of you unfamiliar with the subject, grasp the general idea easier and faster. This explanation is simpler than the new one, and is sufficient for us to understand why one shouldn't squeeze lots of operations into one "pile".

In the example we have started with, there is none of the above mentioned sequence points, while the '=' operator, as well as the parentheses, can't be treated as such. Therefore, we cannot know which value of the t variable will be used when evaluating the return value.

In other words, this expression is one single sequence point, so it is unknown in what order the t variable will be accessed. For instance, the "t * t" subexpression may be evaluated before or after writing into the " t = t / d - 1" variable.

Correct code

static real_t out(real_t t, real_t b, real_t c, real_t d)
{
  t = t / d - 1;
  return c * (t * t * t + 1) + b;
}

Recommendation

It obviously wasn't a good idea to try to fit the whole expression in one line. Besides it being difficult to read, it also made it easier for an error to sneak in.

Having fixed the defect and split the expression into two parts, we have solved 2 issues at once - made the code more readable, and gotten rid of undefined behavior by adding a sequence point.

The code discussed above is not the only example, of course. Here's another:

*(mem+addr++) =
   (opcode >= BENCHOPCODES) ? 0x00 : ((addr >> 4)+1) << 4;

Just as in the previous case, the error in this code has been caused by unreasonably complicated code. The programmer's attempt to increment the addr variable within one expression has led to undefined behavior as it is unknown which value the addr variable will have in the right part of the expression - the original or the incremented one.

The best solution to this problem is the same as before - do not complicate matters without reason; arrange operations in several expressions instead of putting them all in one:

*(mem+addr) = (opcode >= BENCHOPCODES) ? 0x00 : ((addr >> 4)+1) << 4;
addr++;

There is a simple yet useful conclusion to draw from all of this - do not try to fit a set of operations in as few lines if possible. It may be more preferable to split the code into several fragments, thus making it more comprehensible, and reducing the chance errors occuring.

Next time you're about to write complex constructs, pause for a while and think what using them will cost you, and if you are ready to pay that price.

12. When using Copy-Paste, be especially careful with the last lines

This bug was found in Source SDK library. The error is detected by the following PVS-Studio diagnostic: V525 The code containing the collection of similar blocks. Check items 'SetX', 'SetY', 'SetZ', 'SetZ'.

inline void SetX( float val );
inline void SetY( float val );
inline void SetZ( float val );
inline void SetW( float val );

inline void Init( float ix=0, float iy=0,
                  float iz=0, float iw = 0 )
{
  SetX( ix );
  SetY( iy );
  SetZ( iz );
  SetZ( iw );
}

Explanation

I'm 100% sure this code was written with the help of Copy-Paste. One of the first lines was copied several times, with certain letters changed in its duplicates. At the very end, this technique failed the programmer: his attention weakened, and he forgot to change letter 'Z' to 'W' in the last line.

In this example, we are not concerned about the fact of a programmer making a mistake; what matters is that it was made at the end of a sequence of monotonous actions.

I do recommend reading the article "The Last Line Effect". Due to public interest a scientific version of it also got published.

Put briefly, when copying code fragments through the Copy-Paste method, it is highly probable that you will make a mistake at the very end of the sequence of copied lines. It's not my guess, it's statistical data.

Correct code

{
  SetX( ix );
  SetY( iy );
  SetZ( iz );
  SetW( iw );
}

Recommendation

I hope you have already read the article I've mentioned above. So, once again, we are dealing with the following phenomenon. When writing similarly looking code blocks, programmers copy and paste code fragments with slight changes. While doing so, they tend to forget to change certain words or characters, and it most often happens at the end of a sequence of monotonous actions because their attention weakens.

To reduce the number of such mistakes, here are a few tips for you:

Arrange your similar looking code blocks in "tables": it should make mistakes more prominent. We will discuss the "table" code layout in the next section. Perhaps in this case the table layout wasn't of much help, but still it's a very useful thing in programming.
Be very careful and attentive when using Copy-Paste. Stay focused, and double-check the code you have written - especially the last few lines.
You have now learned about the last line effect; try to keep this in mind, and tell your colleagues about it. The very fact of you knowing how such errors occur, should help you avoid them.
Share the link to the "The Last Line Effect" article with your colleagues.

13. Table-style formatting

Fragment taken from the ReactOS project (open-source operating system compatible with Windows). The error is detected by the following PVS-Studio diagnostic: V560 A part of conditional expression is always true: 10035L.

void adns__querysend_tcp(adns_query qu, struct timeval now) {
  ...
  if (!(errno == EAGAIN || EWOULDBLOCK ||
        errno == EINTR || errno == ENOSPC ||
        errno == ENOBUFS || errno == ENOMEM)) {
  ...
}

Explanation

The code sample given above is small and you can easily spot the error in it. But when dealing with real-life code, bugs are often very hard to notice. When reading code like that, you tend to unconsciously skip blocks of similar comparisons and go on to the next fragment.

The reason why it happens has to do with the fact that conditions are poorly formatted and you don't feel like paying too much attention to them because it requires certain effort, and we assume that since the checks are similar, there are hardly any mistakes in the condition and everything should be fine.

One of the ways out is formatting the code as a table.

If you felt too lazy to search for an error in the code above, I'll tell you: "errno ==" is missing in one of the checks. It results in the condition always being true as the EWOULDBLOCK is not equal to zero.

Correct code

if (!(errno == EAGAIN || errno == EWOULDBLOCK ||
      errno == EINTR || errno == ENOSPC ||
      errno == ENOBUFS || errno == ENOMEM)) {

Recommendation

For a start, here's a version of this code formatted in the simplest "table" style. I don't like it actually.

if (!(errno == EAGAIN  || EWOULDBLOCK     ||
      errno == EINTR   || errno == ENOSPC ||
      errno == ENOBUFS || errno == ENOMEM)) {

It's better now, but not quite.

There are two reasons why I don't like this layout. First, the error is still not much visible; second, you have to insert too many spaces to align the code.

That's why we need to make two improvements in this formatting style. The first one is we need to use no more than one comparison per line: it makes errors easy to notice. For example:

a == 1 &&
b == 2 &&
c      &&
d == 3 &&

The second improvement is to write operators &&, ||, etc., in a more rational way, i.e. on the left instead of on the right.

See how tedious it is to align code by means of spaces:

x == a          &&
y == bbbbb      &&
z == cccccccccc &&

Writing operators on the left makes it much faster and easier:

   x == a&& y == bbbbb&& z == cccccccccc

The code looks a bit odd, but you'll get used to it very soon.

Let's combine these two improvements to write our code sample in the new style:

if (!(   errno == EAGAIN
      || EWOULDBLOCK
      || errno == EINTR
      || errno == ENOSPC
      || errno == ENOBUFS
      || errno == ENOMEM)) {

Yes, it's longer now - yet the error has become clearly seen, too.

I agree that it looks strange, but nevertheless I do recommend this technique. I've been using it myself for half a year now and enjoy it very much, so I'm confident about this recommendation.

I don't find it a problem at all that the code has become longer. I'd even write it in a way like this:

const bool error =    errno == EAGAIN
                   || errno == EWOULDBLOCK
                   || errno == EINTR
                   || errno == ENOSPC
                   || errno == ENOBUFS
                   || errno == ENOMEM;
if (!error) {

Feel disappointed with the code being too lengthy and cluttered? I agree. So let's make it a function!

static bool IsInterestingError(int errno)
{
  return    errno == EAGAIN
         || errno == EWOULDBLOCK
         || errno == EINTR
         || errno == ENOSPC
         || errno == ENOBUFS
         || errno == ENOMEM;
}
....
if (!IsInterestingError(errno)) {

You may think that I'm dramatizing things, being too much of a perfectionist. But I assure you that errors are very common in complex expressions, and I wouldn't ever bring them up weren't they 'so frequent. They are everywhere. And they are very difficult to notice.

Here's another example from WinDjView project:

inline bool IsValidChar(int c)
{
  return c == 0x9 || 0xA || c == 0xD ||
         c >= 0x20 && c <= 0xD7FF ||
         c >= 0xE000 && c <= 0xFFFD ||
         c >= 0x10000 && c <= 0x10FFFF;
}

The function consists of just a few lines, but it still has an error. The function always returns true. The reason, in the long run, has to do with poor formatting and programmers maintaining the code for many years being unwilling to read it carefully.

Let's refactor this code in the "table" style, I'd also add some parentheses:

inline bool IsValidChar(int c)
{
  return
       c == 0x9
    || 0xA
    || c == 0xD
    || (c >= 0x20    && c <= 0xD7FF)
    || (c >= 0xE000  && c <= 0xFFFD)
    || (c >= 0x10000 && c <= 0x10FFFF);
}

You don't have to format your code exactly the way I suggest. The aim of this post is to draw your attention to typos in "chaotically" written code. By arranging it in the "table" style, you can avoid lots of silly typos, and that's already great. So I hope this post will help you.

Note

Being completely honest, I have to warn you that "table" formatting may sometimes cause harm. Check this example:

inline
void elxLuminocity(const PixelRGBi& iPixel,
                   LuminanceCell< PixelRGBi >& oCell)
{
  oCell._luminance = 2220*iPixel._red +
                     7067*iPixel._blue +
                     0713*iPixel._green;
  oCell._pixel = iPixel;
}

It's taken from the eLynx SDK project. The programmer wanted to align the code, so he added 0 before the value 713. Unfortunately, he forgot that 0 being the first digit in a number means that this number is octal.

An array of strings

I hope that the idea about the table formatting of the code is clear, but I feel like giving couple more examples. Let's have a look at one more case. By bringing it here, I am saying that the table formatting should be used not only with conditions, but also with other various constructions of a language.

The fragment is taken from Asterisk project. The error is detected by the following PVS-Studio diagnostic: V653 A suspicious string consisting of two parts is used for array initialization. It is possible that a comma is missing. Consider inspecting this literal: "KW_INCLUDES""KW_JUMP".

static char *token_equivs1[] =
{
  ...."KW_IF","KW_IGNOREPAT","KW_INCLUDES""KW_JUMP","KW_MACRO","KW_PATTERN",
  ....
};

There is a typo here - one comma is forgotten. As a result two strings that have completely different meaning are combined in one, i.e. we actually have:

  ...."KW_INCLUDESKW_JUMP",
  ....

The error could be avoided if the programmer used the table formatting. Then, if the comma is omitted, it will be easy to spot.

static char *token_equivs1[] =
{
  ...."KW_IF"        ,"KW_IGNOREPAT" ,"KW_INCLUDES"  ,"KW_JUMP"      ,"KW_MACRO"     ,"KW_PATTERN"   ,
  ....
};

Just like last time, pay attention, that if we put the delimiter to the right (a comma in this case), you have to add a lot of spaces, which is inconvenient. It is especially inconvenient if there is a new long line/phrase: we will have to reformat the entire table.

That's why I would again recommend formatting the table in the following way:

static char *token_equivs1[] =
{
  ....
  , "KW_IF"
  , "KW_IGNOREPAT"
  , "KW_INCLUDES"
  , "KW_JUMP"
  , "KW_MACRO"
  , "KW_PATTERN"
  ....
};

Now it's very easy to spot a missing comma and there is no need to use a lot of spaces - the code is beautiful and intuitive. Perhaps this way of formatting may seem unusual, but you quickly get used to it - try it yourself.

Finally, here is my short motto. As a rule, beautiful code is usually correct code.

14. A good compiler and coding style aren't always enough

We have already spoken about good styles of coding, but this time we'll have a look at an anti-example. It's not enough to write good code: there can be various errors and a good programming style isn't always a cure-all.

The fragment is taken from PostgreSQL. The error is detected by the following PVS-Studio diagnostic: V575 The 'memcmp' function processes '0' elements. Inspect the third argument.

Cppcheck analyzer can also detect such errors. It issues a warning: Invalid memcmp() argument nr 3. A non-boolean value is required.

Datum pg_stat_get_activity(PG_FUNCTION_ARGS)
{
  ....
  if (memcmp(&(beentry->st_clientaddr), &zero_clientaddr,
             sizeof(zero_clientaddr) == 0))
  ....
}

Explanation

A closing parenthesis is put in a wrong place. It's just a typo, but unfortunately it completely alters the meaning of the code.

The sizeof(zero_clientaddr) == 0 expression always evaluates to 'false' as the size of any object is always larger than 0. The false value turns to 0, which results in the memcmp() function comparing 0 bytes. Having done so, the function assumes that the arrays are equal and returns 0. It means that the condition in this code sample can be reduced to if (false).

Correct code

if (memcmp(&(beentry->st_clientaddr), &zero_clientaddr,
           sizeof(zero_clientaddr)) == 0)

Recommendation

It's just the case when I can't suggest any safe coding technique to avoid typos. The only thing I can think of is "Yoda conditions", when constants are written to the left of the comparison operator:

if (0 == memcmp(&(beentry->st_clientaddr), &zero_clientaddr,
                sizeof(zero_clientaddr)))

But I won't recommend this style. I don't like and don't use it for two reasons:

First, it makes conditions less readable. I don't know how to put it exactly, but it's not without reason that this style is called after Yoda.

Second, they don't help anyway if we deal with parentheses put in a wrong place. There are lots of ways you can make a mistake. Here's an example of code where using the Yoda conditions didn't prevent the incorrect arrangement of parentheses:

if (0 == LoadStringW(hDllInstance, IDS_UNKNOWN_ERROR,
        UnknownError,
        sizeof(UnknownError) / sizeof(UnknownError[0] -
        20)))

This fragment is taken from the ReactOS project. The error is difficult to notice, so let me point it out for you: sizeof(UnknownError[0] - 20).

So Yoda conditions are useless here.

We could invent some artificial style to ensure that every closing parenthesis stands under the opening one. But it will make the code too bulky and ugly, and no one will be willing to write it that way.

So, again, there is no coding style I could recommend to avoid writing closing parentheses in wrong places.

And here's where the compiler should come in handy and warn us about such a strange construct, shouldn't it? Well, it should but it doesn't. I run Visual Studio 2015, specify the /Wall switch... and don't get any warning. But we can't blame the compiler for that, it has enough work to do as it is.

The most important conclusion for us to draw from today's post is that good coding style and compiler (and I do like the compiler in VS2015) do not always make it. I sometimes hear statements like, "You only need to set the compiler warnings at the highest level and use good style, and everything's going to be OK" No, it's not like that. I don't mean to say some programmers are bad at coding; it's just that every programmer makes mistakes. Everyone, no exceptions. Many of your typos are going to sneak past the compiler and good coding style.

So the combo of good style + compiler warnings is important but not sufficient. That's why we need to use a variety of bug search methods. There's no silver bullet; the high quality of code can be only achieved through a combination of several techniques.

The error we are discussing here can be found by means of the following methods:

code review;
unit-tests;
manual testing;
static code analysis;
etc.

I suppose you have already guessed that I am personally interested in the static code analysis methodology most of all. By the way, it is most appropriate for solving this particular issue because it can detect errors at the earliest stage, i.e. right after the code has been written.

Indeed, this error can be easily found by such tools as Cppcheck or PVS-Studio.

Conclusion. Some people don't get it that having skill isn't enough to avoid mistakes. Everyone makes them - it's inevitable. Even super-guru make silly typos every now and then. And since it's inevitable, it doesn't make sense blaming programmers, bad compilers, or bad style. It's just not going to help. Instead, we should use a combination of various software quality improving techniques.

15. Start using enum class in your code, if possible

All the examples of this error I have are large. I've picked one of the smallest, but it's still quite lengthy. Sorry for that.

This bug was found in Source SDK library. The error is detected by the following PVS-Studio diagnostic: V556 The values of different enum types are compared: Reason == PUNTED_BY_CANNON.

enum PhysGunPickup_t
{
  PICKED_UP_BY_CANNON,
  PUNTED_BY_CANNON,
  PICKED_UP_BY_PLAYER,
};

enum PhysGunDrop_t
{
  DROPPED_BY_PLAYER,
  THROWN_BY_PLAYER,
  DROPPED_BY_CANNON,
  LAUNCHED_BY_CANNON,
};

void CBreakableProp::OnPhysGunDrop(...., PhysGunDrop_t Reason)
{
  ....
  if( Reason == PUNTED_BY_CANNON )
  {
    PlayPuntSound();
  }
  ....
}

Explanation

The Reason variable is an enumeration of the PhysGunDrop_t type. This variable is compared to the named constant PUNTED_BY_CANNON belonging to another enumeration, this comparison being obviously a logical error.

This bug pattern is quite widespread. I came across it even in such projects as Clang, TortoiseGit, and Linux Kernel.

The reason why it is so frequent is that enumerations are not type safe in the standard C++; you may get easily confused about what should be compared with what.

Correct code

I don't know for sure what the correct version of this code should look like. My guess is that PUNTED_BY_CANNON should be replaced with DROPPED_BY_CANNON or LAUNCHED_BY_CANNON. Let it be LAUNCHED_BY_CANNON.

if( Reason == LAUNCHED_BY_CANNON )
{
  PlayPuntSound();
}

Recommendation

Consider yourself lucky if you write in C++; I recommend that you start using enum class right now and the compiler won't let you compare values, that refer to different enumerations. You won't be comparing pounds with inches anymore.

There are certain innovations in C++ I don't have much confidence in. Take, for instance, the auto keyword. I believe it may be harmful when used too often. Here's how I see it: programmers spend more time reading the code rather than writing it, so we must ensure that the program text is easy-to-read. In the C language, variables are declared in the beginning of the function, so when you edit the code in the middle or at the end of it, it's not always easy to figure what some Alice variable actually means. That's why there exists a variety of variable naming notations. For instance, there is a prefix notation, where pfAlice may stand for a "pointer to float".

In C++, you can declare variables whenever you need, and it is considered a good style. Using prefixes and suffixes in variable names is no longer popular. And here the auto keyword emerges, resulting in programmers starting to use multiple mysterious constructs of the "auto Alice = Foo();" kind again. Alice, who the fuck is Alice?!

Sorry for digressing from our subject. I wanted to show you that some of the new features may do both good and bad. But it's not the case with enum class: I do believe it does only good.

When using enum class, you must explicitly specify to which enumeration a named constant belongs to. It protects the code from new errors. That is, the code will look like this:

enum class PhysGunDrop_t
{
  DROPPED_BY_PLAYER,
  THROWN_BY_PLAYER,
  DROPPED_BY_CANNON,
  LAUNCHED_BY_CANNON,
};

void CBreakableProp::OnPhysGunDrop(...., PhysGunDrop_t Reason)
{
  ....
  if( Reason == PhysGunDrop_t::LAUNCHED_BY_CANNON )
  {
    PlayPuntSound();
  }
  ....
}

True, fixing old code may involve certain difficulties. But I do urge you to start using enum class in new code right from this day on. Your project will only benefit from it.

I don't see much point in introducing enum class. Here's a few links for you to learn all the details about this new wonderful feature of the C++11 language:

Wikipedia. C++11. Strongly typed enumerations.
Cppreference. Enumeration declaration.
StackOverflow. Why is enum class preferred over plain enum?

16. "Look what I can do!" - Unacceptable in programming

This section will be slightly similar to "Don't try to squeeze as many operations as possible in one line", but this time I want to focus on a different thing. Sometimes it feels like programmers are competing against somebody, trying to write the shortest code possible.

I am not speaking about complicated templates. This is a different topic for discussion, as it is very hard to draw a line between where these templates do harm, and where they do good. Now I am going to touch upon a simpler situation which is relevant for both C and C++ programmers. They tend to make the constructions more complicated, thinking, "I do it because I can".

The fragment is taken from KDE4 project. The error is detected by the following PVS-Studio diagnostic: V593 Consider reviewing the expression of the 'A = B == C' kind. The expression is calculated as following: 'A = (B == C)'.

void LDAPProtocol::del( const KUrl &_url, bool )
{
  ....
  if ( (id = mOp.del( usrc.dn() ) == -1) ) {
    LDAPErr();
    return;
  }
  ret = mOp.waitForResult( id, -1 );
  ....
}

Explanation

After looking at this code, I always have questions such as: What was the point of doing it? Did you want to save a line? Did you want to show that you can combine several actions in one expression?

As a result we have a typical error pattern - using expressions of the if (A = Foo() == Error) kind.

The precedence of the comparison operation is higher than that of the assignment operation. That's why the "mOp.del( usrc.dn() ) == -1" comparison is executed first, and only then the true (1) or false (0) value is assigned to the id variable.

If mOp.del() returns '-1', the function will terminate; otherwise, it will keep running and the 'id' variable will be assigned an incorrect value. So it will always equal 0.

Correct code

I want to emphasize: adding extra parentheses is not a solution to the problem. Yes, the error can be eliminated. But it's the wrong way.

There were additional parentheses in the code - have a closer look. It's difficult to say what they were meant for; perhaps the programmer wanted to get rid of the compiler warnings. Perhaps he suspected that the operation priority may be not right, and wanted to fix this issue, but failed to do so. Anyway, those extra brackets don't help.

There is a deeper problem here. If it is a possible not to make the code more complicated, don't. It is better to write:

id = mOp.del(usrc.dn());
if ( id == -1 ) {

Recommendation

Don't be so lazy as not to write an extra code line: complex expressions are hard to read, after all. Do the assignment first, and only then, the comparison. Thus you will make it easier for programmers who will be maintaining your code later, and also it will reduce the chances of making a mistake.

So my conclusion is - don't try to show off.

This tip sounds trivial, but I hope it will help you. It's always better to write clear and neat code, instead of in a "see how cool I am" style.

17. Use dedicated functions to clear private data

The fragment is taken from the Apache HTTP Server project. The error is detected by the following PVS-Studio diagnostic: V597 The compiler could delete the 'memset' function call, which is used to flush 'x' buffer. The RtlSecureZeroMemory() function should be used to erase the private data.

static void MD4Transform(
  apr_uint32_t state[4], const unsigned char block[64])
{
  apr_uint32_t a = state[0], b = state[1],
               c = state[2], d = state[3],
               x[APR_MD4_DIGESTSIZE];
  ....
  /* Zeroize sensitive information. */
  memset(x, 0, sizeof(x));
}

Explanation

In this code the programmer uses a call of the memset() function to erase private data. But it's not the best way to do that because the data won't actually be erased. To be more exact, whether or not they will be erased depends on the compiler, its settings, and the Moon phase.

Try to look at this code from the compiler's viewpoint. It does its best to make your code work as fast as possible, so it carries out a number of optimizations. One of them is to remove the calls of functions which don't affect the program's behavior, and are therefore excessive from the viewpoint of the C/C++ language. This is exactly the case with the memset() function in the code sample above. True, this function changes the 'x' buffer, but this buffer is not used anywhere after that, which means the call of the memset() function can - and ought to - be deleted.

Important! What I'm telling you now is not a theoretical model of the compiler's behavior - it's a real-life one. In such cases, the compiler does remove the calls of the memset() function. You can do a few experiments to check it for yourself. For more details and examples on this issue, please see the following articles:

Security, security! But do you test it?
Safe Clearing of Private Data.
V597. The compiler could delete the 'memset' function call, which is used to flush 'Foo' buffer. The RtlSecureZeroMemory() function should be used to erase the private data
Zero and forget -- caveats of zeroing memory in C (see also the discussion of this article).
MSC06-C. Beware of compiler optimizations.

What makes this error with removed memset() calls especially tricky, is its being very hard to track. When working in the debugger, you will most likely be dealing with un-optimized code, with the function call still there. You can only find the error when studying the assembler listing, generated when building the optimized application version.

Some programmers believe that it has to do with a bug in the compiler, and that it has no right to throw away the calls of such an important function as memset(). But this is not the case. This function is by no means more, or less, important than any other, so the compiler has full right to optimize the code where it is called. After all, such code may turn out to be excessive indeed.

Correct code

memset_s(x, sizeof(x), 0, sizeof(x));

RtlSecureZeroMemory(x, sizeof(x));

Recommendation

You should use special memory clearing functions that the compiler is not allowed to remove for its optimization purposes.

Visual Studio, for instance, offers the RtlSecureZeroMemory function; and starting with C11, you can use the memset_s function. If necessary, you can even create a safe function of your own - there are lots of examples on the Internet. Here is a couple of them.

Version No.1.

errno_t memset_s(void *v, rsize_t smax, int c, rsize_t n) {
  if (v == NULL) return EINVAL;
  if (smax > RSIZE_MAX) return EINVAL;
  if (n > smax) return EINVAL;
  volatile unsigned char *p = v;
  while (smax-- && n--) {
    *p++ = c;
  }
  return 0;
}

Version No.2.

void secure_zero(void *s, size_t n)
{
    volatile char *p = s;
    while (n--) *p++ = 0;
}

Some programmers even go further, and implement functions to fill the array with pseudo-random values, these functions running at different times to ensure better protection from time-measuring attacks. You can find the implementations of such functions on the internet, too.

18. The knowledge you have, working with one language isn't always applicable to another language

The fragment is taken from Putty project. Ineffective code is detected by the following PVS-Studio diagnostic: V814 Decreased performance. Calls to the 'strlen' function have being made multiple times when a condition for the loop's continuation was calculated.

static void tell_str(FILE * stream, char *str)
{
  unsigned int i;
  for (i = 0; i < strlen(str); ++i)
    tell_char(stream, str[i]);
}

Explanation

There's no actual error here, but such code can be extremely inefficient when we deal with long strings, as the strlen() function is called in every loop iteration. So the error, if there is one here, is one of inefficiency.

As a rule, this kind of thing is typically found in code written by those that have previously worked with the Pascal language (or Delphi). In Pascal, the evaluation of the terminating condition of the loop is computed just once, thus this code is suitable and quite commonly used.

Let's have a look at an example of code written in Pascal. The word called will be printed only once, because the pstrlen() is called only once.

program test;
var
  i   : integer;
  str : string;

function pstrlen(str : string): integer;
begin
  writeln('called');
  pstrlen := Length(str);
end;

begin
  str := 'a pascal string';
  for i:= 1 to pstrlen(str) do
    writeln(str[i]);
end.

Effective code:

static void tell_str(FILE * stream, char *str)
{
  size_t i;
  const size_t len = strlen(str);
  for (i = 0; i < len; ++i)
    tell_char(stream, str[i]);
}

Recommendation

Don't forget that in C/C++, loop termination conditions are re-computed at the end of each and every iteration. Therefore it's not a good idea to call inefficient slow functions as part of this evaluation, especially if you can compute it just the once, before the loop is entered.

In some cases the compiler might be able to optimize the code with strlen(). For instance, if the pointer always refers to the same string literal, but we shouldn't rely on that in any way.

19. How to properly call one constructor from another

This issue was found in LibreOffice project. The error is detected by the following PVS-Studio diagnostic: V603 The object was created but it is not being used. If you wish to call constructor, 'this->Guess::Guess(....)' should be used.

Guess::Guess()
{
  language_str = DEFAULT_LANGUAGE;
  country_str = DEFAULT_COUNTRY;
  encoding_str = DEFAULT_ENCODING;
}

Guess::Guess(const char * guess_str)
{
  Guess();
  ....
}

Explanation

Good programmers hate writing duplicate code. And that's great. But when dealing with constructors, many shoot themselves in the foot, trying to make their code short and neat.

You see, a constructor can't simply be called like an ordinary function. If we write "A::A(int x) { A(); }", it will lead to creating a temporary unnamed object of the A type, instead of calling a constructor without arguments.

This is exactly what happens in the code sample above: a temporary unnamed object Guess() is created and gets immediately destroyed, while the class member language_str and others remain uninitialized.

Correct code:

There used to be 3 ways to avoid duplicate code in constructors. Let's see what they were.

The first way is to implement a separate initialization function, and call it from both constructors. I'll spare you the examples - it should be obvious as it is.

That's a fine, reliable, clear, and safe technique. However, some bad programmers want to make their code even shorter. So I have to mention two other methods.

They are pretty dangerous, and require you to have a good understanding of how they work, and what consequences you may have to face.

The second way:

Guess::Guess(const char * guess_str)
{
  new (this) Guess();
  ....
}

Third way:

Guess::Guess(const char * guess_str)
{
  this->Guess();
  ....
}

The second and the third variant are rather dangerous because the base classes are initialized twice. Such code can cause subtle bugs, and do more harm than good. Consider an example where such a constructor call is appropriate, where it's not.

Here is a case where everything is fine:

class SomeClass
{
  int x, y;
public:
  SomeClass() { new (this) SomeClass(0,0); }
  SomeClass(int xx, int yy) : x(xx), y(yy) {}
};

The code is safe and works well since the class only contains simple data types, and is not derived from other classes. A double constructor call won't pose any danger.

And here's another example where explicitly calling a constructor will cause an error:

class Base
{
public:
 char *ptr;
 std::vector vect;
 Base() { ptr = new char[1000]; }
 ~Base() { delete [] ptr; }
};

class Derived : Base
{
  Derived(Foo foo) { }
  Derived(Bar bar) {
     new (this) Derived(bar.foo);
  }
  Derived(Bar bar, int) {
     this->Derived(bar.foo);
  }
}

So we call the constructor using the expressions "new (this) Derived(bar.foo);" or "this->Derived(bar.foo)".

The Base object is already created, and the fields are initialized. Calling the constructor once again will cause double initialization. As a result, a pointer to the newly allocated memory chunk will be written into ptr, which will result in a memory leak. As for double initialization of an object of the std::vector type, the consequences of it are even harder to predict. One thing is clear: code like that is not permissible.

Do you need all that headache, after all? If you can't utilize C++11's features, then use method No. 1 (create an initialization function). An explicit constructor call may be only needed on very rare occasions.

Recommendation

And now we have a feature to help us with the constructors, at last!

C++11 allows constructors to call other peer constructors (known as delegation). This allows constructors to utilize another constructor's behavior with a minimum of added code.

For example:

Guess::Guess(const char * guess_str) : Guess()
{
  ....
}

To learn more about delegating constructors, see the following links:

Wikipedia. C++11. Object construction improvement.
C++11 FAQ. Delegating constructors.
MSDN. Uniform Initialization and Delegating Constructors.

20. The End-of-file (EOF) check may not be enough

The fragment is taken from SETI@home project. The error is detected by the following PVS-Studio diagnostic: V663 Infinite loop is possible. The 'cin.eof()' condition is insufficient to break from the loop. Consider adding the 'cin.fail()' function call to the conditional expression.

template <typename T>
std::istream &operator >>(std::istream &i, sqlblob<T> &b)
{
  ....
  while (!i.eof())
  {
    i >> tmp;
    buf+=(tmp+'');
  }
  ....
}

Explanation

The operation of reading data from a stream object is not as trivial as it may seem at first. When reading data from streams, programmers usually call the eof() method to check if the end of stream has been reached. This check, however, is not quite adequate as it is not sufficient and doesn't allow you to find out if any data reading errors or stream integrity failures have occurred, which may cause certain issues.

Note. The information provided in this article concerns both input and output streams. To avoid repetition, we'll only discuss one type of stream here.

This is exactly the mistake the programmer made in the code sample above: in the case of there being any data reading error, an infinite loop may occur as the eof() method will always return false. On top of that, incorrect data will be processed in the loop, as unknown values will be getting to the tmp variable.

To avoid issues like that, we need to use additional methods to check the stream status: bad(), fail().

Correct code

Let's take advantage of the fact that the stream can implicitly cast to the bool type. The true value indicates that the value is read successfully.More details about the way this code works can be found on StackOverflow.

template <typename T>
std::istream &operator >>(std::istream &i, sqlblob<T> &b)
{
  ....
  while (i >> tmp)
  {
    buf+=(tmp+'');
  }
  ....
}

Recommendation

When reading data from a stream, don't use the eof() method only; check for any failures, too.

Use the methods bad() and fail() to check the stream status. The first method is used to check stream integrity failures, while the second is for checking data reading errors.

However, it's much more convenient to use bool() operator, as it is shown in the example of the correct code.

21. Check that the end-of-file character is reached correctly (EOF)

Let's continue the topic of working with files. And again we'll have a look at EOF. But this time we'll speak about a bug of a completely different type. It usually reveals itself in localized versions of software.

The fragment is taken from Computational Network Toolkit. The error is detected by the following PVS-Studio diagnostic: V739 EOF should not be compared with a value of the 'char' type. The 'c' should be of the 'int' type.

string fgetstring(FILE* f)
{
  string res;
  for (;;)
  {
    char c = (char) fgetc(f);
    if (c == EOF)
      RuntimeError("error reading .... 0: %s", strerror(errno));
    if (c == 0)
      break;
    res.push_back(c);
  }
  return res;
}

Explanation

Let's look at the way EOF is declared:

#define EOF (-1)

As you can see, the EOF is nothing more than '-1 ' of int type. Fgetc() function returns a value of int type. Namely, it can return a number from 0 to 255 or -1 (EOF). The values read are placed into a variable of char type. Because of this, a symbol with the 0xFF (255) value turns into -1, and then is handled in the same way as the end of file (EOF).

Users that use Extended ASCII Codes, may encounter an error when one of the symbols of their alphabet is handled incorrectly by the program.

For example in the Windows 1251 code page, the last letter of Russian alphabet has the 0xFF code, and so, is interpreted by the program as the end-of-file character.

Correct code

for (;;)
{
  int c = fgetc(f);
  if (c == EOF)
    RuntimeError("error reading .... 0: %s", strerror(errno));
  if (c == 0)
    break;
  res.push_back(static_cast<char>(c));
}

Recommendation

There is probably no particular recommendation here, but as we are speaking about EOF, I wanted to show an interesting variant of an error, that some people aren't aware of.

Just remember, if the functions return the values of int type, don't hasten to change it into char. Stop and check that everything is fine. By the way, we have already had a similar case discussing the function memcmp() in Chapter N2 - "Larger than 0 does not mean 1" (See the fragment about a vulnerability in MySQL)

22. Do not use #pragma warning(default:X)

The fragment is taken from TortoiseGIT project. The error is detected by the following PVS-Studio diagnostic: V665 Possibly, the usage of '#pragma warning(default: X)' is incorrect in this context. The '#pragma warning(push/pop)' should be used instead.

#pragma warning(disable:4996)
LONG result = regKey.QueryValue(buf, _T(""), &buf_size);
#pragma warning(default:4996)

Explanation

Programmers often assume that warnings disabled with the "pragma warning(disable: X)" directive earlier will start working again after using the "pragma warning(default : X)" directive. But it is not so. The 'pragma warning(default : X)' directive sets the 'X' warning to the DEFAULT state, which is quite not the same thing.

Suppose that a file is compiled with the /Wall switch used. The C4061 warning must be generated in this case. If you add the "#pragma warning(default : 4061)" directive, this warning will not be displayed, as it is turned off by default.

Correct code

#pragma warning(push)
#pragma warning(disable:4996)
LONG result = regKey.QueryValue(buf, _T(""), &buf_size);
#pragma warning(pop)

Recommendation

The correct way to return the previous state of a warning is to use directives "#pragma warning(push[ ,n ])" and "#pragma warning(pop)". See the Visual C++ documentation for descriptions of these directives: Pragma Directives. Warnings.

Library developers should pay special attention to the V665 warning. Careless warning customization may cause a whole lot of trouble on the library users' side.

A good article on this topic: So, You Want to Suppress This Warning in Visual C++

23. Evaluate the string literal length automatically

The fragment is taken from the OpenSSL library. The error is detected by the following PVS-Studio diagnostic: V666 Consider inspecting the third argument of the function 'strncmp'. It is possible that the value does not correspond with the length of a string which was passed with the second argument.

if (!strncmp(vstart, "ASCII", 5))
  arg->format = ASN1_GEN_FORMAT_ASCII;
else if (!strncmp(vstart, "UTF8", 4))
  arg->format = ASN1_GEN_FORMAT_UTF8;
else if (!strncmp(vstart, "HEX", 3))
  arg->format = ASN1_GEN_FORMAT_HEX;
else if (!strncmp(vstart, "BITLIST", 3))
  arg->format = ASN1_GEN_FORMAT_BITLIST;
else
  ....

Explanation

It's very hard to stop using magic numbers. Also, it would be very unreasonable to get rid of such constants as 0, 1, -1, 10. It's rather difficult to come up with names for such constants, and often they will make reading of the code more complicated.

However, it's very useful to reduce the number of magic numbers. For example, it would be helpful to get rid of magic numbers which define the length of string literals.

Let's have a look at the code given earlier. The code was most likely written using the Copy-Paste method. A programmer copied the line:

else if (!strncmp(vstart, "HEX", 3))

After that "HEX" was replaced by "BITLIST", but the programmer forgot to change 3 to 7. As a result, the string is not compared with "BITLIST", only with "BIT". This error might not be a crucial one, but still it is an error.

It's really bad that the code was written using Copy-Paste. What's worse is that the string length was defined by a magic constant. From time to time we come across such errors, where the string length does not correspond with the indicated number of symbols because of a typo or carelessness of a programmer. So it's quite a typical error, and we have to do something about it. Let's look closely at the question of how to avoid such errors.

Correct code

First it may seem that it's enough to replace strncmp() call with strcmp(). Then the magic constant will disappear.

else if (!strcmp(vstart, "HEX"))

Too bad-we have changed the logic of the code work. The strncmp() function checks if the string starts with "HEX", and the function strcmp() checks if the strings are equal. There are different checks.

The easiest way to fix this is to change the constant:

else if (!strncmp(vstart, "BITLIST", 7))
  arg->format = ASN1_GEN_FORMAT_BITLIST;

This code is correct, but it is very bad because the magic 7 is still there. That's why I would recommend a different method.

Recommendation

Such an error can be prevented if we explicitly evaluate the string length in the code. The easiest option is to use the strlen() function.

else if (!strncmp(vstart, "BITLIST", strlen("BITLIST")))

In this case it will be much easier to detect a mismatch if you forget to fix one of the strings:

else if (!strncmp(vstart, "BITLIST", strlen("HEX")))

But the suggested variant has two disadvantages:

There is no guarantee that the compiler will optimize the strlen() call and replace it with a constant.
You have to duplicate the string literal. It does not look graceful, and can be the subject of a possible error.

The first issue can be dealt with by using special structures for literal length evaluation during the compilation phase. For instance, you can use a macro such as:

#define StrLiteralLen(arg) ((sizeof(arg) / sizeof(arg[0])) - 1)
....
else if (!strncmp(vstart, "BITLIST", StrLiteralLen("BITLIST")))

But this macros can be dangerous. The following code can appear during the refactoring process:

const char *StringA = "BITLIST";
if (!strncmp(vstart, StringA, StrLiteralLen(StringA)))

In this case StrLiteralLen macro will return some nonsense. Depending on the pointer size (4 or 8 byte) we will get the value 3 or 7. But we can protect ourselves from this unpleasant case in C++ language, by using a more complicated trick:

template <typename T, size_t N>
char (&ArraySizeHelper(T (&array)[N]))[N];
#define StrLiteralLen(str) (sizeof(ArraySizeHelper(str)) - 1)

Now, if the argument of the StrLiteralLen macro is a simple pointer, we won't be able to compile the code.

Let's have a look at the second issue (duplicating of the string literal). I have no idea what to say to C programmers. You can write a special macro for it, but personally I don't like this variant. I am not a fan of macros. That's why I don't know what to suggest.

In C++ everything is fabulously awesome. Moreover, we solve the first problem in a really smart way. The template function will be of a great help to us. You can write it in different ways, but in general it will look like this:

template<typename T, size_t N>
int mystrncmp(const T *a, const T (&b)[N])
{
  return _tcsnccmp(a, b, N - 1);
}

Now the string literal is used only once. The string literal length is evaluated during the compilation phase. You cannot accidentally pass a simple pointer to the function and incorrectly evaluate the string length. Presto!

Summary: try to avoid magic numbers when working with strings. Use macros or template functions; the code will become not only safer, but more beautiful and shorter.

As an example, you can look at the declaration of a function strcpy_s ():

errno_t strcpy_s(
   char *strDestination,
   size_t numberOfElements,
   const char *strSource
);
template <size_t size>
errno_t strcpy_s(
   char (&strDestination)[size],
   const char *strSource
); // C++ only

The first variant is intended for the C language, or in the case of a buffer size not being known in advance. If we work with the buffer, created on the stack, then we can use the second variant in C++:

char str[BUF_SIZE];
strcpy_s(str, "foo");

There are no magic numbers, there is no evaluation of the buffer size at all. It's short and sweet.

24. Override and final specifiers should become your new friends

The fragment is taken from the MFC library. The error is detected by the following PVS-Studio diagnostic: V301 Unexpected function overloading behavior. See first argument of function 'WinHelpW' in derived class 'CFrameWndEx' and base class 'CWnd'.

class CWnd : public CCmdTarget {
  ....
  virtual void WinHelp(DWORD_PTR dwData,
                       UINT nCmd = HELP_CONTEXT);
  ....
};
class CFrameWnd : public CWnd {
  ....
};
class CFrameWndEx : public CFrameWnd {
  ....
  virtual void WinHelp(DWORD dwData,
                       UINT nCmd = HELP_CONTEXT);
  ....
};

Explanation

When you override a virtual function it's quite easy to make an error in the signature and to define a new function, which won't be in any way connected with the function in the base class. There can be various errors in this case.

Another type is used in the parameter of the overridden function.
The overridden function has a different number of parameters, this can be especially crucial when there are many parameters.
The overridden function differs in const modifier.
The base class function is not a virtual one. It was assumed that the function in the derived class would override it in the base class, but in reality it hides it.

The same error can occur during the change of types or parameter quantity in the existing code, when the programmer changed the virtual function signature in almost the entire hierarchy, but forgot to do it in some derived class.

This error can appear particularly often during the porting process to the 64-bit platform when replacing the DWORD type with DWORD_PTR, LONG with LONG_PTR and so on. Details. This is exactly our case.

Even in the case of such an error the 32-bit system will work correctly, as both DWORD and DWORD_PTR are synonyms of unsigned long; but in 64-bit version there will be an error because DWORD_PTR is a synonym of unsigned __int64 there.

Correct code

class CFrameWndEx : public CFrameWnd {
  ....
  virtual void WinHelp(DWORD_PTR dwData,
                       UINT nCmd = HELP_CONTEXT) override;
  ....
};

Recommendation

Now we have a way to protect ourselves from the error we described above. Two new specifiers were added in C++11:

Override - to indicate that the method is overriding a virtual method in a base class
Final - to indicate that derived classes do not need to override this virtual method.

We are interested in the override specifier. This is an indication for the compiler to check if the virtual function is really overriding the base class function, and to issue an error if it isn't.

If override was used when determining the function WinHelp in the CFrameWndEx class, we would have an error of compilation on a 64-bit version of an application. Thus the error could have been prevented at an early stage.

Always use the override specifier (or final), when overriding virtual functions. More details about override and final can be seen here:

Cppreference.com. override specifier (since C++11)
Cppreference.com. final specifier (since C++11)
Wikipedia.org. Explicit overrides and final.
Stackoverflow.com. 'override' in c++11.

25. Do not compare 'this' to nullptr anymore

The fragment is taken from CoreCLR project. This dangerous code is detected by the following PVS-Studio diagnostic: V704 'this == nullptr' expression should be avoided - this expression is always false on newer compilers, because 'this' pointer can never be NULL.

bool FieldSeqNode::IsFirstElemFieldSeq()
{
  if (this == nullptr)
    return false;
  return m_fieldHnd == FieldSeqStore::FirstElemPseudoField;
}

Explanation

People used to compare this pointer with 0 / NULL / nullptr. It was a common situation when C++ was only in the beginning of its development. We have found such fragments doing "archaeological" research. I suggest reading about them in an article about checking Cfront. Moreover, in those days the value of this pointer could be changed, but it was so long ago that it was forgotten.

Let's go back to the comparison of this with nullptr.

Now it is illegal. According to modern C++ standards, this can NEVER be equal to nullptr.

Formally the call of the IsFirstElemFieldSeq() method for a null-pointer this according to C++ standard leads to undefined behavior.

It seems that if this==0, then there is no access to the fields of this class while the method is executed. But in reality there are two possible unfavorable ways of such code implementation. According to C++ standards, this pointer can never be null, so the compiler can optimize the method call, by simplifying it to:

bool FieldSeqNode::IsFirstElemFieldSeq()
{
  return m_fieldHnd == FieldSeqStore::FirstElemPseudoField;
}

There is one more pitfall, by the way. Suppose there is the following inheritance hierarchy.

class X: public Y, public FieldSeqNode { .... };
....
X * nullX = NULL;
X->IsFirstElemFieldSeq();

Suppose that the Y class size is 8 bytes. Then the source pointer NULL (0x00000000) will be corrected in such a way, so that it points to the beginning of FieldSeqNode sub object. Then you have to offset it to sizeof(Y) byte. So this in the IsFirstElemFieldSeq() function will be 0x00000008. The "this == 0" check has completely lost its sense.

Correct code

It's really hard to give an example of correct code. It won't be enough to just remove this condition from the function. You have to do the code refactoring in such a way that you will never call the function, using the null pointer.

Recommendation

So, now the "if (this == nullptr)" is outlawed. However, you can see this code in many applications and libraries quite often (MFC library for instance). That's why Visual C++ is still diligently comparing this to 0. I guess the compiler developers are not so crazy as to remove code that has been working properly for a dozen years.

But the law was enacted. So for a start let's avoid comparing this to null. And once you have some free time, it will be really useful to check out all the illegal comparisons, and rewrite the code.

Most likely the compilers will act in the following way. First they will give us comparison warnings. Perhaps they are already giving them, I haven't studied this question. And then at some point they'll fully support the new standard, and your code will cease working altogether. So I strongly recommend that you start obeying the law, it will be helpful later on.

P.S. When refactoring you may need the Null object pattern.

Additional links on the topic:

26. Insidious VARIANT_BOOL

The fragment is taken from NAME project. The code contains an error that PVS-Studio analyzer diagnoses in the following way: V721 The VARIANT_BOOL type is utilized incorrectly. The true value (VARIANT_TRUE) is defined as -1. Inspect the first argument.

virtual HRESULT __stdcall
  put_HandleKeyboard (VARIANT_BOOL pVal) = 0;
....
pController->put_HandleKeyboard(true);

Explanation:

There is quite a witty quote:

We all truck around a kind of original sin from having learned Basic at an impressionable age. (C) P.J. Plauger

And this hint is exactly on the topic of evil. VARIANT_BOOL type came to us from Visual Basic. Some of our present day programming troubles are connected with this type. The thing is that "true" is coded as -1 in it.

Let's see the declaration of the type and the constants denoting true/false:

typedef short VARIANT_BOOL;

#define VARIANT_TRUE ((VARIANT_BOOL)-1)

#define VARIANT_FALSE ((VARIANT_BOOL)0)

It seems like there is nothing terrible in it. False is 0, and truth is not 0. So, -1 is quite a suitable constant. But it's very easy to make an error by using true or TRUE instead of VARIANT_TRUE.

Correct code

pController->put_HandleKeyboard(VARIANT_TRUE);

Recommendation

If you see an unknown type, it's better not to hurry, and to look up in the documentation. Even if the type name has a word BOOL, it doesn't mean that you can place 1 into the variable of this type.

In the same way programmers sometimes make mistakes, when they use HRESULT type, trying to compare it with FALSE or TRUE and forgetting that:

#define S_OK     ((HRESULT)0L)
#define S_FALSE  ((HRESULT)1L)

So I really ask you to be very careful with any types which are new to you, and not to hasten when programming.

27. Guileful BSTR strings

Let's talk about one more nasty data type - BSTR (Basic string or binary string).

The fragment is taken from VirtualBox project. The code contains an error that PVS-Studio analyzer diagnoses in the following way: V745 A 'wchar_t *' type string is incorrectly converted to 'BSTR' type string. Consider using 'SysAllocString' function.

....
HRESULT EventClassID(BSTR bstrEventClassID);
....
hr = pIEventSubscription->put_EventClassID(
                    L"{d5978630-5b9f-11d1-8dd2-00aa004abd5e}");

Explanation

Here's how a BSTR type is declared:

typedef wchar_t OLECHAR;
typedef OLECHAR * BSTR;

At first glance it seems that "wchar_t *" and BSTR are one and the same things. But this is not so, and this brings a lot of confusion and errors.

Let's talk about BSTR type to get a better idea of this case.

Here is the information from MSDN site. Reading MSDN documentation isn't much fun, but we have to do it.

A BSTR (Basic string or binary string) is a string data type that is used by COM, Automation, and Interop functions. Use the BSTR data type in all interfaces that will be accessed from script. BSTR description:

Length prefix. A four-byte integer that contains the number of bytes in the following data string. It appears immediately before the first character of the data string. This value does not include the terminating null character.
Data string. A string of Unicode characters. May contain multiple embedded null characters.
Terminator. Two null characters.

A BSTR is a pointer. The pointer points to the first character of the data string, not to the length prefix. BSTRs are allocated using COM memory allocation functions, so they can be returned from methods without concern for memory allocation. The following code is incorrect:

BSTR MyBstr = L"I am a happy BSTR";

This code builds (compiles and links) correctly, but it will not function properly because the string does not have a length prefix. If you use a debugger to examine the memory location of this variable, you will not see a four-byte length prefix preceding the data string. Instead, use the following code:

BSTR MyBstr = SysAllocString(L"I am a happy BSTR");

A debugger that examines the memory location of this variable will now reveal a length prefix containing the value 34. This is the expected value for a 17-byte single-character string that is converted to a wide-character string through the inclusion of the "L" string modifier. The debugger will also show a two-byte terminating null character (0x0000) that appears after the data string.

If you pass a simple Unicode string as an argument to a COM function that is expecting a BSTR, the COM function will fail.

I hope this is enough to understand why we should separate the BSTR and simple strings of "wchar_t *" type.

Additional links:

MSDN. BSTR.
StackOverfow. Static code analysis for detecting passing a wchar_t* to BSTR.
StackOverfow. BSTR to std::string (std::wstring) and vice versa.
Robert Pittenger. Guide to BSTR and CString Conversions.
Eric Lippert. Eric's Complete Guide To BSTR Semantics.

Correct code

hr = pIEventSubscription->put_EventClassID(
       SysAllocString(L"{d5978630-5b9f-11d1-8dd2-00aa004abd5e}"));

Recommendation

The tip resembles the previous one. If you see an unknown type, it's better not to hurry, and to look it up in the documentation. This is important to remember, so it's not a big deal that this tip was repeated once again.

28. Avoid using a macro if you can use a simple function

The fragment is taken from ReactOS project. The code contains an error that PVS-Studio analyzer diagnoses in the following way: V640 The code's operational logic does not correspond with its formatting. The second statement will always be executed. It is possible that curly brackets are missing.

#define stat64_to_stat(buf64, buf)   \
    buf->st_dev   = (buf64)->st_dev;   \
    buf->st_ino   = (buf64)->st_ino;   \
    buf->st_mode  = (buf64)->st_mode;  \
    buf->st_nlink = (buf64)->st_nlink; \
    buf->st_uid   = (buf64)->st_uid;   \
    buf->st_gid   = (buf64)->st_gid;   \
    buf->st_rdev  = (buf64)->st_rdev;  \
    buf->st_size  = (_off_t)(buf64)->st_size;  \
    buf->st_atime = (time_t)(buf64)->st_atime; \
    buf->st_mtime = (time_t)(buf64)->st_mtime; \
    buf->st_ctime = (time_t)(buf64)->st_ctime; \

int CDECL _tstat(const _TCHAR* path, struct _stat * buf)
{
  int ret;
  struct __stat64 buf64;

  ret = _tstat64(path, &buf64);
  if (!ret)
    stat64_to_stat(&buf64, buf);
  return ret;
}

Explanation

This time the code example will be quite lengthy. Fortunately it's rather easy, so it shouldn't be hard to understand.

There was the following idea. If you manage to get file information by means of _tstat64() function, then put these data into the structure of _stat type. We use a stat64_to_stat macro to save data.

The macro is incorrectly implemented. The operations it executes are not grouped in blocks with curly brackets { }. As a result the conditional operator body is only the first string of the macro. If you expand the macro, you'll get the following:

if (!ret)
  buf->st_dev   = (&buf64)->st_dev;
buf->st_ino   = (&buf64)->st_ino;
buf->st_mode  = (&buf64)->st_mode;

Consequently the majority of the structure members are copied regardless of the whether the information was successfully received or not.

This is certainly an error, but in practice it's not a fatal one. The uninitialized memory cells are just copied in vain. We had a bit of luck here. But I've come across more serious errors, connected with such poorly written macros.

Correct code

The easiest variant is just to add curly brackets to the macro. To add do { .... } while (0) is a slightly better variant. Then after the macro and the function you can put a semicolon ';'.

#define stat64_to_stat(buf64, buf)   \
  do { \
    buf->st_dev   = (buf64)->st_dev;   \
    buf->st_ino   = (buf64)->st_ino;   \
    buf->st_mode  = (buf64)->st_mode;  \
    buf->st_nlink = (buf64)->st_nlink; \
    buf->st_uid   = (buf64)->st_uid;   \
    buf->st_gid   = (buf64)->st_gid;   \
    buf->st_rdev  = (buf64)->st_rdev;  \
    buf->st_size  = (_off_t)(buf64)->st_size;  \
    buf->st_atime = (time_t)(buf64)->st_atime; \
    buf->st_mtime = (time_t)(buf64)->st_mtime; \
    buf->st_ctime = (time_t)(buf64)->st_ctime; \
  } while (0)

Recommendation

I cannot say that macros are my favorite. I know there is no way to code without them, especially in C. Nevertheless I try to avoid them if possible, and would like to appeal to you not to overuse them. My macro hostility has three reasons:

It's hard to debug the code.
It's much easier to make an error.
The code gets hard to understand especially when some macros use another macros.

A lot of other errors are connected with macros. The one I've given as an example shows very clearly that sometimes we don't need macros at all. I really cannot grasp the idea of why the authors didn't use a simple function instead. Advantages of a function over a macro:

The code is simpler. You don't have to spend additional time writing it and, aligning some wacky symbols \.
The code is more reliable (the error given as an example won't be possible in the code at all)

Concerning the disadvantages, I can only think of optimization. Yes, the function is called but it's not that serious at all.

However, let's suppose that it's a crucial thing to us, and meditate on the topic of optimization. First of all, there is a nice keyword inline which you can use. Secondly, it would be appropriate to declare the function as static. I reckon it can be enough for the compiler to build in this function and not to make a separate body for it.

In point of fact you don't have to worry about it at all, as the compilers have become really smart. Even if you write a function without any inline/static, the compiler will build it in; if it considers that it's worth doing it. But don't really bother going into such details. It's much better to write a simple and understandable code, it'll bring more benefit.

To my mind, the code should be written like this:

static void stat64_to_stat(const struct __stat64 *buf64,
                           struct _stat *buf)
{
  buf->st_dev   = buf64->st_dev;
  buf->st_ino   = buf64->st_ino;
  buf->st_mode  = buf64->st_mode;
  buf->st_nlink = buf64->st_nlink;
  buf->st_uid   = buf64->st_uid;
  buf->st_gid   = buf64->st_gid;
  buf->st_rdev  = buf64->st_rdev;
  buf->st_size  = (_off_t)buf64->st_size;
  buf->st_atime = (time_t)buf64->st_atime;
  buf->st_mtime = (time_t)buf64->st_mtime;
  buf->st_ctime = (time_t)buf64->st_ctime;
}

Actually we can make even more improvements here. In C++ for example, it's better to pass not the pointer, but a reference. The usage of pointers without the preliminary check doesn't really look graceful. But this is a different story, I won't talk about it in a section on macros.

29. Use a prefix increment operator (++i) in iterators instead of a postfix (i++) operator

The fragment is taken from the Unreal Engine 4 project. Ineffective code is detected by the following PVS-Studio diagnostic: V803 Decreased performance. In case 'itr' is iterator it's more effective to use prefix form of increment. Replace iterator++ with ++iterator.

void FSlateNotificationManager::GetWindows(....) const
{
  for( auto Iter(NotificationLists.CreateConstIterator());
       Iter; Iter++ )
  {
    TSharedPtr<SNotificationList> NotificationList = *Iter;
    ....
  }
}

Explanation

If you hadn't read the title of the article, I think it would've been quite hard to notice an issue in the code. At first sight, it looks like the code is quite correct, but it's not perfect. Yes, I am talking about the postfix increment - 'Iter++'. Instead of a postfix form of the increment iterator, you should rather use a prefix analogue, i.e. to substitute 'Iter++' for '++Iter'. Why should we do it, and what's the practical value of it? Here is the story.

Effective code:

for( auto Iter(NotificationLists.CreateConstIterator());
     Iter; ++Iter)

Recommendation

The difference between a prefix and a postfix form is well known to everybody. I hope that the internal structure distinctions (which show us the operational principles) are not a secret as well. If you have ever done the operator overloading, then you must be aware of it. If not - I'll give a brief explanation. (All the others can skip this paragraph and go to the one, which follows the code examples with operator overloading)

The prefix increment operator changes an object's state, and returns itself in the changed form. No temporary objects required. Then the prefix increment operator may look like this:

MyOwnClass& operator++()
{
  ++meOwnField;
  return (*this);
}

A postfix operator also changes the object's state but returns the previous state of the object. It does so by creating a temporary object, then the postfix increment operator overloading code will look like this:

MyOwnClass operator++(int)
{
  MyOWnCLass tmp = *this;
  ++(*this);
  return tmp;
}

Looking at these code fragments, you can see that an additional operation of creating a temporary object is used. How crucial is it in practice?

Today's compilers are smart enough to do the optimization, and to not create temporary objects if they are of no use. That's why in the Release version it's really hard to see the difference between 'it++' and '++it'.

But it is a completely different story when debugging the program in the Debug-mode. In this case the difference in the performance can be really significant.

For example, in this article there are some examples of estimation of the code running time using prefix and postfix forms of increment operators in the Debug-version. We see that is almost 4 times longer to use the postfix forms.

Those, who will say, "And? In the Release version it's all the same!" will be right and wrong at the same time. As a rule we spend more time working on the Debug-version while doing the Unit-tests, and debugging the program. So quite a good deal of time is spent working with the Debug version of software, which means that we don't want to waste time waiting.

In general I think we've managed to answer the question - "Should we use the prefix increment operator (++i) instead a of postfix operator (i++) for iterators". Yes, you really should. You'll get a nice speed-up in the Debug version. And if the iterators are quite "heavy", then the benefit will be even more appreciable.

References (reading recommendation):

30. Visual C++ and wprintf() function

The fragment is taken from Energy Checker SDK. The code contains an error that PVS-Studio analyzer diagnoses in the following way: V576 Incorrect format. Consider checking the second actual argument of the 'wprintf' function. The pointer to string of wchar_t type symbols is expected.

int main(void) {
  ...
  char *p = NULL;
  ...
  wprintf(
    _T("Using power link directory: %s\n"),
    p
  );
  ...
}

Explanation

Note: The first error is in the usage of _T for specifying a string in wide-character format. To use L prefix will be the correct variant here. However this mistake is not a crucial one and is not of a big interest to us. The code simply won't be compiled if we don't use a wide-character format and _T will expand into nothing.

If you want a wprintf() function to print a char* type string, you should use "%S" in the format string.

Many Linux programmers don't see where the pitfall is. The thing is that Microsoft quite strangely implemented such functions as wsprintf. If we work in Visual C++ with the wsprintf function, then we should use "%s" to print wide-character strings, at the same time to print char* strings we need "%S". So it's just a weird case. Those who develop cross platform applications quite often fall into this trap.

Correct code

The code I give here as a way to correct the issue is really not the most graceful one, but I still want to show the main point of corrections to make.

char *p = NULL;
...
#ifdef defined(_WIN32)
wprintf(L"Using power link directory: %S\n"), p);
#else
wprintf(L"Using power link directory: %s\n"), p);
#endif

Recommendation

I don't have any particular recommendation here. I just wanted to warn you about some surprises you may get if you use functions such as wprintf().

Starting from Visual Studio 2015 there was a solution suggested for writing a portable code. For compatibility with ISO C (C99), you should point out to the preprocessor a _CRT_STDIO_ISO_WIDE_SPECIFIERS macro.

In this case the code:

const wchar_t *p = L"abcdef";
const char *x = "xyz";
wprintf(L"%S %s", p, x);

is correct.

The analyzer knows about _CRT_STDIO_ISO_WIDE_SPECIFIERS and takes it into account when doing the analysis.

By the way, if you turn on the compatibility mode with ISO C (the _CRT_STDIO_ISO_WIDE_SPECIFIERS macro is declared), you can get the old behavior, using the specifier of "%Ts" format.

In general the story about the wide - character symbols is quite intricate, and goes beyond the frames of one short article. To investigate the topic more thoroughly, I recommend doing some reading on the topic:

31. In C and C++ arrays are not passed by value

The fragment is taken from the game 'Wolf'. The code contains an error that PVS-Studio analyzer diagnoses in the following way: V511 The sizeof() operator returns size of the pointer, and not of the array, in 'sizeof (src)' expression.

ID_INLINE mat3_t::mat3_t( float src[ 3 ][ 3 ] ) {
  memcpy( mat, src, sizeof( src ) );
}

Explanation

Sometimes programmers forget that in C/C++ you cannot pass an array to a function by value. This is because a pointer to an array is passed as an argument. Numbers in square brackets mean nothing, they only serve as a kind of hint to the programmer, which array size is supposed to be passed. In fact, you can pass an array of a completely different size. For example, the following code will be successfully compiled:

void F(int p[10]) { }
void G()
{
  int p[3];
  F(p);
}

Correspondingly, the sizeof(src) operator evaluates not the array size, but the size of the pointer. As a result, memcpy() will only copy part of the array. Namely, 4 or 8 bytes, depending on the size of the pointer (exotic architectures don't count).

Correct code

The simplest variant of such code can be like this:

ID_INLINE mat3_t::mat3_t( float src[ 3 ][ 3 ] ) {
  memcpy(mat, src, sizeof(float) * 3 * 3);
}

Recommendation

There are several ways of making your code more secure.

The array size is known. You can make the function take the reference to an array. But not everyone knows that you can do this, and even fewer people are aware of how to write it. So I hope that this example will be interesting and useful:

ID_INLINE mat3_t::mat3_t( float (&src)[3][3] )
{
  memcpy( mat, src, sizeof( src ) );
}

Now, it will be possible to pass to the function an array only of the right size. And most importantly, the sizeof() operator will evaluate the size of the array, not a pointer.

Yet another way of solving this problem is to start using std::array class.

The array size is not known. Some authors of books on programming advise to use std::vector class, and other similar classes, but in practice it's not always convenient.

Sometimes you want to work with a simple pointer. In this case you should pass two arguments to the function: a pointer, and the number of elements. However, in general this is bad practice, and it can lead to a lot of bugs.

In such cases, some thoughts given in "C++ Core Guidelines" can be useful to read. I suggest reading "Do not pass an array as a single pointer". All in all it would be a good thing to read the "C++ Core Guidelines" whenever you have free time. It contains a lot of useful ideas.

32. Dangerous printf

The fragment is taken from TortoiseSVN project. The code contains an error that PVS-Studio analyzer diagnoses in the following way: V618 It's dangerous to call the 'printf' function in such a manner, as the line being passed could contain format specification. The example of the safe code: printf("%s", str);

BOOL CPOFile::ParseFile(....)
{
  ....
  printf(File.getloc().name().c_str());
  ....
}

Explanation

When you want to print or, for example, to write a string to the file, many programmers write code that resembles the following:

printf(str);
fprintf(file, str);

A good programmer should always remember that these are extremely unsafe constructions. The thing is, that if a formatting specifier somehow gets inside the string, it will lead to unpredictable consequences.

Let's go back to the original example. If the file name is "file%s%i%s.txt", then the program may crash or print some rubbish. But that's only a half of the trouble. In fact, such a function call is a real vulnerability. One can attack programs with its help. Having prepared strings in a special way, one can print private data stored in the memory.

More information about these vulnerabilities can be found in this article. Take some time to look through it; I'm sure it will be interesting. You'll find not only theoretical basis, but practical examples as well.

Correct code

printf("%s", File.getloc().name().c_str());

Recommendation

Printf()-like functions can cause a lot of security related issues. It is better not to use them at all, but switch to something more modern. For example, you may find boost::format or std::stringstream quite useful.

In general, sloppy usage of the functions printf(), sprintf(), fprintf(), and so on, not only can lead to incorrect work of the program, but cause potential vulnerabilities, that someone can take advantage of.

33. Never dereference null pointers

This bug was found in GIT's source code. The code contains an error that PVS-Studio analyzer diagnoses in the following way: V595 The 'tree' pointer was utilized before it was verified against nullptr. Check lines: 134, 136.

void mark_tree_uninteresting(struct tree *tree)
{
  struct object *obj = &tree->object;
  if (!tree)
    return;
  ....
}

Explanation

There is no doubt that it's bad practice to dereference a null pointer, because the result of such dereferencing is undefined behavior. We all agree about the theoretical basis behind this.

But when it comes to practice, programmers start debating. There are always people who claim that this particular code will work correctly. They even bet their life for it - it has always worked for them! And then I have to give more reasons to prove my point. That's why this article topic is another attempt to change their mind.

I have deliberately chosen such an example that will provoke more discussion. After the tree pointer is dereferenced, the class member isn't just using, but evaluating, the address of this member. Then if (tree == nullptr), the address of the member isn't used in any way, and the function is exited. Many consider this code to be correct.

But it is not so. You shouldn't code in such a way. Undefined behavior is not necessarily a program crash when the value is written at a null address, and things like that. Undefined behavior can be anything. As soon as you have dereferenced a pointer which is equal to null, you get an undefined behavior. There is no point in further discussion about the way the program will operate. It can do whatever it wants.

One of the signs of undefined behavior is that the compiler can totally remove the "if (!tree) return;" - the compiler sees that the pointer has already been dereferenced, so the pointer isn't null and the compiler concludes that the check can be removed. This is just one of a great many scenarios, which can cause the program to crash.

I recommend having a look at the article where everything is explained in more details: http://www.viva64.com/en/b/0306/

Correct code

void mark_tree_uninteresting(struct tree *tree)
{
  if (!tree)
    return;
  struct object *obj = &tree->object;
  ....
}

Recommendation

Beware of undefined behavior, even if it seems as if everything is working fine. There is no need to risk that much. As I have already written, it's hard to imagine how it may show its worth. Just try avoiding undefined behavior, even if it seems like everything works fine.

One may think that he knows exactly how undefined behavior works. And, he may think that this means that he is allowed to do something that others can't, and everything will work. But it is not so. The next section is to underline the fact that undefined behavior is really dangerous.

34. Undefined behavior is closer than you think

This time it's hard to give an example from a real application. Nevertheless, I quite often see suspicious code fragments which can lead to the problems described below. This error is possible when working with large array sizes, so I don't know exactly which project might have arrays of this size. We don't really collect 64-bit errors, so today's example is simply contrived.

Let's have a look at a synthetic code example:

size_t Count = 1024*1024*1024; // 1 Gb
if (is64bit)
  Count *= 5; // 5 Gb
char *array = (char *)malloc(Count);
memset(array, 0, Count);

int index = 0;
for (size_t i = 0; i != Count; i++)
  array[index++] = char(i) | 1;

if (array[Count - 1] == 0)
  printf("The last array element contains 0.\n");

free(array);

Explanation

This code works correctly if you build a 32-bit version of the program; if we compile the 64-bit version, the situation will be more complicated.

A 64-bit program allocates a 5 GB buffer and initially fills it with zeros. The loop then modifies it, filling it with non-zero values: we use "| 1" to ensure this.

And now try to guess how the code will run if it is compiled in x64 mode using Visual Studio 2015? Have you got the answer? If yes, then let's continue.

If you run a debug version of this program, it'll crash because it'll index out of bounds. At some point the index variable will overflow, and its value will become ?2147483648 (INT_MIN).

Sounds logical, right? Nothing of the kind! This is an undefined behavior, and anything can happen.

To get more in-depth information, I suggest the following links:

An interesting thing - when I or somebody else says that this is an example of undefined behavior, people start grumbling. I don't know why, but it feels like they assume that they know absolutely everything about C++, and how compilers work.

But in fact they aren't really aware of it. If they knew, they would't say something like this (group opinion):

This is some theoretical nonsense. Well, yes, formally the 'int' overflow leads to an undefined behavior. But it's nothing more but some jabbering. In practice, we can always tell what we will get. If you add 1 to INT_MAX then we'll have INT_MIN. Maybe somewhere in the universe there are some exotic architectures, but my Visual C++ / GCC compiler gives an incorrect result.

And now without any magic, I will give a demonstration of UB using a simple example, and not on some fairy architecture either, but a Win64-program.

It would be enough to build the example given above in the Release mode and run it. The program will cease crashing, and the warning "the last array element contains 0" won't be issued.

The undefined behavior reveals itself in the following way. The array will be completely filled, in spite of the fact that the index variable of int type isn't wide enough to index all the array elements. Those who still don't believe me, should have a look at the assembly code:

  int index = 0;
  for (size_t i = 0; i != Count; i++)
000000013F6D102D  xor         ecx,ecx
000000013F6D102F  nop
    array[index++] = char(i) | 1;
000000013F6D1030  movzx       edx,cl
000000013F6D1033  or          dl,1
000000013F6D1036  mov         byte ptr [rcx+rbx],dl
000000013F6D1039  inc         rcx
000000013F6D103C  cmp         rcx,rdi
000000013F6D103F  jne         main+30h (013F6D1030h)

Here is the UB! And no exotic compilers were used, it's just VS2015.

If you replace int with unsigned, the undefined behavior will disappear. The array will only be partially filled, and at the end we will have a message - "the last array element contains 0".

Assembly code with the unsigned:

  unsigned index = 0;
000000013F07102D  xor         r9d,r9d
  for (size_t i = 0; i != Count; i++)
000000013F071030  mov         ecx,r9d
000000013F071033  nop         dword ptr [rax]
000000013F071037  nop         word ptr [rax+rax]
    array[index++] = char(i) | 1;
000000013F071040  movzx       r8d,cl
000000013F071044  mov         edx,r9d
000000013F071047  or          r8b,1
000000013F07104B  inc         r9d
000000013F07104E  inc         rcx
000000013F071051  mov         byte ptr [rdx+rbx],r8b
000000013F071055  cmp         rcx,rdi
000000013F071058  jne         main+40h (013F071040h)

Correct code

You must use proper data types for your programs to run properly. If you are going to work with large-size arrays, forget about int and unsigned. So the proper types are ptrdiff_t, intptr_t, size_t, DWORD_PTR, std::vector::size_type and so on. In this case it is size_t:

size_t index = 0;
for (size_t i = 0; i != Count; i++)
  array[index++] = char(i) | 1;

Recommendation

If the C/C++ language rules result in undefined behavior, don't argue with them or try to predict the way they'll behave in the future. Just don't write such dangerous code.

There are a whole lot of stubborn programmers who don't want to see anything suspicious in shifting negative numbers, comparing this with null or signed types overflowing.

Don't be like that. The fact that the program is working now doesn't mean that everything is fine. The way UB will reveal itself is impossible to predict. Expected program behavior is one of the variants of UB.

35. Adding a new constant to enum don't forget to correct switch operators

The fragment is taken from the Appleseed project. The code contains an error that PVS-Studio analyzer diagnoses in the following way: V719 The switch statement does not cover all values of the 'InputFormat' enum: InputFormatEntity.

enum InputFormat
{
    InputFormatScalar,
    InputFormatSpectralReflectance,
    InputFormatSpectralIlluminance,
    InputFormatSpectralReflectanceWithAlpha,
    InputFormatSpectralIlluminanceWithAlpha,
    InputFormatEntity
};

switch (m_format)
{
  case InputFormatScalar:
    ....
  case InputFormatSpectralReflectance:
  case InputFormatSpectralIlluminance:
    ....
  case InputFormatSpectralReflectanceWithAlpha:
  case InputFormatSpectralIlluminanceWithAlpha:
    ....
}

Explanation

Sometimes we need to add a new item to an existing enumeration (enum), and when we do, we also need to proceed with caution - as we will have to check where we have referenced the enum throughout all of our code, e.g., in every switch statement and if chain. A situation like this can be seen in the code given above.

InputFormatEntity was added to the InputFormat - I'm making that assumption based on the fact that the constant has been added to the end. Often, programmers add new constants to the end of enum, but then forget to check their code to make sure that they've dealt with the new constant properly throughout, and corrected the switch operator.

As a result we have a case when "m_format==InputFormatEntity" isn't handled in any way.

Correct code

switch (m_format)
{
  case InputFormatScalar:
  ....
  case InputFormatSpectralReflectance:
  case InputFormatSpectralIlluminance:
  ....
  case InputFormatSpectralReflectanceWithAlpha:
  case InputFormatSpectralIlluminanceWithAlpha:
  ....
  case InputFormatEntity:
  ....
}

Recommendation

Let's think, how can we reduce such errors through code refactoring? The easiest, but not a very effective solution is to add a "default:", that will cause a message to appear, e.g.:

switch (m_format)
{
  case InputFormatScalar:
  ....
  ....
  default:
    assert(false);
    throw "Not all variants are considered"
}

Now if the m_format variable is InputFormatEntity, we'll see an exception. Such an approach has two big faults:

1. As there is the chance that this error won't show up during testing (if during the test runs, m_format is not equal to InputFormatEntity), then this error will make its way into the Release build and would only show up later - during runtime at a customer's site. It's bad if customers have to report such problems!

2. If we consider getting into default as an error, then you have to write a case for all of the enum's possible values. This is very inconvenient, especially if there are a lot of these constants in the enumeration. Sometimes it's very convenient to handle different cases in the default section.

I suggest solving this problem in the following way; I can't say that it's perfect, but at least it's something.

When you define an enum, make sure you also add a special comment. You can also use a keyword and an enumeration name.

Example:

enum InputFormat
{
  InputFormatScalar,
  ....
  InputFormatEntity
  //If you want to add a new constant, find all ENUM:InputFormat.
};

switch (m_format) //ENUM:InputFormat
{
  ....
}

In the code above, when you change the InputFormat enum, you are directed to look for "ENUM:InputFormat" in the source code of the project.

If you are in a team of developers, you would make this convention known to everybody, and also add it to your coding standards and style guide. If somebody fails to follow this rule, it will be very sad.

36. If something strange is happening to your PC, check its memory

I think you got pretty tired looking at numerous error patterns. So this time, let's take a break from looking at code.

A typical situation - your program is not working properly. But you have no idea what's going on. In such situations I recommend not rushing to blame someone, but focus on your code. In 99.99% of cases, the root of the evil is a bug that was brought by someone from your development team. Very often this bug is really stupid and banal. So go ahead and spend some time looking for it!

The fact that the bug occurs from time to time means nothing. You may just have a Heisenbug.

Blaming the compiler would be an even worse idea. It may do something wrong, of course, but very rarely. It will be very awkward if you find out that it was an incorrect use of sizeof(), for example. I have a post about that in my blog: The compiler is to blame for everything

But to set the record straight, I should say that there are exceptions. Very seldom the bug has nothing to do with the code. But we should be aware that such a possibility exists. This will help us to stay sane.

I'll demonstrate this using an example of a case that once happened with me. Fortunately, I have the necessary screenshots.

I was making a simple test project that was intended to demonstrate the abilities of the Viva64 analyzer (the predecessor of PVS-Studio), and this project was refusing to work correctly.

After long and tiresome investigations, I saw that one memory slot is causing all this trouble. One bit, to be exact. You can see on the picture that I am in debug mode, writing the value "3" in this memory cell.

After the memory is changed, the debugger reads the values to display in the window, and shows number 2: See, there is 0x02. Although I've set the "3" value. The low-order bit is always zero.

A memory test program confirmed the problem. It's strange that the computer was working normally without any problems. Replacement of the memory bank finally let my program work correctly.

I was very lucky. I had to deal with a simple test program. And still I spent a lot of time trying to understand what was happening. I was reviewing the assembler listing for more than two hours, trying to find the cause of the strange behavior. Yes, I was blaming the compiler for it.

I can't imagine how much more effort it would take, if it were a real program. Thank God I didn't have to debug anything else at that moment.

Recommendation

Always look for the error in your code. Do not try to shift responsibility.

However, if the bug reoccurs only on your computer for more than a week, it may be a sign that it's not because of your code.

Keep looking for the bug. But before going home, run an overnight RAM test. Perhaps, this simple step will save your nerves.

37. Beware of the 'continue' operator inside do {...} while (...)

Fragment taken from the Haiku project (inheritor of BeOS). The code contains an error that PVS-Studio analyzer diagnoses in the following way: V696 The 'continue' operator will terminate 'do { ... } while (FALSE)' loop because the condition is always false.

do {
  ....
  if (appType.InitCheck() == B_OK&& appType.GetAppHint(&hintRef) == B_OK&& appRef == hintRef)
  {
    appType.SetAppHint(NULL);
    // try again
    continue;
  }
  ....
} while (false);

Explanation

The way continue works inside the do-while loop, is not the way some programmers expect it to. When continue is encountered, there will always be a check of loop termination condition. I'll try to explain this in more details. Suppose the programmer writes code like this:

for (int i = 0; i < n; i++)
{
  if (blabla(i))
    continue;
  foo();
}

Or like this:

while (i < n)
{
  if (blabla(i++))
    continue;
  foo();
}

Most programmers by intuition understand that when continue is encountered, the controlling condition (i < n) will be (re)evaluated, and that the next loop iteration will only start if the evaluation is true. But when a programmer writes code:

do
{
  if (blabla(i++))
    continue;
  foo();
} while (i < n);

the intuition often fails, as they don't see a condition above the continue, and it seems to them that the continue will immediately trigger another loop iteration. This is not the case, and continue does as it always does - causes the controlling condition to be re-evaluated.

It depends on sheer luck if this lack of understanding of continue will lead to an error. However, the error will definitely occur if the loop condition is always false, as it is in the code snippet given above, where the programmer planned to carry out certain actions through subsequent iterations. A comment in the code "//try again" clearly shows their intention to do so. There will of course be no "again", as the condition is always false, and so once continue is encountered, the loop will terminate.

In other words, it turns out that in the construction of this do {...} while (false), the continue is equivalent to using break.

Correct code

There are many options to write correct code. For example, create an infinite loop, and use continue to loop, and break to exit.

for (;;) {
  ....
  if (appType.InitCheck() == B_OK&& appType.GetAppHint(&hintRef) == B_OK&& appRef == hintRef)
  {
    appType.SetAppHint(NULL);
    // try again
    continue;
  }
  ....
  break;
};

Recommendation

Try to avoid continue inside do { ... } while (...). Even if you really know how it all works. The thing is that you could slip and make this error, and/or that your colleagues might read the code incorrectly, and then modify it incorrectly. I will never stop saying it: a good programmer is not the one who knows and uses different language tricks, but the one who writes clear understandable code, that even a newbie can comprehend.

38. Use nullptr instead of NULL from now on

New C++ standards brought quite a lot of useful changes. There are things which I would not rush into using straight away, but there are some changes which need to be applied immediately, as they will bring with them, significant benefits.

One such modernization is the keyword nullptr, which is intended to replace the NULL macro.

Let me remind you that in C++ the definition of NULL is 0, nothing more.

Of course, it may seem that this is just some syntactic sugar. And what's the difference, if we write nullptr or NULL? But there is a difference! Using nullptr helps to avoid a large variety of errors. I'll show this using examples.

Suppose there are two overloaded functions:

void Foo(int x, int y, const char *name);
void Foo(int x, int y, int ResourceID);

A programmer might write the following call:

Foo(1, 2, NULL);

And that same programmer might be sure that he is in fact calling the first function by doing this. It is not so. As NULL is nothing more than 0, and zero is known to have int type, the second function will be called instead of the first.

However, if the programmer had used nullptr no such error would occur and the first function would have been called. Another common enough use of NULL is to write code like this:

if (unknownError)
  throw NULL;

To my mind, it is suspicious to generate an exception passing the pointer. Nevertheless sometimes people do so. Apparently, the developer needed to write the code in this way. However, discussions on whether it is good or bad practice to do so, go beyond the scope of this note.

What is important, is that the programmer decided to generate an exception in the case of an unknown error and "send" a null pointer into the outer world.

In fact it is not a pointer but int. As a result the exception handling will happen in a way that the programmer didn't expect.

"throw nullptr;" code saves us from misfortune, but this does not mean that I believe this code to be totally acceptable.

In some cases, if you use nullptr, the incorrect code will not compile.

Suppose that some WinApi function returns a HRESULT type. The HRESULT type has nothing to do with the pointer. However, it is quite possible to write nonsensical code like this:

if (WinApiFoo(a, b, c) != NULL)

This code will compile, because NULL is 0 and of int type, and HRESULT is a long type. It is quite possible to compare values of int and long type. If you use nullptr, then the following code will not compile:

if (WinApiFoo(a, b, c) != nullptr)

Because of the compiler error, the programmer will notice and fix the code.

I think you get the idea. There are plenty such examples. But these are mostly synthetic examples. And it is always not very convincing. So are there any real examples? Yes, there are. Here is one of them. The only thing - it's not very graceful or short.

This code is taken from the MTASA project.

So, there exists RtlFillMemory(). This can be a real function or a macro. It doesn't matter. It is similar to the memset() function, but the 2nd and 3rd argument switched their places. Here's how this macro can be declared:

#define RtlFillMemory(Destination,Length,Fill) \
  memset((Destination),(Fill),(Length))

There is also FillMemory(), which is nothing more than RtlFillMemory():

#define FillMemory RtlFillMemory

Yes, everything is long and complicated. But at least it is an example of real erroneous code.

And here's the code that uses the FillMemory macro.

LPCTSTR __stdcall GetFaultReason ( EXCEPTION_POINTERS * pExPtrs )
{
  ....
  PIMAGEHLP_SYMBOL pSym = (PIMAGEHLP_SYMBOL)&g_stSymbol ;
  FillMemory ( pSym , NULL , SYM_BUFF_SIZE ) ;
  ....
}

This code fragment has even more bugs. We can clearly see that at least the 2 and 3 arguments are confused here. That's why the analyzer issues 2 warnings V575:

V575 The 'memset' function processes value '512'. Inspect the second argument. crashhandler.cpp 499
V575 The 'memset' function processes '0' elements. Inspect the third argument. crashhandler.cpp 499

The code compiled because NULL is 0. As a result, 0 array elements get filled. But in fact the error is not only about this. NULL is in general not appropriate here. The memset() function works with bytes, so there's no point in trying to make it fill the memory with NULL values. This is absurd. Correct code should look like this:

FillMemory(pSym, SYM_BUFF_SIZE, 0);

Or like this:

ZeroMemory(pSym, SYM_BUFF_SIZE);

But it's not the main point, which is that this meaningless code compiles successfully. However, if the programmer had gotten into the habit of using nullptr instead of NULL and written this instead:

FillMemory(pSym, nullptr, SYM_BUFF_SIZE);

the complier would have emitted a error message, and the programmer would realize that they did something wrong, and would pay more attention to the way they code.

Note. I understand that in this case NULL is not to blame. However, it is because of NULL that the incorrect code compiles without any warnings.

Recommendation

Start using nullptr. Right now. And make necessary changes in the coding standard of your company.

Using nullptr will help to avoid stupid errors, and thus will slightly speed up the development process.

39. Why incorrect code works

This bug was found in Miranda NG's project. The code contains an error that PVS-Studio analyzer diagnoses in the following way: V502 Perhaps the '?:' operator works in a different way than was expected. The '?:' operator has a lower priority than the '|' operator..

#define MF_BYCOMMAND 0x00000000L
void CMenuBar::updateState(const HMENU hMenu) const
{
  ....
  ::CheckMenuItem(hMenu, ID_VIEW_SHOWAVATAR,
    MF_BYCOMMAND | dat->bShowAvatar ? MF_CHECKED : MF_UNCHECKED);
  ....
}

Explanation

We have seen a lot of cases that lead to incorrect working of the program, this time I would like to raise a different thought-provoking topic for discussion. Sometimes we see that totally incorrect code happens, against all odds, to work just fine! Now, for experienced programmers this really comes as no surprise (another story), but for those that have recently started learning C/C++, well, it might be a little baffling. So today, we'll have a look at just such an example.

In the code shown above, we need to call CheckMenuItem() with certain flags set; and, on first glance we see that if bShowAvatar is true, then we need to bitwise OR MF_BYCOMMAND with MF_CHECKED - and conversely, with MF_UNCHECKED if it's false. Simple!

In the code above the programmers have chosen the very natural ternary operator to express this (the operator is a convenient short version of if-then-else):

MF_BYCOMMAND | dat->bShowAvatar ? MF_CHECKED : MF_UNCHECKED

The thing is that the priority of |operator is higher than of ?: operator. (see Operation priorities in C/C++). As a result, there are two errors at once.

The first error is that the condition has changed. It is no longer - as one might read it - "dat->bShowAvatar", but "MF_BYCOMMAND | dat->bShowAvatar".

The second error - only one flag gets chosen - either MF_CHECKED or MF_UNCHECKED. The flag MF_BYCOMMAND is lost.

But despite these errors the code works correctly! Reason - sheer stroke of luck. The programmer was just lucky that the MF_BYCOMMAND flag is equal to 0x00000000L. As the MF_BYCOMMAND flag is equal to 0, then it doesn't affect the code in any way. Probably some experienced programmers have already gotten the idea, but I'll still give some comments in case there are beginners here.

First let's have a look at a correct expression with additional parenthesis:

MF_BYCOMMAND | (dat->bShowAvatar ? MF_CHECKED : MF_UNCHECKED)

Replace macros with numeric values:

0x00000000L | (dat->bShowAvatar ? 0x00000008L : 0x00000000L)

If one of the operator operands | is 0, then we can simplify the expression:

dat->bShowAvatar ? 0x00000008L : 0x00000000L

Now let's have a closer look at an incorrect code variant:

MF_BYCOMMAND | dat->bShowAvatar ? MF_CHECKED : MF_UNCHECKED

Replace macros with numeric values:

0x00000000L | dat->bShowAvatar ? 0x00000008L : 0x00000000L

In the subexpression "0x00000000L | dat->bShowAvatar" one of the operator operands | is 0. Let's simplify the expression:

dat->bShowAvatar ? 0x00000008L : 0x00000000L

As a result we have the same expression, this is why the erroneous code works correctly; another programming miracle has occurred.

Correct code

There are various ways to correct the code. One of them is to add parentheses, another - to add an intermediate variable. A good old if operator could also be of help here:

if (dat->bShowAvatar)
  ::CheckMenuItem(hMenu, ID_VIEW_SHOWAVATAR,
                  MF_BYCOMMAND | MF_CHECKED);
else
  ::CheckMenuItem(hMenu, ID_VIEW_SHOWAVATAR,
                  MF_BYCOMMAND | MF_UNCHECKED);

I really don't insist on using this exact way to correct the code. It might be easier to read it, but it's slightly lengthy, so it's more a matter of preferences.

Recommendation

My recommendation is simple - try to avoid complex expressions, especially with ternary operators. Also don't forget about parentheses.

As it was stated before in chapter N4, the ?: is very dangerous. Sometimes it just slips your mind that it has a very low priority and it's easy to write an incorrect expression. People tend to use it when they want to clog up a string, so try not to do that.

40. Start using static code analysis

It is strange to read such big pieces of text, written by a developer of a static code analyzer, and not to hear recommendations about the usage of it. So here it is.

Fragment taken from the Haiku project (inheritor of BeOS). The code contains an error that PVS-Studio analyzer diagnoses in the following way: V501 There are identical sub-expressions to the left and to the right of the '<' operator: lJack->m_jackType < lJack->m_jackType

int compareTypeAndID(....)
{
  ....
  if (lJack && rJack)
  {
    if (lJack->m_jackType < lJack->m_jackType)
    {
      return -1;
    }
    ....
}

Explanation

It's just a usual typo. Instead of rJack it was accidentally written lJack in the right part of the expression.

This typo is a simple one indeed, but the situation is quite complicated. The thing is that the programming style, or other methods, are of no help here. People just make mistakes while typing and there is nothing you can do about it.

It's important to emphasize that it's not a problem of some particular people or projects. No doubt, all people can be mistaken, and even professionals involved in serious projects can be. Here is the proof of my words. You can see the simplest misprints like A == A, in such projects as: Notepad++, WinMerge, Chromium, Qt, Clang, OpenCV, TortoiseSVN, LibreOffice, CoreCLR, Unreal Engine 4 and so on.

So the problem is really there and it's not about students' lab works. When somebody tells me that experienced programmers don't make such mistakes, I usually send them this link.

Correct code

if (lJack->m_jackType < rJack->m_jackType)

Recommendation

First of all, let's speak about some useless tips.

Be careful while programming, and don't let errors sneak into your code (Nice words, but nothing more)
Use a good coding style (There isn't s a programming style which can help to avoid errors in the variable name)

What can really be effective?

Code review
Unit tests (TDD)
Static code analysis

I should say right away, that every strategy has its strong and weak sides. That's why the best way to get the most efficient and reliable, code is to use all of them together.

Code reviews can help us to find a great deal of different errors, and on top of this, they help us to improve readability of the code. Unfortunately shared reading of the text is quite expensive, tiresome and doesn't give a full validity guarantee. It's quite hard to remain alert, and find a typo looking at this kind of code:

qreal l = (orig->x1 - orig->x2)*(orig->x1 - orig->x2) +
          (orig->y1 - orig->y2)*(orig->y1 - orig->y1) *
          (orig->x3 - orig->x4)*(orig->x3 - orig->x4) +
          (orig->y3 - orig->y4)*(orig->y3 - orig->y4);

Theoretically, unit tests can save us. But it's only in theory. In practice, it's unreal to check all the possible execution paths; besides that, a test itself can have some errors too :)

Static code analyzers are mere programs, and not artificial intelligence. An analyzer can skip some errors and, on the contrary, display an error message for code which in actuality, is correct. But despite all these faults, it is a really useful tool. It can detect a whole lot of errors at an early stage.

A static code analyzer can be used as a cheaper version of Code Review. The program examines the code instead of a programmer doing it, and suggests checking certain code fragments more thoroughly.

Of course I would recommend using PVS-Studio code analyzer, which we are developing. But it's not the only one in the world; there are plenty of other free and paid tools to use. For example you can start with having a look at a free open Cppcheck analyzer. A good number of tools is given on Wikipedia: List of tools for static code analysis.

Attention:

A static analyzer can hurt your brain if not used correctly. One of the typical mistakes is to "get the maximum from the check mode options, and drown in the stream of warnings messages". That's one of many recommendations I could give, so to get a bigger list, could be useful to go to A, B.
A static analyzer should be used on a regular basis, not just from time to time, or when everything gets really bad. Some explanations: C, D.

Really, try using static code analyzers, you'll like them. It's a very nice sanitary tool.

Finally I would recommend reading an article by John Carmack: Static Code Analysis.

41. Avoid adding a new library to the project

Suppose you need to implement an X functionality in your project. Theorists of software development will say that you have to take the already existing library Y, and use it to implement the things you need. In fact, it is a classic approach in software development - reusing your own or others' previously created libraries (third-party libraries). And most programmers use this way.

However, those theorists in various articles and books, forget to mention what hell it will become to support several dozen third-party libraries in about 10 years.

I strongly recommend avoiding adding a new library to a project. Please don't get me wrong. I am not saying that you shouldn't use libraries at all, and write everything yourself. This would be insufficient, of course. But sometimes a new library is added to the project at the whim of some developer, intending to add a little cool small "feature" to the project. It's not hard to add a new library to the project, but then the whole team will have to carry the load of its support for many years.

Tracking the evolution of several large projects, I have seen quite a lot of problems caused by a large number of third-party libraries. I will probably enumerate only some of the issues, but this list should already provoke some thoughts:

Adding new libraries promptly increases the project size. In our era of fast Internet and large SSD drives, this is not a big problem, of course. But, it's rather unpleasant when the download time from the version control system turns into 10 minutes instead of 1.
Even if you use just 1% of the library capabilities, it is usually included in the project as a whole. As a result, if the libraries are used in the form of compiled modules (for example, DLL), the distribution size grows very fast. If you use the library as source code, then the compile time significantly increases.
Infrastructure connected with the compilation of the project becomes more complicated. Some libraries require additional components. A simple example: we need Python for building. As a result, in some time you'll need to have a lot of additional programs to build a project. So the probability that something will fail increases. It's hard to explain, you need to experience it. In big projects something fails all the time, and you have to put a lot of effort into making everything work and compile.
If you care about vulnerabilities, you must regularly update third-party libraries. It would be of interest to violators, to study the code libraries to search for vulnerabilities. Firstly, many libraries are open-source, and secondly, having found a weak point in one of the libraries, you can get a master exploit to many applications where the library is used.
One the libraries may suddenly change the license type. Firstly, you have to keep that in mind, and track the changes. Secondly, it's unclear what to do if that happens. For example, once, a very widely used library softfloat moved to BSD from a personal agreement.
You will have troubles upgrading to a new version of the compiler. There will definitely be a few libraries that won't be ready to adapt for a new compiler, you'll have to wait, or make your own corrections in the library.
You will have problems when moving to a different compiler. For example, you are using Visual C++, and want to use Intel C++. There will surely be a couple of libraries where something is wrong.
You will have problems moving to a different platform. Not necessarily even a totally different platform. Let's say, you'll decide to port a Win32 application to Win64. You will have the same problems. Most likely, several libraries won't be ready for this, and you'll wonder what to do with them. It is especially unpleasant when the library is lying dormant somewhere, and is no longer developing.
Sooner or later, if you use lots of C libraries, where the types aren't stored in namespace, you'll start having name clashes. This causes compilation errors, or hidden errors. For example, a wrong enum constant can be used instead of the one you've intended to use.
If your project uses a lot of libraries, adding another one won't seem harmful. We can draw an analogy with the broken windows theory. But consequently, the growth of the project turns into uncontrolled chaos.
And there could be a lot of other downsides in adding new libraries, which I'm probably not aware of. But in any case, additional libraries increase the complexity of project support. Some issues can occur in a fragment where they were least expected to.

Again, I should emphasize; I don't say that we should stop using third-party libraries at all. If we have to work with images in PNG format in the program, we'll take the LibPNG library, and not reinvent the wheel.

But even working with PNG we need to stop and think. Do we really need a library? What do we want to do with the images? If the task is just to save an image in *.png file, you can get by with system functions. For example, if you have a Windows application, you could use WIC. And if you're already using an MFC library, there is no need to make the code more sophisticated, because there's a CImage class (see the discussion on StackOverflow). Minus one library - great!

Let me give you an example from my own practice. In the process of developing the PVS-Studio analyzer, we needed to use simple regular expressions in a couple of diagnostics. In general, I am convinced that static analysis isn't the right place for regular expressions. This is an extremely inefficient approach. I even wrote an article regarding this topic. But sometimes you just need to find something in a string with the help of a regular expression.

It was possible to add existing libraries, but it was clear that all of them would be redundant. At the same time we still needed regular expressions, and we had to come up with something.

Absolutely coincidentally, exactly at that moment I was reading a book "Beautiful Code" (ISBN 9780596510046). This book is about simple and elegant solutions. And there I came across an extremely simple implementation of regular expressions. Just a few dozen strings. And that's it!

I decided to use that implementation in PVS-Studio. And you know what? The abilities of this implementation are still enough for us; complex regular expressions are just not necessary for us.

Conclusion: Instead of adding a new library, we spent half an hour writing a needed functionality. We suppressed the desire to use one more library. And it turned out to be a great decision; the time showed that we really didn't need that library. And I am not talking about several months, we have happily used it for more than five years.

This case really convinced me that the simpler solution, the better. By avoiding adding new libraries (if possible), you make your project simpler.

Readers may be interested to know what the code for searching regular expressions was. We'll type it here from the book. See how graceful it is. This code was slightly changed when integrating to PVS-Studio, but its main idea remains unchanged. So, the code from the book:

 // regular expression format
// c Matches any "c" letter
//.(dot) Matches any (singular) symbol
//^ Matches the beginning of the input string
//$ Matches the end of the input string
# Match the appearance of the preceding character zero or
// several times

int matchhere(char *regexp, char *text);
int matchstar(int c, char *regexp, char *text);

// match: search for regular expression anywhere in text
int match(char *regexp, char *text)
{
  if (regexp[0] == '^')
    return matchhere(regexp+1, text);
  do { /* must look even if string is empty */
   if (matchhere(regexp, text))
     return 1;
  } while (*text++ != '\0');
  return 0;
}

// matchhere: search for regexp at beginning of text
int matchhere(char *regexp, char *text)
{
   if (regexp[0] == '\0')
     return 1;
   if (regexp[1] == '*')
     return matchstar(regexp[0], regexp+2, text);

   if (regexp[0] == '$'&& regexp[1] == '\0')
     return *text == '\0';
   if (*text!='\0'&& (regexp[0]=='.' || regexp[0]==*text))
     return matchhere(regexp+1, text+1);
   return 0;
}

// matchstar: search for c*regexp at beginning of text
int matchstar(int c, char *regexp, char *text)
{
  do {   /* * a * matches zero or more instances */
            more instances */
    if (matchhere(regexp, text))
      return 1;
  } while (*text != '\0'&& (*text++ == c || c == '.'));
  return 0;
}

Yes, this version is extremely simple, but for several years there was need to use more complex solutions. It really has got limited functionality, but there was no need to add anything more complicated, and I don't think there will be. This is a good example of where a simple solution turned out to be better than a complex one.

Recommendation

Don't hurry to add new libraries to the project; add one only when there is no other way to manage without a library.

Here are the possible workarounds:

Have a look if the API of your system, or one of the already used libraries has a required functionality. It's a good idea to investigate this question.
If you plan to use a small piece of functionality from the library, then it makes sense to implement it yourself. The argument to add a library "just in case" is no good. Almost certainly, this library won't be used much in the future. Programmers sometimes want to have universality that is actually not needed.
If there are several libraries to resolve your task, choose the simplest one, which meets your needs. As I have stated before, get rid of the idea "it's a cool library - let's take it just in case"
Before adding a new library, sit back and think. Maybe even take a break, get some coffee, discuss it with your colleagues. Perhaps you'll realsie that you can solve the problem in a completely different way, without using third-party libraries.

P.S. The things I speak about here may not be completely acceptable to everyone. For example, the fact that I'm recommending the use of WinAPI, instead of a universal portable library. There may arise objections based on the idea that going this way "binds" this project to one operating system. And then it will be very difficult to make a program portable. But I do not agree with this. Quite often the idea "and then we'll port it to a different operating system" exists only in the programmer's mind. Such a task may even be unnecessary for managers. Another option - the project will kick the bucket due to the complexity and universality of it before gaining popularity and having the necessity to port. Also don't forget about point (8) in the list of problems, given above.

42. Don't use function names with "empty"

The fragment is taken from WinMerge project. The code contains an error that PVS-Studio analyzer diagnoses in the following way: V530 The return value of function 'empty' is required to be utilized.

void CDirView::GetItemFileNames(
  int sel, String& strLeft, String& strRight) const
{
  UINT_PTR diffpos = GetItemKey(sel);
  if (diffpos == (UINT_PTR)SPECIAL_ITEM_POS)
  {
    strLeft.empty();
    strRight.empty();
  }
  ....
}

Explanation

A programmer wanted to clean the strLeft and strRight strings. They have String type, which is nothing else than std::wstring.

For this purpose he called the empty() function. And this is not correct. The empty() function doesn't change the object, but returns the information if the string is empty or not.

Correct code

To correct this error you should replace the empty() function with clear() or erase (). WinMerge developers preferred erase() and now the code looks like this:

if (diffpos == (UINT_PTR)SPECIAL_ITEM_POS)
{
  strLeft.erase();
  strRight.erase();
}

Recommendation

In this case the name "empty()" is really inappropriate. The thing is that in different libraries, this function can mean two different actions.

In some libraries the emply() function clears the object. In other ones, it returns the information if the object is empty or not.

I would say that the word "empty" is lame in general, because everybody understands it differently. Some think it's an "action", others that it's "information inquiry". That's the reason for the mess we can see.

There is just one way out. Do not use "empty" in the class names.

Name the function for cleaning as "erase" or "clear". I would rather use "erase", because "clear" can be quite ambiguous.
Choose another name for the function which gets information, "isEmpty" for instance.

If you for some reason think that it's not a big deal, then have a look here. It's quite a widespread error pattern. Of course it's slightly late to change such classes as std::string, but at least let's try not to spread the evil any longer.

Conclusion

I hope you enjoyed this collection of tips. Of course, it is impossible to write about all the ways to write a program incorrectly, and there is probably no point in doing this. My aim was to warn a programmer, and to develop a sense of danger. Perhaps, next time when a programmer encounters something odd, he will remember my tips and won't haste. Sometimes several minutes of studying the documentation or writing simple/clear code can help to avoid a hidden error that would make the life of your colleagues and users miserable for several years.

I also invite everybody to follow me on Twitter @Code_Analysis

Bugless coding!

Sincerely, Andrey Karpov.

↧

Intel® Software Guard Extensions Tutorial Series: Part 8, GUI Integration

February 23, 2017, 4:39 pm

Latest and popular articles on Intel Technologies

≫ Next: [How to fix] MS VS2015 text editor is blank after upgrading Intel® parallel Studio XE

≪ Previous: The Ultimate Question of Programming, Refactoring, and Everything

Download ZIP [573 KB]

Download PDF [919.5 KB]

Part 8 of the Intel® Software Guard Extensions (Intel® SGX) tutorial series integrates the graphical user interface (GUI) with the back end. In this part, we’ll examine the implications of mixing managed code with enclaves and the potential for undermining the security that is gained from Intel SGX. We’ll scope out the risks to our secrets and then develop mitigation strategies to limit their exposure.

You can find a list of all of the published tutorials in the article Introducing the Intel® Software Guard Extensions Tutorial Series.

Source code is provided with this installment of the series.

The User Interface

When we developed our application requirements back in Part 2 of the tutorial series, we made a decision to use the Microsoft* .NET Framework* to develop the user interface. That decision was based on the need to limit the complexity of the application development as well as limit dependencies on external software libraries.

The main window for the UI is shown in Figure 1. It is based on the Windows Presentation Foundation* (WPF), and makes use of standard WPF controls.

Figure 1.The Tutorial Password Manager user interface (click to enlarge).

This decision came at a cost, however. .NET applications execute under the supervision of the Common Language Runtime, and the CLR includes a memory manager with garbage collection.

The Problem

In a traditional application—one built on a native code base—the developer is responsible for allocating memory as it’s needed, and must keep track of the addresses of allocated memory so that it can be freed when it’s no longer needed. Failing to free memory that has been allocated leads to memory leaks, which in turn causes the application to consume more and more memory over time. In a high-performance application, the developer may even need to consider how the variables and objects in their application are arranged in memory, because some hardware instructions are sensitive to byte boundaries, and some algorithms depend on efficient use of the CPU cache.

In a managed application, however, the developer is abstracted from these details. The developer doesn’t allocate memory directly when an object is created, and that memory doesn’t have to be explicitly freed by the developer when it’s no longer needed. The runtime keeps track of the number of references to an object, and when the reference count drops to zero it is flagged for cleanup at a later time by a subsystem known as the garbage collector. The garbage collector has several jobs, but its primary ones are freeing up unused memory and compacting allocated memory into contiguous blocks for efficient usage.

Memory managed applications have a number of advantages, and the biggest of these is that they eliminate entire categories of software bugs. Unfortunately, they also have a very big disadvantage when it comes to application security: the developer is no longer in control over the low-level memory management, which means there is no way to guarantee that sensitive areas of memory can be securely erased or overwritten. The garbage collector can, at any time during a program’s execution, manipulate all the memory under its control. It can move objects around, make copies of them, delay writes to memory until they are needed, and even eliminate writes that it deems unnecessary. For a security application that needs to guarantee that a region of memory is erased or overwritten, memory managed environments are bad news.

How Big is the Problem, Really?

We can scope the extent of the problem by looking at the String class in .NET. Nearly every UI widget that displays text output to the screen in .NET does so through a String, and we can use that to assess just how big of a security problem we are creating if we put sensitive information into a UI control. To do that, we’ll write a simple Windows Forms* application that merely updates a text box, empties it, and then resets it.

Figure 2 shows the test application window, and Figure 3 shows the relevant code that modifies the string in the TextBox control.

Figure 2. The String Demo application window.

private void button2_Click(object sender, EventArgs e)
{
    textBox1.Text = "furniture";
    textBox1.Text = "compound";
    textBox1.Text += "-complex";
    textBox1.Clear();
    textBox1.Text = "Click 'Run' to go again";
}

Figure 3. String operations on the TextBox control.

We’ll compile this application in Release mode, execute it, and then attach to it using the Windows Debugger*, WinDbg, which is part of the Debugging Tools for Windows* from Microsoft. One of the features of WinDbg is that you can load other DLLs to extend its functionality, and the SOS.DLL from the .NET Framework exposes a command to dump classes of a specified type from the memory heap (for more information on this topic, see the article .NET Debugging: Dump All Strings from a Managed Code Process at CodeProject*).

When we dump the String objects in the heap right at the start of the application, before clicking the “Run” button, we see all the Strings currently in use by the application. There are a lot of Strings on the heap, but the most relevant ones are highlighted in Figure 4.

Figure 4.Strings in the String Demo application. Highlighted strings correspond to the user interface controls.

Here we can see the names of our Windows Form widgets, and the starting text for the TextBox widget, “Click ‘Run’ to start”. If we scroll down toward the bottom, we see “String Demo”, the name of our main form, repeated a few dozen times as shown in Figure 5.

Figure 5. Duplicates of the same string in the String Demo application.

Now, we resume the program and click the “Run” button. This cycles through the operations shown in Figure 3, which returns almost instantly. Returning to WinDbg, we dump the String objects once again, and get the output shown in Figure 6.

Figure 6.Strings in the String Demo application, post execution.

Note that this String dump is from a single execution of the button2_Click()event handler, and the debugger wasn’t attached until a full minute after the execution had completed. This means that, one minute after running through a command sequence in which the execution:

overwrote the textbox String with the word “furniture”
immediately overwrote it again, this time with the word “compound”
appended the text “-complex”
cleared the textbox by calling the Clear() method
reset the text to the phrase “Click ‘Run’ to restart”

the memory heap for the application contained multipleStrings with the contents “furniture”, “compound”, and “compound-complex”. Clearly, any controls that depend on the String class are going to be leaking our secrets, and doing so to multiple places in memory!

Mitigation and Solution

Fortunately, most managed code frameworks provide classes that are specifically designed to store sensitive data. These usually work by placing their contents in unmanaged memory, outside of the control of the garbage collector. In the .NET Framework this capability is provided by the SecureString class, but we are hamstrung by having only one WPF control that can work with the SecureString class out of the box, the PasswordBox.

This may not seem like a significant issue since our application’s secrets include passwords, and the PasswordBox control is designed specifically for password entry. For most applications this is sufficient, but our application is a password manager, and it must be able to reveal the password to the user on demand. In the Universal Windows Platform* runtime, the PasswordBox class provides a property named PasswordRevealMode, which can be set to either “Hidden”, “Visible”, or “Peek” (this last value reveals the password while the user is pressing the “reveal” button on the control.

Figure 7.The Universal Windows Platform version of the PasswordBox control supports a “peek” capability that the WPF control in .NET does not support.

The .NET Framework version of the PasswordBox control does not have this property, however, which means that the PasswordBox control is effectively write-only (from the user’s perspective). PasswordBox is fine for entering a password, but it’s not a solution for revealing a password when the user asks to view it. We need to create our own solution.

Requirements and Constraints

In Part 3 of the series, we identified two secrets that exit the enclave in clear text so that they can be displayed via the user interface:

The login information for an account (a name for the account, a login or user name, and URL associated with the account).
The account password.

While the login information for a user’s account is sensitive data, accidental exposure does not carry tremendous risk. The entire application could be re-written to make use of native windows, but that would be a huge investment for arguably little gain. A future release of the .NET Framework may contain the functionality that we need in the PasswordBox control, so we can accept the risk today and look for the problem to be fixed for us in the future. If later we decide the risk is too great, we could always revisit the decision and implement a native solution in a later release.

For account passwords, however, the day of reckoning has arrived. There is no scenario where it is appropriate to expose a password to the String class in .NET. As a password manager, we need the ability to show passwords to the user, but we must not use the .NET Framework to do it.

The Solution

We must use a native window when revealing a password to the end user. With a native window we are placing the password in unmanaged memory, and because it’s unmanaged memory we can guarantee that the password is securely erased by making a call to SecureZeroMemory(). The problem with a native window, though, is the same one that drove our design to .NET and WPF in the first place: the default tools for building native windows either force you to build the entire window and messaging handlers from the ground up using the Win32 API, or rely on the large and complex Microsoft Foundation Classes*, which is extremely heavy-handed for our simple application.

To solve this problem we’re going to take a shortcut. Though the Win32 API does not provide convenience methods for building generic windows, it does provide one function for displaying a very simple, modestly configurable dialog box that meets our needs: MessageBox(). The MessageBox() function lets you specify a window title, some caption text, and a button configuration from a number of preset options. It takes care of the window formatting and the event handlers, and it returns an integer value that tells us which button the user clicked.

Figure 8 shows the dialog box created by our password manager application using this approach. To implement it, we added a method called accounts_view_password() to the PasswordManagerCoreNative class. The code listing is given in Figure 9.

Figure 8.The "view password" window generated by accounts_view_password().

int PasswordManagerCoreNative::accounts_view_password(UINT32 idx, HWND hWnd)
{
	int rv = NL_STATUS_UNKNOWN;
	LPWCH wpass;
	UINT16 wlen;
	static const wchar_t empty[] = L"(no password stored)";

	// First get the password

	rv = this->accounts_get_password(idx, &wpass, &wlen);
	if (rv != NL_STATUS_OK) return rv;

	// Display our MessageBox

	MessageBox(hWnd, (wlen>1) ? wpass : empty, L"Password", MB_OK);

	this->accounts_release_password(wpass, wlen);

	return rv;
}

Figure 9.The accounts_view_password() method in PasswordManagerCoreNative.

The first argument is the index of the account in our vault whose password we want to view. Internally, it makes a call to accounts_get_password() to fetch that password from the vault, and then it calls MessageBox() to display it (with logic to display a canned message if the password string is empty) with a single OK button. Then we free up the password and return.

The second argument to this function is the handle of the owner window. In native code, that handle is a type HWND, but our owner window originates from managed code. Getting this handle is a two-step procedure. PasswordManagerCore can marshal an IntPtr to a native pointer that accounts_get_password() can use, and this is shown in Figure 10, but the IntPtr that refers to the owning WPF window has to come from the UI layer of the application.

int PasswordManagerCore::accounts_view_password(UInt32 idx, IntPtr hptr)
{
	int rv;
	UINT32 index = idx;
	HWND hWnd = static_cast<HWND>(hptr.ToPointer());

	rv = _nlink->accounts_view_password(index, hWnd);
	if (rv != NL_STATUS_OK) return rv;

	return rv;
}

Figure 10.The accounts_view_password() method in PasswordManagerCore.

Figure 11 shows the code from EditDialog.xaml.cs. Here, we get a managed handle to the current window and use the WindowInteropHelper class from WPF to produce an IntPtr.

        private void btnView_Click(object sender, RoutedEventArgs e)
        {
            WindowInteropHelper wih;
            Window wnd;
            IntPtr hptr;
            int rv = 0;

            wnd = Window.GetWindow(this);
            wih = new WindowInteropHelper(wnd);
            hptr = wih.Handle;

            rv = mgr.accounts_view_password(pNewItem.index, hptr);
            if (rv != PasswordManagerStatus.OK)
            {
                MessageBox.Show("Couldn't view password: " + MainWindow.returnErrorcode(rv));
                return;
            }
        }

Figure 11.The event handler for viewing a password.

Viewing the password isn’t the only place where this is needed. Our Tutorial Password Manager application also provides the user with the option to randomly generate a password, as shown in Figure 12. It stands to reason that the user should be able to review that password before accepting it.

Figure 12.The interface for setting an account's password.

We follow the same approach of using MessageBox() to show the password, only with a slightly different twist. Instead of presenting a dialog box with a single OK button, we prompt the user, asking them if they accept the randomly generated password that is shown. Their options are:

Yes. Accepts and saves the password.
No. Generates a new password using the same character requirements, and then reprompts.
Cancel. Cancels the operation entirely, and does not change the password.

This dialog box is shown in Figure 13.

Figure 13.Accepting or rejecting a randomly generated password.

The code behind this dialog box, which implements the accounts_generate_and_view_password() method, is shown in Figure 14. The approach is very similar to accounts_view_password(), except there is additional logic to handle the button options in the dialog box.

int PasswordManagerCoreNative::accounts_generate_and_view_password(UINT16 length, UINT16 flags, LPWSTR *wpass, UINT16 *wlen, HWND hWnd)
{
	int rv;
	char *cpass;
	int dresult;

	cpass = new char[length + 1];
	if (cpass == NULL) return NL_STATUS_ALLOC;

	if (supports_sgx()) rv = ew_accounts_generate_password(length, flags, cpass);
	else rv = vault.accounts_generate_password(length, flags, cpass);
	cpass[length] = NULL;

	if (rv != NL_STATUS_OK) return rv;

	*wpass = towchar(cpass, length + 1, wlen);

	SecureZeroMemory(cpass, length);
	delete[] cpass;

	if (*wpass == NULL) return NL_STATUS_ALLOC;

	// Show the message box

	dresult= MessageBox(hWnd, *wpass, L"Accept this password?", MB_YESNOCANCEL);
	if (dresult == IDNO) return NL_STATUS_AGAIN;
	else if (dresult == IDCANCEL) return NL_STATUS_USER_CANCEL;

	return NL_STATUS_OK;
}

Figure 14.Code listing for accounts_generate_and_view_password() in PasswordManagerCoreNative.

The dialog result of IDNO is mapped to NL_STATUS_AGAIN, which is the “retry” option. A result of IDCANCEL is mapped to NL_STATUS_USER_CANCEL.

Because the wpass pointer is passed in from the parent method in PasswordManagerCore, we implement the retry loop there instead of in the native method. This is slightly less efficient, but it keeps the overall program architecture consistent. The code for accounts_generate_and_view_password() in PasswordManagerCoreis shown in Figure 15.

int PasswordManagerCore::generate_and_view_password(UInt16 mlength, UInt16 mflags, SecureString ^%password, IntPtr hptr)
{
	int rv = NL_STATUS_AGAIN;
	LPWSTR wpass;
	UINT16 wlen;
	UINT16 length = mlength;
	UINT16 flags = mflags;
	HWND hWnd = static_cast<HWND>(hptr.ToPointer());

	if (!length) return NL_STATUS_INVALID;

	// Loop until they accept the randomly generated password, cancel, or an error occurs.

	while (rv == NL_STATUS_AGAIN) {
		rv = _nlink->accounts_generate_and_view_password(length, flags, &wpass, &wlen, hWnd);
		// Each loop through here allocates a new pointer.
		if ( rv == NL_STATUS_AGAIN ) _nlink->accounts_release_password(wpass, wlen);
	}

	if (rv != NL_STATUS_OK) return rv;

	// They accepted this password, so assign it and return.

	password->Clear();
	for (int i = 0; i < length; ++i) {
		password->AppendChar(wpass[i]);
	}

	_nlink->accounts_release_password(wpass, wlen);

	return rv;
}

Figure 15.Code listing for accounts_generate_and_view_password() in PasswordManagerCore.

Summary

Mixing enclaves with managed code is a tricky, and potentially risky, business. As a general rule, secrets should not cross the enclave boundary unencrypted, but there are applications such as our Tutorial Password Manager where this cannot be avoided. In these cases, you must take proper care to not undermine the security that is provided by Intel SGX. Exposing secrets to unprotected memory can be a necessary evil, but placing them in managed memory can have disasterous consequences.

The takeaway from this part of the series should be this: Intel SGX does not eliminate the need for secure coding practices. What it gives you is a tool (and a very powerful one) for keeping your application’s secrets away from malicious software, but that tool, alone, does not and cannot build a secure application.

Sample Code

The code sample for this part of the series builds against the Intel SGX SDK version 1.7 using Microsoft Visual Studio* 2015.

Coming Up Next

In Part 9 of the series, we’ll add support for power events. Stay tuned!

↧

[How to fix] MS VS2015 text editor is blank after upgrading Intel® parallel Studio XE

March 6, 2017, 8:04 pm

Latest and popular articles on Intel Technologies

≫ Next: Enable billions of colors with 10bit HEVC

≪ Previous: Intel® Software Guard Extensions Tutorial Series: Part 8, GUI Integration

Some customer may have a issue that the Visual Studio* (VS) text editor is blank and could not display content of code file after upgrading Intel® Parallel Studio XE (IPS). Here's the article talking about this problem and providing proper solution.

1. Issue Description

After upgrading Intel® IPS XE for windows which is integrated into Visual Studio*, the VS text editor is blank and could not display any content of code file including .c, .cpp, .h etc. But still could view content of xml and txt file.

2. To find Reason

There would be several reason for this problem, use below way to check the main reason:

a. Open any existed code file, click and drag the mouse in the window of editor to select a range of text, type ctrl-C to copy it, then paste into Notepad; If the content can be seen, follow solution a to fix.
b. Create a new code file, type into editor, save the file and use Notepad to see if there's some change. If so, please follow the solution b.
c. If above two ways are failed to test, please follow solution c.

3. Solution

a. It means the color of text is same as background color, please change front color to separate content and background.
b. It means the VS theme would be the main reason, try to change another theme.
c. It may caused by Visual Studio cache, you could try to delete all files under this folder (for instance, if you are using VS2015):
C:\Users\<user name>\AppData\Local\Microsoft\VisualStudio\14.0\ComponentModelCache

Normally this kind of problems are not caused by Intel Parallel Studio XE.

↧

Enable billions of colors with 10bit HEVC

March 23, 2017, 8:09 pm

Latest and popular articles on Intel Technologies

≫ Next: Benefits of Intel® Optimized Caffe* in comparison with BVLC Caffe*

≪ Previous: [How to fix] MS VS2015 text editor is blank after upgrading Intel® parallel Studio XE

Introduction

Human eyes are capable of seeing many more colors than those shown by video displays currently on the market. Up to now, displays were limited in the number of colors they produce as well as computer systems those can represent only a finite number of colors. This article is focused on providing an overview of using 10-bit color depth compared to 8-bit color depth with capabilities built on 7^th generation Intel® Core™ processors and optimized by Intel® Software Tools. An example of HEVC 10-bit encoding can also be found in the attached code sample.

In order to understand additional details about 8-bit vs. 10-bit colors, a concept called ‘color depth’ is outlined as follows.

Color depth

Color depth is also known as bit depth, and is the number of bits used to display the color of a single pixel. The same images or video frames with different color depth look differently because number of colors in each pixel varies depending on color depth value.

The number of bits for an image refers to the amount of bits per channel for each type of color in each pixel. The number of color channels in a pixel depend on the color space used. For example, the available color channels of RGBA color space are Red (R), Green (G), Blue (B) and Alpha (A). Each additional bit doubles the amount of information we can store for each color. In an 8-bit image, the total number of colors available per pixel is 256. Table 1 shows the possible number of colors available for each respective color depth.

Table 1: Possible number of tones per each color depth
channel depth	Tones per channel per pixel	Total number of possible tones
8-bit	256	16.78 million
10-bit	1024	1.07 billion
12-bit	4096	68.68 billion

As most computer and TV monitors on the market are still capable of showing only up to 8-bit content, on which the 10-bit content is displayed by lowering the bit depth. However, the actual advantages of 10-bit can be exploited in the following most common scenarios.

When processing an image or a video after recording
In High Dynamic Range (HDR) imaging and display systems

If a content is shot in 10-bit, there is a large margin of safety to not lose information when applying the required changes. Otherwise, image processing with lower precision could result in loss of sharpness, contrast and other valuable information. If loss of information occurred due to changes applied to 8-bit content, this could leave fewer bits per pixel and cause a color banding effect. Color banding concept is explained with an example below.

Color banding

When an image sensor captures an image and is unable to distinguish the minimal difference between adjacent colors, a problem of inaccurate color representation occurs. As a result, the image is translated into a single visual pixel color value due to the lack of adjacent color availability. This pattern results in an image which has bands of color instead of smooth calibration of colors. Color banding (Figure 1) occurs when an image is captured without enough detail, but the same image is supposed to look differently in the real world.

Available possible solutions to avoid color banding are

Increasing the number of bits per channel
Color quantization (not covered in this article)

An uncalibrated display can also show banding-like artifacts. In such scenarios, one can try monitor calibration tools or the Intel® Graphics Control Panel applet.

*Figure 1: Comparing 8-bit (left side) vs. 10-bit image (on the right) shows the color banding effect*

Figure 1 shows the difference between an 8-bit and 10-bit image with respect to the color banding issue. The image on the left was captured with an 8-bit sensor and image on the right was captured with a 10-bit sensor. In the left image, the required detail was not captured and fewer bits implicate fewer number of colors causing the color banding effect. Whereas in right image, the same frame was captured with enough detail and transition between, so the adjacent colors are smooth. To offer smoother color transition between the adjacent pixels, the current color gamut is not sufficient and it needs to be widened. A wider color gamut is introduced in the standard BT.2020, which is briefly introduced below.

BT. 2020 standard

7^th generation Intel Xeon and Core processors support the BT. 2020 (also known as Rec. 2020) standard in use-cases such as 4K Ultra-high definition (UHD) content creation/consumption and HDR with 10-bit enablement and more. UHD monitors have 3840*2160 pixels among several screen sizes. Displays supporting the BT.2020 standard are able to provide enhanced viewing experiences at these high resolutions.

*Figure2: BT.2020 vs BT.709 color space comparison*

The International Telecommunications Union (ITU) recommendation of the BT.2020 represents a much larger range of colors than previously used in BT.709. The comparison between the respective color spaces is shown in Figure 2 (which follows), represents the CIE 1931 color space chromacity diagram. The X and Y axis show the chromacity coordinates with the wavelength of the respective color space shown in blue font. The triangle outlined in yellow shows the color space covered by the BT. 709 standard, which has finite color information to represent pixels on large displays such as HDTV. The black triangle shows the BT. 2020 color space in which the smoother transition between adjacent colors is highly possible as more colors are available. BT. 2020 also defines various aspects of UHD TV such as display resolution, frame rate, Chroma subsampling and bit depth in addition to the color space.

7^th generation Intel processors support the HEVC Main 10 profile, VP9 Profile 2 and High Dynamic Range (HDR) video rendering by exploiting BT.2020 standard.

HEVC Main 10 profile

High Efficiency Video Coding (HEVC), also known as H.265 is a video compression standard, a successor to the widely successful H.264/AVC standard. The HEVC standard is capable of enabling more sophisticated compression algorithms relative to its predecessors. See also Learn about the Significance of HEVC (H.265) Codec for more information. The Main 10 profile allows for a color depth of 8-bits to 10-bits per sample with 4:2:0 chroma sampling.

HEVC 10b decode support is available starting from 6^th generation Intel® processors. The command below shows how sample_decode in the Intel Media SDK Code Samples can be used to achieve raw frames from a HEVC elementary stream.

sample_decode.exe h265 -p010 -i input.h265 -o raw_farmes.yuv -hw

The input (input.h265) used in the above decode session can be downloaded by visiting Free H.265/HEVC bitstreams (the exact file name is mentioned at the end of this article). The output (raw_frames.yuv) from the above decode session needs to be in P010 format, which can be used as the input to sample_encode operation as explained in the following paragraph.

HEVC 10b hardware acceleration for both decoder and encoder with HEVC/H.265 Main 10 Profile is supported in 7^th generation Intel processors. The HEVC 10-bit encode capability was verified using the attached ‘modified_sample_encode' code, which was exclusively modified to support this particular feature. This sample works with Intel® Media SDK 2016 R2. Related build instructions are available in Media Samples Guide in Intel® Media SDK Code Samples.

Below is an example to achieve HEVC 10-bit encoding using the sample_encode from the attached 'modified_sample_encode'.

sample_encode.exe h265 -i raw_frames.yuv -o output.265 -w 3840 -h 2160 -p010 -hw

Figure 3 is a screenshot of Video Quality Caliper tool, which verifies that the encoded stream has 10 bits per pixel (bpp), which denotes that each pixel contains or 1024 number of colors.

*Figure 3 Snapshot of Video Quality Caliper showing the encoded file is 10 bpp*

sample_encode supports classic P010 YUV only, which has 10-data bits in Least Significant Bit position. This is contrast to FFMPEG P010 format, which has 10-data bits in Most Significant Bit position.

VP9 Profile 2

VP9 is a video coding format developed by Google as a successor to VP8. 7^th generation Intel® platforms support VP9 10bit hardware accelerated decode, whereas encode solution is software/CPU-supported.

HDR

Dynamic range is the ratio between the whitest whites and blackest blacks in an image. High Dynamic Range video interprets better dynamic range than conventional Standard Dynamic Range (SDR) video, which uses a non-linear operation to encode and decode luminance values in video systems.

HDR video content is supported using either the HEVC Main 10 or VP9.2 codec, which includes full hardware decode support starting with 7^th generation Intel processors. To transmit HDR content, the system needs to be equipped with either DP 1.4 or HDMI 2.0a port. This feature is currently tested with pre-released OS but not yet available with public releases. Enabling HDR and its support will be provided in upcoming articles.

Conclusion

As discussed, developers have the opportunity to deliver amazing, real-life video, and to innovate their content with more brilliant colors with 10-bit support for the growing market of UHD/HDR-ready devices. With media applications running on 7^th generation Intel® processors and optimized by Intel® Media Server Studio or the Intel® Media SDK, developers can deliver video at the BT.2020 standard for 10-bit 4K UHD and even higher resolutions and frame rates with smoother color transition. Going forward, 10-bit content and seamless viewing experiences will be available in more dimensions than described in this article, as many optimized multimedia applications run on multiple types of Intel® processor-based platforms.

The following tools (along with downloadable links) were used to explain the 10-bit supported features in this article:

Software- Intel® Media SDK 2016 R2
Input bitstream - MHD_2013_2160p_ShowReel_R_9000f_24fps_RMN_QP23_10b.hevc from Free H.265/HEVC bitstreams
Codec - H.265/HEVC
Analysis tool - Video Quality Caliper (VQC), a component in Intel® Media Server Studio Professional Edition and Intel® Video Pro Analyzer
System used -
- CPU: Intel® Core™ i7-7500U CPU @ 2.70GHz
- OS: Microsoft Windows 10 Professional 64-bit
- Graphics Devices: Intel® HD Graphics 620

See also Deep Color Support of Intel® Graphics for Intel hardware and graphics driver support for 10-bit/12-bit.

References:

Deep Color Support of Intel® Graphics
VP9 Video Codec
Recommendation ITU-R BT.2020-2
10-bit and 16-bit YUV Video Formats
High Efficiency Video Coding (HEVC) Algorithms and Architectures by Vivienne Sze, Madhukar Budagavi, Gary J. Sullivan
Deliver High Quality, High Performance HEVC via Intel® Media Server Studio
Free H.265/HEVC bitstreams

↧

Benefits of Intel® Optimized Caffe* in comparison with BVLC Caffe*

April 9, 2017, 12:33 am

Latest and popular articles on Intel Technologies

≫ Next: Use Intel SGX Templates for the GNU* Autoconf* Build System

≪ Previous: Enable billions of colors with 10bit HEVC

Overview

This article introduces Berkeley Vision and Learning Center (BVLC) Caffe* and a custom version of Caffe*, Intel® Optimized Caffe*. We explain why and how Intel® Optimized Caffe* performs efficiently on Intel® Architecture via Intel® VTune™ Amplifier and the time profiling option of Caffe* itself.

Introduction to BVLC Caffe* and Intel® Optimized Caffe*

Caffe* is a well-known and widely used machine vision based Deep Learning framework developed by the Berkeley Vision and Learning Center (BVLC). It is an open-source framework and is evolving currently. It allows users to control a variety options such as libraries for BLAS, CPU or GPU focused computation, CUDA, OpenCV, MATLAB and Python before you build Caffe* through 'Makefile.config'. You can easily change the options in the configuration file and BVLC provides intuitive instructions on their project web page for developers.

Intel® Optimized Caffe* is Intel distributed customized Caffe* version for Intel Architectures. Intel® Optimized Caffe* offers all the goodness of main Caffe* with the addition of Intel Architectures optimized functionality and multi-node distributor training and scoring. Intel® Optimized Caffe* makes it possible to more efficiently utilize CPU resources.

To see in detail how Intel® Optimized Caffe* has changed in order to optimize itself to Intel Architectures, please refer this page : https://software.intel.com/en-us/articles/caffe-optimized-for-intel-architecture-applying-modern-code-techniques

In this article, we will first profile the performance of BVLC Caffe* with Cifar 10 example and then will profile the performance of Intel® Optimized Caffe* with the same example. Performance profile will be conducted through two different methods.

Tested platform : Xeon Phi™ 7210 ( 1.3Ghz, 64 Cores ) with 96GB RAM, CentOS 7.2

1. Caffe* provides its own timing option for example :

./build/tools/caffe time \
    --model=examples/cifar10/cifar10_full_sigmoid_train_test_bn.prototxt \
    -iterations 1000

2. Intel® VTune™ Amplifier : Intel® VTune™ Amplifier is a powerful profiling tool that provides advanced CPU profiling features with a modern analysis interface. https://software.intel.com/en-us/intel-vtune-amplifier-xe

How to Install BVLC Caffe*

Please refer the BVLC Caffe project web page for installation : http://caffe.berkeleyvision.org/installation.html

If you have Intel® MKL installed on your system, it is better using MKL as BLAS library.

In your Makefile.config , choose BLAS := mkl and specify MKL address. ( The default set is BLAS := atlas )

In our test, we kept all configurations as they are specified as default except the CPU only option.

Test example

In this article, we will use 'Cifar 10' example included in Caffe* package as default.

You can refer BVLC Caffe project page for detail information about this exmaple : http://caffe.berkeleyvision.org/gathered/examples/cifar10.html

You can simply run the training example of Cifar 10 as the following :

cd $CAFFE_ROOT
./data/cifar10/get_cifar10.sh
./examples/cifar10/create_cifar10.sh
./examples/cifar10/train_full_sigmoid_bn.sh

First, we will try the Caffe's own benchmark method to obtain its performance results as the following:

./build/tools/caffe time \
    --model=examples/cifar10/cifar10_full_sigmoid_train_test_bn.prototxt \
    -iterations 1000

as results, we got the layer-by-layer forward and backward propagation time. The command above measure the time each forward and backward pass over a batch f images. At the end it shows the average execution time per iteration for 1,000 iterations per layer and for the entire calculation.

This test was run on Xeon Phi™ 7210 ( 1.3Ghz, 64 Cores ) with 96GB RAM of DDR4 installed with CentOS 7.2.

The numbers in the above results will be compared later with the results of Intel® Optimized Caffe*.

Before that, let's take a look at the VTune™ results also to observe the behave of Caffe* in detail.

VTune Profiling

Intel® VTune™ Amplifier is a modern processor performance profiler that is capable of analyzing top hotspots quickly and helping tuning your target application. You can find the details of Intel® VTune™ Amplifier from the following link :

Intel® VTune™ Amplifier : https://software.intel.com/en-us/intel-vtune-amplifier-xe

We used Intel® VTune™ Amplifier in this article to find the function with the highest total CPU utilization time. Also, how OpenMP threads are working.

VTune result analysis

What we can see here is some functions listed on the left side of the screen which are taking the most of the CPU time. They are called 'hotspots' and can be the target functions for performance optimization.

In this case, we will focus on 'caffe::im2col_cpu<float>' function as a optimization candidate.

'im2col_cpu<float>' is one of the steps in performing direct convolution as a GEMM operation for using highly optimized BLAS libraries. This function took the largest CPU resource in our test of training Cifar 10 model using BVLC Caffe*.

Let's take a look at the threads behaviors of this function. In VTune™, you can choose a function and filter other workloads out to observe only the workloads of the specified function.

On the above result, we can see the CPI ( Cycles Per Instruction ) of the fuction is 0.907 and the function utilizes only one single thread for the entire calculation.

One more intuitive data provided by VTune is here.

This 'CPU Usage Histogram' provides the data of the numbers of CPUs that were running simultaneously. The number of CPUs the training process utilized appears to be about 25. The platform has 64 physical core with Intel® Hyper-Threading Technology so it has 256 CPUs. The CPU usage histogram here might imply that the process is not efficiently threaded.

However, we cannot just determine that these results are 'bad' because we did not set any performance standard or desired performance to classify. We will compare these results with the results of Intel® Optimized Caffe* later.

Let's move on to Intel® Optimized Caffe* now.

How to Install Intel® Optimized Caffe*

The basic procedure of installation of Intel® Optimized Caffe* is the same as BVLC Caffe*.

When clone Intel® Optimized Caffe* from Git, you can use this alternative :

git clone https://github.com/intel/caffe

Additionally, it is required to install Intel® MKL to bring out the best performance of Intel® Optimized Caffe*.

Please download and install Intel® MKL. Intel offers MKL for free without technical support or for a license fee to get one-on-one private support. The default BLAS library of Intel® Optimized Caffe* is set to MKL.

Intel® MKL : https://software.intel.com/en-us/intel-mkl

After downloading Intel® Optimized Caffe* and installing MKL, in your Makefile.config, make sure you choose MKL as your BLAS library and point MKL include and lib folder for BLAS_INCLUDE and BLAS_LIB

BLAS :=mkl

BLAS_INCLUDE := /opt/intel/mkl/include
BLAS_LIB := /opt/intel/mkl/lib/intel64

If you encounter 'libstdc++' related error during the compilation of Intel® Optimized Caffe*, please install 'libstdc++-static'. For example :

sudo yum install libstdc++-static

Optimization factors and tunes

Before we run and test the performance of examples, there are some options we need to change or adjust to optimize performance.

Use 'mkl' as BLAS library : Specify 'BLAS := mkl' in Makefile.config and configure the location of your MKL's include and lib location also.

Set CPU utilization limit :

echo "100" | sudo tee /sys/devices/system/cpu/intel_pstate/min_perf_pct
echo "0" | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo

Put 'engine:"MKL2017"' at the top of your train_val.prototxt or solver.prototxt file or use this option with caffe tool : -engine "MKL2017"
Current implementation uses OpenMP threads. By default the number of OpenMP threads is set to the number of CPU cores. Each one thread is bound to a single core to achieve best performance results. It is however possible to use own configuration by providing right one through OpenMP environmental variables like KMP_AFFINITY, OMP_NUM_THREADS or GOMP_CPU_AFFINITY. For the example run below , 'OMP_NUM_THREADS = 64' has been used.
Intel® Optimized Caffe* has edited many parts of original BVLC Caffe* code to achieve better code parallelization with OpenMP*. Depending on other processes running on the background, it is often useful to adjust the number of threads getting utilized by OpenMP*. For Intel Xeon Phi™ product family single-node we recommend to use OMP_NUM_THREADS = numer_of_cores-2.
Please also refer here : Intel Recommendation to Achieve the best performance

If you observe too much overhead because of too frequent movement of thread by OS, you can try to adjust OpenMP* affinity environment variable :

KMP_AFFINITY=compact,granularity=fine

Test example

For Intel® Optimized Caffe* we run the same example to compare the results with the previous results.

cd $CAFFE_ROOT
./data/cifar10/get_cifar10.sh
./examples/cifar10/create_cifar10.sh

./build/tools/caffe time \
    --model=examples/cifar10/cifar10_full_sigmoid_train_test_bn.prototxt \
    -iterations 1000

Comparison

The results with the above example is the following :

Again , the platform used for the test is : Xeon Phi™ 7210 ( 1.3Ghz, 64 Cores ) with 96GB RAM, CentOS 7.2

first, let's look at the BVLC Caffe*'s and Intel® Optimized Caffe* together,

-->

to make it easy to compare, please see the table below. The duration each layer took in milliseconds has been listed, and on the 5th column we stated how many times Intel® Optimized Caffe* is faster than BVLC Caffe* at each layer. You can observe significant performance improvements except for bn layers relatively. Bn stands for "Batch Normalization" which requires fairly simple calculations with small optimization potential. Bn forward layers show better results and Bn backward layers show 2~3% slower results than the original. Worse performance can occur here in result of threading overhead. Overall in total, Intel® Optimized Caffe* achieved about 28 times faster performance in this case.

	Direction	BVLC (ms)	Intel (ms)	Performance Benefit (x)
conv1	Forward	40.2966	1.65063	24.413
conv1	Backward	54.5911	2.24787	24.286
pool1	Forward	162.288	1.97146	82.319
pool1	Backward	21.7133	0.459767	47.227
bn1	Forward	1.60717	0.812487	1.978
bn1	Backward	1.22236	1.24449	0.982
Sigmoid1	Forward	132.515	2.24764	58.957
Sigmoid1	Backward	17.9085	0.262797	68.146
conv2	Forward	125.811	3.8915	32.330
conv2	Backward	239.459	8.45695	28.315
bn2	Forward	1.58582	0.854936	1.855
bn2	Backward	1.2253	1.25895	0.973
Sigmoid2	Forward	132.443	2.2247	59.533
Sigmoid2	Backward	17.9186	0.234701	76.347
pool2	Forward	17.2868	0.38456	44.952
pool2	Backward	27.0168	0.661755	40.826
conv3	Forward	40.6405	1.74722	23.260
conv3	Backward	79.0186	4.95822	15.937
bn3	Forward	0.918853	0.779927	1.178
bn3	Backward	1.18006	1.18185	0.998
Sigmoid3	Forward	66.2918	1.1543	57.430
Sigmoid3	Backward	8.98023	0.121766	73.750
pool3	Forward	12.5598	0.220369	56.994
pool3	Backward	17.3557	0.333837	51.989
ipl	Forward	0.301847	0.186466	1.619
ipl	Backward	0.301837	0.184209	1.639
loss	Forward	0.802242	0.641221	1.251
loss	Backward	0.013722	0.013825	0.993
Ave.	Forward	735.534	21.6799	33.927
Ave.	Backward	488.049	21.7214	22.469
Ave.	Forward-Backward	1223.86	43.636	28.047
Total		1223860	43636	28.047

Some of many reasons this optimization was possible are :

Code vectorization for SIMD
Finding hotspot functions and reducing function complexity and the amount of calculations
CPU / system specific optimizations
Reducing thread movements
Efficient OpenMP* utilization

Additionally, let's compare the VTune results of this example between BVLC Caffe and Intel® Optimized Caffe*.

Simply we will looking at how efficiently im2col_cpu function has been utilized.

BVLC Caffe*'s im2col_cpu function had CPI at 0.907 and was single threaded.

In case of Intel® Optimized Caffe* , im2col_cpu has its CPI at 2.747 and is multi threaded by OMP Workers.

The reason why CPI rate increased here is vectorization which brings higher CPI rate because of longer latency for each instruction and multi-threading which can introduce spinning while waitning for other threads to finish their jobs. However, in this example, benefits from vectorization and multi-threading exceed the latency and overhead and bring performance improvements after all.

VTune suggests that CPI rate close to 2.0 is theoretically ideal and for our case, we achieved about the right CPI for the function. The training workload for the Cifar 10 example is to handle 32 x 32 pixel images for each iteration so when those workloads split down to many threads, each of them can be a very small task which may cause transition overhead for multi-threading. With larger images we would see lower spining time and smaller CPI rate.

CPU Usage Histogram for the whole process also shows better threading results in this case.

Useful links

BVLC Caffe* Project : http://caffe.berkeleyvision.org/

BVLC Caffe* Git : https://github.com/BVLC/caffe

Intel® Optimized Caffe* Introduction : https://software.intel.com/en-us/videos/what-is-intel-optimized-caffe

Intel® Optimized Caffe* Git : https://github.com/intel/caffe

Intel® Optimized Caffe* Recommendations for the best performance : https://github.com/intel/caffe/wiki/Recommendations-to-achieve-best-performance

Intel® Optimized Caffe* Modern Code Techniques : https://software.intel.com/en-us/articles/caffe-optimized-for-intel-architecture-applying-modern-code-techniques

Summary

Intel® Optimized Caffe* is a customized Caffe* version for Intel Architectures with modern code techniques.

In Intel® Optimized Caffe*, Intel leverages optimization tools and Intel® performance libraries, perform scalar and serial optimizations, implements vectorization and parallelization.

↧

Use Intel SGX Templates for the GNU* Autoconf* Build System

May 9, 2017, 9:24 am

Latest and popular articles on Intel Technologies

≫ Next: CSharp Application with Intel Software Guard Extension

≪ Previous: Benefits of Intel® Optimized Caffe* in comparison with BVLC Caffe*

GNU* Autoconf* is a popular build system that sees extensive use for Linux* source code packages. It produces a consistent, easy-to-use, and well-understood configuration script that allows end users and systems integrators to tailor software packages for their installation environments, almost always without any manual intervention. To create a configure script, the software developer creates a template file consisting of a series of macros that define the software package configuration needs, and then processes it with the Autoconf utility. GNU Autoconf provides convenient automation and standardization for common, and often tedious, tasks such as building Makefiles and configurable header files.

One of the key features of the Autoconf system is that it is extensible. Software developers can create macros that expand its functionality in order to support customized build and configuration needs. In this article, we introduce a set of macros and Makefile templates that do exactly this: Extend the functionality of Autoconf to simplify the process of building software that makes use of Intel® Software Guard Extensions (Intel® SGX). The templates themselves, along with a sample application source tree that makes use of them, are provided as a download.

Overview

The Intel SGX templates for the GNU Autoconf package contain four files:

README
aclocal.m4
sgx-app.mk.in
sgx-enclave.mk.in

README

The README file has detailed information on the Autoconf macros and Makefile rules and variables that make up the templates. It is a reference document, while this article functions more as a “how to” guide.

aclocal.m4

This is where the macros for extending Autoconf are defined. This file can be used as-is, appended to an existing aclocal.m4, or renamed for integration with GNU Automake*.

sgx-app.mk.in

This file builds to “sgx-app.mk” and contains Makefile rules and definitions for building Intel SGX applications. It is intended to be included (via an “include” directive) from the Makefile(s) that produce an executable object that includes one or more Intel SGX enclaves.

sgx-enclave.mk.in

This file builds to “sgx-enclave.mk” and contains Makefile rules and definitions for building Intel SGX enclaves. It must be included (via an “include” directive) from Makefiles that produce an Intel SGX enclave object (*.signed.so file in Linux).

Because this file contains build targets, you should place the include directive after the default build target in the enclave’s Makefile.in.

Creating configure.ac

Start by including the macro SGX_INIT in your configure.ac. This macro is required in order to set up the build system for Intel SGX, and it does the following:

Adds several options to the final configure script that let the user control aspects of the build.
Attempts to discover the location of the Intel SGX SDK.
Creates sgx_app.mk from sgx_app.mk.in.

SGX_INIT also defines a number of Makefile substitution variables. The ones most likely to be needed by external Makefiles are:

`enclave_libdir`	Installation path for enclave libraries/objects. Defaults to $EPREFIX/lib.
`SGX_URTS_LIB`	The untrusted runtime library name. When the project is built in simulation mode it automatically includes the _sim suffix.
`SGX_UAE_SERVICE_LIB`	The untrusted AE service library name. When the project is built in simulation mode it automatically includes the _sim suffix.
`SGXSDK`	The location of the Intel® SGX SDK.
`SGXSDK_BINDIR`	The directory containing Intel SGX SDK utilities.
`SGXSDK_INCDIR`	The location of Intel SGX SDK header files.
`SGXSDK_LIBDIR`	The directory containing the Intel SGX SDK libraries needed during linking.

The SGX_INIT macro does not take any arguments.

AC_INIT(sgxautosample, 1.0, john.p.mechalas@intel.com)

AC_PROG_CC()
AC_PROG_CXX()
AC_PROG_INSTALL()

AC_CONFIG_HEADERS([config.h])

SGX_INIT()

AC_CONFIG_FILES([Makefile])

AC_OUTPUT()

Next, define the enclaves. Each enclave is expected to have a unique name, and should be located in a subdirectory that is named after it. Specify the enclaves using the SGX_ADD_ENCLAVES macro. It takes one or two arguments:

(required) The list of enclave names.
(optional) The parent directory where the enclave subdirectories can be found. This defaults to “.”, the current working directory, if omitted.

Note that you can invoke this macro multiple times if your project has multiple enclaves and they do not share a common parent directory. Enclave names should not include spaces or slashes.

AC_INIT(sgxautosample, 1.0, john.p.mechalas@intel.com)

AC_PROG_CC()
AC_PROG_CXX()
AC_PROG_INSTALL()

AC_CONFIG_HEADERS([config.h])

SGX_INIT()

# Add enclave named “EnclaveHash” in the EnclaveHash/ directory
SGX_ADD_ENCLAVES([EnclaveHash])

AC_CONFIG_FILES([Makefile])

AC_OUTPUT()

In addition to defining the enclaves, this macro does the following:

Builds sgx_enclave.mk from sgx_enclave.mk.in.
Builds the Makefiles in each enclave subdirectory from their respective Makefile.in sources.

Enclave Makefiles

Each enclave’s Makefile needs to include the global sgx_enclave.mk rules file in order to inherit the rules, targets, and variables that automate enclave builds. Each Enclave must abide by the following rules:

The enclave must be in its own subdirectory.
The name of the subdirectory must match the name of the enclave (for example, an enclave named EnclaveCrypto must be placed in a subdirectory named EnclaveCrypto).
The EDL file for the enclave must also match the enclave name (for example, EnclaveCrypto.edl).
The Makefile must define the name of the enclave in a variable named ENCLAVE (for example, ENCLAVE=EnclaveCrypto).

The sgx_enclave.mk file defines a number of variables for you to use in the enclave’s Makefile:

`ENCLAVE_CLEAN`	A list of files that should be removed during 'make clean'.
`ENCLAVE_CPPFLAGS`	C preprocessor flags.
`ENCLAVE_CXXFLAGS`	C++ compiler flags necessary for building an enclave.
`ENCLAVE_DISTCLEAN`	A list of files that should be removed during 'make distclean'.
`ENCLAVE_LDFLAGS`	Linker flags for generating the enclave .so.
`ENCLAVE_TOBJ`	The trusted object file `$(ENCLAVE)_t.o` that is auto-generated by the sgx_edger8r tool. Include this in your enclave link line and the enclave build dependencies.

Here’s the Makefile.in for the enclave in the sample application included with the templates:

CC=@CC@
CFLAGS=@CFLAGS@
CPPFLAGS=@CPPFLAGS@
LDFLAGS=@LDFLAGS@

INSTALL=@INSTALL@
prefix=@prefix@
exec_prefix=@exec_prefix@
bindir=@bindir@
libdir=@libdir@
enclave_libdir=@enclave_libdir@

ENCLAVE=EnclaveHash

OBJS=$(ENCLAVE).o

%.o: %.c
        $(CC) $(CPPFLAGS) $(ENCLAVE_CPPFLAGS) $(CFLAGS) $(ENCLAVE_CFLAGS) -c $<

all: $(ENCLAVE).so

install: all
        $(INSTALL) -d $(enclave_libdir)
        $(INSTALL) -t $(enclave_libdir) $(ENCLAVE_SIGNED)

include ../sgx_enclave.mk

$(ENCLAVE).so: $(ENCLAVE_TOBJ) $(OBJS)
        $(CC) $(CFLAGS) -o $@ $(ENCLAVE_TOBJ) $(OBJS) $(LDFLAGS) $(ENCLAVE_LDFLAGS)

clean:
        rm -f $(OBJS) $(ENCLAVE_CLEAN)

distclean: clean
        rm -f Makefile $(ENCLAVE_DISTCLEAN)

Application Makefiles

Application components that reference enclaves need to include sgx_app.mk in their Makefile. It defines a number of rules, targets, and variables to assist with the build.

To get a list of all the enclaves in the project, the Makefile must define a list variable from the @SGX_ENCLAVES@ substitution variable that is set by Autoconf:

SGX_ENCLAVES:=@SGX_ENCLAVES@

This should be included as a build target as well, to ensure that all enclaves are built along with the application.

all: enclavetest $(SGX_ENCLAVES)

The variables most likely to be needed by the application’s Makefile are:

`ENCLAVE_CLEAN`	A list of files that should be removed during 'make clean'.
`ENCLAVE_UOBJS`	The untrusted object files `$(ENCLAVE)_u.o` that are auto-generated by the sgx_edger8r tool. Include these in your application link line and the enclave build dependencies.
`ENCLAVE_UDEPS`	The untrusted source and header files that are auto-generated by the sgx_edger8r tool. Include these in your compilation dependencies when building your application.

Here’s the Makefile for the sample application that is bundled with the templates:

SGX_ENCLAVES:=@SGX_ENCLAVES@

CC=@CC@
CFLAGS=@CFLAGS@ -fno-builtin-memsetqq
CPPFLAGS=@CPPFLAGS@
LDFLAGS=@LDFLAGS@ -L$(SGXSDK_LIBDIR)
LIBS=@LIBS@

INSTALL=@INSTALL@
prefix=@prefix@
exec_prefix=@exec_prefix@
bindir=@bindir@
libdir=@libdir@
enclave_libdir=@enclave_libdir@

APP_OBJS=main.o

%.o: %.c
        $(CC) -c $(CPPFLAGS) $(CFLAGS) -I$(SGXSDK_INCDIR) $<

all: enclavetest $(SGX_ENCLAVES)

install: install-program install-enclaves

install-program: all
        $(INSTALL) -d $(bindir)
        $(INSTALL) -t $(bindir) enclavetest

install-enclaves:
        for dir in $(SGX_ENCLAVES); do \
                $(MAKE) -C $$dir install; \
        done

include sgx_app.mk

enclavetest: $(ENCLAVE_UOBJS) $(APP_OBJS)
        $(CC) -o $@ $(LDFLAGS) $(APP_OBJS) $(ENCLAVE_UOBJS) $(LIBS) -l$(SGX_URTS_LIB)

clean: clean_enclaves
        rm -f enclavetest $(APP_OBJS) $(ENCLAVE_CLEAN)

distclean: clean distclean_enclaves
        rm -rf Makefile config.log config.status config.h autom4te.cache
        rm -rf sgx_app.mk sgx_enclave.mk

Note that the link line for the application references the sgx_urts library via the Makefile variable $(SGX_URTS_LIB). This is to support builds made in simulation mode: The variable will automatically append the _sim suffix to the library names so that the Makefile doesn’t have to define multiple build targets. Always use the variables $(SGX_URTS_LIB) and $(SGX_UAE_SERVICE_LIB) in your Makefile instead of the actual library names.

Running the Configure Script

When the configure.ac file is processed by Autoconf, the resulting configure script will have some additional command-line options. These are added by the SGX_INIT macro:

--enable-sgx-simulation

Build the project in simulation mode. This is for running and testing Intel SGX applications on hardware that does not support Intel SGX instructions.

--with-enclave-libdir-path=path

Specify where enclave libraries should be installed, and set the enclave_libdir substitution variable in Makefiles. The default is $EPREFIX/lib.

--with-sgx-build=debug|prerelease|release

Specify whether to build the Intel SGX application in debug, prerelease, or release mode. The default is to build in debug mode.

See the Intel SGX SDK for information on the various build modes. Note that you cannot mix release or prerelease modes with the --enable-sgx-simulation option.

--with-sgxsdk=path

Specify the Intel SGX SDK installation directory. This overrides the auto-detection procedure.

Summary and Future Work

These templates simplify the process of integrating the GNU build system with Intel SGX projects. They eliminate tedious, redundant coding, relieve the developer of the burden of remembering and entering the numerous libraries and compiler and linker flags needed to build Intel SGX enclaves, and automate the execution of supporting tools such as sgx_edger8r and sgx_sign.

While this automation and integration is valuable, there is still a non-trivial amount of effort required to set up the project environment. Further automation might be possible through the use of GNU Automake, which is designed to generate the Makefile templates that are in turn processed by Autoconf.

The build environment for Intel SGX applications can be complicated. Integration with build systems such as GNU Autoconfig, and potentially Automake, can save the developer considerable time and make their projects less prone to errors.

↧

CSharp Application with Intel Software Guard Extension

December 13, 2016, 2:52 am

Latest and popular articles on Intel Technologies

≫ Next: Intel® VTune™ Amplifier Disk I/O analysis with Intel® Optane Memory

≪ Previous: Use Intel SGX Templates for the GNU* Autoconf* Build System

C# Application with Intel Software Guard Extension

Enclaves must be 100 percent native code and the enclave bridge functions must be 100 percent native code with C (and not C++) linkages, it is possible, indirectly, to make an ECALL into an enclave from .NET and to make an OCALL from an enclave into a .NET object.

Mixing Managed Code and Native Code with C++/CLI

Microsoft Visual Studio* 2005 and later offers three options for calling unmanaged code from managed code:

Platform Invocation Services, commonly referred to by developers as P/Invoke:
- P/Invoke is good for calling simple C functions in a DLL, which makes it a reasonable choice for interfacing with enclaves, but writing P/Invoke wrappers and marshaling data can be difficult and error-prone.
COM:
- COM is more flexible than P/Invoke, but it is also more complicated; that additional complexity is unnecessary for interfacing with the C bridge functions required by enclaves
C++/CLI:
- C++/CLI offers significant convenience by allowing the developer to mix managed and unmanaged code in the same module, creating a mixed-mode assembly which can in turn be linked to modules comprised entirely of either managed or native code.
- Data marshaling in C++/CLI is also fairly easy: for simple data types it is done automatically through direct assignment, and helper methods are provided for more complex types such as arrays and strings.
- Data marshaling is, in fact, so painless in C++/CLI that developers often refer to the programming model as IJW (an acronym for “it just works”).
- The trade-off for this convenience is that there can be a small performance penalty due to the extra layer of functions, and it does require that you produce an additional DLL when interfacing with Intel SGX enclaves.

Please find the detailed information in the PDF and also i have shared sample code.

↧

Intel® VTune™ Amplifier Disk I/O analysis with Intel® Optane Memory

June 7, 2017, 2:54 am

Latest and popular articles on Intel Technologies

≫ Next: Intel® Parallel Studio XE 2017 Update 4 integration to Microsoft* Visual Studio 2017 fails

≪ Previous: CSharp Application with Intel Software Guard Extension

This article will talk about Intel® VTune™ Amplifier I/O Analysis with Intel® Optane Memory. Several benchmark tools like crystaldisk, IOmeter, System Mark or PC Mark etc. are used to evaluate system I/O efficiency with usually a score number. For some power users, PC-gaming geeks might be satisfied with those numbers served for performance validation purpose. How about the further technical-depth information like slow I/O activities identification, detailed I/O queue depth visualization in timeline, I/O function APIs callstacks and even the correlation with other system metrics to give further debug or profiling information for a software developer? Software Developers need the clues to understand how I/O efficient his program performs. VTune tries to provide such insights with its new feature, Disk I/O Analysis Type.

A bit about I/O Performance metrics

First of all, there are some basics you might need to know; I/O Queue Depth, Read/Write Latency, I/O Bandwidth, they are the I/O metrics used to track I/O efficiency. I/O Queue Depth means how many I/O commands wait in a queue to be served. This queue depth (size) depends on application, driver, OS implementation or the definition of host controller interface’s spec., like AHCI or NVMe and etc.. Comparing to ACHI with a single queue design, NVMe has multiple queues design supports parallel operations.

Imagine that a software program issues multiple I/O requests pass through the framework, software libraries, VM, container, runtimes, OS’s I/O scheduler, driver to the host controller of I/O device. These requests can be temporarily delayed in any of these components due to different queue implementation and other reasons. Observing the change of system’s queue depth can help understand how busy system I/O utilization is and overall I/O access patterns. From OS perspective, high queue depth represents a state that system is working to consume pending I/O requests. Zero queue depth means I/O scheduler is idle. From Storage device perspective, high queue depth design shows the storage media or controller has the confidence to serve a bulk of I/O requests in a higher speed comparing to lower queue depth design. Read/Write Latency shows how quick storage device finishes or response I/O request. Its inverse also represents IOPS (I/O per second). As for I/O Bandwidth, it will be tightened to the capability offered by different host controller interfaces. For example, SATA 3.0 can achieve 600MB/s of the theoretical bandwidth and NVMe PCIe 3.0 x2 lanes can do ~1.87GB/s.

Optane+NAND SSD

We will expect the system I/O performance increase after adopting Intel® Optane Memory + Intel Rapid Storage technique.

Insight from VTune for a workload running on Optane enabled setup

IOAPI_time_ssdvsoptane [figure1]

The figure 1 shows two VTune results are based on a benchmark program, PCMark, running on “single SATA NAND SSD” vs “SATA NAND SSD + additional 16GB NVMe Optane module within IRST RAID 0 mode”. Besides the basics of VTune’s online help for Disk I/O analysis, you can also observe I/O APIs effective time by applying “Task Domain” grouping view. As VTune indicates, I/O API’s CPU time also gets improved with Optane’s acceleration. It make senses since most of I/O API calls are synchronous in this case and I/O media with Optane acceleration responses quickly.

Latency SSD vs Optane

[figure 2]

In figure 2, it shows how VTune measure the latency for single I/O operation. We compare 3^rd FileRead operation of the test#3(importing pictures to Windows Photo Gallery) of benchmark workload on both cases. It shows Optane+SSD can help nearly 5 times gain for this read operation speed in 300us vs 60us.

On linux target, VTune also provides the Page fault metric. Page fault event usually invokes disk I/O to handle page swapping. To avoid frequent Disk I/O caused by page fault events, the typical direction is to keep more pages in the memory instead swap pages back to the disk. Intel® Memory Drive Technology provides a solution to expand memory capacity and Optane provides the best proximity to memory’s speed. And that’s transparent to application and OS, it also mitigates the Disk I/O penalty to further increase the performance. One common mistake is that using asynchronous I/O can always help application’s I/O performance. Asynchronous I/O is to actually add more responsiveness back to the application because asynchronous I/O does not need to put CPU to wait. Putting CPU to wait is the case when synchronous I/O API is used but I/O operation is not finished.

With all that software design suggestions above, the extra performance solution is to upgrade your hardware to faster media. Intel® Optane is Intel’s edge non-volatile memory technology enabling memory-like performance at storage-like capacity and cost. VTune can even help to juice out more software performance by providing insight analysis.

Intel® Parallel Studio XE 2017 Update 4 integration to Microsoft* Visual Studio 2017 fails

May 23, 2017, 10:17 am

Latest and popular articles on Intel Technologies

≫ Next: Intel® Software Guard Extensions (Intel® SGX) Part 9: Power Events and Data Sealing

≪ Previous: Intel® VTune™ Amplifier Disk I/O analysis with Intel® Optane Memory

Issue: Installation of Intel® Parallel Studio XE 2017 Update 4 with Microsoft* Visual Studio 2017 integration hangs and fails on some systems. The problem is intermittent and not reproducible on every system. Any attempts to repair it fails with the message "Incomplete installation of Microsoft Visual Studio* 2017 is detected". Note, in some cases the installation may complete successfully with no error/crashes, however, the integration to VS2017 is not installed.

Environment: Microsoft* Windows, Visual Studio 2017

Root Cause: A root cause was identified and reported to Microsoft*. A Workaround is expected to be implemented in Intel Parallel Studio XE 2017 Update 5.

Workaround:

Integrate the Intel Parallel Studio XE 2017 components manually. You need to run all the files from the corresponding folders:

C++/Fortran Compiler IDE: <installdir>/ide_support_2018/VS15/*.vsix
Amplifier: <installdir>/VTune Amplifier 2018/amplxe_vs2017-integration.vsix
Advisor: <installdir>/Advisor 2018/advi_vs2017-integration.vsix
Inspector: <installdir>/Inspector 2018/insp_vs2017-integration.vsix
Debugger: <InstallDir>/ide_support_2018/MIC/*.vsix
<InstallDir>/ide_support_2018/CPUSideRDM/*.vsix

↧

Intel® Software Guard Extensions (Intel® SGX) Part 9: Power Events and Data Sealing

June 22, 2017, 3:12 pm

Latest and popular articles on Intel Technologies

≫ Next: Intel® Parallel Studio XE 2018 Composer Edition Fortran - Debug Solutions Release Notes

≪ Previous: Intel® Parallel Studio XE 2017 Update 4 integration to Microsoft* Visual Studio 2017 fails

Download [ZIP 598KB]

In part 9 of the Intel® Software Guard Extensions (Intel® SGX) tutorial series we’ll address some of the complexities surrounding the suspend and resume power cycle. Our application needs to do more than just survive power transitions: it must also provide a smooth user experience without compromising overall security. First, we’ll discuss what happens to enclaves when the system resumes from the sleep state and provide general advice on how to manage power transitions in an Intel SGX application. We’ll examine the data sealing capabilities of Intel SGX and show how they can help smooth the transitions between power states, while also pointing out some of the serious pitfalls that can occur when they are used improperly. Finally, we’ll apply these techniques to the Tutorial Password Manager in order to create a smooth user experience.

You can find a list of all the published tutorials in the article Introducing the Intel® Software Guard Extensions Tutorial Series.

Source code is provided with this installment of the series.

Suspend, Hibernate, and Resume

Applications must be able to survive a sleep and resume cycle. When the system resumes from suspend or hibernation, applications should return to their previous state, or, if necessary, create a new state specifically to handle the wake event. What applications shouldn’t do is become unstable or crash as a direct result of that change in the power state. Call this the “rule zero” of managing power events.

Most applications don’t actually need special handling for these events. When the system suspends, the application state is preserved because RAM is still powered on. When the system hibernates, the RAM is saved to a special hibernation file on disk, which is used to restore the system state when it’s powered back on. You don’t need to add code to enable or take advantage of this core feature of the OS. There are two notable exceptions, however:

Applications that rely on physical hardware that isn’t guaranteed to be preserved across power events, such as CPU caches.
Scenarios where possible changes to the system context can affect program logic. For example, a location-based application can be moved hundreds of miles while it’s sleeping and would need to re-acquire its location. An application that works with sensitive data may choose to guard against theft by reprompting the user for his or her password.

Our Tutorial Password Manager actually falls into both categories. Certainly, if a laptop running our password manager is stolen, the thief would potentially have access to the victim’s passwords until they explicitly closed the application or locked the vault. The first category, though, may be less obvious: Intel SGX is a hardware feature that is not preserved across power events.

We can demonstrate this by running the Tutorial Password Manager, unlocking the vault, suspending the system, waking it back up, and then trying to read a password or edit one of the accounts. Follow those sequences, and you’ll get one of the error dialogs shown in Figure 1 or Figure 2.

Figure 1. Error received when attempting to edit an account after resuming from sleep.

Figure 2. Error received when attempting to view an account password after resuming from sleep.

As currently written, the Tutorial Password Manager violates rule zero: it becomes unstable after resuming from a sleep operation. The application needs special handling for power events.

Enclaves and Power Events

When a processor leaves S0 or S1 for a lower-power state, the enclave page cache (EPC) is destroyed: all EPC pages are erased along with their encryption keys. Since enclaves store their code and data in the EPC, when the EPC goes away the enclaves go with it. This means that enclaves do not survive power events that take the system to state S2 or lower.

Table 1 provides a summary of the power states.

Table 1. CPU power states

State	Description
S0	Active run state. The CPU is executing instructions, and background tasks are running even if the system appears idle and the display is powered off.
S1	Processor caches are flushed, CPU stops executing instructions. Power to CPU and RAM is maintained. Devices may or may not power off. This is a high-power standby state, sometimes called “power on suspend.”
S2	CPU is powered off. CPU context and contents of the system cache are lost.
S3	RAM is powered on to preserve its contents. A standby or sleep state.
S4	RAM is saved to nonvolatile storage in a hibernation file before powering off. When powered on, the hibernation file is read in to restore the system state. A hibernation state.
S5	“Soft off.” The system is off but some components are powered to allow a full system power-on via some external event, such as Wake-on-LAN, a system management component, or a connected device.

Power state S1 is not typically seen on modern systems, and state S2 is uncommon in general. Most CPUs go to power state S3 when put in “sleep” mode and drop to S4 when hibernating to disk.

The Windows* OS provides a mechanism for applications to subscribe to wakeup events, but that won’t help any ECALLs that are in progress when the power transition occurs (and, by extension, any OCALLs either since they are launched from inside of ECALLs). When the enclave is destroyed, the execution context for the ECALL is destroyed with it, any nested OCALLs and ECALLs are destroyed, and the outer-most ECALL immediately returns with a status of SGX_ERROR_ENCLAVE_LOST.

It is important to note that any OCALLs that are in progress are destroyed without warning, which means any changes they are making in unprotected memory will potentially be incomplete. Since unprotected memory is maintained or restored when resuming from the S3 and S4 power states, it is important that developers use reliable and robust procedures to prevent partial write corruptions. Applications must not end up in an indeterminate or invalid state when power resumes.

General Advice for Managing Power Transitions

Planning for power transitions begins before a sleep or hibernation event occurs. Decide how extensive the enclave recovery needs to be. Should the application be able to pick up exactly where it left off without user intervention? Will it resume interrupted tasks, restart them, or just abort? Will the user interface, if any, reflect the change in state? The answers to these questions will drive the rest of the application design. As a general rule, the more autonomous and seamless the recovery is, the more complex the program logic will need to be.

An application may also have different levels of recovery at different points. Some stages of an application may be easier to seamlessly recover from than others, and in some execution contexts it may not make sense or even be good security practice to attempt a seamless recovery at all.

Once the overall enclave recovery strategy has been identified, the process of preparing an enclave for a power event is as follows:

Determine the minimal state information and data that needs to be saved in order to reconstruct the enclave.
Periodically seal the state information and save it to unprotected memory (data sealing is discussed below). The sealed state data can be sent back to the main application as an [out] pointer parameter to an ECALL, or the ECALL can make an OCALL specifically to save state data.
When an SGX_ERROR_ENCLAVE_LOST code is returned by an ECALL, explicitly destroy the enclave and then recreate it. It is strongly recommended that applications explicitly destroy the enclave with a call to sgx_enclave_destroy().
Restore the enclave state using an ECALL that is designed to do so.

It is important to save the enclave state to untrusted memory before a power transition occurs. Even if the OS is able to send an event to an application when it is about to enter a standby mode, there are no guarantees that the application will have sufficient time to act before the system physically goes to sleep.

Data Sealing

When an enclave needs to preserve data across instantiations, either in preparation for a power event or between executions of the parent application, it needs to send that data out to untrusted memory. The problem with untrusted memory, however, is exactly that: it is untrusted. It is neither encrypted nor integrity checked, so any data sent outside the enclave in the clear is potentially leaking secrets. Furthermore, if that data were to be modified in untrusted memory, future instantiations of the enclave would not be able to detect that the modification occurred.

To address this problem, Intel SGX provides a capability called data sealing. When data is sealed, it is encrypted with advanced encryption standard (AES) in Galois/Counter Mode (GCM) using a 128-bit key that is derived from CPU-specific key material and some additional inputs, guided by one of two key policies. The use of AES-GCM provides both confidentiality of the data being sealed and integrity checking when the data is read back in and unsealed (decrypted).

As mentioned above, the key used in data sealing is derived from several inputs. The two key policies defined by data sealing determine what those inputs are:

MRSIGNER. The encryption key is derived from the CPU’s key material, the security version number (SVN), and the enclave signing key used by the developer. Data sealed using MRSIGNER can be unsealed by other enclaves on that same system that originate from the same software vendor (enclaves that share the same signing key). The use of an SVN allows enclaves to unseal data that was sealed by previous versions of an enclave, but prevents older enclaves from unsealing data from newer versions. It allows enclave developers to enforce software version upgrades.
MRENCLAVE. The encryption key is derived from the CPU’s key material and the enclave’s cryptographic signature. Data signed with the MRENCLAVE policy can only be unsealed by that exact enclave on that system.

Note that the CPU is a common component in the two key policies. Each processor has some random, hardware-based key material—physical circuitry on the processor—which is built into it as part of the manufacturing process. This ensures that data sealed by an enclave on one CPU cannot be unsealed by enclaves on another CPU. Each CPU will result in a different signing key, even if all other aspects of the signing policy (enclave measurement, enclave signing key, SVN) are the same.

The data sealing and unsealing API is really a set of convenience functions. They provide a high-level interface to the underlying AES-GCM encryption and 128-bit key derivation functions.

Once data has been sealed in the enclave, it can be sent out to untrusted memory and optionally written to disk.

Caveats

There is a caveat with data sealing, though, and it has significant security implications. Your enclave API needs to include an ECALL that will take sealed data as an input and then unseal it. However, Intel SGX does not authenticate the calling application, so you cannot assume that only your application is loading your enclave. This means that your enclave can be loaded and executed by anyone, even applications you didn’t write. As you might recall from Part 1, enclave applications are divided into two parts: the trusted part, which is made up of the enclaves, and the untrusted part, which is the rest of the application. These terms, “trusted” and “untrusted,” are chosen deliberately.

Intel SGX cannot authenticate the calling application because this would require a trusted execution chain that runs from system power-on all the way through boot, the OS load, and launching the application. This is far outside the scope of Intel SGX, which limits the trusted execution environment to just the enclaves themselves. Because there’s no way for the enclave to validate the caller, each enclave must be written defensibly. Your enclave cannot make any assumptions about the application that has called into it. An enclave must be written under the assumption that any application can load it and execute its API, and that its ECALLs can be executed in any order.

Normally this is not a significant constraint, but sealing and unsealing data complicates matters significantly because both the sealed data and the means to unseal it are exposed to arbitrary applications. The enclave API must not allow applications to use sealed data to bypass security mechanisms.

Take the following scenario as an example: A file encryption program wants to save end users the hassle of re-entering their password every time the application runs, so it seals their password using the data sealing functions and the MRENCLAVE policy, and then writes the sealed data to disk. When the application starts, it looks for the sealed data file, and if it’s present, reads it in and makes an ECALL to unseal the data and restore the user’s password into the enclave.

The problems with this hypothetical application are two-fold:

It assumes that it is the only application that will ever load the enclave.
It doesn’t authenticate the end user when the data is unsealed.

A malicious software developer can write their own application that loads the same enclave and follows the same procedure (looks for the sealed data file, and invokes the ECALL to unseal it inside the enclave). While the malicious application can’t expose the user’s password, it can use the enclave’s ECALLs to encrypt and decrypt the user’s files using their stored password, which is nearly as bad. The malicious user has gained the ability to decrypt files without having to know the user’s password at all!

A non-Intel SGX version of this same application that offered this same convenience feature would also be vulnerable, but that’s not the point. If the goal is to use Intel SGX features to harden the application’s security, those same features should not be undermined by poor programming practices!

Managing Power Transitions in the Tutorial Password Manager

Now that we understand how power events affect enclaves and know what tools are available to assist with the recovery process, we can turn our attention to the Tutorial Password Manager. As currently written, it has two problems:

It becomes unstable after a power event.
It assumes the password vault should remain unlocked after the system resumes.

Before we can solve the first problem we need to address the second one, and that means making some design decisions.

Sleep and Resume Behavior

The big decision that needs to be made for the Tutorial Password Manager is whether or not to lock the password vault when the system resumes from a sleep state.

The primary argument for locking the password vault after a sleep/resume cycle is to protect the password database in case the physical system is stolen while it’s suspended. This would prevent the thief from being able to access the password database after waking up the device. However, having the system lock the password vault immediately can also be a user interface friction: sometimes, aggressive power management settings cause a running system to sleep while the user is still in front of the device. If the user wakes the system back up immediately, they might be irritated to find that their password vault has been locked.

This issue really comes down to balancing user convenience against security, so the right approach is to give the user control over the application’s behavior. The default will be for the password vault to lock immediately upon suspend/resume, but the user can configure the application to wait up to 10 minutes after the sleep event before the vault is forcibly locked.

Intel® Software Guard Extensions and Non-Intel Software Guard Extensions Code Paths

Interestingly, the default behavior of the Intel SGX code path differs from that of the non-Intel SGX code path. Enclaves are destroyed during the sleep/resume cycle, which means that we effectively lock the password vault as a result. To give the user the illusion that the password vault never locked at all, we have to not only reload the vault file from disk, but also explicitly unlock it again without forcing the user to re-enter their password (this has some security implications, which we discuss below).

For the non-Intel SGX code path, the vault is just stored in regular memory. When the system resumes, system memory is unchanged and the application continues as normal. Thus, the default behavior is that an unlocked password vault remains unlocked when the system resumes.

Application Design

With the behavior of the application decided, we turn to the application design. Both code paths need to handle the sleep/resume cycle and place the vault in the correct state: locked or unlocked.

The Non-Intel Software Guard Extensions Code Path

This is the simpler of the two code paths. As mentioned above, the non-Intel SGX code path will, by default, leave the password vault unlocked if it was unlocked when the system went to sleep. When the system resumes it only needs to see how long it slept: if the sleep time exceeds the maximum configured by the user, the password vault should be explicitly locked.

To keep track of the sleep duration, we’ll need a periodic heartbeat that records the current time. This time will serve as the “sleep start” time when the system resumes. For security, the heartbeat time will be encrypted using the database key.

The Intel Software Guard Extensions Code Path

No matter how the application is configured, the system will need code to recreate the enclave and reopen the password vault. This will put the vault in the locked state.

The application will then need to see how long it has been sleeping. If the sleep time was less than the maximum configured by the user, the password vault needs to be explicitly unlocked without prompting the user for his or her master passphrase. In order to do that the application needs the passphrase, and that means the passphrase must be saved to untrusted memory so that it can be read back in when the system is restored.

The only safe way to save a secret to untrusted memory is to use data sealing, but this presents a significant security issue: As mentioned previously, our enclave can be loaded by any application, and the same ECALL that is used to unseal the master password will be available for anyone to use. Our password manager application exposes secrets to the end user (their passwords), and the master password is the only means of authenticating the user. The point of keeping the password vault unlocked after the sleep/resume cycle is to prevent the user from having to authenticate. That means we are creating a logic flow where a malicious user could potentially use our enclave’s API to unseal the user’s master password and then extract their account and password data.

In order to mitigate this risk, we’ll do the following:

Data will be sealed using the MRENCLAVE policy.
Sealed data will be kept in memory only. Writing it to disk would increase the attack surface.
In addition to sealing the password, we’ll also include the process ID. The enclave will require that the process ID of the calling process match the one that was saved when unsealing the data. If they don’t match, the vault will be left in the locked state.
The current system time will be sealed periodically using a heartbeat function. This will serve as the “sleep start” time.
The sleep duration will be checked in the enclave.

Note that verification logic must be in the enclave where it cannot be modified or manipulated.

This is not a perfect solution, but it helps. A malicious application would need to scrape the sealed data from memory, crash the user’s existing process, and then create new processes over and over until it gets one with the same process ID. It will have to do all of this before the lock timeout is reached (or take control of the system clock).

Common Needs

Both code paths will need some common infrastructure:

A timer to provide the heartbeat. We’ll use a timer interval of 15 seconds.
An event handler that is called when the system resumes from a sleep state.
Safe handling for any potential race conditions, since wakeup events are asynchronous.
Code that updates the UI to reflect the “locked” state of the password vault

Implementation

We won’t go over every change in the code base, but we’ll look at the major components and how they work.

User Options

The lock timeout value is set in the new Tools -> Options configuration dialog, shown in Figure 3.

Figure 3. Configuration options.

This parameter is saved immediately to the Windows registry under HKEY_LOCAL_USER and is loaded by the application on startup. If the registry value is not present, the lock timeout defaults to zero (lock the vault immediately after going to sleep).

The Intel SGX code path also saves this value in the enclave.

The Heartbeat

Figure 4 shows the declaration for the Heartbeat class which is ultimately responsible for recording the vault’s state information. The heartbeat is only run if state information is needed, however. If the user has set the lock timeout to zero, we don’t need to maintain state because we know to lock the vault immediately when the system resumes.

class PASSWORDMANAGERCORE_API Heartbeat {
	class PasswordManagerCoreNative *nmgr;
	HANDLE timer;
	void start_timer();
public:
	Heartbeat();
	~Heartbeat();
	void set_manager(PasswordManagerCoreNative *nmgr_in);
	void heartbeat();

	void start();
	void stop();
};

Figure 4. The Heartbeat class.

The PasswordManagerCoreNative class gains a Heartbeat object as a class member, and the Heartbeat object is initialized with a reference back to the containing PasswordManagerCoreNative object.

The Heartbeat class obtains a timer from CreateTimerQueueTimer and executes the callback function heartbeat_proc when the timer expires, as shown in Figure 5. The timer is sent a reference to the Heartbeat object, which in turn calls the heartbeat method in the Heartbeat class, which in turn calls the heartbeat method in PasswordManagerCoreNative and restarts the timer.

static void CALLBACK heartbeat_proc(PVOID param, BOOLEAN fired)
{
   // Call the heartbeat method in the Heartbeat object
	Heartbeat *hb = (Heartbeat *)param;
	hb->heartbeat();
}

Heartbeat::Heartbeat()
{
	timer = NULL;
}

Heartbeat::~Heartbeat()
{
	if (timer == NULL) DeleteTimerQueueTimer(NULL, &timer, NULL);
}

void Heartbeat::set_manager(PasswordManagerCoreNative *nmgr_in)
{
	nmgr = nmgr_in;

}

void Heartbeat::heartbeat ()
{
	// Call the heartbeat method in the native password manager
	// object. Restart the timer unless there was an error.

	if (nmgr->heartbeat()) start_timer();
}

void Heartbeat::start()
{
	stop();

	// Perform our first heartbeat right away.

	if (nmgr->heartbeat()) start_timer();
}

void Heartbeat::start_timer()
{
	// Set our heartbeat timer. Use the default Timer Queue

	CreateTimerQueueTimer(&timer, NULL, (WAITORTIMERCALLBACK)heartbeat_proc,
		(void *)this, HEARTBEAT_INTERVAL_SECS * 1000, 0, 0);
}

void Heartbeat::stop()
{
	// Stop the timer (if it exists)

	if (timer != NULL) {
		DeleteTimerQueueTimer(NULL, timer, NULL);
		timer = NULL;
	}
}

Figure 5. The Heartbeat class methods and timer callback function.

The heartbeat method in the PasswordManagerCoreNative object maintains the state information. To prevent partial write corruption, it has a two-element array of state data and an index pointer to the current index (0 or 1). The new state information is obtained from:

The new ECALL ve_heartbeat in the Intel SGX code path (by way of ew_heartbeat in EnclaveBridge.cpp).
The Vault method heartbeat in the non-Intel SGX code path.

After the new state has been received, it updates the next element (alternating between elements 0 and 1) of the array, and then updates the index pointer. The last operation is our atomic update, ensuring that the state information is complete before we officially mark it as the “current” state.

Intel Software Guard Extensions code path

The ve_heartbeat ECALL simply calls the heartbeat method in the E_Vault object, as shown in Figure 6.

int E_Vault::heartbeat(char *state_data, uint32_t sz)
{
	sgx_status_t status;
	vault_state_t vault_state;
	uint64_t ts;

	// Copy the db key

	memcpy(vault_state.db_key, db_key, 16);

	// To get the system time and PID we need to make an OCALL

	status = ve_o_process_info(&ts, &vault_state.pid);
	if (status != SGX_SUCCESS) return NL_STATUS_SGXERROR;

	vault_state.lastheartbeat = (sgx_time_t)ts;

	// Storing both the start and end times provides some
	// protection against clock manipulation. It's not perfect,
	// but it's better than nothing.

	vault_state.lockafter = vault_state.lastheartbeat + lock_delay;

	// Saves us an ECALL to have to reset this when the vault is restored.

	vault_state.lock_delay = lock_delay;

	// Seal our data with the MRENCLAVE policy. We defined our
	// struct as packed to support working on the address
	// directly like this.

	status = sgx_seal_data(0, NULL, sizeof(vault_state_t), (uint8_t *)&vault_state, sz, (sgx_sealed_data_t *) state_data);
	if (status != SGX_SUCCESS) return NL_STATUS_SGXERROR;

	return NL_STATUS_OK;
}

Figure 6. The heartbeat in the enclave.

It has to obtain the current system time and the process ID, and to do this we have added our first OCALL to the enclave, ve_o_process_info. When the OCALL returns, we update our state information and then call sgx_seal_data to seal it into the state_data buffer.

One restriction of the Intel SGX seal and unseal functions is that they can only operate on enclave memory. That means the state_data parameter must be a marshaled data buffer when used in this manner. If you need to write sealed data to a raw pointer that references untrusted memory (one that is passed with the user_check parameter), you must first seal the data to an enclave-local data buffer and then copy it over.

The OCALL is defined in EnclaveBridge.cpp:

// OCALL to retrieve the current process ID and
// local system time.

void SGX_CDECL ve_o_process_info(uint64_t *ts, uint64_t *pid)
{
	DWORD dwpid= GetCurrentProcessId();
	time_t ltime;

	time(&ltime);

	*ts = (uint64_t)ltime;
	*pid = (uint64_t)dwpid;
}

Because the heartbeat runs asynchronously, two threads can enter the enclave at the same time. This means the number of Thread Control Structures (TCSs) allocated to the enclave must be increased from the default of 1 to 2. This can be done one of two ways:

Right-click the Enclave project, select Intel SGX Configuration -> Enclave Settings to bring up the configuration window, and then set Thread Number to 2 (see Figure 7).
Edit the Enclave.config.xml file in the Enclave project directly, and then change the <TCSNum> parameter to 2.

Figure 7. Enclave settings dialog.

Detecting Suspend and Resume Events

A suspend and resume cycle will destroy the enclave, and that will be detected by the next ECALL. However, we shouldn’t rely on this mechanism to perform enclave recovery, because we need to act as soon as the system wakes up from the sleep state. That means we need an event listener to receive the power state change messages that are generated by Windows.

The best place to capture these is in the user interface layer. In addition to performing the enclave recovery, we must be able to lock the password vault if the system was in the sleep state longer than maximum sleep time set in the user options. When the vault is locked, the user interface also needs to be updated to reflect the new vault state.

One limitation of the Windows Presentation Foundation* is that it does not provide event hooks for power-related messages. The workaround is to hook in to the message handler for the underlying window handle. Our main application window and all of our dialog windows need a listener so that we can gracefully close each one.

The hook procedure for the main window is shown in Figure 8.

private IntPtr Main_Power_Hook(IntPtr hwnd, int msg, IntPtr wParam, IntPtr lParam, ref bool handled)
{
    UInt16 pmsg;

    // C# doesn't have definitions for power messages, so we'll get them via C++/CLI. It returns a
    // simple UInt16 that defines only the things we care about.
    pmsg= PowerManagement.message(msg, wParam, lParam);

    if ( pmsg == PowerManagementMessage.Suspend )
    {
        mgr.suspend();
        handled = true;
    } else if (pmsg == PowerManagementMessage.Resume)
    {
        int vstate = mgr.resume();

        if (vstate == ResumeVaultState.Locked) lockVault();
        handled = true;
    }

    return IntPtr.Zero;
}

Figure 8. Message hook for the main window.

To get at the messages, the handler must dip down to native code. This is done using the new PowerManagement class, which defines a static function called message, shown in Figure 9. It returns one of four values:

PWR_MSG_NONE	The message was not a power event.
PWR_MSG_OTHER	The message was power-related, but not a suspend or resume message.
PWR_MSG_RESUME	The system has woken up from a low-power or sleep state.
PWR_MSG_SUSPEND	The system is suspending to a low-power state.

UINT16 PowerManagement::message(int msg, IntPtr wParam, IntPtr lParam)
{
	INT32 subcode;

	// We only care about power-related messages

	if (msg != WM_POWERBROADCAST) return PWR_MSG_NONE;

	subcode = wParam.ToInt32();

	if ( subcode == PBT_APMRESUMEAUTOMATIC ) return PWR_MSG_RESUME;
	else if (subcode == PBT_APMSUSPEND ) return PWR_MSG_SUSPEND;

	// Don't care about other power events.

	return PWR_MSG_OTHER;
}

Figure 9. The message listener.

We actually listen for both suspend and resume messages here, but the suspend handler does very little work. When a system is transitioning to a sleep state, an application has less than 2 seconds to act on the power message. All we do with the sleep message is stop the heartbeat. This isn’t strictly necessary, and is just a precaution against having a heartbeat execute while the system is suspending.

The resume message is handled by calling the resume method in PasswordManagerCore. It’s job is to figure out whether the vault should be locked or unlocked. It does this by checking the current system time against the saved vault state (if any). If there’s no state, or if the system has slept longer than the maximum allowed, it returns ResumeVaultState.Locked.

Restoring the Enclave

In the Intel SGX code path, the enclave has to be recreated before the enclave state information can be checked. The code for this is shown in Figure 10.

bool PasswordManagerCore::restore_vault(bool flag_async)
{
	bool got_lock= false;
	int rv;

	// Only let one thread do the restore if both come in at the
	// same time. A spinlock approach is inefficient but simple.
	// This is OK for our application, but a high-performance
	// application (or one with a long-running work loop)
	// would want something else.

	try {
		slock.Enter(got_lock);

		if (_nlink->supports_sgx()) {
			bool do_restore = true;

			// This part is only needed for enclave-based vaults.

			if (flag_async) {
				// If we are entering as a result of a power event,
				// make sure the vault has not already been restored
				// by the synchronous/UI thread (ie, a failed ECALL).

				rv = _nlink->ping_vault();
				if (rv != NL_STATUS_LOST_ENCLAVE) do_restore = false;
				// If do_store is false, then we'll also use the
				// last value of rv_restore as our return value.
				// This will tell us whether or not we should lock the
				// vault.
			}

			if (do_restore) {
				// If the vaultfile isn't open then we are locked or hadn't
				// been opened to be begin with.

				if (!vaultfile->is_open()) {
					// Have we opened a vault yet?
					if (vaultfile->get_vault_path()->Length == 0) goto restore_error;

					// We were explicitly locked, so reopen.
					rv = vaultfile->open_read(vaultfile->get_vault_path());
					if (rv != NL_STATUS_OK) goto restore_error;
				}

				// Reinitialize the vault from the header.

				rv = _vault_reinitialize();
				if (rv != NL_STATUS_OK) goto restore_error;

				// Now, call to the native object to restore the vault state.
				rv = _nlink->restore_vault_state();
				if (rv != NL_STATUS_OK) goto restore_error;

				// The database password was restored to the vault. Now restore
				// the vault, itself.

				rv = send_vault_data();
			restore_error:
				restore_rv = (rv == NL_STATUS_OK);
			}
		}
		else {
			rv = _nlink->check_vault_state();
			restore_rv = (rv == NL_STATUS_OK);
		}

		slock.Exit(false);
	}
	catch (...) {
		// We don't need to do anything here.
	}

	return restore_rv;
}

Figure 10. The restore_vault() method.

The enclave and vault are reinitialized from the vault data file, and the vault state is restored using the method restore_vault_state in PasswordManagerCoreNative.

Which Thread Restores the Vault State?

The Tutorial Password Manager can have up to three threads executing at any given time. They are:

The main UI
The heartbeat
The power event handler

Only one of these threads should be responsible for actually restoring the enclave, but it is possible that both the heartbeat and the main UI thread are in the middle of an ECALL when a power event occurs. In that case, both ECALLs will fail with the error code SGX_ERR_ENCLAVE_LOST while the power event handler is executing. Given this potential race condition, it’s necessary to decide which thread is given the job of enclave recovery.

If the lock timeout is set to zero, there won’t be a heartbeat thread at all, so it doesn’t make sense to put enclave recovery logic there. If the heartbeat ECALL returns SGX_ERR_ENCLAVE_LOST, it simply stops the heartbeat and assumes other threads will be dealing with it.

That leaves the UI thread and the power event handler, and a good argument can be made that both threads need the ability to recover an enclave. The event handler will catch all suspend/resume cycles immediately, so it make sense to have enclave recovery happen there. However, as we pointed out earlier it is entirely possible for a power event to occur during an active ECALL on the UI thread, and there’s no reason to prevent that thread from starting the recovery, especially since it might occur before the power event message is received. This not only provides a safety net in case the event handler fails to execute for some reason, but it also provides a quick and easy retry loop for the operation.

Since we can’t have both of these threads run the recovery at the same time, we need to use locking to ensure that only the first thread to arrive is given the job. The second one simply waits for the first to finish.

It’s also possible that a failed ECALL will complete the recovery process before the event handler enters the recovery loop. To prevent the event handler from blindly repeating the enclave recovery procedure, we have added a quick test to make sure the enclave hasn’t already been recreated.

Detection in the UI Thread

The UI thread detects power events by looking for ECALLs that fail with SGX_ERR_LOST_ENCLAVE. The wrapper functions in EnclaveBridge.cpp automatically relaunch the enclave and pass the error NL_STATUS_ENCLAVE_RECREATED back up to the PasswordManagerCore object.

Each method in PasswordManagerCore handles this return code uniquely. Some methods, such as initialize, initialize_from_header, and lock_vault don’t actually have to restore state at all, but most of the others do and they call in to restore_vault as show in Figure 11.

int PasswordManagerCore::accounts_password_to_clipboard(UInt32 idx)
{
	UINT32 index = idx;
	int rv;
	int tries = 3;

	while (tries--) {
		rv = _nlink->accounts_password_to_clipboard(index);
		if (rv == NL_STATUS_RECREATED_ENCLAVE) {
			if (!restore_vault()) {
				rv = NL_STATUS_LOST_ENCLAVE;
				tries = 0;
			}
		}
		else break;
	}

	return rv;
}

Figure 11. Detecting a power event on the main UI thread.

Here, the method gets three attempts to restore the vault before giving up. This retry count of three is an arbitrary limit: it’s not likely that we’ll have multiple power events in rapid succession but it’s possible. Though we don’t want to just give up after one attempt, we also don’t want to loop forever in case there’s a system issue that prevents the enclave from ever being recreated.

Restoring and Checking State

The last step is to examine the state data for the vault and determine whether the vault should be locked or unlocked. In the Intel SGX code path, the sealed state data is sent into the enclave where it is unsealed, and then compared to current system data obtained from the OCALL ve_o_process_info. This method, restore_state, is shown in Figure 12.

int E_Vault::restore_state(char *state_data, uint32_t sz)
{
	sgx_status_t status;
	vault_state_t vault_state;
	uint64_t now, thispid;
	uint32_t szout = sz;

	// First, make an OCALL to get the current process ID and system time.
	// Make these OCALLs so that the parameters aren't be supplied by the
	// ECALL (which would make it trivial for the calling process to fake
	// this information)

	status = ve_o_process_info(&now, &thispid);
	if (status != SGX_SUCCESS) {
		// Zap the state data.
		memset_s(state_data, sz, 0, sz);
		return NL_STATUS_SGXERROR;
	}

	status = sgx_unseal_data((sgx_sealed_data_t *)state_data, NULL, 0, (uint8_t *)&vault_state, &szout);
	// Zap the state data.
	memset_s(state_data, sz, 0, sz);

	if (status != SGX_SUCCESS) return NL_STATUS_SGXERROR;

	if (thispid != vault_state.pid) return NL_STATUS_PERM;
	if (now < vault_state.lastheartbeat) return NL_STATUS_PERM;
	if (now > vault_state.lockafter) return NL_STATUS_PERM;

	// Everything checks out. Restore the key and mark the vault as unlocked.

	lock_delay = vault_state.lock_delay;

	memcpy(db_key, vault_state.db_key, 16);
	_VST_CLEAR(_VST_LOCKED);

	return NL_STATUS_OK;
}

Figure 12. Restoring state in the enclave.

Note that unsealing data is programmatically simpler than sealing it: the key derivation and policy information is embedded in the sealed data blob. Unlike data sealing there is only one unseal function, sgx_unseal_data, and it takes fewer parameters than its counterpart.

This method returns NL_STATUS_OK if the vault is restored to the unlocked state, and NL_STATUS_PERM if it is restored to the locked state.

Lingering Issues

The Tutorial Password Manager as currently implemented still has issues that need to be addressed.

There is still a race condition in the enclave recovery logic. Because the ECALL wrappers in EnclaveBridge.cpp immediately recreate the enclave before returning an error code to the PasswordManagerCore layer, it is possible for the power event handler thread to enter the restore_vault method after the enclave has been recreated but before the enclave recovery has completed. This can cause the power event handler to return the wrong status to the UI layer, placing the UI in the “locked” or “unlocked” state incorrectly.
We depend on the system clock when validating our state data, but the system clock is actually untrusted. A malicious user can manipulate the time in order to force the password vault into an unlocked state when the system wakes up (this can be addressed by using trusted time, instead).

Summary

In order to prevent cold boot attacks and other attacks against memory images in RAM, Intel SGX destroys the Enclave Page Cache whenever the system enters a low-power state. However, this added security comes at a price: software complexity that can’t be avoided. All real-world Intel SGX applications need to plan for power events and incorporate enclave recovery logic because failing to do so will lead to runtime errors during the application’s execution.

Power event planning can rapidly escalate the application’s level of sophistication. The user experience needs of the Tutorial Password Manager took us from a single-threaded application with relatively simple constructs to one with multiple, asynchronous threads, locking, and atomic memory updates via simple journaling. As a general rule, seamless enclave recovery requires careful design and a significant amount of added program logic.

Sample Code

The code sample for this part of the series builds against the Intel SGX SDK version 1.7 using Microsoft Visual Studio* 2015.

Release Notes

Running a mixed-mode Intel SGX application under the debugger in Visual Studio will cause an exception to be thrown if a power event is triggered. The exception occurs when an ECALL detects the lost enclave and returns SGX_ERROR_LOST_ENCLAVE.
The non-Intel SGX code path was updated to use Microsoft’s DPAPI to store the database encryption key. This is a better solution than the in-memory XOR’ing.

Coming Up Next

In Part 10 of the series, we’ll discuss debugging mixed-mode Intel SGX applications with Visual Studio. Stay tuned!

↧

Intel® Parallel Studio XE 2018 Composer Edition Fortran - Debug Solutions Release Notes

March 21, 2017, 10:45 am

Latest and popular articles on Intel Technologies

≫ Next: Intel® Parallel Studio XE 2018 Composer Edition C++ - Debug Solutions Release Notes

≪ Previous: Intel® Software Guard Extensions (Intel® SGX) Part 9: Power Events and Data Sealing

This page provides the current Release Notes for the Debug Solutions from Intel® Parallel Studio XE 2018 Composer Edition for Fortran Linux*, Windows* and OS X*/macOS* products.

To get product updates, log in to the Intel® Software Development Products Registration Center.

For questions or technical support, visit Intel® Software Products Support.

For the top-level Release Notes, visit:

Intel® Fortran Compiler 17.0 Release Notes

Table of Contents:

Change History

This section highlights important from the previous product version and changes in product updates.

Changes since Intel® Parallel Studio XE 2017 Composer Edition

Support for Intel(R) Xeon Phi(TM) Product Family x200 (formerly code named Knights Landing)
BKM provided below If Knights Landing debugging inside Visual Studio fails to attach to the Knights Landing targets

Product Contents

Linux*:
- GNU* Project Debugger (GDB) 7.12:
  Command line for host CPU and Intel® Xeon Phi™ coprocessor, and Eclipse* IDE plugin for offload enabled applications.
OS X*/macOS*:
- GNU* Project Debugger (GDB) 7.12:
  Command line for CPU only.
Windows*:
- Intel® Debugger Extension for Intel® Many Integrated Core Architecture (Intel® MIC Architecture)
- Fortran Expression Evaluator (FEE) as extension to debugger of Microsoft Visual Studio*

GNU* GDB

This section summarizes the changes, new features, customizations and known issues related to the GNU* GDB provided with Intel® Parallel Studio XE 2018 Composer Edition.

Features

GNU* GDB provided with Intel® Parallel Studio XE 2018 Composer Edition and above is based on GDB 7.12 with additional enhancements provided by Intel. This debugger replaces the Intel® Debugger from previous releases. In addition to features found in GDB 7.12, there are several other new features:

Intel® Processor Trace (Intel® PT) support for 5th generation Intel® Core™ Processors:
(gdb) record btrace pt
Support for Intel® Xeon Phi™ coprocessor & processor X200
Support for Intel® Transactional Synchronization Extensions (Intel® TSX) (Linux & OSX)
Register support for Intel® Memory Protection Extensions (Intel® MPX) and Intel® Advanced Vector Extensions 512 (Intel® AVX-512)
Data Race Detection (pdbx):
Detect and locate data races for applications threaded using POSIX* thread (pthread) or OpenMP* models
Branch Trace Store (btrace):
Record branches taken in the execution flow to backtrack easily after events like crashes, signals, exceptions, etc.

All features are available for Linux*, but only Intel® TSX is supported for OS X*/macOS*.

Using GNU* GDB

GNU* GDB provided with Intel® Parallel Studio XE 2018 Composer Edition comes in different versions:

IA-32/Intel® 64 debugger:
Debug applications natively on IA-32 or Intel® 64 systems with gdb-ia on the command line.
A standard Eclipse* IDE can be used for this as well if a graphical user interface is desired.
Intel® Xeon Phi™ coprocessor debugger (only for Linux*):
Debug applications remotely on Intel® Xeon Phi™ coprocessor systems. The debugger will run on a host system and a debug agent (gdbserver) on the coprocessor.
There are two options:
- Use the command line version of the debugger with gdb-mic.
  This only works for native Intel® Xeon Phi™ coprocessor X100 applications. For Intel® Xeon Phi™ coprocessor & processor X200 use gdb-ia.
  A standard Eclipse* IDE can be used for this as well if a graphical user interface is desired.
- Use an Eclipse* IDE plugin shipped with Intel® Parallel Studio XE 2018 Composer Edition.
  This works only for offload enabled Intel® Xeon Phi™ coprocessor applications. Instructions on how to use GNU* GDB can be found in the Documentation section.

Documentation

The documentation for the provided GNU* GDB can be found here:

<install-dir>/documentation_2018/en/debugger/gdb-ia/gdb.pdf<install-dir>/documentation_2018/en/debugger/ps2018/get_started.htm

The latter is available online as well:

Known Issues and Changes

Knights Landing Debugging in Visual Studio

If Knights Landing debugging inside Visual Studio fails to attach to the Knights Landing targets, try the following 2 steps:
1) Resolving the IP address of the mic card

Open the Intel Compiler x64 environment command prompt.
Check if the mic0 card address can be resolved: DOS> ping mic0
If that fails to find the mic0 card, then you need to register it manually, otherwise continue with 2).
First we need to find out the IP address of the mic0 card: DOS> ipconfig
Search for the mic0 adapter IP address. The mic0 card IP address is the same just with 1 as last digit.
Then add the following line to the hosts file in C:\Windows\system32\drivers\etc\hosts.
<mic0-IP-address> mic0
(If you have more then 1 mic card, you need to make the same also for mic1, …)
Afterwards reboot your PC.

2) Running the MIC application in the Intel Compiler Environment

If you want to run or debug a MIC application ALWAYS use the Intel Compiler x64 environment command prompt!
To debug the mic application start Visual Studio from this corrected the Intel Compiler x64 environment command prompt:
DOS> C:\Program Files (x86)\Microsoft Visual Studio 14.0\Common7\IDE\devenv.com

Not found: libncurses.so.5

On some systems, using the GNU* GDB version that is provided by Intel, fails due to a missing libncurses.so.5 (e.g. Fedora 24 and 25). Please install the package ncurses-compat-libs which provides the missing library.

Not found: libtinfo.so.5

On some systems, using the GNU* GDB version that is provided by Intel, fails due to a missing libtinfo.so.5 (e.g. SLES 11 SP3). If a package for libtinfo is not available, the following workaround can be applied:

$ sudo ln -s <path>/libncurses.so.5.6 <path>/libtinfo.so.5

As <path>, use the location of the system's ncurses library.

Safely ending offload debug sessions

To avoid issues like orphan processes or stale debugger windows when ending offload applications, manually end the debugging session before the application is reaching its exit code. The following procedure is recommended for terminating a debug session:

Manually stop a debug session before the application reaches the exit-code.
When stopped, press the red stop button in the tool-bar in the Intel® MIC Architecture-side debugger first. This will end the offloaded part of the application.
Next, do the same for the CPU-side debugger.
The link between the two debuggers will be kept alive. The Intel® MIC Architecture-side debugger will stay connected to the debug agent and the application will remain loaded in the CPU-side debugger, including all breakpoints that have been set.
At this point, both debugger windows can safely be closed.

Intel® MIC Architecture-side debugger asserts on setting source directories

Setting source directories in the GNU* GDB might lead to an assertion.

Resolution:

The assertion should not affect debugger operation. To avoid the assertion anyway, don’t use source directory settings. The debugger will prompt you to browse for files it cannot locate automatically.

Debugger and debugged application required to be located on local drive (OS X/macOS only)

In order to use the provided GNU* GDB (gdb-ia), it has to be installed on a local drive. As such, the entire Intel® Parallel Studio XE 2018 package has to be installed locally. Any application that is being debugged needs to be located on a local drive as well. This is a general requirement that’s inherent to GNU GDB with OS X*/macOS*.

Debugging Fortran applications with Eclipse* IDE plugin for Intel® Xeon Phi™ coprocessor

If the Eclipse* IDE plugin for the Intel® Xeon Phi™ coprocessor is used for debugging Fortran applications, evaluation of arrays in the locals window might be incorrect. The underlying CDT applies the C/C++ syntax with brackets to arrays to retrieve their contents. This does not work for Fortran.
Solution: Use a fully qualified Fortran expression to retrieve the contents of arrays (e.g. with array sections like array(1:10)).

Intel® Debugger Extension for Intel® Many Integrated Core Architecture (Intel® MIC Architecture)

This section summarizes new features and changes, usage and known issues related to the Intel® Debugger Extension. This debugger extension only supports code targeting Intel® Many Integrated Core Architecture (Intel® MIC Architecture).

Features

Support for both native Intel® Xeon Phi™ coprocessor applications and host applications with offload extensions
Debug multiple Intel® Xeon Phi™ coprocessors at the same time (with offload extension)

Using the Intel® Debugger Extension

The Intel® Debugger Extension is a plug-in for the Microsoft Visual Studio* IDE. It transparently enables debugging of projects defined by that IDE. Applications for Intel® Xeon Phi™ coprocessors can be either loaded and executed or attached to. This extension supports debugging of offload enabled code, using:

Microsoft Visual Studio* 2013
Microsoft Visual Studio* 2015
Microsoft Visual Studio* 2017

Documentation

The full documentation for the Intel® Debugger Extension can be found here:

<install-dir>\documentation_2018\en\debugger\ps2018\get_started.htm

This is available online as well:

Known Issues and Limitations

Disassembly window cannot be scrolled outside of 1024 bytes from the starting address within an offload section.
Handling of exceptions from the Intel® MIC Architecture application is not supported.
Starting an Intel® MIC Architecture native application is not supported. You can attach to a currently running application, though.
The Thread Window in Microsoft Visual Studio* offers context menu actions to Freeze, Thaw and Rename threads. These context menu actions are not functional when the thread is on an Intel® Xeon Phi™ coprocessor.
Setting a breakpoint right before an offload section sets a breakpoint at the first statement of the offload section. This only is true if there is no statement for the host between set breakpoint and offload section. This is normal Microsoft Visual Studio* breakpoint behavior but might become more visible with interweaved code from host and Intel® Xeon Phi™ coprocessor. The superfluous breakpoint for the offload section can be manually disabled (or removed) if desired.
Only Intel® 64 applications containing offload sections can be debugged with the Intel® Debugger Extension for Intel® Many Integrated Core Architecture.
Stepping out of an offload section does not step back into the host code. It rather continues execution without stopping (unless another event occurs). This is intended behavior.
The functionality “Set Next Statement” is not working within an offload section.
If breakpoints have been set for an offload section in a project already, starting the debugger might show bound breakpoints without addresses. Those do not have an impact on functionality.
For offload sections, using breakpoints with the following conditions of hit counts do not work: “break when the hit count is equal to” and “break when the hit count is a multiple of”.
The following options in the Disassembly window do not work within offload sections: “Show Line Numbers”, “Show Symbol Names” and “Show Source Code”
Evaluating variables declared outside the offload section shows wrong values.
Please consult the Output (Debug) window for detailed reporting. It will name unimplemented features (see above) or provide additional information required to configuration problems in a debugging session. You can open the window in Microsoft Visual Studio* via menu Debug->Windows->Output.
When debugging an offload-enabled application and a variable assignment is entered in the Immediate Window, the debugger may hang if assignments read memory locations before writing to them (for example, x=x+1). Please do not use the Immediate Window for changing variable values for offload-enabled applications.
Depending on the debugger extensions provided by Intel, the behavior (for example, run control) and output (for example, disassembly) could differ from what is experienced with the Microsoft Visual Studio* debugger. This is because of the different debugging technologies implemented by each and should not have a significant impact to the debugging experience.

Fortran Expression Evaluator (FEE) for debugging Fortran applications with Microsoft Visual Studio*

Fortran Expression Evaluator (FEE) is a plug-in for Microsoft Visual Studio* that is installed with Intel® Visual Fortran Compiler. It extends the standard debugger in Microsoft Visual Studio* IDE by handling Fortran expressions. There is no other change in usability.

Known Issues and Limitations

Conditional breakpoints limited

Conditional breakpoints that contain expressions with allocatable variables are not supported for Microsoft Visual Studio 2012* or later.

Debugging mixed language programs with Fortran does not work

To enable debugging Fortran code called from a .NET managed code application in Visual Studio 2012 or later, unset the following configuration:
Menu Tools->Options, under section Debugging->General, clear the "Managed C++ Compatibility Mode" or "Use Managed Compatibility Mode" check box

For any managed code application, one must also check the project property Debug > Enable unmanaged code debugging.

Native edit and continue

With Microsoft Visual Studio 2015*, Fortran debugging of mixed code applications is enabled if "native edit and continue" is enabled for the C/C++ part of the code. In earlier versions this is not supported.

FEE truncates entries in locals window

To increase debugging performance, the maximum number of locals queried by the debug engine is limited with Intel® Parallel Studio XE 2016 and later releases. If a location in the source code has more than that number of locals, they are truncated and a note is shown:

Note: Too many locals! For performance reasons the list got cut after 500 entries!

The threshold can be controlled via the environment variable FEE_MAX_LOCALS. Specify a positive value for the new threshold (default is 500). A value of -1 can be used to turn off truncation entirely (restores previous behavior) - but at the cost of slower debug state transitions. In order to take effect, Microsoft Visual Studio* needs to be restarted.

Problem with debugging C# applications

If Microsoft Visual Studio 2015* is used, debugging of C# applications might cause problems, i.e. evaluations like watches won't work.If you experience issues like that, try to enable "Managed Compatibility Mode". More details how to enable it can be found here:
http://blogs.msdn.com/b/visualstudioalm/archive/2013/10/16/switching-to-managed-compatibility-mode-in-visual-studio-2013.aspx

The problem is known and will be fixed with a future version.

Attributions

This product includes software developed at:

GDB – The GNU* Project Debugger

This program is free software; you can redistribute it and/or modify it under the terms and conditions of the GNU General Public License, version 2, as published by the Free Software Foundation.

This program is distributed in the hope it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.

GNU* Free Documentation License

Version 1.3, 3 November 2008

Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed.

0. PREAMBLE

The purpose of this License is to make a manual, textbook, or other functional and useful document "free" in the sense of freedom: to assure everyone the effective freedom to copy and redistribute it, with or without modifying it, either commercially or noncommercially. Secondarily, this License preserves for the author and publisher a way to get credit for their work, while not being considered responsible for modifications made by others.

This License is a kind of "copyleft", which means that derivative works of the document must themselves be free in the same sense. It complements the GNU General Public License, which is a copyleft license designed for free software.

We have designed this License in order to use it for manuals for free software, because free software needs free documentation: a free program should come with manuals providing the same freedoms that the software does. But this License is not limited to software manuals; it can be used for any textual work, regardless of subject matter or whether it is published as a printed book. We recommend this License principally for works whose purpose is instruction or reference.

1. APPLICABILITY AND DEFINITIONS

This License applies to any manual or other work, in any medium, that contains a notice placed by the copyright holder saying it can be distributed under the terms of this License. Such a notice grants a world-wide, royalty-free license, unlimited in duration, to use that work under the conditions stated herein. The "Document", below, refers to any such manual or work. Any member of the public is a licensee, and is addressed as "you". You accept the license if you copy, modify or distribute the work in a way requiring permission under copyright law.

A "Modified Version" of the Document means any work containing the Document or a portion of it, either copied verbatim, or with modifications and/or translated into another language.

A "Secondary Section" is a named appendix or a front-matter section of the Document that deals exclusively with the relationship of the publishers or authors of the Document to the Document's overall subject (or to related matters) and contains nothing that could fall directly within that overall subject. (Thus, if the Document is in part a textbook of mathematics, a Secondary Section may not explain any mathematics.) The relationship could be a matter of historical connection with the subject or with related matters, or of legal, commercial, philosophical, ethical or political position regarding them.

The "Invariant Sections" are certain Secondary Sections whose titles are designated, as being those of Invariant Sections, in the notice that says that the Document is released under this License. If a section does not fit the above definition of Secondary then it is not allowed to be designated as Invariant. The Document may contain zero Invariant Sections. If the Document does not identify any Invariant Sections then there are none.

The "Cover Texts" are certain short passages of text that are listed, as Front-Cover Texts or Back-Cover Texts, in the notice that says that the Document is released under this License. A Front-Cover Text may be at most 5 words, and a Back-Cover Text may be at most 25 words.

A "Transparent" copy of the Document means a machine-readable copy, represented in a format whose specification is available to the general public, that is suitable for revising the document straightforwardly with generic text editors or (for images composed of pixels) generic paint programs or (for drawings) some widely available drawing editor, and that is suitable for input to text formatters or for automatic translation to a variety of formats suitable for input to text formatters. A copy made in an otherwise Transparent file format whose markup, or absence of markup, has been arranged to thwart or discourage subsequent modification by readers is not Transparent. An image format is not Transparent if used for any substantial amount of text. A copy that is not "Transparent" is called "Opaque".

Examples of suitable formats for Transparent copies include plain ASCII without markup, Texinfo input format, LaTeX input format, SGML or XML using a publicly available DTD, and standard-conforming simple HTML, PostScript or PDF designed for human modification. Examples of transparent image formats include PNG, XCF and JPG. Opaque formats include proprietary formats that can be read and edited only by proprietary word processors, SGML or XML for which the DTD and/or processing tools are not generally available, and the machine-generated HTML, PostScript or PDF produced by some word processors for output purposes only.

The "Title Page" means, for a printed book, the title page itself, plus such following pages as are needed to hold, legibly, the material this License requires to appear in the title page. For works in formats which do not have any title page as such, "Title Page" means the text near the most prominent appearance of the work's title, preceding the beginning of the body of the text.

The "publisher" means any person or entity that distributes copies of the Document to the public.

A section "Entitled XYZ" means a named subunit of the Document whose title either is precisely XYZ or contains XYZ in parentheses following text that translates XYZ in another language. (Here XYZ stands for a specific section name mentioned below, such as "Acknowledgements", "Dedications", "Endorsements", or "History".) To "Preserve the Title" of such a section when you modify the Document means that it remains a section "Entitled XYZ" according to this definition.

The Document may include Warranty Disclaimers next to the notice which states that this License applies to the Document. These Warranty Disclaimers are considered to be included by reference in this License, but only as regards disclaiming warranties: any other implication that these Warranty Disclaimers may have is void and has no effect on the meaning of this License.

2. VERBATIM COPYING

You may copy and distribute the Document in any medium, either commercially or noncommercially, provided that this License, the copyright notices, and the license notice saying this License applies to the Document are reproduced in all copies, and that you add no other conditions whatsoever to those of this License. You may not use technical measures to obstruct or control the reading or further copying of the copies you make or distribute. However, you may accept compensation in exchange for copies. If you distribute a large enough number of copies you must also follow the conditions in section 3.

You may also lend copies, under the same conditions stated above, and you may publicly display copies.

3. COPYING IN QUANTITY

If you publish printed copies (or copies in media that commonly have printed covers) of the Document, numbering more than 100, and the Document's license notice requires Cover Texts, you must enclose the copies in covers that carry, clearly and legibly, all these Cover Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on the back cover. Both covers must also clearly and legibly identify you as the publisher of these copies. The front cover must present the full title with all words of the title equally prominent and visible. You may add other material on the covers in addition. Copying with changes limited to the covers, as long as they preserve the title of the Document and satisfy these conditions, can be treated as verbatim copying in other respects.

If the required texts for either cover are too voluminous to fit legibly, you should put the first ones listed (as many as fit reasonably) on the actual cover, and continue the rest onto adjacent pages.

If you publish or distribute Opaque copies of the Document numbering more than 100, you must either include a machine-readable Transparent copy along with each Opaque copy, or state in or with each Opaque copy a computer-network location from which the general network-using public has access to download using public-standard network protocols a complete Transparent copy of the Document, free of added material. If you use the latter option, you must take reasonably prudent steps, when you begin distribution of Opaque copies in quantity, to ensure that this Transparent copy will remain thus accessible at the stated location until at least one year after the last time you distribute an Opaque copy (directly or through your agents or retailers) of that edition to the public.

It is requested, but not required, that you contact the authors of the Document well before redistributing any large number of copies, to give them a chance to provide you with an updated version of the Document.

4. MODIFICATIONS

You may copy and distribute a Modified Version of the Document under the conditions of sections 2 and 3 above, provided that you release the Modified Version under precisely this License, with the Modified Version filling the role of the Document, thus licensing distribution and modification of the Modified Version to whoever possesses a copy of it. In addition, you must do these things in the Modified Version:

A. Use in the Title Page (and on the covers, if any) a title distinct from that of the Document, and from those of previous versions (which should, if there were any, be listed in the History section of the Document). You may use the same title as a previous version if the original publisher of that version gives permission.

B. List on the Title Page, as authors, one or more persons or entities responsible for authorship of the modifications in the Modified Version, together with at least five of the principal authors of the Document (all of its principal authors, if it has fewer than five), unless they release you from this requirement.

C. State on the Title page the name of the publisher of the Modified Version, as the publisher.

D. Preserve all the copyright notices of the Document.

E. Add an appropriate copyright notice for your modifications adjacent to the other copyright notices.

F. Include, immediately after the copyright notices, a license notice giving the public permission to use the Modified Version under the terms of this License, in the form shown in the Addendum below.

G. Preserve in that license notice the full lists of Invariant Sections and required Cover Texts given in the Document's license notice.

H. Include an unaltered copy of this License.

I. Preserve the section Entitled "History", Preserve its Title, and add to it an item stating at least the title, year, new authors, and publisher of the Modified Version as given on the Title Page. If there is no section Entitled "History" in the Document, create one stating the title, year, authors, and publisher of the Document as given on its Title Page, then add an item describing the Modified Version as stated in the previous sentence.

J. Preserve the network location, if any, given in the Document for public access to a Transparent copy of the Document, and likewise the network locations given in the Document for previous versions it was based on. These may be placed in the "History" section. You may omit a network location for a work that was published at least four years before the Document itself, or if the original publisher of the version it refers to gives permission.

K. For any section Entitled "Acknowledgements" or "Dedications", Preserve the Title of the section, and preserve in the section all the substance and tone of each of the contributor acknowledgements and/or dedications given therein.

L. Preserve all the Invariant Sections of the Document, unaltered in their text and in their titles. Section numbers or the equivalent are not considered part of the section titles.

M. Delete any section Entitled "Endorsements". Such a section may not be included in the Modified Version.

N. Do not retitle any existing section to be Entitled "Endorsements" or to conflict in title with any Invariant Section.

O. Preserve any Warranty Disclaimers.

If the Modified Version includes new front-matter sections or appendices that qualify as Secondary Sections and contain no material copied from the Document, you may at your option designate some or all of these sections as invariant. To do this, add their titles to the list of Invariant Sections in the Modified Version's license notice. These titles must be distinct from any other section titles.

You may add a section Entitled "Endorsements", provided it contains nothing but endorsements of your Modified Version by various parties—for example, statements of peer review or that the text has been approved by an organization as the authoritative definition of a standard.

You may add a passage of up to five words as a Front-Cover Text, and a passage of up to 25 words as a Back-Cover Text, to the end of the list of Cover Texts in the Modified Version. Only one passage of Front-Cover Text and one of Back-Cover Text may be added by (or through arrangements made by) any one entity. If the Document already includes a cover text for the same cover, previously added by you or by arrangement made by the same entity you are acting on behalf of, you may not add another; but you may replace the old one, on explicit permission from the previous publisher that added the old one.

The author(s) and publisher(s) of the Document do not by this License give permission to use their names for publicity for or to assert or imply endorsement of any Modified Version.

5. COMBINING DOCUMENTS

You may combine the Document with other documents released under this License, under the terms defined in section 4 above for modified versions, provided that you include in the combination all of the Invariant Sections of all of the original documents, unmodified, and list them all as Invariant Sections of your combined work in its license notice, and that you preserve all their Warranty Disclaimers.

The combined work need only contain one copy of this License, and multiple identical Invariant Sections may be replaced with a single copy. If there are multiple Invariant Sections with the same name but different contents, make the title of each such section unique by adding at the end of it, in parentheses, the name of the original author or publisher of that section if known, or else a unique number. Make the same adjustment to the section titles in the list of Invariant Sections in the license notice of the combined work.

In the combination, you must combine any sections Entitled "History" in the various original documents, forming one section Entitled "History"; likewise combine any sections Entitled "Acknowledgements", and any sections Entitled "Dedications". You must delete all sections Entitled "Endorsements".

6. COLLECTIONS OF DOCUMENTS

You may make a collection consisting of the Document and other documents released under this License, and replace the individual copies of this License in the various documents with a single copy that is included in the collection, provided that you follow the rules of this License for verbatim copying of each of the documents in all other respects.

You may extract a single document from such a collection, and distribute it individually under this License, provided you insert a copy of this License into the extracted document, and follow this License in all other respects regarding verbatim copying of that document.

7. AGGREGATION WITH INDEPENDENT WORKS

A compilation of the Document or its derivatives with other separate and independent documents or works, in or on a volume of a storage or distribution medium, is called an "aggregate" if the copyright resulting from the compilation is not used to limit the legal rights of the compilation's users beyond what the individual works permit. When the Document is included in an aggregate, this License does not apply to the other works in the aggregate which are not themselves derivative works of the Document.

If the Cover Text requirement of section 3 is applicable to these copies of the Document, then if the Document is less than one half of the entire aggregate, the Document's Cover Texts may be placed on covers that bracket the Document within the aggregate, or the electronic equivalent of covers if the Document is in electronic form. Otherwise they must appear on printed covers that bracket the whole aggregate.

8. TRANSLATION

Translation is considered a kind of modification, so you may distribute translations of the Document under the terms of section 4. Replacing Invariant Sections with translations requires special permission from their copyright holders, but you may include translations of some or all Invariant Sections in addition to the original versions of these Invariant Sections. You may include a translation of this License, and all the license notices in the Document, and any Warranty Disclaimers, provided that you also include the original English version of this License and the original versions of those notices and disclaimers. In case of a disagreement between the translation and the original version of this License or a notice or disclaimer, the original version will prevail.

If a section in the Document is Entitled "Acknowledgements", "Dedications", or "History", the requirement (section 4) to Preserve its Title (section 1) will typically require changing the actual title.

9. TERMINATION

You may not copy, modify, sublicense, or distribute the Document except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense, or distribute it is void, and will automatically terminate your rights under this License.

However, if you cease all violation of this License, then your license from a particular copyright holder is reinstated (a) provisionally, unless and until the copyright holder explicitly and finally terminates your license, and (b) permanently, if the copyright holder fails to notify you of the violation by some reasonable means prior to 60 days after the cessation.

Moreover, your license from a particular copyright holder is reinstated permanently if the copyright holder notifies you of the violation by some reasonable means, this is the first time you have received notice of violation of this License (for any work) from that copyright holder, and you cure the violation prior to 30 days after your receipt of the notice.

Termination of your rights under this section does not terminate the licenses of parties who have received copies or rights from you under this License. If your rights have been terminated and not permanently reinstated, receipt of a copy of some or all of the same material does not give you any rights to use it.

10. FUTURE REVISIONS OF THIS LICENSE

The Free Software Foundation may publish new, revised versions of the GNU Free Documentation License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. See http://www.gnu.org/copyleft/.

Each version of the License is given a distinguishing version number. If the Document specifies that a particular numbered version of this License "or any later version" applies to it, you have the option of following the terms and conditions either of that specified version or of any later version that has been published (not as a draft) by the Free Software Foundation. If the Document does not specify a version number of this License, you may choose any version ever published (not as a draft) by the Free Software Foundation. If the Document specifies that a proxy can decide which future versions of this License can be used, that proxy's public statement of acceptance of a version permanently authorizes you to choose that version for the Document.

11. RELICENSING

"Massive Multiauthor Collaboration Site" (or "MMC Site") means any World Wide Web server that publishes copyrightable works and also provides prominent facilities for anybody to edit those works. A public wiki that anybody can edit is an example of such a server. A "Massive Multiauthor Collaboration" (or "MMC") contained in the site means any set of copyrightable works thus published on the MMC site.

"CC-BY-SA" means the Creative Commons Attribution-Share Alike 3.0 license published by Creative Commons Corporation, a not-for-profit corporation with a principal place of business in San Francisco, California, as well as future copyleft versions of that license published by that same organization.

"Incorporate" means to publish or republish a Document, in whole or in part, as part of another Document.

An MMC is "eligible for relicensing" if it is licensed under this License, and if all works that were first published under this License somewhere other than this MMC, and subsequently incorporated in whole or in part into the MMC, (1) had no cover texts or invariant sections, and (2) were thus incorporated prior to November 1, 2008.

The operator of an MMC Site may republish an MMC contained in the site under CC-BY-SA on the same site at any time before August 1, 2009, provided the MMC is eligible for relicensing.

Disclaimer and Legal Information

INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR.
Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information.
The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.
Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.
Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.htm

Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families. Go to:
http://www.intel.com/products/processor_number/

MPEG-1, MPEG-2, MPEG-4, H.261, H.263, H.264, MP3, DV, VC-1, MJPEG, AC3, AAC, G.711, G.722, G.722.1, G.722.2, AMRWB, Extended AMRWB (AMRWB+), G.167, G.168, G.169, G.723.1, G.726, G.728, G.729, G.729.1, GSM AMR, GSM FR are international standards promoted by ISO, IEC, ITU, ETSI, 3GPP and other organizations. Implementations of these standards, or the standard enabled platforms may require licenses from various entities, including Intel Corporation.

BunnyPeople, Celeron, Celeron Inside, Centrino, Centrino Inside, Cilk, Core Inside, i960, Intel, the Intel logo, Intel AppUp, Intel Atom, Intel Atom Inside, Intel Core, Intel Inside, Intel Inside logo, Intel NetBurst, Intel NetMerge, Intel NetStructure, Intel SingleDriver, Intel SpeedStep, Intel Sponsors of Tomorrow., the Intel Sponsors of Tomorrow. logo, Intel StrataFlash, Intel Viiv, Intel vPro, Intel XScale, InTru, the InTru logo, InTru soundmark, Itanium, Itanium Inside, MCS, MMX, Moblin, Pentium, Pentium Inside, skoool, the skoool logo, Sound Mark, The Journey Inside, vPro Inside, VTune, Xeon, and Xeon Inside are trademarks of Intel Corporation in the U.S. and other countries.

* Other names and brands may be claimed as the property of others.

Microsoft, Windows, Visual Studio, Visual C++, and the Windows logo are trademarks, or registered trademarks of Microsoft Corporation in the United States and/or other countries.

Java is a registered trademark of Oracle and/or its affiliates.

↧

Intel® Parallel Studio XE 2018 Composer Edition C++ - Debug Solutions Release Notes

March 29, 2017, 1:15 am

Latest and popular articles on Intel Technologies

≫ Next: Intel® Software Development tools integration to Microsoft* Visual Studio 2017 issue

≪ Previous: Intel® Parallel Studio XE 2018 Composer Edition Fortran - Debug Solutions Release Notes

This page provides the current Release Notes for the Debug Solutions from Intel® Parallel Studio XE 2018 Composer Edition for C++ Linux*, Windows* and OS X*/macOS* products.

To get product updates, log in to the Intel® Software Development Products Registration Center.

For questions or technical support, visit Intel® Software Products Support.

For the top-level Release Notes, visit:

Intel® C++ Compiler 18.0 Release Notes

Table of Contents:

Change History

This section highlights important from the previous product version and changes in product updates.

Changes since Intel® Parallel Studio XE 2017 Composer Edition

Support for Intel(R) Xeon Phi(TM) Product Family x200 (formerly code named Knights Landing)
Best Known Method provided below If Knights Landing debugging inside Visual Studio fails to attach to the Knights Landing targets

Product Contents

This section lists the individual Debug Solutions components for each supported host OS. Not all components are available for all host OSes.

Linux*:
- GNU* Project Debugger (GDB) 7.12:
  Command line for host CPU and Intel® Xeon Phi™ coprocessor & processor, and Eclipse* IDE plugin for offload enabled applications.
- Intel® Debugger for Heterogeneous Compute 2018
OS X*/macOS*:
- GNU* Project Debugger (GDB) 7.12:
  Command line for CPU only.
Windows*:
- Intel® Debugger Extension for Intel® Many Integrated Core Architecture (Intel® MIC Architecture)

GNU* GDB

This section summarizes the changes, new features, customizations and known issues related to the GNU* GDB provided with Intel® Parallel Studio XE 2018 Composer Edition.

Features

Intel® Processor Trace (Intel® PT) support for 5th generation Intel® Core™ Processors:
(gdb) record btrace pt
Support for Intel® Xeon Phi™ coprocessor & processor X200
Support for Intel® Transactional Synchronization Extensions (Intel® TSX) (Linux & OSX)
Register support for Intel® Memory Protection Extensions (Intel® MPX) and Intel® Advanced Vector Extensions 512 (Intel® AVX-512)
Data Race Detection (pdbx):
Detect and locate data races for applications threaded using POSIX* thread (pthread) or OpenMP* models
Branch Trace Store (btrace):
Record branches taken in the execution flow to backtrack easily after events like crashes, signals, exceptions, etc.
Pointer Checker:
Assist in finding pointer issues if compiled with Intel® C++ Compiler and having Pointer Checker feature enabled (see Intel® C++ Compiler documentation for more information)
Improved Intel® Cilk™ Plus Support:
Serialized execution of Intel® Cilk™ Plus parallel applications can be turned on and off during a debug session using the following command:
(gdb) set cilk-serialization [on|off]

All features are available for Linux*, but only Intel® TSX is supported for OS X*/macOS*.

Using GNU* GDB

GNU* GDB provided with Intel® Parallel Studio XE 2018 Composer Edition comes in different versions:

IA-32/Intel® 64 debugger:
Debug applications natively on IA-32 or Intel® 64 systems with gdb-ia on the command line.
A standard Eclipse* IDE can be used for this as well if a graphical user interface is desired.
Intel® Xeon Phi™ coprocessor & processor debugger (only for Linux*):
Debug applications remotely on Intel® Xeon Phi™ coprocessor systems. The debugger will run on a host system and a debug agent (gdbserver) on the coprocessor.
There are two options:
- Use the command line version of the debugger with gdb-mic.
  This only works for native Intel® Xeon Phi™ coprocessor X100 applications. For Intel® Xeon Phi™ coprocessor & processor X200 use gdb-ia.
  A standard Eclipse* IDE can be used for this as well if a graphical user interface is desired.
- Use an Eclipse* IDE plugin shipped with Intel® Parallel Studio XE 2018 Composer Edition.
  This works only for offload enabled Intel® Xeon Phi™ coprocessor & processor applications. Instructions on how to use GNU* GDB can be found in the Documentation section.

Documentation

The documentation for the provided GNU* GDB can be found here:

<install-dir>/documentation_2018/en/debugger/gdb-ia/gdb.pdf<install-dir>/documentation_2018/en/debugger/ps2018/get_started.htm

The latter is available online as well:

Known Issues and Changes

Knights Landing Debugging in Visual Studio

If Knights Landing debugging inside Visual Studio fails to attach to the Knights Landing targets, try the following 2 steps:
1) Resolving the IP address of the mic card

Open the Intel Compiler x64 environment command prompt.
Check if the mic0 card address can be resolved: DOS> ping mic0
If that fails to find the mic0 card, then you need to register it manually, otherwise continue with 2).
First we need to find out the IP address of the mic0 card: DOS> ipconfig
Search for the mic0 adapter IP address. The mic0 card IP address is the same just with 1 as last digit.
Then add the following line to the hosts file in C:\Windows\system32\drivers\etc\hosts.
<mic0-IP-address> mic0
(If you have more then 1 mic card, you need to make the same also for mic1, …)
Afterwards reboot your PC.

2) Running the MIC application in the Intel Compiler Environment

If you want to run or debug a MIC application ALWAYS use the Intel Compiler x64 environment command prompt!
To debug the mic application start Visual Studio from this corrected the Intel Compiler x64 environment command prompt:
DOS> C:\Program Files (x86)\Microsoft Visual Studio 14.0\Common7\IDE\devenv.com

Not found: libncurses.so.5

Not found: libtinfo.so.5

As <path>, use the location of the system's ncurses library.

Safely ending offload debug sessions

Manually stop a debug session before the application reaches the exit-code.
When stopped, press the red stop button in the tool-bar in the Intel® MIC Architecture-side debugger first. This will end the offloaded part of the application.
Next, do the same for the CPU-side debugger.
The link between the two debuggers will be kept alive. The Intel® MIC Architecture-side debugger will stay connected to the debug agent and the application will remain loaded in the CPU-side debugger, including all breakpoints that have been set.
At this point, both debugger windows can safely be closed.

Intel® MIC Architecture-side debugger asserts on setting source directories

Setting source directories in the GNU* GDB might lead to an assertion.

Resolution:

Accessing _Cilk_shared variables in the debugger

Writing to a shared variable in an offloaded section from within the CPU-side debugger before the CPU-side debuggee has accessed that variable may result in loss of the written value/might display a wrong value or cause the application to crash.

Consider the following code snippet:

_Cilk_shared bool is_active;
_Cilk_shared my_target_func() {
  //Accessing “is_active” from the debugger *could* lead to unexpected
  // results e.g. a lost write or outdated data is read.
  is_active = true;
  // Accessing "is_active" (read or write) from the debugger at this
  // point is considered safe e.g. correct value is displayed.
}

Debugger and debugged application required to be located on local drive (OS X/macOS only)

Intel® Debugger for Heterogeneous Compute 2018

Features

The version of Intel® Debugger for Heterogeneous Compute 2018 provided as part of Intel® Parallel Studio XE 2018 Composer Edition uses GDB version 7.6. It provides the following features:

Debugging applications containing offload enabled code to Intel® Graphics Technology
Eclipse* IDE integration

The provided documentation (<install-dir>/documentation_2018/en/debugger/ps2018/get_started.htm) contains more information.

Requirements

For Intel® Debugger for Heterogeneous Compute 2018, the following is required:

Hardware
- A dedicated host system is required as the target system will stop the GPU when debugging. Hence no more visual feedback is possible.
- Network connection (TCP/IP) between host and target system.
- 4^th generation Intel® Core™ processor or later with Intel® Graphics Technology up to GT3 for the target system.
Software
- Intel® Parallel Studio XE 2018 Composer Edition for Linux* host
- Intel® HD Graphics Drivers for Linux* 16.4.x and later for the target system. This package can be downloaded from https://registrationcenter.intel.com/.
- Eclipse* IDE for C/C++ Developers (full Eclipse* IDE including CDT) version 4.4 (Luna):
  http://www.eclipse.org/downloads/packages/eclipse-ide-cc-developers/lunar

Documentation

The documentation can be found here:

<install-dir>/documentation_2018/en/debugger/gdb-igfx/gdb.pdf<install-dir>/documentation_2018/en/debugger/ps2018/get_started.htm

The latter is also available online:
https://software.intel.com/en-us/get-started-with-debugger-extension-for-linux

Known Issues and Limitations

No call-stack

There is currently no provision for call-stack display. This will be addressed in future version of the debugger.

Un-interruptible threads

Due to hardware limitations it is not possible to interrupt a running thread. This may cause intermittent side-effects while debugging, where the debugger displays incorrect register and variable value for these threads. It might also show up as displaying SIGTRAP messages when breakpoints get removed while other threads are running.

Evaluation of expressions with side-effects

The debugger does not evaluate expressions that contain assignments which read memory locations before writing to them (e.g. x = x + 1). Please do not use such assignments when evaluating expressions.

Intel® Debugger Extension for Intel® Many Integrated Core Architecture (Intel® MIC Architecture)

Features

Support for both native Intel® Xeon Phi™ coprocessor applications and host applications with offload extensions
Debug multiple Intel® Xeon Phi™ coprocessors at the same time (with offload extension)

Using the Intel® Debugger Extension

Microsoft Visual Studio* 2013
Microsoft Visual Studio* 2015
Microsoft Visual Studio* 2017

Documentation

The full documentation for the Intel® Debugger Extension can be found here:

<install-dir>\documentation_2018\en\debugger\ps2018\get_started.htm

This is available online as well:

Known Issues and Limitations

Using conditional breakpoints for offload sections might stall the debugger. If aconditional breakpoint is created within an offload section, the debugger might hang when hitting it and evaluating the condition. This is currently analyzed and will be resolved in a future version of the product.
Data breakpoints are not yet supported within offload sections.
Disassembly window cannot be scrolled outside of 1024 bytes from the starting address within an offload section.
Handling of exceptions from the Intel® MIC Architecture application is not supported.
Changing breakpoints while the application is running does not work. The changes will appear to be in effect but they are not applied.
Starting an Intel® MIC Architecture native application is not supported. You can attach to a currently running application, though.
The Thread Window in Microsoft Visual Studio* offers context menu actions to Freeze, Thaw and Rename threads. These context menu actions are not functional when the thread is on an Intel® Xeon Phi™ coprocessor.
Setting a breakpoint right before an offload section sets a breakpoint at the first statement of the offload section. This only is true if there is no statement for the host between set breakpoint and offload section. This is normal Microsoft Visual Studio* breakpoint behavior but might become more visible with interweaved code from host and Intel® Xeon Phi™ coprocessor. The superfluous breakpoint for the offload section can be manually disabled (or removed) if desired.
Only Intel® 64 applications containing offload sections can be debugged with the Intel® Debugger Extension for Intel® Many Integrated Core Architecture.
Stepping out of an offload section does not step back into the host code. It rather continues execution without stopping (unless another event occurs). This is intended behavior.
The functionality “Set Next Statement” is not working within an offload section.
If breakpoints have been set for an offload section in a project already, starting the debugger might show bound breakpoints without addresses. Those do not have an impact on functionality.
For offload sections, setting breakpoints by address or within the Disassembly window won’t work.
For offload sections, using breakpoints with the following conditions of hit counts do not work: “break when the hit count is equal to” and “break when the hit count is a multiple of”.
The following options in the Disassembly window do not work within offload sections: “Show Line Numbers”, “Show Symbol Names” and “Show Source Code”
Evaluating variables declared outside the offload section shows wrong values.
Please consult the Output (Debug) window for detailed reporting. It will name unimplemented features (see above) or provide additional information required to configuration problems in a debugging session. You can open the window in Microsoft Visual Studio* via menu Debug->Windows->Output.
When debugging an offload-enabled application and a variable assignment is entered in the Immediate Window, the debugger may hang if assignments read memory locations before writing to them (for example, x=x+1). Please do not use the Immediate Window for changing variable values for offload-enabled applications.
Depending on the debugger extensions provided by Intel, the behavior (for example, run control) and output (for example, disassembly) could differ from what is experienced with the Microsoft Visual Studio* debugger. This is because of the different debugging technologies implemented by each and should not have a significant impact to the debugging experience.

Attributions

This product includes software developed at:

GDB – The GNU* Project Debugger

This program is free software; you can redistribute it and/or modify it under the terms and conditions of the GNU General Public License, version 2, as published by the Free Software Foundation.

GNU* Free Documentation License

Version 1.3, 3 November 2008

Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed.

0. PREAMBLE

1. APPLICABILITY AND DEFINITIONS

A "Modified Version" of the Document means any work containing the Document or a portion of it, either copied verbatim, or with modifications and/or translated into another language.

The "publisher" means any person or entity that distributes copies of the Document to the public.

2. VERBATIM COPYING

You may also lend copies, under the same conditions stated above, and you may publicly display copies.

3. COPYING IN QUANTITY

4. MODIFICATIONS

C. State on the Title page the name of the publisher of the Modified Version, as the publisher.

D. Preserve all the copyright notices of the Document.

E. Add an appropriate copyright notice for your modifications adjacent to the other copyright notices.

G. Preserve in that license notice the full lists of Invariant Sections and required Cover Texts given in the Document's license notice.

H. Include an unaltered copy of this License.

L. Preserve all the Invariant Sections of the Document, unaltered in their text and in their titles. Section numbers or the equivalent are not considered part of the section titles.

M. Delete any section Entitled "Endorsements". Such a section may not be included in the Modified Version.

N. Do not retitle any existing section to be Entitled "Endorsements" or to conflict in title with any Invariant Section.

O. Preserve any Warranty Disclaimers.

The author(s) and publisher(s) of the Document do not by this License give permission to use their names for publicity for or to assert or imply endorsement of any Modified Version.

5. COMBINING DOCUMENTS

6. COLLECTIONS OF DOCUMENTS

7. AGGREGATION WITH INDEPENDENT WORKS

8. TRANSLATION

9. TERMINATION

10. FUTURE REVISIONS OF THIS LICENSE

11. RELICENSING

"Incorporate" means to publish or republish a Document, in whole or in part, as part of another Document.

The operator of an MMC Site may republish an MMC contained in the site under CC-BY-SA on the same site at any time before August 1, 2009, provided the MMC is eligible for relicensing.

Disclaimer and Legal Information

* Other names and brands may be claimed as the property of others.

Microsoft, Windows, Visual Studio, Visual C++, and the Windows logo are trademarks, or registered trademarks of Microsoft Corporation in the United States and/or other countries.

Java is a registered trademark of Oracle and/or its affiliates.

↧

Intel® Software Development tools integration to Microsoft* Visual Studio 2017 issue

July 26, 2017, 7:47 am

Latest and popular articles on Intel Technologies

≫ Next: Coarray Fortran 32-bit doesn't work on 64-bit Microsoft* Windows

≪ Previous: Intel® Parallel Studio XE 2018 Composer Edition C++ - Debug Solutions Release Notes

Issue: Installation of Intel® Parallel Studio XE with Microsoft* Visual Studio 2017 integration hangs and fails on some systems. The problem is intermittent and not reproducible on every system. Any attempts to repair it fails with the message "Incomplete installation of Microsoft Visual Studio* 2017 is detected". Note, in some cases the installation may complete successfully with no error/crashes, however, the integration to VS2017 is not installed. The issue may be observed with Intel® Parallel Studio XE 2017 Update 4, Intel® Parallel Studio XE 2018 Beta and later versions as well as Intel® System Studio installations.

Environment: Microsoft* Windows, Visual Studio 2017

Root Cause: A root cause was identified and reported to Microsoft*. Note that there may be different reasons of integration failures. We are documenting all cases and providing to Microsoft for further root-cause analysis.

Workaround:

Note that with Intel Parallel Studio XE 2017 Update 4 there is no workaround for this integration problem. The following workaround is expected to be implemented in Intel Parallel Studio XE 2017 Update 5. It is implemented in Intel Parallel Studio XE 2018 Beta Update 1.

Integrate the Intel Parallel Studio XE components manually. You need to run all the files from the corresponding folders:

C++/Fortran Compiler IDE: <installdir>/ide_support_2018/VS15/*.vsix
Amplifier: <installdir>/VTune Amplifier 2018/amplxe_vs2017-integration.vsix
Advisor: <installdir>/Advisor 2018/advi_vs2017-integration.vsix
Inspector: <installdir>/Inspector 2018/insp_vs2017-integration.vsix
Debugger: <InstallDir>/ide_support_2018/MIC/*.vsix
<InstallDir>/ide_support_2018/CPUSideRDM/*.vsix

If this workaround doesn't work and installation still fails then please report the problem to Intel through the Intel® Developer Zone Forums or Online Service Center. You will need to supply the installation log file and error message from Microsoft installer.

↧

Coarray Fortran 32-bit doesn't work on 64-bit Microsoft* Windows

August 31, 2017, 8:55 am

Latest and popular articles on Intel Technologies

≫ Next: Intel® Data Analytics Acceleration Library - Decision Trees

≪ Previous: Intel® Software Development tools integration to Microsoft* Visual Studio 2017 issue

Version : Intel® Visual Fortran Compiler 17.0, 18.0

Operating System : Microsoft* Windows 10 64-bit, Microsoft* Windows Server 2012 R2 64-bit

Problem Description : Coarray Fortran 32-bit doesn't work on Microsoft* Windows 10 or Microsoft* Windows Server 2012 R2 (only on 64-bit OS) due to required utilities “mpiexec.exe” and “smpd.exe” not working properly.

Resolution Status :

It is a compatibility issue. You need to change the compatibility properties in order to run “mpiexec.exe” and “smpd.exe” correctly. Following workaround should resolve the problem:

1. Go to folder where your “mpiexec.exe” and “smpd.exe” files are located.
2. For both files follow these steps:

Right click > Properties > Compatibility Tab
Make sure the “Run this program in compatibility mode for:” box is checked and Windows Vista (Service Pack 2) is chosen.
Click Apply and close the Properties window.

Coarray Fortran 32-bit application should work fine if all steps followed carefully.

↧

Intel® Data Analytics Acceleration Library - Decision Trees

August 13, 2017, 11:53 pm

Latest and popular articles on Intel Technologies

≫ Next: Intel(R) Math Kernel Library - Introducing Vectorized Compact Routines

≪ Previous: Coarray Fortran 32-bit doesn't work on 64-bit Microsoft* Windows

Introduction

Decision trees method is one of most popular approaches in machine learning. They can easily be used to solve different classification and regression tasks. Often, decision trees endear by their universality and by the fact that the model obtained by learning the decision tree is easy to interpret even by an unprepared person.

The universality of decision trees is a consequence of two main factors. First, the decision tree method is non-parametric machine learning method. It means that its usage does not need to know or assume the probabilistic characteristics of the data with which it is supposed to work. Second, the decision tree method naturally incorporates mixtures of variables with different levels of measurement [1].

At the same time, the decision tree model is a white-box, from which it is clear to understand for which particular data a particular class for the classification problem, or one or another value of the dependent variable for regression problem, will be predicted, which features or dependent variables have impact on this and how.

This article describes the decision trees algorithm and how Intel® Data Analytics Acceleration Library (Intel® DAAL) [2] helps optimize this algorithm when running it on systems equipped with Intel® Xeon® processors.

What is a Decision tree?

Decision trees partition the feature space into a set of hypercubes, and then fit a simple model in each one. Such a simple model can be a prediction model, which ignores all predictors and predicts the majority (most frequent) class (or mean of dependent variable for regression), also known as 0-R or constant classifier.

Decision tree induction constructs a tree-like graph structure as shown on the figure below where each internal (non-leaf) node denotes a test on features, each branch descending from node corresponds to an outcome of the test, and each external node (leaf) node donates the mentioned simple model.

The test is a rule, which depends on feature values, to perform the partitioning of the feature space: each outcome of the test represents an appropriate hypercube associated with both the test and one of descending branches. If the test is Boolean expression (e.g. f< c or f = c, where f is a feature and c is a constant fitted while decision tree induction), the inducted decision tree is a binary tree and so each its non-leaf node has exactly two branches (“true” and “false”) according to result of such a Boolean expression. In this case, often, left branch implicitly assumed to be associated with “true” outcome, while right branch implicitly assumed to be associated with “false” outcome.

Test selection is performed as a search through all reasonable tests to find best one according to some criterion, named split criterion. There are many widely used split criteria, including Gini index [3] and Information Gain [4] for classification, and Mean-Squared Error (MSE) [3] for regression.

To improve prediction, decision tree can be pruned [5]. Pruning technics that are embedded in the training process named pre-pruning, because they stop further growing of the decision tree. There are also post-pruning technics that replace already completely trained decision tree by another one [5].

For instance, Reduced Error Pruning (REP), described in [5], assumes an existence of a separate pruning dataset, each observation in which is used to get prediction by the original (unpruned) tree. For every non-leaf subtree, the change in mispredictions is examined over the pruning dataset that would occur if this subtree were replaced by the best possible leaf:

where E_Subtree and E_leaf are numbers of errors in case of classification and MSE in case of regression respectively for given subtree and a best possible leaf, which replaces given subtree. If the new tree would give an equal or fewer mispredictions (DE £ 0) and subtree contains no subtree with the same property, the subtree is replaced by the leaf. The process continues until any further replacements would increase mispredictions over the pruning dataset. The final tree is the most accurate subtree of the original tree with respect to the pruning dataset and is the smallest tree with that accuracy. Pruning dataset can be some fraction of original training dataset (e.g. randomly chosen 20% of observations), but in this case those observations must be excluded from the training dataset.

The prediction is performed by starting at the root node of the tree, testing features by test specified by this node, then moving down the tree branch corresponding to the outcome of the test for the given example. This process is then repeated for the subtree rooted at the new node. The final result of the prediction of is the prediction of simple model at leaf node.

Applications of Decision trees

Decision trees can be used in many real-world applications [6]:

Agriculture
Astronomy (e.g. for filtering noise from Hubble Space Telescope images)
Biomedical Engineering
Control Systems
Financial analysis
Manufacturing and Production
Medicine
Molecular biology
Object recognition
Pharmacology
Physics (e.g. for the detection of physical particles)
Plant diseases (e.g. to assess the hazard of mortality to pine trees)
Power systems (e.g. power system security assessment and power stability prediction)
Remote sensing
Software development (e.g. to estimate the development effort of a given software module)
Text processing (e.g. medical text classification)
Personal learning assistants
Classifying sleep signals

Advantages and disadvantages of Decision trees

Using Decision trees has advantages and disadvantages [7]:

Advantages
- Simple to understand and interpret. Have a white-box model.
- Able to handle both numerical and categorical data.
- Requires little data preparation.
- Non-statistical approach that makes no assumptions of the training data or prediction residuals; e.g., no distributional, independence, or constant variance assumptions
- Performs well even with large datasets.
- Mirrors human decision making more closely than other approaches.
- Robust against co-linearity.
- Have built in feature selection.
- Have value even with small datasets.
- Can be combined with other techniques.
Disadvantages
- Trees do not tend to be as accurate as other approaches.
- Trees can be very non-robust. A small change in the training data can result in a big change in the tree, and thus a big change in final predictions.
- The problem of learning an optimal decision tree is known to be NP-complete under several aspects of optimality and even for simple concepts. Consequently, practical decision-tree learning algorithms are based on heuristics such as the greedy algorithm where locally-optimal decisions are made at each node.
- Decision-tree learners can create over-complex trees that do not generalize well from the training data. Mechanisms such as pruning are necessary to avoid this problem.
- There are concepts that are hard to learn because decision trees do not express them easily, such as XOR, parity or multiplexer problems. In such cases, the decision tree becomes prohibitively large.

Intel® Data Analytics Acceleration Library

Intel® DAAL is a library consisting of many basic building blocks that are optimized for data analytics and machine learning. Those building blocks are highly optimized for the latest features of latest Intel® processors. More about Intel® DAAL can be found in [2]. Intel® DAAL provides Decision tree classification and regression algorithms.

Using Decision trees in Intel® Data Analytics Acceleration Library

This section shows how to invoke Decision trees classification and regression using Intel® DAAL.

Do the following steps to invoke Decision tree classification algorithm from Intel® DAAL:

1.	Ensure that you have Intel® DAAL installed and environment is prepared. See details in [8, 9, 10] according to your operating system.
2.	Include header file daal.h into your application:
#include <daal.h>
3.	To simplify usage of Intel® DAAL namespaces we will use following using directives:
using namespace daal;
using namespace daal::algorithms;
4.	We will assume that training, pruning and testing datasets are in appropriate .csv files. If so, we must read first and second of them into Intel® DAAL numeric tables:
const size_t nFeatures = 5; /* Number of features in training and testing data sets */

/* Initialize FileDataSource<CSVFeatureManager> to retrieve the input data from a .csv
   file */
FileDataSource<CSVFeatureManager> trainDataSource(“train.csv”,
    DataSource::notAllocateNumericTable, DataSource::doDictionaryFromContext);

/* Create Numeric Tables for training data and labels */
NumericTablePtr trainData(new HomogenNumericTable<>(nFeatures, 0,
    NumericTable::notAllocate));
NumericTablePtr trainGroundTruth(new HomogenNumericTable<>(1, 0,
    NumericTable::notAllocate));
NumericTablePtr mergedData(new MergedNumericTable(trainData, trainGroundTruth));

/* Retrieve the data from the input file */
trainDataSource.loadDataBlock(mergedData.get());

/* Initialize FileDataSource<CSVFeatureManager> to retrieve the pruning input data from a
   .csv file */
FileDataSource<CSVFeatureManager> pruneDataSource(“prune.csv”,
    DataSource::notAllocateNumericTable, DataSource::doDictionaryFromContext);

/* Create Numeric Tables for pruning data and labels */
NumericTablePtr pruneData(new HomogenNumericTable<>(nFeatures, 0,
    NumericTable::notAllocate));
NumericTablePtr pruneGroundTruth(new HomogenNumericTable<>(1, 0,
    NumericTable::notAllocate));
NumericTablePtr pruneMergedData(new MergedNumericTable(pruneData, pruneGroundTruth));

/* Retrieve the data from the pruning input file */
pruneDataSource.loadDataBlock(pruneMergedData.get());
5.	Create an algorithm object to train the model:
const size_t nClasses = 5;  /* Number of classes */
/* Create an algorithm object to train the Decision tree model */
decision_tree::classification::training::Batch<> algorithm1(nClasses);
6.	Pass the training data and labels with pruning data and labels to the algorithm:
/* Pass the training data set, labels, and pruning dataset with labels to the algorithm */
algorithm1.input.set(classifier::training::data, trainData);
algorithm1.input.set(classifier::training::labels, trainGroundTruth);
algorithm1.input.set(decision_tree::classification::training::dataForPruning, pruneData);
algorithm1.input.set(decision_tree::classification::training::labelsForPruning,
    pruneGroundTruth);
7.	Train the model:
/* Train the Decision tree model */
algorithm1.compute();
where algorithm1 is variable as defined in step 5.
8.	Store result of training in variable:
decision_tree::classification::training::ResultPtr trainingResult =
    algorithm1.getResult();
9.	Read testing dataset from appropriate .csv file:
/* Initialize FileDataSource<CSVFeatureManager> to retrieve the test data from a .csv
   file */
FileDataSource<CSVFeatureManager> testDataSource(“test.csv”,
    DataSource::notAllocateNumericTable, DataSource::doDictionaryFromContext);

 /* Create Numeric Tables for testing data and labels */
NumericTablePtr testData(new HomogenNumericTable<>(nFeatures, 0,
    NumericTable::notAllocate));
testGroundTruth = NumericTablePtr(new HomogenNumericTable<>(1, 0,
    NumericTable::notAllocate));
NumericTablePtr mergedData(new MergedNumericTable(testData, testGroundTruth));

/* Retrieve the data from input file */
testDataSource.loadDataBlock(mergedData.get());
10.	Create an algorithm object to test the model:
/* Create algorithm objects for Decision tree prediction with the default method */
decision_tree::classification::prediction::Batch<> algorithm2;
11.	Pass the testing data and trained model to the algorithm:
/* Pass the testing data set and trained model to the algorithm */
algorithm2.input.set(classifier::prediction::data, testData);
algorithm2.input.set(classifier::prediction::model,
    trainingResult->get(classifier::training::model));
12.	Test the model:
/* Compute prediction results */
algorithm2.compute();
13.	Retrieve the results of the prediction:
/* Retrieve algorithm results */
classifier::prediction::ResultPtr predictionResult = algorithm2.getResult();

For decision tree regression, the steps 1-4, 7, 9, 12 are same, while other are very similar:

1.	Ensure that you have Intel® DAAL installed and environment is prepared. See details in [8, 9, 10] according to your operating system.
2.	Include header file daal.h into your application:
#include <daal.h>
3.	To simplify usage of Intel® DAAL namespaces we will use following using directives:
using namespace daal;
using namespace daal::algorithms;
4.	We will assume that training, pruning and testing datasets are in appropriate .csv files. If so, we must read first and second of them into Intel® DAAL numeric tables:
const size_t nFeatures = 5; /* Number of features in training and testing data sets */

/* Initialize FileDataSource<CSVFeatureManager> to retrieve the input data from a .csv
   file */
FileDataSource<CSVFeatureManager> trainDataSource(“train.csv”,
    DataSource::notAllocateNumericTable, DataSource::doDictionaryFromContext);

/* Create Numeric Tables for training data and labels */
NumericTablePtr trainData(new HomogenNumericTable<>(nFeatures, 0,
    NumericTable::notAllocate));
NumericTablePtr trainGroundTruth(new HomogenNumericTable<>(1, 0,
    NumericTable::notAllocate));
NumericTablePtr mergedData(new MergedNumericTable(trainData, trainGroundTruth));

/* Retrieve the data from the input file */
trainDataSource.loadDataBlock(mergedData.get());

/* Initialize FileDataSource<CSVFeatureManager> to retrieve the pruning input data from a
   .csv file */
FileDataSource<CSVFeatureManager> pruneDataSource(“prune.csv”,
    DataSource::notAllocateNumericTable, DataSource::doDictionaryFromContext);

/* Create Numeric Tables for pruning data and labels */
NumericTablePtr pruneData(new HomogenNumericTable<>(nFeatures, 0,
    NumericTable::notAllocate));
NumericTablePtr pruneGroundTruth(new HomogenNumericTable<>(1, 0,
    NumericTable::notAllocate));
NumericTablePtr pruneMergedData(new MergedNumericTable(pruneData, pruneGroundTruth));

/* Retrieve the data from the pruning input file */
pruneDataSource.loadDataBlock(pruneMergedData.get());
5.	Create an algorithm object to train the model:
/* Create an algorithm object to train the Decision tree model */
decision_tree::regression::training::Batch<> algorithm;
6.	Pass the training data and labels with pruning data and labels to the algorithm:
/* Pass the training data set, dependent variables, and pruning dataset with dependent
   variables to the algorithm */
algorithm.input.set(decision_tree::regression::training::data, trainData);
algorithm.input.set(decision_tree::regression::training::dependentVariables,
    trainGroundTruth);
algorithm.input.set(decision_tree::regression::training::dataForPruning, pruneData);
algorithm.input.set(decision_tree::regression::training::dependentVariablesForPruning,
    pruneGroundTruth);
7.	Train the model:
/* Train the Decision tree model */
algorithm1.compute();
where algorithm1 is variable as defined in step 5.
8.	Store result of training in variable:
decision_tree::regression::training::ResultPtr trainingResult =
    algorithm1.getResult();
9.	Read testing dataset from appropriate .csv file:
/* Initialize FileDataSource<CSVFeatureManager> to retrieve the test data from a .csv
   file */
FileDataSource<CSVFeatureManager> testDataSource(“test.csv”,
    DataSource::notAllocateNumericTable, DataSource::doDictionaryFromContext);

 /* Create Numeric Tables for testing data and labels */
NumericTablePtr testData(new HomogenNumericTable<>(nFeatures, 0,
    NumericTable::notAllocate));
testGroundTruth = NumericTablePtr(new HomogenNumericTable<>(1, 0,
    NumericTable::notAllocate));
NumericTablePtr mergedData(new MergedNumericTable(testData, testGroundTruth));

/* Retrieve the data from input file */
testDataSource.loadDataBlock(mergedData.get());
10.	Create an algorithm object to test the model:
/* Create algorithm objects for Decision tree prediction with the default method */
decision_tree::regression::prediction::Batch<> algorithm2;
11.	Pass the testing data and trained model to the algorithm:
/* Pass the testing data set and trained model to the algorithm */
algorithm.input.set(decision_tree::regression::prediction::data, testData);
algorithm.input.set(decision_tree::regression::prediction::model,
    trainingResult->get(decision_tree::regression::training::model));
12.	Test the model:
/* Compute prediction results */
algorithm2.compute();
13.	Retrieve the results of the prediction:
/* Retrieve algorithm results */
decision_tree::regression::prediction::ResultPtr predictionResult =
    algorithm2.getResult();

Conclusion

Decision tree is a powerful method, which can be used for both classification and regression. Intel® DAAL optimized the decision tree algorithm. By using Intel® DAAL developers can take advantage of new features in future generations of Intel® Xeon® processors without having to modify their applications. They only need to link their applications to the latest version of Intel® DAAL.

References

↧

Intel(R) Math Kernel Library - Introducing Vectorized Compact Routines

August 14, 2017, 1:26 am

Latest and popular articles on Intel Technologies

≫ Next: Introducing Batch GEMM Operations

≪ Previous: Intel® Data Analytics Acceleration Library - Decision Trees

Introduction

Many high performance computing applications depend on matrix operations performed on large groups of matrices of small sizes. Intel® Math Kernel Library (Intel® MKL) 2018 and later versions provide new compact routines that include optimizations for problems of this type.

The main idea behind these compact routines is to create true SIMD computations, in which subgroups of matrices are operated on with kernels that abstractly appear as scalar kernels while registers are filled by cross-matrix vectorization. Intel MKL compact routines provide significant performance benefits compared to batched techniques (see https://software.intel.com/en-us/articles/introducing-batch-gemm-operations for more detailed information about Intel MKL Batch functions), while maintaining ease-of-use through the inclusion of compact service functions that facilitate the reformatting of matrix data for use in these routines.

Compact routines operate on matrices that have been packed into a contiguous segment of memory in an interleaved format, called compact format. Six compact routines have been introduced in Intel MKL 2018: general matrix-multiply (mkl_?gemm_compact), triangular matrix equation solve (mkl_?trsm_compact), inverse calculation (mkl_?getrinp_compact), LU factorization (mkl_?getrfnp_compact), Cholesky decomposition (mkl_?potrf_compact), and QR decomposition (mkl_?geqrf_compact). These routines can only be used for groups of matrices of identical dimensions, where the layout (row-major or column-major) and the stride are identical throughout the group.

Compact Format

In compact format, for real precisions, matrices are organized into packs of size V, where V is related to the SIMD vector length of the underlying architecture. Each pack is a 3D tensor with the matrix index incrementing the fastest. These packs can then be loaded into registers and operated on using SIMD instructions.

The picture below demonstrates the packing of a set of 4, 3 x 3, real-precision matrices into compact format. The pack length for this example is V = 2, resulting in 2 compact packs.

Figure 1: Compact format for 4, 3 x 3, real precision matrices with pack length V = 2

The particular form for the packs for each architecture and problem precision are specified by a MKL_COMPACT_PACK enum type.

Before calling a BLAS or LAPACK compact function, the input data must be packed in compact format. After execution, the output data should be unpacked from this compact format, unless another compact routine will be called immediately following the first. Two service functions, mkl_?gepack_compact, and mkl_?geunpack_compact, facilitate the process of storing matrices in compact format. It is recommended that the user call these service functions before calling the mkl_?gepack_compact routine to obtain the optimal format for performance, but advanced users can pack and unpack the matrices themselves and still use Intel MKL compact kernels on the packed set.

For more details, including a description of the compact format of complex-type arrays, see <Compact Format> in the Intel MKL User’s guide.

A SIMPLE VISUAL EXAMPLE

A simple compact version of a matrix multiplication is illustrated in this section, performing the operation C = A * B for a set of 4, 3 x 3, real-precision matrices. Generic (or batched) routines require 4 matrix-matrix multiplications to be performed for a problem of this type, as illustrated in Figure 2.

Figure 2: Generic GEMM for a set of 4, 3 x 3 matrices

Assuming that the matrices have been packed into compact format using a pack length of V = 2, the compact version of this problem involves two matrix-matrix multiplications, as illustrated in Figure 3

Figure 3: Compact GEMM for a set of 4, 3 x 3 matrices

The elements of the matrices involved in these two multiplications are vectors of length V, which are loaded into registers and operated on as if they were a scalar element in an ordinary matrix-matrix multiplication. Clearly, it is optimal to have pack length V equal to the length of the SIMD registers of the architecture.

NUMERICAL LIMITATIONS

Compact routines are subject to a set of numerical limitations, and they skip most of the checks presented in regular BLAS and LAPACK routines to provide effective vectorization. Error checking is the responsibility of the user. For more information on limitations in compact routines, see <MKL User Guide Numerical Limitations>

BLAS COMPACT ROUTINES

Intel MKL BLAS provides compact routines for general matrix-matrix multiplication and solving triangular matrix equations. The following table provides a brief description of the new routines. For detailed information on usage for these routines, see the Intel MKL User’s Guide.

MKL Routine

Description

mkl_?gemm_compact

mkl_?trsm_compact

General matrix-matrix multiply

Performs the operation

C = alpha*op(A)*op(B) + beta*C

where op(X) is one of op(X) = X, op(X) = X^T, or op(X) = X^H, alpha and beta are scalars, and A, B, and C are matrices stored in compact format.

Solves a triangular matrix equation

Computes the solution of one of the following matrix equations:

op(A) * X = alpha * B, or X*op(A) = alpha*B

where alpha is a scalar, X and B are m x n matrices stored in compact format, and A is a unit (or non-unit) triangular matrix stored in compact format.

LAPACK COMPACT ROUTINES

Intel MKL LAPACK provides compact functions to calculate QR, LU, and Cholesky decompositions, as well as inverses, in Intel MKL 2018 (and later versions). The compact routines for LAPACK follow the same optimization principles as the compact BLAS routines. The following table provides a brief description of the new routines. For detailed information on these routines, see the Intel MKL User’s Guide.

MKL Routine

Description

mkl_?geqrf_compact

mkl_?getrfnp_compact

mkl_?getrinp_compact

mkl_?potrf_compact

QR decomposition

Computes the QR factorization of a set of general m x n, matrices, stored in the compact format.

LU decomposition, without pivoting

Computes the LU factorization, without pivoting, of a set of general, m x n matrices A, which are stored in array ap in the compact format (see Compact Format).

Inverse, without pivoting

Computes the inverse, of a set of LU factorized (without pivoting), general matrices A, which are stored in the compact format (see Compact Format).

Cholesky decomposition

Computes the Cholesky factorization of a set of symmetric (Hermitian), positive-definite, matrices, stored in the compact format.

Example

The following example uses Intel MKL compact routines to calculate first the LU factorizations, then the inverses (from the LU factorizations), of a group of 2048, 8x8 matrices. Within this example, the same calculations are made using an OpenMP loop on the group of matrices. The time that each routine takes is printed so that the user can verify the performance improvement when using compact routines.

Notice that the routines mkl_dgetrfnp_compact and mkl_dgetrinp_compact are called between the mkl_dgepack_compact and mkl_dgeunpack functions. Because the mkl_?gepack_compact and mkl_?geunpack_compact functions add overhead, users who call multiple compact routines on the same group of matrices will see the greatest performance benefit from using compact routines.

The complex compact routines are executed similarly, but it is important to note that for complex precisions, all input parameters are of real type. For more details, see <Compact Format> in the Intel MKL User’s guide. Examples of the calling sequences for each individual routine can be found in the Intel MKL 2018 product.

#include <stdio.h>
#include <stdlib.h>
#include <omp.h>
#include "mkl.h"

#define N                        8
#define NMAT                  2048

#define NITER_WARMUP            10

void test(double *t_compact, double *t_omp) {
    MKL_INT i, j;

    MKL_LAYOUT layout = MKL_COL_MAJOR;
    MKL_INT m = N;
    MKL_INT n = N;
    MKL_INT lda = m;

    MKL_INT info;
    MKL_COMPACT_PACK format;
    MKL_INT nmat = NMAT;

    /* Set up standard arrays in P2P (pointer-to-pointer) format */
    MKL_INT a_size = lda * n;
    MKL_INT na = a_size * nmat;
    double *a_ref = (double *)mkl_malloc(na * nmat * sizeof(double), 128);
    double *a = (double *)mkl_malloc(na * nmat * sizeof(double), 128);
    double *a_array[NMAT];
    double *a_compact;

    /* For random generation of matrices */
    MKL_INT idist = 1;
    MKL_INT iseed[] = { 0, 1, 2, 3 };
    double diag_offset = (double)n;

    /* For workspace calculation */
    MKL_INT imone = -1;
    MKL_INT lwork;
    double work_query[1];
    double *work_compact;

    /* For threading */
    MKL_INT nthr = omp_get_max_threads();
    MKL_INT ithr;
    MKL_INT lwork_i;
    double *work_omp;
    double* work_i = work_omp;

    /* For setting up compact arrays */
    MKL_INT a_buffer_size;
    MKL_INT ldap = lda;
    MKL_INT sdap = n;

    /* Random generation of matrices */
    dlarnv(&idist, iseed, &na, a);

    for (i = 0; i < nmat; i++) {
        /* Make matrix diagonal dominant to avoid accuracy issues
                 in the non-pivoted LU factorization */
        for (j = 0; j < m; j++) {
            a[i * a_size + j + j * lda] += diag_offset;
        }
        a_array[i] = &a[i * a_size];
    }
    /* Set up a_ref to use in OMP version */
    for (i = 0; i < na; i++) {
        a_ref[i] = a[i];
    }

    /* -----Start Compact----- */

    /* Set up Compact arrays */
    format = mkl_get_format_compact();

    a_buffer_size = mkl_dget_size_compact(ldap, sdap, format, nmat);

    a_compact = (double *)mkl_malloc(a_buffer_size, 128);

    /* Workspace query */
    mkl_dgetrinp_compact(layout, n, a_compact, ldap, work_query, imone, &info, format, nmat);
    lwork = (MKL_INT)work_query[0];
    work_compact = (double *)mkl_malloc(sizeof(double) * lwork, 128);

    /* Start timing compact */
    *t_compact = dsecnd();

    /* Pack from P2P to Compact format */
    mkl_dgepack_compact(layout, n, n, a_array, lda, a_compact, ldap, format, nmat);

    /* Perform Compact LU Factorization */
    mkl_dgetrfnp_compact(layout, n, n, a_compact, ldap, &info, format, nmat);

    /* Perform Compact Inverse Calculation */
    mkl_dgetrinp_compact(layout, n, a_compact, ldap, work_compact, lwork, &info, format, nmat);

    /* Unpack from Compact to P2P format */
    mkl_dgeunpack_compact(layout, n, n, a_array, lda, a_compact, ldap, format, nmat);

    /* End timing compact */
    *t_compact = dsecnd() - *t_compact;
    /* -----End Compact----- */

    /* -----Start OMP----- */
    for (i = 0; i < nmat; i++) {
        a_array[i] = &a_ref[i * a_size];
    }

    /* Workspace query */
    mkl_dgetrinp(&n, a_array[0], &lda, work_query, &imone, &info);
    lwork = (MKL_INT)work_query[0] * nthr;
    work_omp = (double *)mkl_malloc(sizeof(double) * lwork, 128);

    /* Start timing OMP */
    *t_omp = dsecnd();

    /* OpenMP loop */
    #pragma omp parallel for
    for (i = 0; i < nmat; i++) {
        /* Set up workspace for thread */
        ithr = omp_get_thread_num();
        lwork_i = lwork / nthr;
        work_i = &work_omp[ithr * lwork_i];

        /* Perform LU Factorization */
        mkl_dgetrfnp(&n, &n, a_array[i], &lda, &info);

        /* Perform Inverse Calculation */
        mkl_dgetrinp(&n, a_array[i], &lda, work_i, &lwork_i, &info);
    }

    /* End timing OMP */
    *t_omp = dsecnd() - *t_omp;
    /* -----End OMP----- */

    /* Deallocate arrays */
    mkl_free(a_compact);
    mkl_free(a);
    mkl_free(a_ref);
    mkl_free(work_compact);
    mkl_free(work_omp);
}

int main() {
    MKL_INT i = 0;
    double t_compact;
    double t_omp;
    double flops = NMAT * ((2.0 / 3.0 + 4.0 / 3.0) * N * N * N);
    for (i = 0; i < NITER_WARMUP; i++) {
        test(&t_compact, &t_omp);
    }
    test(&t_compact, &t_omp);
    printf("N = %d, NMAT = %d\n", N, NMAT);
    printf("Compact time = %fs, GFlops = %f\n", t_compact, flops / t_compact / 1e9);
    printf("OMP     time = %fs, GFlops = %f\n", t_omp,     flops / t_omp / 1e9);
    return 0;
}

PERFORMANCE RESULTS

The following four charts demonstrate the performance improvement for the following operations: general matrix-matrix multiplication (GEMM), triangular matrix equation solve (TRSM), non-pivoting LU-factorization of a general matrix (GETRFNP), and inverse calculation of an LU-factorized (without pivoting) general matrix (GETRINP). The results were measured against calls to the generic BLAS and LAPACK functions, as in the above example.

↧

Introducing Batch GEMM Operations

September 14, 2017, 1:55 am

Latest and popular articles on Intel Technologies

≫ Next: Wrong Intel® Fortran compiler version displayed in Microsoft* Visual Studio 2012

≪ Previous: Intel(R) Math Kernel Library - Introducing Vectorized Compact Routines

The general matrix-matrix multiplication (GEMM) is a fundamental operation in most scientific, engineering, and data applications. There is an everlasting desire to make this operation run faster. Optimized numerical libraries like Intel® Math Kernel Library (Intel® MKL) typically offer parallel high-performing GEMM implementations to leverage the concurrent threads supported by modern multi-core architectures. This strategy works well when multiplying large matrices because all cores are used efficiently. When multiplying small matrices, however, individual GEMM calls may not optimally use all the cores. Developers wanting to improve utilization usually batch multiple independent small GEMM operations into a group and then spawn multiple threads for different GEMM instances within the group. While this is a classic example of an embarrassingly parallel approach, making it run optimally requires a significant programming effort that involves threads creation/termination, synchronization, and load balancing. That is, until now.

Intel MKL 11.3 Beta (part of Intel® Parallel Studio XE 2016 Beta) includes a new flavor of GEMM feature called "Batch GEMM". This allows users to achieve the same objective described above with minimal programming effort. Users can specify multiple independent GEMM operations, which can be of different matrix sizes and different parameters, through a single call to the "Batch GEMM" API. At runtime, Intel MKL will intelligently execute all of the matrix multiplications so as to optimize overall performance. Here is an example that shows how "Batch GEMM" works:

Example

Let A0, A1 be two real double precision 4x4 matrices; Let B0, B1 be two real double precision 8x4 matrices. We'd like to perform these operations:

C0 = 1.0 * A0 * B0^T, and C1 = 1.0 * A1 * B1^T

where C0 and C1 are two real double precision 4x8 result matrices.

Again, let X0, X1 be two real double precision 3x6 matrices; Let Y0, Y1 be another two real double precision 3x6 matrices. We'd like to perform these operations:

Z0 = 1.0 * X0 * Y0^T + 2.0 * Z0, and Z1 = 1.0 * X1 * Y1^T + 2.0 * Z1

where Z0 and Z1 are two real double precision 3x3 result matrices.

We could accomplished these multiplications using four individual calls to the standard DGEMM API. Instead, here we use a single "Batch GEMM" call for the same with potentially improved overall performance. We illustrate this using the "cblas_dgemm_batch" function as an example below.

#define    GRP_COUNT    2

MKL_INT    m[GRP_COUNT] = {4, 3};
MKL_INT    k[GRP_COUNT] = {4, 6};
MKL_INT    n[GRP_COUNT] = {8, 3};

MKL_INT    lda[GRP_COUNT] = {4, 6};
MKL_INT    ldb[GRP_COUNT] = {4, 6};
MKL_INT    ldc[GRP_COUNT] = {8, 3};

CBLAS_TRANSPOSE    transA[GRP_COUNT] = {'N', 'N'};
CBLAS_TRANSPOSE    transB[GRP_COUNT] = {'T', 'T'};

double    alpha[GRP_COUNT] = {1.0, 1.0};
double    beta[GRP_COUNT] = {0.0, 2.0};

MKL_INT    size_per_grp[GRP_COUNT] = {2, 2};

// Total number of multiplications: 4
double    *a_array[4], *b_array[4], *c_array[4];
a_array[0] = A0, b_array[0] = B0, c_array[0] = C0;
a_array[1] = A1, b_array[1] = B1, c_array[1] = C1;
a_array[2] = X0, b_array[2] = Y0, c_array[2] = Z0;
a_array[3] = X1, b_array[3] = Y1, c_array[3] = Z1;

// Call cblas_dgemm_batch
cblas_dgemm_batch (
        CblasRowMajor,
        transA,
        transB,
        m,
        n,
        k,
        alpha,
        a_array,
        lda,
        b_array,
        ldb,
        beta,
        c_array,
        ldc,
        GRP_COUNT,
        size_per_group);

The "Batch GEMM" interface resembles the GEMM interface. It is simply a matter of passing arguments as arrays of pointers to matrices and parameters, instead of as matrices and the parameters themselves. We see that it is possible to batch the multiplications of different shapes and parameters by packaging them into groups. Each group consists of multiplications of the same matrices shape (same m, n, and k) and the same parameters.

Performance

While this example does not show performance advantages of "Batch GEMM", when you have thousands of independent small matrix multiplications then the advantages of "Batch GEMM" become apparent. The chart below shows the performance of 11K small matrix multiplications with various sizes using "Batch GEMM" and the standard GEMM, respectively. The benchmark was run on a 28-core Intel Xeon processor (Haswell). The performance metric is Gflops, and higher bars mean higher performance or a faster solution.

The second chart shows the same benchmark running on a 61-core Intel Xeon Phi co-processor (KNC). Because "Batch GEMM" is able to exploit parallelism using many concurrent multiple threads, its advantages are more evident on architectures with a larger core count.

Summary

This article introduces the new API for batch computation of matrix-matrix multiplications. It is an ideal solution when many small independent matrix multiplications need to be performed. "Batch GEMM" supports all precision types (S/D/C/Z). It has Fortran 77 and Fortran 95 APIs, and also CBLAS bindings. It is available in Intel MKL 11.3 Beta and later releases. Refer to the reference manual for additional documentation.

Optimization Notice in English

↧

Wrong Intel® Fortran compiler version displayed in Microsoft* Visual Studio 2012

August 21, 2017, 5:15 am

Latest and popular articles on Intel Technologies

≫ Next: An update to the integration of Intel® Media SDK and FFmpeg

≪ Previous: Introducing Batch GEMM Operations

Issue: Microsoft* Visual Studio 2012 is supported by Intel® Parallel Studio XE 2017. It is not supported by Intel® Parallel Studio XE 2018. Wrong Intel® Fortran compiler version displayed in Microsoft* Visual Studio 2012 in case both Intel® Parallel Studio XE 2017 and Intel® Parallel Studio XE 2018 are installed on the same system with Microsoft* Visual Studio 2012.

It may be observed while opening "Tools > Options > Intel Compilers and Tools > Visual Fortran > Compilers".
The 'selected compiler' may be shown as "Intel(R) Visual Fortran Compiler 18.0", which is not correct.

Once the compilation process is invoked the correct compiler version is used. The output window shows a wrong compiler name "Intel(R) Visual Fortran Compiler 18.0". For example, there are both 17.0 Update 4 and 18.0 compiler versions installed:

1>------ Rebuild All started: Project: Console8, Configuration: Debug Win32 ------
1>Deleting intermediate files and output files for project 'Console8', configuration 'Debug|Win32'.
1>Compiling with Intel(R) Visual Fortran Compiler 18.0.0.118 [IA-32]...
1>Console8.f90
1>Intel(R) Visual Fortran Intel(R) 64 Compiler for applications running on IA-32, Version 17.0.4.210 Build 20170411
1>Copyright (C) 1985-2017 Intel Corporation. All rights reserved.
1>Linking...

Environment: Both Intel(R) Parallel Studio XE 2017 Update 4 and Intel(R) Parallel Studio XE 2018 are installed, Microsoft* Visual Studio 2012 is installed

Root Cause: A root cause was identified and will be fixed in the upcoming compiler versions.

Workaround:

The user should select “Intel(R) Visual Fortran Compiler 17.0” at the 'Select compiler' option at "Tools > Options > Intel Compilers and Tools > Visual Fortran > Compilers". Then the correct name and compiler will be displayed as expected:

1>------ Rebuild All started: Project: Console8, Configuration: Debug Win32 ------
1>Deleting intermediate files and output files for project 'Console8', configuration 'Debug|Win32'.
1>Compiling with Intel(R) Visual Fortran Compiler 17.0.4.210 [IA-32]...
1>Console8.f90
1>Intel(R) Visual Fortran Intel(R) 64 Compiler for applications running on IA-32, Version 17.0.4.210 Build 20170411
1>Copyright (C) 1985-2017 Intel Corporation. All rights reserved.
1>Linking...

↧

An update to the integration of Intel® Media SDK and FFmpeg

August 28, 2017, 3:25 pm

Latest and popular articles on Intel Technologies

≫ Next: Build an Autonomous Mobile Robot with the Intel® RealSense™ Camera, ROS*, and SAWR

≪ Previous: Wrong Intel® Fortran compiler version displayed in Microsoft* Visual Studio 2012

Introduction

Intel^® GPUs contain fixed function hardware to accelerate video encode, decode, and frame processing, which can now be used with a variety of interfaces. Media SDK and Media Server Studio provide great performance with an API designed around delivering full hardware capabilities that is portable between OSes. However, there is a big limitation: the Media SDK API only processes video elementary streams. FFmpeg is one of the most popular media frameworks. It is open source and easily expandable. Because of this it has a very wide range of functionality beyond just codecs: muxing and demuxing(splitting), audio, network streaming, and more. It is straightforward to extend FFmpeg with wrappers for Intel^® HW acceleration. Various forms of these wrappers have existed for many years, and they provide important ease of use benefits compared to writing encode/decode code directly with the Media SDK API. However, the tradeoff for this ease of use is that performance is still left on the table. To get the best of both worlds – full performance and access to the full range of capabilities in FFmpeg – a hybrid approach is recommended.

Intel^® provides several ways for you to use hardware acceleration in FFmpeg.

FFmpeg wrappers for lower level APIs "underneath" Media SDK in the stack: libva (Linux) and DXVA (Windows)
FFmpeg supports the default Media SDK plugin and this article describes the transcoding performance of the plugin, the detailed installation and validation guide is here;
The Intel^® FFmpeg plug-in project is a fork of FFmpeg which attempts to explore additional options to improve performance for Intel hardware within the FFmpeg framework.
A 2012 article by Petter Larsson began exploring how to use the FFmpeg libav* APIs and Media SDK APIs together in the same application.

This article provides important updates to the 2012 article. It describes the process to use the FFmpeg libraries on Ubuntu 16.04. The example code will be based on our tutorial code so the user will have a better view on how the FFmpeg API is integrated with the media pipeline. The example code will also update the deprecated FFmpeg API so it is synced with the latest FFmpeg releases.

Build FFmpeg libraries and run the tutorial code

Requirements

Hardware: An Intel^® hardware platform which has the Intel Quick Sync Video capability. It is recommended to use the latest hardware version since the better support. For Linux, a computer with 5^th or 6^th generation Core processor; for Windows^®, 5^th generation or late.
Linux OS: The sample code was tested on Ubuntu 16.04.3LTS, but the user can try other Linux distribution like CentOS.
Intel^® Media Server Studio: For the hardware you have, please go to the MSS documentation page to check the release notes and identify the right MSS version, for the latest release, click the Linux link on "Essential/Community Edition"; for the previous releases, click the link on "Historical release notes and blogs".
FFmpeg: This should be the latest release from FFmpeg website, for this article, V3.4 is used.
Video File: Any mp4 video container with H.264 video content, for testing purpose, we use the BigBuckBunny320x180.mp4

Project File Structure

The project to run the tutorial has the following file structure:

Folder	Content	Notes
simple_decode_ffmpeg	src/simple_decode_ffmpeg.cpp Makefile	simple_decode_ffmpeg.cpp is the Media SDK application to create a simple decode pipeline and call the function defined in ffmpeg_utils.h to hook up the demux APIs of FFmpeg library
simple_encode_ffmpeg	src/simple_encode_ffmpeg.cpp -Makefile	simple_encode_ffmpeg.cpp is the Media SDK application to create a simple encode pipeline and call the ffmpeg adaptive function defined in ffmpeg_utils.h to hook up with the mux APIs of FFmpeg library.
common	ffmpeg_utils.h ffmpeg_utils.cpp	The API in these files defines and implements the API to initialize, execute and close the mux and demux functions of the FFmpeg library.
$(HOME)/ffmpeg_build	- inlcude - lib	This is the built FFmpeg libraries, the libraries involved are libavformat.so, libavcodec.so and libavutil.so

How to build and execute the workload

Download the Media Server Studio and validate the successful installation
- Based on the hardware platform, identify the right Media Server Studio version.
- Go to Media Server Studio landing page to download the release package.
- Following this instruction to install the Media Server Studio on Ubuntu 16.04; following this instruction if you install on CentOS 7.3(the instruction can also be found in the release package).
- Following above instruction to validate it before the next step.
Download the FFmpeg source code package and build the libraries.
- Following the instruction in the generic compilation guide of FFmpeg, in the guide, select the Linux and the distribution you are working on, for example, the Ubuntu build instructions. This project requires the shared FFMpeg library, refer to the following instruction to build the final FFMpeg library.
- After building the requested FFMpeg modules. To build the shared library, several argument should be appended the general instructions. When configuring the final build, please append the following arguments to the original "./configure..." command: "--enable-shared --enable-pic --extra-cflags=-fPIC", for example,
```
PATH="$HOME/bin:$PATH" PKG_CONFIG_PATH="$HOME/ffmpeg_build/lib/pkgconfig" ./configure \
  --prefix="$HOME/ffmpeg_build" \
  --pkg-config-flags="--static" \
  --extra-cflags="-I$HOME/ffmpeg_build/include" \
  --extra-ldflags="-L$HOME/ffmpeg_build/lib" \
  --extra-libs=-lpthread \
  --bindir="$HOME/bin" \
  --enable-gpl \
  --enable-libass \
  --enable-libfdk-aac \
  --enable-libfreetype \
  --enable-libmp3lame \
  --enable-libopus \
  --enable-libtheora \
  --enable-libvorbis \
  --enable-libvpx \
  --enable-libx264 \
  --enable-libx265 \
  --enable-nonfree \
  --enable-shared \
  --enable-pic \
  --extra-cflags=-fPIC
```
- Note: the general instruction download the latest(snapshot) package with the following command "wget http://ffmpeg.org/releases/ffmpeg-snapshot.tar.bz2", there might be build/configure mistake with this package since it is not the official release, please download your favorite release packages if the build failed. For this tutorial, version 3.4 is used.
- Set the path LD_LIBRARY_PATH to point to $HOME/ffmpeg_build/lib, it is recommend to set to the system environment variable, for example, add it to /etc/environment; Or user can use the following command as a temporary way to set in the current environment:
  # export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$HOME/ffmpeg_build/lib"
Download the sample code attached to this article and uncompress it in a local directory.
Build the source code
- Add the library path to the LD_LIBRARY_PATH:Check the Makefile in directory "simple_decode_ffmpeg" and "simple_encode_ffmpeg), noticed "FFMPEG_BUILD=$(HOME)/ffmpeg_build", the directory "$(HOME)/ffmpeg_build" is the default build directory if you followed the general FFMpeg compilation instructions; if you have build the library in different directory, you have to change the $(FFMPEG_BUILD) variable to that directory.
- At the root directory of the project, do "make", the binary will be built at "~/_build" directory
- The user could disable the audio code and check video only by followings:
  Remove "-DDECODE_AUDIO" from the Makefile in simple_decode_ffmpeg project
  Remove "-DENCODE_AUDIO" from the Makefile in simple_encode_ffmpeg project
- The user could also turn of the debug build by changing the Makefile to switch the definition of "CFLAG"
Run the binary with the video workload
- Download the BigBuckBunny320x180.mp4 and save it locally.
- To decode the video file with the following command:
```
# _build/simple_decode_ffmpeg ~/Downloads/BigBuckBunny_320x180.mp4 out.yuv
```
  The command generates 2 output files: out.yuv--the raw video stream; audio.dat--the raw audio PCM 32bit stream.
- To encode the result from the decoding with the following command:
```
# _build/simple_encode_ffmpeg -g 320x180 -b 20000 -f 24/1 out.yuv out.mp4
```
  The command reads the raw audio with the name "audio.dat" by default.

Known Issue

When running the sample to validate the MSS installation, there is a failure when the patched the kernel was not applied to the platform, run the following command to check(take patched kernel 4.4 as an example):
```
uname -r
4.4.0
```
In the installation instruction, the kernel 4.4 was patched, this implies the driver update to access the media fixed functions. If the command doesn't show the expected kernel version, user has to switch the kernel at the boot time at the grub option menu, to show the grub menu, refer to this page.
The following table shows all the video clips being tested successful, for the other codecs and containers, please feel free to extends the current code.
Tested sample vs container with codecs

container with codecs	sample_decode_ffmpeg	sample_encode_ffmpeg
.mp4	(h.264/hevc/MPEG2, aac)	(h.264, aac)
.mkv	(h.264/hevc/MPEG2, ac3)	(h.264, ac3)
.ts	(h264/hevc, ac3)	(MPEG2, aac)
.mpg, mpeg	(MPEG2, ac3)	(MPEG2, aac)

The audio codec uses the FFMpeg's library, among the audio codec, only AAC is well tested, for the other codec, Vorbis and AC3 has encoding error, so the default audio for the container ".mkv", ".mpeg", "mpg" and ".ts" is forced to other audio codec.

To validate the successful installation of the Media Server Studio, after installing it, download the Media SDK sample from this page and run the following command:

./sample_multi_transcode -i::h264 test_stream.264 -o::h264 out.264
Multi Transcoding Sample Version 8.0.24.698

libva info: VA-API version 0.99.0
libva info: va_getDriverName() returns 0
libva info: User requested driver 'iHD'
libva info: Trying to open /opt/intel/mediasdk/lib64/iHD_drv_video.so
libva info: Found init function __vaDriverInit_0_32
libva info: va_openDriver() returns 0
Pipeline surfaces number (DecPool): 20
MFX HARDWARE Session 0 API ver 1.23 parameters:
Input  video: AVC
Output video: AVC

Session 0 was NOT joined with other sessions

Transcoding started
..
Transcoding finished

Common transcoding time is 0.094794 sec
-------------------------------------------------------------------------------
*** session 0 PASSED (MFX_ERR_NONE) 0.094654 sec, 101 frames
-i::h264 test_stream.264 -o::h264 out.264

-------------------------------------------------------------------------------

The test PASSED

The design of the mux/demux functions with the FFmpeg library APIs

The sample code is modified base on our original tutorial code, simple_decode and simple_encode. The call to the FFMpeg integration is added to the original source code, the modified area is wrapped by the following comment line:

// =========== ffmpeg splitter integration ============
......

// =========== ffmpeg splitter integration end ============

Demux functions

The structure demuxControl keeps the control parameters of the demux process; the function openDemuxControl() initializes and configures the demuxControl structure; the structure is then used for the demux and decoding process; during the decoding, the function ffmpegReadFrame() reads the video frame after demuxing; finally the function closeDemuxControl() releases the system resources.

In the code "DECODE_AUDIO" turns on the audio decoding and demux the audio stream and use the FFMpeg audio decoder to uncompress the audio stream into the raw audio file "Audio.dat".

Mux functions

The structure muxControl keeps the control parameters of the mux process; the function openMuxControl initializes and configures the muxControl structure, the structure is then used for the encoding and mux process; during the encoding, the function ffmpegWriteFrame() writes the encoded stream into the output container via the FFmpeg muxer; finally the function closeMuxControl() releases the system resources.

In the code "ENCODE_AUDIO" turns on the audio encoding and mux/compress the audio raw data from "Audio.dat" to the video container.

Reference

FFmpeg: examples

FFmpeg: build with shared libraries

Luca Barbato's blog about the bitstream filtering

Luca Barbato's blog about the new AVCodec API

↧

Build an Autonomous Mobile Robot with the Intel® RealSense™ Camera, ROS*, and SAWR

October 18, 2017, 10:10 am

Latest and popular articles on Intel Technologies

≫ Next: How to create video wall with MSDK sample_multi_transcode

≪ Previous: An update to the integration of Intel® Media SDK and FFmpeg

Overview

The Simple Autonomous Wheeled Robot (SAWR) project defines the hardware and software required for a basic "example" robot capable of autonomous navigation using the Robot Operating System* (ROS*) and an Intel® RealSense™ camera. In this article, we give an overview of the SAWR project and also offer some tips for building your own robot using the Intel RealSense camera and SAWR projects.

Mobile Robots – What They Need

Mobile robots require the following capabilities:

Sense a potentially dynamic environment. The environment surrounding robots is not static. Obstacles, such as furniture, humans, or pets, are sometimes moving, and can appear or disappear.
Determine current location. For example, imagine that you are driving a car. You need to specify "Where am I?" in the map or at least know your position relative to a destination position.
Navigate from one location to another. For example, to drive your car to your destination, you need both driver (deciding on how much power to apply and how to steer) and navigator (keeping track of the map and planning a route to the destination) skills.
Interact with humans as needed. Robots in human environments need to be able to interact appropriately with humans. This may mean the ability to recognize an object as a human, follow him or her, and respond to voice or gesture commands.

The SAWR project, based on ROS and the Intel RealSense camera, covers the first three of these requirements. It can also serve as a platform to explore how to satisfy the last requirement: human interaction.

A Typical Robot Software Stack

To fulfill the above requirements, a typical robot software stack consists of many modules (see Figure 1). At the bottom of the stack, sensor hardware drivers, including those for the Intel RealSense camera in the case of the SAWR, deliver environmental information to a set of sensing modules. These modules recognize environmental information as well as human interaction. Several sources of information are fused to create various models: a world model, an estimate of the robot state (including position in the world), and command inputs (for example, voice recognition).

The Plan module decides how the robot will act in order to achieve a goal. For mobile robotics, the main purpose is navigating from one place to another, for which it is necessary to calculate obstacle-free paths given the current world model and state.

Based on the calculated plan, the Act module manages the actual movement of the robot. Typically, motor control is the main function of this segment, but other actions are possible, such as speech output. When carrying out an action, a robot may also be continuously updating its world model and replanning. For example, if an unexpected obstacle arises, the robot may have to update its model of the world and also replan its path. The robot may even make mistakes (for example, its estimate of its position in the world might be incorrect), in which case it has to figure out how to recover.

Autonomous navigation requires a lot of computation to do the above tasks. Some tasks can be offloaded to the cloud, but due to connectivity and latency issues this is frequently not an option. The SAWR robot can do autonomous navigation using only onboard computational resources, but the cloud can still be useful for adding other capabilities, such as voice control (for example, using Amazon Voice Services*).

Figure 1. A typical robot software stack.

Navigation Capabilities - SLAM

Simultaneous localization and mapping (SLAM) is one of the most vital capabilities for autonomous mobile robots. In a typical implementation, the robot navigates (plans paths) through a space using an occupancy map. This map needs to be dynamically updated as the environment changes. In lower-end systems, this map is typically 2D, but more advanced systems might use a 3D representation such as a point cloud. This map is part of the robot’s world representation. The “localization” part of SLAM means that in addition to maintaining the map, the robot needs to estimate where it is located in the map. Normally this estimation uses a probabilistic method; rather than a single estimated location, the robot maintains a probability distribution and the most probable location is used for planning. This allows the robot to recover from errors and reason about uncertainty. For example, if the estimate for the current location is too uncertain, the robot could choose to acquire more information from the environment (for example, by rotating to scan for landmarks) to refine its estimate.

In the default SAWR software stack, the open source slam_gmapping package is used to create and manage the map, although there are several other options available, such as cartographer and rgbd-slam. This module is continually integrating new sensor data into the map and clearing out old data if it is proven incorrect. Another module, amcl, is used to estimate the current location by matching sensor data against the map. These modules run in parallel to constantly update the map and the estimate of the robot’s position. Figure 2 shows a typical indoor environment and a 2D map created by this process.

Figure 2. Simultaneous localization and mapping (SLAM) with 2D mapping.

Hardware for Robotics

Figure 3 shows the hardware architecture of the SAWR project. Like many robotics systems, the architecture consists of a master and slave system. The master takes care of high-level processing (such as SLAM and planning), and the slave takes care of real-time processing (such as motor speed control). This is similar to how the brain and spinal reflexes work together in animals. Several different options can be used for this model, but typically a Linux* system is used for the master and one or more microcontroller units (MCUs) are used for the slave.

Figure 3. Robot architecture.

In this article, Intel RealSense cameras are used as the primary environmental sensor. These cameras provide depth data and can be used as input to a SLAM system. The Intel® RealSense™ camera R200 or Intel® RealSense™ camera ZR300 are used in the current SAWR project. The Intel® RealSense™ camera D400 series, shown in Figure 4, will soon become a common depth camera of choice, but since this camera provides similar data but with improved range and accuracy, and uses the same driver, an upgrade is straightforward. As for drivers, librealsense and realsense_ros_camera drivers are available on GitHub*. You can use any Intel RealSense camera with them.

Figure 4. Intel® RealSense™ Depth Camera D400 Series.

For the master computer, you can choose from various hardware, including Intel® NUC with Intel® Core™ i5 and Intel® Core™ i7 processors (see Figure 5). This choice provides maximum performance for robotics development. You can also use OEM boards for robotics, such as one of the Aaeon UP* boards, for rapid prototype-to-production for robotics development. Even the diminutive Aaeon UP Core* has enough performance to do SLAM. The main requirement is that the board runs Linux. The SAWR software stack uses ROS, which runs best under Ubuntu*, although it is possible to install it under other distributions, such as Debian* or Yocto*.

Figure 5. Intel® NUC.

SAWR Basic Mobile Robot

The following is a spec overview of the SAWR basic mobile robot, shown in Figure 6, which is meant to be an inexpensive reference design that is easy to reproduce (the GitHub site includes the files to laser-cut your own frame). The SAWR software stack can be easily adapted to other robot frames. For this design, the slave computers are actually embedded inside the Dynamixel servos. The MCUs in these smart motors take care of low-level issues like position sensing and speed control, making the rest of the robot much simpler.

Computer: Aaeon UP board

Camera: Intel RealSense camera

Actuation: Two Dynamixel MX-12W* smart servos with magnetic encoders

Software: Xubuntu* 16.04 and ROS Kinetic*

Frame: Laser-cut acrylic or POM, Polulo sphere casters, O-ring tires and belt transmission

Other: DFRobot 25W/5V power regulator

Extras: Jabra Speak* 510+ USB speakerphone (for voice I/O, if desired)

Instructions and software: https://github.com/01org/sawr

Figure 6. SAWR basic mobile robot.

One of distinctive parts of the SAWR project is that both the hardware and the software have been developed in an open source style. The software is based on modifying and simplifying the Open Source Robotics Foundation Turtlebot* stack, but adds a custom motor driver using the Dynamixel Linux* SDK. For the hardware, the frame is parametrically modeled using OpenSCAD*, and then converted to laser-cut files using Inkscape*. You can download all the data from GitHub, and then make your own frame using a laser cutter (or a laser-cutter service). Most of other parts are available from a hardware store. Detailed instructions, assembly, and setup plans are available online.

Using an OEM Board for Robotics

When you choose an OEM board for robotics, such as an UP board for SAWR or any other robotics system, using active cooling to get higher performance is strongly recommended. Usually robotics middleware consumes a high level of CPU resources, and lack of CPU resource sometimes will translate into low quality or low speed of autonomous movement. With active cooling, you can maintain the CPU’s highest speed indefinitely. In particular, the UP board can turbo with active cooling and run at a much higher clock rate with it than without.

You may be concerned about power resources for active cooling and higher clock rates. However power consumption is not usually a limiting factor in robotics, because motors are usually the primary power load. In fact, instead of the basic UP board, you can select the UP Squared*, which has much better performance.

Another issue is memory. The absolute minimum is 2 GB, but 4 GB is highly recommended. The SLAM system uses a lot of memory to maintain the world state and position estimate. Remember that the OS needs memory too, and Ubuntu tends to use about 500 MB doing nothing. So a 4 GB system has 7x the available space for applications than a 1 GB system, not just 4x.

ROS Overview

Despite its name, ROS is not an OS, but a middleware software stack that can run on top of various operating systems, although it is primarily used with Ubuntu. ROS supports a distributed, concurrent processing model based on a graph of communicating nodes. Thanks to this basic architecture, you can not only easily network together multiple processing boards on the same robot if you need to, but you can also physically locate boards away from the actual robot by using Wi-Fi* (with some loss of performance and reliability, however). From a knowledge base perspective, ROS has a large community with many existing open source nodes supporting a wide range of sensors, actuators, and algorithms. That and its excellent documentation are good reasons to choose ROS. From a development and debugging perspective, various powerful and attractive visualization tools and simulators are also available and useful.

Basic ROS Concepts

This section covers the primary characteristics of the ROS architecture. To learn more, refer to the ROS documentation and tutorials.

Messages and topics (see Figure 7). ROS uses a publish and subscribe system for sending and receiving data on uniquely named topics. Each topic can have multiple publishers and subscribers. Messages are typed and can carry multiple elements. Message delivery is asynchronous, and it's usually recommended to use this for most interprocess communication in ROS.

Figure 7. Messages and topics.
Service calls (see Figure 8). Service calls use synchronous remote procedure call semantics, also known as “request/response.” When using service calls, the caller blocks communication until a response is received. Due to this behavior, which can lead to various problems such as deadlocks and hung processes, you should consider whether you really need to build your communication with service calls. They are primarily used for updating parameters, where the buffering for messages creates too much overhead (for example, for updating maps) or where synchronization between activities is actually needed.

Figure 8. Service calls.
Actions (see Figure 9). Actions are used to define long-running tasks with goals, the possibility of failure, and where periodic status reports are useful. In the SAWR software stack actions are mainly used for setting the destination goal and monitoring the progress of navigation tasks. Actions generally support asynchronous goal-directed behavior control based on a standard set of topics. In the case of SAWR, you can trigger a navigation action by using Rviz (the visualizer) and the 2D Nav Goal button.

Figure 9. Actions.
Parameters (see Figure 10). Parameters are used to set various values for each node. A parameter server provides typed constant data at startup, and the latest version of ROS also supports dynamic parameter update after node launch. Parameters can be specified in various ways, including through the command line, parameter files, or launch file parameters.

Figure 10. Parameters.
Other ROS concepts. There are several other important concepts relevant to the ROS architecture.
- Packages: Collections of files used to implement or specify a service or node in ROS, built together using the catkin build system (typically).
- Universal Robot Description Format (URDF): XML files describing joints and transformations between joints in a 3D model of the robot.
- Launch files: XML files describing a set of nodes and parameters for a ROS graph.
- Yet Another Markup Language: Used for parameter specification on the command line and in files.

ROS Tools

A lot of powerful development and debug tools are available for ROS. The following tools are typically used for autonomous mobile robots.

Rviz (see Figure 11). Visualize various forms of dynamic 3D data in context: transforms, maps, point clouds, images, goal positions, and so on.

Figure 11. Rviz.
Gazebo. Robot simulator, including collisions, inertia, perceptual errors, and so on.
Rqt. Visualize graphs of nodes and topics.
Command-line tools. Listen to and publish on topics, make service calls, initiate actions. Can filter and monitor error messages.
Catkin. Build system and package management.

ROS Common Modules for Autonomous Movement

The following modules are commonly used for autonomous mobile robots, and SAWR also adopts them as well.

Tf (tf2) (see Figure 12). Coordinate the transform library. It's one of the most important packages for ROS. Thanks to tf, you can manage all coordinate values, including the position of the robot or relations between the camera and wheels. For treating various categories of coordinates, several distinctive concepts such as frame and tree are adopted.

Figure 12. tf frame example.
slam_gmapping. ROS wrapper for OpenSlam's Gmapping. gmapping is one of the most famous SLAM algorithms. While still popular, there are also several alternatives now for this function.
move_base. Core module for autonomous navigation. This package provides various functions, including planning a route, maintaining cost maps, and issuing speed and direction commands for motors.
Robot_state_publisher. Publishes the 3D poses of the robot links, which are important for a manipulator or humanoid. In the case of SAWR, the most important data maintained by this module is the position and orientation of the robot and the location of the camera relative to the robot’s position.

Tips for Building a Custom Robot using the SAWR Stack

SAWR consists of the following subdirectories, which you can use as-is if you want to utilize the complete SAWR software and hardware package (see Figure 13). You can also use them as a starting point for your original robot with the Intel RealSense camera. Also below are tips for customizing the SAWR stack for use with other robot hardware.

sawr_master: Master package, launch scripts.
- Modify if you change another ROS module.
sawr_description: Runtime physical description (URDF files).
- Modify urdf and xacro files according to your robot’s dimension (check with tf tree/frame).
sawr_base: Motor controller and hardware interfacing.
- Prepare your own motor controller and odometry libraries.
sawr_scan: Camera configuration.
- You can use as-is if you use the Intel RealSense camera R200 or ZR300. If you use the Intel RealSense camera D400 series, use ROS Wrapper 2.0 for Intel® RealSense™ Devices.
sawr_mapping: SLAM configuration.
- You can begin as-is if you use the same Intel RealSense camera configuration with SAWR.
sawr_navigation: Move-base configuration.
- Modify and tune parameters of global/local costmap, move_base. This is the most difficult part of tuning your own hardware.

Figure 13. SAWR ROS node graph viewed by rqt_graph.

Conclusion

Autonomous mobile robotics is an emerging area, but the technology for mobile robotics is already relatively mature. ROS is a key framework for robot software development that provides a wide range of modules covering many areas of robotics. The latest version is Lunar—the 12th generation.. Robotics involves all aspects of computer science and engineering, including artificial intelligence, computer vision, machine learning, speech understanding, the Internet of Things, networking, and real-time control—and SAWR project is good start point for developing ROS* based robotics.

About the Author

Sakemoto is an application engineer in the Intel® Software and Services Group. He is responsible for software enabling and also works with application vendors in the area of embedded systems and robotics. Prior to his current job, he was a software engineer for various mobile devices including embedded Linux and Windows*.

↧