Unit Testing in C

Like any good software developer, I’m sure you’re beginning to wonder how to write sustainable C code. The way to do that in any language is with unit tests. The idea behind unit tests is to have each “unit” of your code be validated with assertive tests. A “unit” is typically a single function, and the goal of unit tests on that function is to validate that the function behaves as you expect for a range of inputs.

While there are many unit testing frameworks for C, for the purposes of this article I’ll be using minunit. Minunit is an incredibly simple framework that makes learning how to unit test easy. In fact, the source code is a whole 3 lines (seen below).

To install minunit, copy the below code into a “minunit.h” file. It doesn’t matter where it’s located in your file system, as long as you know how to include it in your unit test files.

#define mu_assert(message, test) do { if (!(test)) return message; } while (0)
#define mu_run_test(test) do { char *message = test(); tests_run++; \
                               if (message) return message; } while (0)
extern int tests_run;

Minunit provides two critical macros, mu_assert and mu_run_test. mu_assert takes a message and a statement that evaluates to a boolean value (i.e. my_int == 5), returning the message if the boolean value is false. mu_run_test handles calling the functions containing all of your mu_assert calls.

A simple test suite using minunit looks something like this:

#include <stdio.h>
#include "minunit.h"

int tests_run = 0;

// Here's a sample function we'll test
int add(int x, int y) {
    return x + y;
}

// Here's another sample function,
// except this one has an error (* instead of +)
int bad_add(int x, int y) {
    return x * y;
}

// Testing the add() function
static char * test_add() {
    mu_assert("error, adding 2 and 3 != 5", add(2, 3) == 5);
    return 0;
}

// Testing the bad_add() function
static char * test_bad_add() {
    mu_assert("error, adding 2 and 3 != 5", bad_add(2, 3) == 5);
    return 0;
}

// Here is where we call all of the unit test functions
static char * all_tests() {
    mu_run_test(test_add);
    mu_run_test(test_bad_add);
    return 0;
}

int main(int argc, char **argv) {
    char *result = all_tests();
    // result will be the failed test message if a test fails
    if (result != 0) {
        printf("%s\n", result);
    }
    else {
        printf("ALL TESTS PASSED\n");
    }
    printf("Tests run: %d\n", tests_run);

    return result != 0;
}

test_add and test_bad_add are the unit testing functions in this example. They should contain 1 or more calls of mu_assert that should test something regarding the functionality of add and bad_add, respectively. A good unit test should have a representative sample of the range of possible inputs as well as a representative sample of the expected domain of the function being tested.

As a note, the above code does not do this well. For example, if I had chosen to have the assertions testing adding 2 and 2, both unit tests would have passed because the bad_add function would find 2 * 2 = 4, which is the expected result as well.

In minunit, there is no automation regarding running the test suite. As such, you have to call mu_run_test on each unit test function, which is aggregated in the all_tests function in the above sample.

While the functionality of minunit is limited, I think it makes a good place to start with unit testing. When you feel comfortable with the idea of unit testing, I’d recommend going and finding another, more feature-rich alternative. You can find an excellent discussion regarding the available choices on Stack Overflow.

Advertisements

Degugging Memory Useage with Valgrind for Fun and Profit

Everyone is familiar with ‘malloc()’ and ‘free()’, and also their reputation for being notoriously dangerous when misused. Memory leaks, and other allocation related bugs, are some of the most common and pernicious issues affecting software. With the advent of more sophisticated compilers, as well as the burgeoning use of interpreted languages memory leaks are becoming a less commonly encountered problem, but they are still something that anyone working in C will probably face. Since C allows the programmer to do more of less whatever they please with their pointers it’s easy to mismanage a few bytes of memory in a program of non-trivial size, as there is no semantic enforced by the language to ensure any degree of memory safety. Memory management being so common an issue there has naturally developed a set of tools to help us deal with it. One of these tools, Valgrind can be incredibly helpful in the quest for perfect memory management. If you don’t have a working Valgrind install one can be obtained as follows. As of the writing of this article the current version is 3.11.0.

wget http://www.valgrind.org/downloads/valgrind-3.11.0.tar.bz2
tar -jxvf valgrind-3.11.0.tar.bz2
//once in the untarred directory
./configure
make && make install

Now that we have Valgrind we need some code that mismanages memory. One of Valgrind’s abilities is detecting memory leaks, so we need some code that leaks.

#include <stdlib.h>

int main() {
    void* unfreed;
    int idx;
    for (idx=0; idx<5; idx++) {
        unfreed = malloc(100);
    }
    return 0;
}

While Valgrind works with literally any binary (try it out on some of the coreutils), in practice it’s helpful to have debugging symbols, so it’s worth compiling the program under test with -g. Once you have a binary, it’s time to break out Valgrind. Since Valgrind supports a number of features your invocation has to specify the subset you wish to use. For this example, you want to check for leaks so you will have to enable the leak check tool. This should look something like

valgrind --tool=memcheck --leak-check=full ./program_name

Running this will print information about the programs memory (mis)useage that should look something like the listing below.

...
==24810== Command: ./program_name
==24810== 
==24810== 
==24810== HEAP SUMMARY:
==24810==     in use at exit: 500 bytes in 5 blocks
==24810==   total heap usage: 5 allocs, 0 frees, 500 bytes allocated
==24810== 
==24810== 500 bytes in 5 blocks are definitely lost in loss record 1 of 1
==24810==    at 0x4A05E46: malloc (vg_replace_malloc.c:195)
==24810==    by 0x400525: main (test.c:7)
==24810== 
==24810== LEAK SUMMARY:
==24810==    definitely lost: 500 bytes in 5 blocks
==24810==    indirectly lost: 0 bytes in 0 blocks
==24810==      possibly lost: 0 bytes in 0 blocks
==24810==    still reachable: 0 bytes in 0 blocks
==24810==         suppressed: 0 bytes in 0 blocks
...

Valgrind is kind enough to show us that the allocation on line 7 results in the program leaking 500 bytes of data across 5 different allocations. Knowing how much and where you’re program is leaking is a step up from the massive no information that you typically have. This gives me an excellent opportunity to quote the legendary Billy Mays “but wait there’s more”! Valgrind can also tell you when you use invalid memory. Time for another broken program.

#include <stdlib.h>

int main()
{
char *x = malloc(10);
x[10] = 'a';
return 0;
}

The mistake here is pretty obvious, but it’s also a startlingly easy class of error to make. Write enough MatLab and all your indices go to hell. Fortunately Valgrind has tools to detect invalid heap access. If we invoke Valgrind we can see that it detects invalid memory access.

valgrind --tool=memcheck --leak-check=yes program_name
...
==9814==  Invalid write of size 1
==9814==    at 0x804841E: main (program_name.c:6)
==9814==  Address 0x1BA3607A is 0 bytes after a block of size 10 alloc'd
==9814==    at 0x1B900DD0: malloc (vg_replace_malloc.c:131)
==9814==    by 0x804840F: main (program_name.c:5)
...

If you order today for the low low price of free, memcheck will also let you know when you’re using uninitialized memory. For example, the statement

int x;
if (x==0) {...}

will generate a warning along the lines of

==17943== Conditional jump or move depends on uninitialised value(s)
==17943==    at 0x804840A: main (program_name.c:6)

In fact, Valgrind can trace the use of uninitialized variables throughout the program. Also worth mentioning is Valgrind’s ability to detect double frees.

As powerful as memcheck is Valgrind has aditional capabilities including thread error detection, call graph generation, and cache and branch prediction capabilities. see the docs for more information. Also, for those interested in the LLVM ecosystem Address sanitizer is an interesting tool for memory analysis using clang.

gprof, because pants are for squares, I want to go fast

Sometimes, when you want to go REALLY fast, just using a compiled language like C won’t be enough. This situation, as strange as it sounds is unfortunately common. Fortunately st. Ignucious provides. Enter gprof, a tool for understanding the “execution profile” of a program written in C, Pascal, or Fortran77. For those of you without a fortran77 program handy, IBM has a nice (read silly) piece of code in their gprof tutorial, listed below.


#include <stdio.h>

int a(void) {
     int i=0,g=0;
     while(i++<100000)
     {
          g+=i;
     }
     return g;
}
int b(void) {
     int i=0,g=0;
     while(i++<400000)
     {
          g+=i;
     }
     return g;
}

int main(int argc, char** argv)
{
     int iterations;

     if(argc != 2)
     {
          printf("Usage %s <No of Iterations>\n", argv[0]);
          exit(-1);
     }
     else
          iterations = atoi(argv[1]);

     printf("No of iterations = %d\n", iterations);

     while(iterations--)
     {
          a();
          b();
     }
}

Looking at it, the behavior is fairly obvious. Given a number of iterations n the program calls function a and b n times. Functions a and b both perform the same task, but a different number of times.

Intuitivley the program should spend exactly four times as long in b as it does in a, but we don’t know how long that is, and validating this hypothesis is a good demonstration of gprof’s value. Gprof, like gcc depends on code injected by the compiler. To do this you compile with the ‘-pg’ flag, e.g. ‘gcc -pg -o program_under_test test.c’. Once you’ve done this, invoke the program with the relevant command line arguments. The rest of this article assumes that the program is invoked with argv[1]==1000. Invoking a program compiled with ‘-pg’ will create identical behavior to an invocation of an instance compiled without, however, it will output a file, gmon.out, that contains statistics about the program’s runtime. Once this file has been generated you can run ‘gprof <program name>’, and a wall of text will appear describing the runtime characterstics of the program.

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 80.53      8.69     8.69     1000     8.69     8.69  b
 20.51     10.90     2.21     1000     2.21     2.21  a

 %         the percentage of the total running time of the
time       program used by this function.

cumulative a running sum of the number of seconds accounted
 seconds   for by this function and those listed above it.

 self      the number of seconds accounted for by this
seconds    function alone.  This is the major sort for this
           listing.

calls      the number of times this function was invoked, if
           this function is profiled, else blank.
 
 self      the average number of milliseconds spent in this
ms/call    function per call, if this function is profiled,
           else blank.

 total     the average number of milliseconds spent in this
ms/call    function and its descendents per call, if this 
           function is profiled, else blank.

name       the name of the function.  This is the minor sort
           for this listing. The index shows the location of
           the function in the gprof listing. If the index is
           in parenthesis it shows where it would appear in
           the gprof listing if it were to be printed.
Copyright (C) 2012-2014 Free Software Foundation, Inc.

Copying and distribution of this file, with or without modification,
are permitted in any medium without royalty provided the copyright
notice and this notice are preserved.
                     Call graph (explanation follows)


granularity: each sample hit covers 2 byte(s) for 0.09% of 10.90 seconds

index % time    self  children    called     name
                                                 <spontaneous>
[1]    100.0    0.00   10.90                 main [1]
                8.69    0.00    1000/1000        b [2]
                2.21    0.00    1000/1000        a [3]
-----------------------------------------------
                8.69    0.00    1000/1000        main [1]
[2]     79.7    8.69    0.00    1000         b [2]
-----------------------------------------------
                2.21    0.00    1000/1000        main [1]
[3]     20.3    2.21    0.00    1000         a [3]
-----------------------------------------------

 This table describes the call tree of the program, and was sorted by
 the total amount of time spent in each function and its children.

 Each entry in this table consists of several lines.  The line with the
 index number at the left hand margin lists the current function.
 The lines above it list the functions that called this function,
 and the lines below it list the functions this one called.
 This line lists:
     index      A unique number given to each element of the table.
                Index numbers are sorted numerically.
                The index number is printed next to every function name so
                it is easier to look up where the function is in the table.

     % time     This is the percentage of the `total' time that was spent
                in this function and its children.  Note that due to
                different viewpoints, functions excluded by options, etc,
                these numbers will NOT add up to 100%.

     self       This is the total amount of time spent in this function.

     children   This is the total amount of time propagated into this
                function by its children.

     called     This is the number of times the function was called.
                If the function called itself recursively, the number
                only includes non-recursive calls, and is followed by
                a `+' and the number of recursive calls.

     name       The name of the current function.  The index number is
                printed after it.  If the function is a member of a
                cycle, the cycle number is printed between the
                function's name and the index number.


 For the function's parents, the fields have the following meanings:

     self       This is the amount of time that was propagated directly
                from the function into this parent.

     children   This is the amount of time that was propagated from
                the function's children into this parent.

     called     This is the number of times this parent called the
                function `/' the total number of times the function
                was called.  Recursive calls to the function are not
                included in the number after the `/'.

     name       This is the name of the parent.  The parent's index
                number is printed after it.  If the parent is a
                member of a cycle, the cycle number is printed between
                the name and the index number.

 If the parents of the function cannot be determined, the word
 `<spontaneous>' is printed in the `name' field, and all the other
 fields are blank.

 For the function's children, the fields have the following meanings:

     self       This is the amount of time that was propagated directly
                from the child into the function.

     children   This is the amount of time that was propagated from the
                child's children to the function.

     called     This is the number of times the function called
                this child `/' the total number of times the child
                was called.  Recursive calls by the child are not
                listed in the number after the `/'.

     name       This is the name of the child.  The child's index
                number is printed after it.  If the child is a
                member of a cycle, the cycle number is printed
                between the name and the index number.

 If there are any cycles (circles) in the call graph, there is an
 entry for the cycle-as-a-whole.  This entry shows who called the
 cycle (as parents) and the members of the cycle (as children.)
 The `+' recursive calls entry shows the number of function calls that
 were internal to the cycle, and the calls entry for each member shows,
 for that member, how many times it was called from other members of
 the cycle.
Copyright (C) 2012-2014 Free Software Foundation, Inc.

Copying and distribution of this file, with or without modification,
are permitted in any medium without royalty provided the copyright
notice and this notice are preserved.
Index by function name

   [3] a                       [2] b

gprof is kind enough to provide explanations of its outputs, so I won’t go into significant depth about how to interpret them. However, there are a couple of things it doesn’t tell you. Gprof profiles the program by sampling at even intervals. Because of this the values displayed, while roughly four to one are not exactly accurate. In fact were you to decrease the number of iterations to 100 and run gprof a couple times you would see noticeably different results. With an appropriate number of samples though, you can see that b runs for about 4 times as long as a overall. The flat profile also shows the number of calls made and the ms/call for each function. These statistics can be very helpful in identifying the sections of a project that are incurring significant runtime cost. For example, if you have a function that gets called several places, which accounts for a significant portion of execution time, it is easy to identify and optimize.

The call graph provides a textual representation of all the function calls made by the program, as well as subgraphs for each individual function which include their caller. Each listing provides information about how much time is spent in the function itself, as well as how much of that time is spent in functions it calls. I find the call graph to be helpful in developing a more nuanced understanding of a program’s behavior. You can see where functions are called often, and identify other potential routes for optimization, for example identifying places where reducing the use of a function is valuable.

Now, since we’re good (read lazy) programmers, who weild glorious multipass optimizing compilers the first/only step we might take is turning some optimizations on. This can be done by recompiling the program with ‘-O’; in this case I used ‘-O3’ but you can use lower or no numbers for less optimization. If we run the newly created program we see some interesting things, namely the following flat profile.

Flat profile:

Each sample counts as 0.01 seconds.
 no time accumulated

This seems a little odd, the program’s execution time has more or less dropped to nothing. To see why, we must delve into the machine code the compiler is generating. To do this, we once again look to GNU. objdump is a tool for disassembling binary files. It shows section delimited listings of the program translated to the relevant assembly language. It can be invoked by calling ‘objdump -d <program name>’. If you look at the section containing main() in the optimized and unoptimized program there are some interesting things to be seen. The unoptimized version is about what we expect. It does some things and calls (the callq opcode calls the associated function) the functions we expect.

0000000000400706 <main>:
  400706:       55                      push   %rbp
  400707:       48 89 e5                mov    %rsp,%rbp
  40070a:       48 83 ec 20             sub    $0x20,%rsp
  40070e:       e8 fd fd ff ff          callq  400510 <mcount@plt>
  400713:       89 7d ec                mov    %edi,-0x14(%rbp)
  400716:       48 89 75 e0             mov    %rsi,-0x20(%rbp)
  40071a:       83 7d ec 02             cmpl   $0x2,-0x14(%rbp)
  40071e:       74 23                   je     400743 <main+0x3d>
  400720:       48 8b 45 e0             mov    -0x20(%rbp),%rax
  400724:       48 8b 00                mov    (%rax),%rax
  400727:       48 89 c6                mov    %rax,%rsi
  40072a:       bf 5c 08 40 00          mov    $0x40085c,%edi
  40072f:       b8 00 00 00 00          mov    $0x0,%eax
  400734:       e8 a7 fd ff ff          callq  4004e0 <printf@plt>
  400739:       bf ff ff ff ff          mov    $0xffffffff,%edi
  40073e:       e8 fd fd ff ff          callq  400540 <exit@plt>
  400743:       48 8b 45 e0             mov    -0x20(%rbp),%rax
  400747:       48 83 c0 08             add    $0x8,%rax
  40074b:       48 8b 00                mov    (%rax),%rax
  40074e:       48 89 c7                mov    %rax,%rdi
  400751:       b8 00 00 00 00          mov    $0x0,%eax
  400756:       e8 d5 fd ff ff          callq  400530 <atoi@plt>
  40075b:       89 45 fc                mov    %eax,-0x4(%rbp)
  40075e:       8b 45 fc                mov    -0x4(%rbp),%eax
  400761:       89 c6                   mov    %eax,%esi
  400763:       bf 79 08 40 00          mov    $0x400879,%edi
  400768:       b8 00 00 00 00          mov    $0x0,%eax
  40076d:       e8 6e fd ff ff          callq  4004e0 <printf@plt>
  400772:       eb 0a                   jmp    40077e <main+0x78>
  400774:       e8 1d ff ff ff          callq  400696 <a>
  400779:       e8 50 ff ff ff          callq  4006ce <b>
  //Calls to a and b, the source of most of the runtime overhead
  40077e:       8b 45 fc                mov    -0x4(%rbp),%eax
  400781:       8d 50 ff                lea    -0x1(%rax),%edx
  400784:       89 55 fc                mov    %edx,-0x4(%rbp)
  400787:       85 c0                   test   %eax,%eax
  400789:       75 e9                   jne    400774 <main+0x6e>
  40078b:       b8 00 00 00 00          mov    $0x0,%eax
  400790:       c9                      leaveq 
  400791:       c3                      retq   
  400792:       66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
  400799:       00 00 00 
  40079c:       0f 1f 40 00             nopl   0x0(%rax)

Everything in the unoptimized version is about what we’d expect, but looking at the optimized version there are some interesting things to be seen.

0000000000400550 <main>:
  400550:       55                      push   %rbp
  400551:       48 89 e5                mov    %rsp,%rbp
  400554:       e8 b7 ff ff ff          callq  400510 <mcount@plt>
  400559:       83 ff 02                cmp    $0x2,%edi
  40055c:       75 1d                   jne    40057b <main+0x2b>
  40055e:       48 8b 7e 08             mov    0x8(%rsi),%rdi
  400562:       31 c0                   xor    %eax,%eax
  400564:       e8 c7 ff ff ff          callq  400530 <atoi@plt>
  400569:       bf b1 08 40 00          mov    $0x4008b1,%edi
  40056e:       89 c6                   mov    %eax,%esi
  400570:       31 c0                   xor    %eax,%eax
  400572:       e8 69 ff ff ff          callq  4004e0 <printf@plt>
  400577:       31 c0                   xor    %eax,%eax
  400579:       5d                      pop    %rbp
  40057a:       c3                      retq   
  40057b:       48 8b 36                mov    (%rsi),%rsi
  40057e:       bf 94 08 40 00          mov    $0x400894,%edi
  400583:       31 c0                   xor    %eax,%eax
  400585:       e8 56 ff ff ff          callq  4004e0 <printf@plt>
  40058a:       83 cf ff                or     $0xffffffff,%edi
  40058d:       e8 ae ff ff ff          callq  400540 <exit@plt>
  400592:       66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
  400599:       00 00 00 
  40059c:       0f 1f 40 00             nopl   0x0(%rax)

The optimizer has produced more succinct machine code, and interestingly it has removed the calls to a and b, accounting for the massive reduction in execution time. This is because the compiler has some ability to assess the side effects of a function, and since there are no relevant side effects, and the return value is unused, it can optimize away the function calls. Often compiler optimizations won’t be a silver bullet, but they can prove very helpful. When you they aren’t enough though, gprof can be incredibly helpful in identifying the sections of code worth hand optimizing.