Practically all of my recent projects, both for school and otherwise, are done in high level programming languages, like Java or Python. As such I've been missing out on many fun low-level features, such as: manual memory management, pointer arithmetic, lean standard library, and of course, debugging adventures! Therefore, given the option to choose the programming language for my graphs and networks project, I immediately decided to go for the good, old C.
The project is a relatively simple application, that visually demonstrates some graph coloring algorithms, using Allegro 5 for graphics. Since the design goals include reading a lot of data from configuration files, I've also included libconfig, to avoid having to parse the files by myself. One of those files contains an adjacency matrix of unknown size, which I'm storing in a dynamically allocated char array. To avoid having to deal with dynamic two-dimensional arrays, I've implemented it as a 1-D array, with indexing done like this:
char elem = node_matrix[x + (matrix_width*y)];
The pointer to this array is stored in an AppConfig structure along with some other program settings. At this point all was well, and the development was proceeding just fine.
The problem appeared right after adding some code to draw the links between graph vertices: the connections were all wrong. After playing with the drawing code for a few minutes, and finding nothing wrong with it, I've added a function call to print out the adjacency matrix just before drawing the links. Lo, and behold: the matrix was garbled. I've added another call to print the matrix just after reading it from file, and at that point it was fine. By moving those two calls around in the code, I've eventually found a single line that caused the matrix to garble:
After commenting out the line, the matrix was fine at both points. Putting it back caused the matrix to appear garbled again. Could libconfig be somehow corrupting my program's memory? A quick google search found no mention of any memory problems with the library. As a quick test, I've replaced the offending line with the following:
int *test = malloc (90000 * sizeof(int));
memset(test, 9, 90000 * sizeof(int));
It allocated around 350 kB of memory and filled it all with nines. As I suspected, with this code in place, my matrix had a few rows replaced with all nines. At that point I was certain that my matrix was being stored in an un-allocated memory. How could this happen? I took a quick look at the malloc call that was allocating memory for my matrix, and it looked fine:
ac->node_matrix = malloc( matrix_width * matrix_width * sizeof(char) ); // square matrix
I took another look at the print function, and the indexing was the same as for the code that was setting the values. I took yet another look at the code setting the values, and THEN I saw it!
ac->node_matrix[x + (ac->screen_width * y)] = (char) result;
A true facepalm moment: I've been using the screen width instead of the matrix width for indexing! I suppose that's one argument against storing all your variables in one structure.
The first row was always written in the correct memory, but all the other rows were written wherever the width of the application window caused them to be written, and it was most certainly outside of the allocated memory block. Since the print function and the set function used the same copy-pasted indexing, the array was consistently written to, and read from the wrong place. The values written in the unallocated memory stayed there, until some other piece of code claimed that particular address. In this case, it was the libconfig call, but it could've been anything. In C, writing values outside of the array does not generate an array-out-of-bounds exception. The code 'just works', until suddenly it doesn't. Debugging is so much fun!