Modern programming languages that we work with these days have made memory management simpler that ever, they manage it on their own using things like garbage collectors, smart pointers, ownership, etc.

Creating an empty vector is as simple as

let empty_vec: Vec<i32> = Vec::new();

But some questions remain unanswered if you haven’t tried manually doing memory operations like:

  • how is a sized vector created using Vec::with_capacity?
  • how is a vector resized on calling pop_back or push_back?
  • and finally how is the memory freed when vector goes out of scope?

Let’s engage in old god C as its closest we get to actual system calls and try to figure out these question inside out.

memory inside out

Memory Allocation

To store a vector or object we need a backing address in the physical memory of our system. We use malloc to ask operating system for more memory which looks something like below.

int* numbers = (int *)malloc(8 * 42);

In above code we are trying to create an array of numbers or more pedantically a pointer to integer which is the starting point of our array.

The parameter to malloc is the number of bytes we want to allocate. Given on a 64 bit system an integer takes 8 bytes so we are allocating for 42 integers.

malloc returns a void * which is C’s way of saying it can be anything so we have to type cast it based on our requirement and understanding.

Dynamic Sizing

Generally we can’t rely on hard coding the size of objects integers might be simple but can still vary on a 32 bit machine. A more complex thing like a struct of multiple values will vary even more.

struct foo
{
  int element;
  struct foo *next;
};

So its better to use a utility function sizeof for calculating size of our type.

int* numbers = (int *)malloc(sizeof(int) * 42);

In case you are running on 512 MB system and system runs out of memory malloc will return NULL value.

It’s not the null you see in Java and other garbage collected languages, NULL means a zero value something like (void *)0 whereas the other keyword is used to identify a variable as not having any reference.

Pointer inside Pointer

How about we handle an even more complex scenario

struct wallet {
	int version;
	int *notes;
};

Now we have a dynamic array inside of a dynamic struct which we have to allocate memory for, using just sizeof will give an empty array to us and it will segfault if you try to access array’s memory.

struct wallet* w = (struct wallet *)malloc(sizeof(struct wallet));
// this will segfault
// w->notes[0] = 42;

So we have to allocate memory of the array of integers inside of our wallet.

wallet->notes = (int *)malloc(sizeof(int) * 42);

Zeroing Memory

If you use malloc the memory is allocated to us but its never zeroed out before we get it. It can have garbage values which if not cleaned can cause your program to behave in unexpected ways.

cleaning dirty

The solution for this is to use one shot allocation and memory zeroing using calloc.

w->notes = (int *)calloc(4, sizeof(int));

calloc allocates an array of size 4 where size of each element is sizeof(int). The returned memory is set to zero.

Now the question arises why not always use calloc:

  • First it only works for array allocation so you can’t allocate a struct using it.
  • It is slow, in case you don’t need to zero memory out and pass the buffer to some other function which maybe reads the file or does some other io just use the big brother malloc.

Resizing

To answer the next question that we had in regard of a vector is basically related to allocating more memory and de-allocating extra memory. We do that using realloc function.

We will declare our vector struct as below it will track backing store, current position and total size of store.

struct vector {
  int *backing_store;
  int current_position;
  int backing_store_size;
};

When we call push_back the vector checks whether the size of its backing store is sufficient or not. If not, it allocates a backing store of double size using realloc and the places the element there. The newly allocated memory will be uninitialised.

void push_back(struct vector *v, int element) {
  if (v->backing_store_size <= v->current_position) {
	v->backing_store_size = v->backing_store_size * 2;
    v->backing_store = (int *)realloc(v->backing_store, sizeof(int) * v->backing_store_size);
  }
  v->backing_store[v->current_position] = element;
  v->current_position = v->current_position + 1;
}

Similarly, we can handle pop_back, just one crucial optimisation we won’t reduce to half but instead if current position is less that 1/4th of backing store size then we reduce the backing store to quarter of its previous size.

vector pop

All the functions above return a NULL value in case of error and set errno which is Cs way of communicating error kind.

Freeing

It’s also important that we keep our memory usage in check otherwise it can exceed the physical memory capacity really fast. We have to free unused memory and return it back to the operating system.

be free

It’s done using free function call, you pass in the pointer where memory is allocated and it will return it back to the system.

int* numbers = (int *)malloc(sizeof(int) * 42);
free(numbers);

Common Errors

The rise of RUST memory safe languages, paradigm like smart pointers in C++ and system like RAII is because of the reason that there are so many memory issues that keep happening in systems.

The recent Crowd Strike one was also a memory issue where invalid memory buffer was being accessed.

blue screen of death

Let’s go over some common gotchas that you have to keep in mind while managing memory.

Invalid Free

Accessing a memory which was already freed will cause SEGFAULT and trying to free (double free) it again is an undefined behaviour you might not wanna encounter.

Free expects a pointer from the result of call to malloc passing anything else will also result in erroneous program execution.

Less Allocation

It’s possible you might have allocated less memory than required or no memory at all for an operation in which case buffer overflow will happen or SEGFAULT.

buffer overflow

It is one of the binary exploitation ways hackers use and will probably cost your service a lot of secret credentials.

char *src = "hello world";
char *dst = (char *)malloc(sizeof(char) * 4);
// dst has size 4 while required is strlen(src) + 1
strcpy(dst, src);

In above example we could have probably used strdup.

Garbage Values

Accessing garbage or uninitialised value without clearing your memory will result in undefined behaviour of your program. It’s sometimes good to use calloc or clear memory on your own.

Memory Leak

Allocated memory is returned back to operating system when program dies even if you don’t call free on your own.

But that’s only in case of short running processes, in case of long ones like databases and servers if you don’t free unused memory, the memory allocated to program will keep increasing and finally your system will run out of available memory leading to crash.

no memory

Finding memory issues

valgrind is one of the tools we can use to identify memory related issues. It can tell us about:

  • uninitialised reads
  • invalid memory read/write
  • memory leak
  • invalid free, and a lot more

For a buggy program like this

int *data = (int *)malloc(sizeof(int) * 100);
data[100] = 42;

It will tell us about the invalid write if you run it through and also about the lack of a free call.

gcc -g test.c -o test
valgrind --leak-check=yes ./test
# Invalid write of size 4
# 400 bytes in 1 blocks are definitely lost in loss record 1 of 1

In case we try to access uninitialised memory and run it through valgrind it will let us know

printf("value at index 11", data[11]);
# Use of uninitialised value of size 8

Conclusion

Knowing about memory management is very crucial to build safe and performance critical software which basically is every software.

Offloading tasks to garbage collectors in programming languages with automatic memory management does make it easy but things like memory leak can still happen there so you should know what’s happening in your program otherwise AI replacement is coming for you :P.

ai destroy humans

That was all for this week, I wrote this piece after reading through Operating System: Three Easy Pieces its a very nice book with exercises and references to research papers I am enjoying reading it will keep sharing my learning from it as I try to complete it fingers crossed.