I’ve been testing the ram usage performance of various versions of Ruby, to compare the effectiveness of Narihiro Nakamura’s bitmap marking garbage collector. I’ll be publishing the results of that very soon but in the mean time I thought I’d write a bit about how I measured ram usage for this particular case.
Modern kernels, like Linux, have advanced memory management systems
that can make it tricky to know for sure how much ram a process is
really using. Tools like
ps don’t quite give us everything
pmap comes closer, but it still misses out some important
Using a combination of information from
/proc/$pid/smaps we can piece together what we need to compare
copy-on-write friendliness. I’ll be using
the sum of all
wrote a script to total it up, described at the end.
First, let me summarise the way memory allocation on Linux works. I won’t go into shared libraries, memory mapped files, swap space or any of that more advanced stuff right now though, so just bear in mind that these contribute to memory usage too.
When your process allocates memory, its virtual memory size (
increases but the real ram isn’t actually used yet.
When you write something to the allocated memory, it starts to use real ram - the bit you wrote to is now considered “private” or “dirty”.
So if you allocate 1GB of memory to store an array of bytes, but only
write one million entries, then your
VmSize will be 1GB but your
Private_Dirty size will only be 1MB.
When forking a process, Linux doesn’t copy the whole allocated address space over from the parent process to the new child and instead only copies pages over as they are written to (hence “copy on write”). The dirty memory space is shared between them until then.
So if you fork this 1GB process, then that 1MB of
memory will become
Shared_Dirty. So between them, the processes now
only use 1MB of real ram (though both think they have 1MB each).
If your newly forked child process then adds another million entries
to the array, then the child process is now using 2MB of memory in
total, but is still sharing 1MB of it with its parent process. The
child process will see its
Private_Dirty stat increase to 1MB,
Shared_Dirty for both process will stay the same at 1MB. So
only 2MB of real ram is in use, even though it looks like 3MB.
Now, if the child process attempts to overwrite the first million
entries of the array then Linux first has to make a copy of the shared
memory so that the child doesn’t trample over the parent’s array. Now
the two processes will no longer be sharing any memory. The parent
will have 1MB of
Private_Dirty and the child have 2MB of
Shared_Dirty will be 0 for both processes.
So, to actually measure the shared ram I threw together a simple
little script to sum up all the
values for some given processes.
The script is called cowstat.rb and you give it a regexp as the first argument which it uses to filter processes (you can give it pids instead if you prefer).
$ ruby cowstat.rb cowtest 28167: cowtest ./cowtest vm_size:1052744 kB vm_rss:1404 kB private_dirty: 40 kB shared_dirty: 1080 kB 28168: cowtest ./cowtest vm_size:1052740 kB vm_rss:2140 kB private_dirty: 1064 kB shared_dirty: 1080 kB
Historically, Ruby hasn’t been very copy-on-write friendly. Web
servers like Unicorn or Passenger can be configured to initialise your
app in the parent process before forking off child processes. In
theory, this means that the ram allocated for your models and
controllers etc. should be shared between all your processes (showing
Shared_Dirty). If your code takes up 50MB of ram and you have
10 workers, then you just saved 500MB of ram.
The problem is that Ruby’s garbage collector, which runs once in a
while, makes lots of writes to your memory as part of its accounting
system. This means that even if there is nothing to be garbage
collected, much of that lovely
Shared_Dirty memory turns to
More on this in my next post, where I’ll look at how this has been improved.
In the mean time, you can have a play with our Ruby 1.9.3 Ubuntu packages which include Narihiro Nakamura’s bitmap marking garbage collector (as backported as part of Sokolov Yura’s performance patches (p327 packages with the latest patch are currently in the experimental repository).