Published on 05/12/23
On February 4, 2020, Discord, a VoIP and instant messaging social platform, has switched its programming language of the “Read States” service from Go to Rust. The sole purpose of the Read States service was to keep track of which channels and messages the user had read. The Read States service was a data structure accessed every time the user connected to Discord, every time a message was sent, and every time a message was read. The Go implementation was fast most of the time, but there were large latency spikes every few minutes that could degrade user experience as the service was a hot path. After investigation, it was determined that the spikes were due to the garbage collector of the Go language.
Conveniently called the “Read State,” there are billions of data structure Discord used to store read state information, one per user per channel. Each stores data like how many @mentions has been the user called and needs to be updated atomically and often reset to 0. To update the atomic counter instantly, each server for the Read States has a Least Recently Used (LRU) cache that stores the data of millions of users, tens of millions of Read States in each cache. Every second there are hundreds of thousands of cache updates. That’s a significant amount of memory being used. As the memory is freed, an action to make the unused memory reusable, every 2 minutes at a minimum because of the architecture of the Go’s garbage collector, the latency spikes occurred. This is shown in Figure 1. To solve the problem, Discord changed its language to Rust, a language that is blazing fast and memory efficient, as it has no runtime or garbage collector. Then why is a garbage collector necessary? What is a garbage collector in the first place? What is memory, and how does memory management work? This article will clear out your curiosity.
To understand memory management, you must first comprehend how memory is organized. A program’s data is stored in memory during runtime, the execution step of the program. Memory is divided into two areas: stack and heap.
Stack
The stack is where the values of local variables of functions, such as function parameters are stored and work accordingly to the LIFO (last in, first out) manner, as the name suggests. When a function is called, values of the local variables are stored in a structure called a stack frame, and the frame is pushed onto the stack. When the function ends and returns, the frame is popped from the stack as it has been used.
Heap
The problem with the stack is that there can be data whose lifetime is irrelevant to the call and return of functions. Such data would cause unnecessary time for copying data when returning the values. The heap is the storage that stores values that stays even after when the stored value returns. As it can stay forever and occupy space when unnecessary, programmers call memory manager, a runtime system that allocates- the act of putting a value on the heap- and deallocates- the act of removing a value on the heap- value.
Like there are diverse programming languages, each language provides its mechanism to decide on stack or heap to store values. Low-level languages, the languages that provide minimal abstractions to programmers, like C, C++, and Rust, give programmers the freedom to control memory by themselves. By default, values go to the stack. But programmers can also allocate values in the heap by utilizing designated keywords like malloc[1] in C, new[2] in C++, and Box::new[3] in Rust.
The high-level languages that provide strong abstraction, like Java, Python, Scala, JavaScript, and Go, each have a prewritten method to manage the memory. In most languages, primitive values like integer and Boolean are stored in the stack, and other objects are stored in the heap. Therefore, adding any values besides primitives is impossible in high-level languages.
Memory Management
Function f( ) = {
for some iterations {
g (i)
}
}
Function g(int a) {
var b = C(a, 2, 3, 4)
// …. Use b in some way
}
When such pseudocode is exists, after calling function f, function g will be repeatedly allocated and take the space of the heap. If the iteration is large enough, there could be no more free space. Each language has different methods of indicating that the heap space is full; low-level languages would return a null pointer instead of an existing address for new values when allocating them, and high-level languages would throw an exception. This is because the heap space is limited. The objects that are useless but take up space are called garbage.
Memory management is used to remove the garbage and restore the occupied space for new objects. Memory management is the decision of when to deallocate objects and is in two forms: manual memory management and automatic memory management.
Manual memory management
Manual memory management is when programmers take care of memory management. Low-level languages like C, C++, and Rust provide designated keywords for the deallocation of values in the heap along with the keywords for allocation (free[4] in C, delete[5] in C++, and drop[6] in Rust). After using an object, the programmer can manually deallocate memories, requiring no complex and time-consuming algorithm to detect garbage that needs to be deallocated automatically. The utilization of heap space can be efficient and optimal as long as the programmer makes no mistake.
However, programmers are humans, and they make mistakes all the time. If deallocation occurs before using the object that computer scientists call use-after-free (UAF), bugs called dangling pointers are created. A value corruption in the heap is one possible side effect of dangling pointers. If a value existing in an arbitrary address is deallocated and a new value is allocated in the same address, errors resulting in unexpected results would occur because the existing value no longer exists when the programmer tries to access it. One famous real-life example of UAF is the vulnerability in Adobe Acrobat Reader DC, where attackers can make a PDF that executes arbitrary code[7]. Video https://youtu.be/sGQYG_Qbjvg demonstrates the example. If deallocation occurs too late or does not occur, memory leaks, a waste of the heap space by leaving garbage, happen. It is not a serious problem nowadays because the space of the memory has increased significantly. However, memory leaks can still cause errors when the codes are in a lousy design.
Automatic memory management
Computer scientists have developed memory managers that automatically find and deallocate garbage to prevent such critical bugs and relieve programmers' stress about all their objects in the heap. A garbage collector (GC) is the most used form of automatic memory management. It finds garbage based on reachability, the notion of whether an object is reachable by the program from the stack and following the pointers. If there is no way to reach from the stack to the object, it means that the object is inaccessible. There are multiple known algorithms that decide whether an object is reachable or not, and each of them has advantages and disadvantages. The remainder of the article explains three widely used algorithms to implement GC.
Reference counting
The simplest method to implement GC is reference counting. In reference counting, each object has a counter representing the number of pointers referring to the object. When a new object is allocated, the counter starts from zero. Every time a new pointer refers to the object, the counter increments by one. Every time the existing pointer referring to the object is removed, the counter decrements by one, and if the counter reaches zero, the object is considered unreachable and is deallocated.
The advantage of reference counting is that it is easy to implement. For example, making a data structure containing the object's values and an integer storing the number of pointers referring to the object can solve the problem. There are also implementations in the libraries of languages, such as shared_ptr[8] in C++ and Rc[9] Rust. Furthermore, because the objects are deallocated immediately after it has been identified as garbage, there is very little space dissipated in the heap.
However, the significant problem of reference counting is that it cannot handle cyclic structures. The cyclic structure is when the objects are not connected with the stack but themselves. The program cannot access the objects even when the number of pointers referring to the objects is greater than 1. If such a cyclic structure exists, memory leaks and significant performance degradation could happen. Additional space is also required to store the reference counters.
Furthermore, the spaces taken up by the objects can be split the heap space into multiple blocks. This is called external fragmentation. External fragmentation limits the ability of heap space to store large-sized objects as the object may not be allocated, although there is enough space. External fragmentation also degrades the locality, a term to describe whether multiple objects that are desirable to put together are close to each other, as the spaces between allocated and deallocated objects create divided heap space.
Although reference counting is criticized for its severe drawbacks, languages like Python[10] and Swift[11] still use reference counting with additional algorithms or features to overcome its disadvantage.
Mark-and-Sweep GC
Mark-and-Sweep GC is an algorithm triggered when more heap space needs to be available. It consists of two phases: the marking phase and the sweeping phase. To identify whether an object is a garbage or not, this GC categorizes objects into three: unreached, where the object has not been gone through the process yet and is a candidate for deallocation; unscanned, where the object is reachable and will not be deallocated but has pointers pointing to other objects that have been not scanned; scanned, where the objects and the objects it is referring is all unscanned or scanned and will not be deallocated. In the marking phase, all the objects start as the unreached state. Starting from the objects referred from the stack, the GC checks objects that are referred ultimately by the stack and changes the state from unreached to unscanned. If an unscanned object refers to no more unreached or unscanned objects, the object becomes scanned. This method would mark objects that are reachable from the stack as scanned and object that are not as unreachable. In the sweeping phase, all the unreachable objects are deallocated.
An advantage of the Mark-and-Sweep GC is that the problem of cyclic structure can be handled by it. No cyclic structures can be created as the GC deallocates all the unreachable objects from the stack.
The disadvantages are similar to reference counting; Mark-and-Sweep GC requires space, preferably a free list, to contain the states of the objects. It also suffers external fragmentation, as GC only deallocates garbage and does not reorganize all living objects. Furthermore, unlike reference counting, the program's execution must be stopped during the GC process because the memory manager must scan the entire heap space and identify whether stored objects are unreachable. Any access to heap memory while the memory manager works would cause an unexpected result.
Copying GC
Like Mark-and-Sweep GC, copying GC identifies unreachable objects and deallocates them. However, the GC copies all the reachable objects and reorganizes them from the from-space to to-space. Cheney's algorithm is a widespread and efficient implementation of this GC that manipulates two pointers, free- a pointer to the beginning of the free space of the to-space- and scan- a pointer to the first Unscanned object in the to-space. This article will not specify how Cheney's algorithm works.
By copying and reorganizing objects into a new space, copying GC solves the two problems of space wasted by the values used during GC and external fragmentation. As a result, allocation is faster than in other GC, and the memory has good locality, so efficient use of the heap can be possible.
The disadvantage of copying GC is the cost of copying the objects. Like the Mark-and-Swap GC, execution must be stopped during the GC process. Furthermore, copying objects in the heap is triggered every time the GC is triggered. Copying large-sized objects require a long time and would cause a significant latency spike. Moreover, because copying divides the heap space into two, half of the heap needs to be available for the place to be copied, causing less available memory and more frequent GC trigger.
Note that not all automatic memory managements are GC and run in runtime, like the Rust language, where the compiler analyzes the code and inserts drop in necessary places during compile time. Still, the GC is the most popular one.
Now back to the issue of Discord. You might now understand why the two-minute latency spikes happened in the Read State Service of Discord. It was because the GC deallocated unnecessary values in the heap during runtime. The latency spikes were gone by changing the programming language from Go to Rust. Even with this small optimization, the Rust version of the service was able to defeat every single performance metric of Go. Discord learned a lesson about the importance of optimization in memory management.
However, this does not mean that manual memory management is always better than automatic memory management. Sacrificing performance, automatic memory management provides programmers with a safe and undemanding environment to focus on the program's design. When designing a program, consider balancing the advantages and disadvantages of manual and automatic memory management and choose an appropriate programming language that fits the objective.
[1] https://en.cppreference.com/w/c/memory/malloc [2] https://en.cppreference.com/w/cpp/language/new [3] https://doc.rust-lang.org/std/boxed/index.html [4] https://en.cppreference.com/w/c/memory/free [5] https://en.cppreference.com/w/cpp/language/delete [6] https://doc.rust-lang.org/std/mem/fn.drop.html [7] https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-9715 [8] https://en.cppreference.com/w/cpp/memory/shared_ptr [9] https://doc.rust-lang.org/std/rc/struct.Rc.html [10] https://docs.python.org/3/library/gc.html [11] https://docs.swift.org/swift-book/LanguageGuide/AutomaticReferenceCounting.html
Comments