1

Introduction

2

While writing code in JavaScript, you aren't required to manage any of the memory yourself. However, I've found that

3

developing a solid mental model of how memory works has been been crucial in my journey to try and improve as a software

4

engineer

5

6

There are applications where dynamic, and sometimes even garbage collected, languages may not be the best choice. This

7

post contains some of my experiences where I learned this the hard way.

8

9

This post will be a blend of theory and practical knowledge. I'll start by sharing some of what I've learned about

10

computer memory in general, then delve deeper into specifics around memory management in Node. We'll also explore ways

11

to debug memory leaks.

12

13

While the focus is on JavaScript and Node, the post will include some simple examples from Go as well. Comparing a

14

dynamic language to a static one offers a valuable contrast.

15

16

RAM

17

The applications we write receive a portion of our computer's working memory (RAM). We can think of this memory as a

18

long list of boxes. Each box has an address and can hold 8 bits:

19

20
┌─────────────────────┐ ┌─────────────────────┐ ┌─────────────────────┐ ┌─────────────────────┐
21
│      01001011       │ │      11001011       │ │      01001111       │ │      00001001       │
22
└─────────────────────┘ └─────────────────────┘ └─────────────────────┘ └─────────────────────┘
23
           0                       1                       2                       3
21

22

Between these boxes and the CPU sits the memory controller:

23

24
┌─────────────────────┐ ┌─────────────────────┐ ┌─────────────────────┐ ┌─────────────────────┐
25
│      01001011       │ │      11001011       │ │      01001111       │ │      00001001       │
26
└─────────────────────┘ └─────────────────────┘ └─────────────────────┘ └─────────────────────┘
27
           0                       1                       2                       3
28
           ▲                       ▲                       ▲                       ▲
29
           │                       │                       │                       │
30
           │                       │                       │                       │
31
           │                       │                       │                       │
32
           │                       │                       │                       │
33
           │                       │                       │                       │
34
           │                       │                       │                       │
35
           │                       │ ┌───────────────────┐ │                       │
36
           └───────────────────────┴─│ Memory controller │─┴───────────────────────┘
37
                                     └───────────────────┘
38
39
40
                                           ┌───────┐
41
                                           │  CPU  │
42
                                           └───────┘
25

26

The memory controller maintains a direct connection to each of these boxes. This allows the CPU to request any random

27

address from the memory controller, and receive an immediate response, hence the name random access memory (RAM).

28

Without this direct connection of the memory controller, we would have to scan through all the addresses sequentially to

29

find the data our programs are looking for

30

31

When the memory controller receives a request for an address, it retrieves the addresses of the neighboring boxes as

32

well. This concept is known as a cache line. Typically comprising 64 boxes, the CPU then stores these in an internal

33

cache. Consequently, placing frequently accessed items next to each other can enhance program speed, as they're likely

34

to be retrieved and cached together

35

36

In fact, many languages go even further to ensure proper memory alignment. For instance, if you create a struct in Go

37

that requires 7 bits, the compiler will add an extra padding bit to better align it with the underlying memory

38

architecture."

39

40

Memory segments

41

The piece of RAM allocated to our program is divided into different segments, which is going to vary depending on the

42

programming language. In statically compiled languages, these typically include the stack, heap, data, and text

43

segments.

44

45

Dynamic languages are going to have a memory layout that aligns well with their flexible needs. Given that this post is

46

primarily focused on JavaScript, we will concentrate on the segments most relevant to this language: the stack and heap.

47

48

The stack is a memory region where programs automatically manage data using a last-in, first-out approach. We can think

49

of it as having two pointers: one pointing to the base of the stack, and another to its top. When a function is invoked,

50

the pointer to the stack's top moves, allocating memory for all the function's local variables. Upon the function's

51

return, this pointer moves back to its previous position before the call.

52

53

Moving the pointer back effectively "deallocates" the memory by allowing the next function to override it. If you were

54

to inspect the memory just after the stack pointer had been adjusted, you would still be able to see the bits from the

55

previous function.

56

57

To place anything on the stack, the compiler must know how much memory to allocate. Without this information, it cannot

58

determine where to move the top pointer.

59

60

Let's use this Go struct as an example:

61

62
type example struct {
63
  num int64
64
}
63

64

When we define a type like this, we're informing the compiler precisely how much memory is needed to create an instance

65

of it. In this case it's 8 bytes to accommodate a 64-bit integer.

66

67

Thus, if we were to create an instance of this struct within a function like this:

68

69
func someFunc() {
70
  x := example{num: 5}
71
  fmt.Println(x)
72
}
70

71

When this function is called, the stack pointer moves to accommodate this allocation. Once the function returns, the

72

stack pointer reverts to its original position. This reversion allows the previously allocated object to be overwritten

73

during the execution of the next function, effectively deallocating it.

74

75

In fact, Go will really try to avoid unnessecary heap allocations to such an extent that, instead of passing a reference

76

to the struct to the fmt.Println function, it creates a completely new instance and copies all the values over.

77

78

This might sound stupid at first. Why would we create another instance of the same struct instead of simply passing a

79

reference to it, as we're used to doing in JavaScript? Wouldn't passing a reference be more efficient? To understand

80

this, let's think of it from our programs perspective. With the stack, it's straightforward for the program to

81

determine when memory can be reclaimed – the memory is released as soon as the function exits

82

83

If we instead had some code like this, where the function returns the memory address for the bits of that struct (a

84

pointer):

85

86
func someFunc() *example {
87
  x := example{num: 5}
88
  return &x
89
}
87

88

It's not immediately obvious when that piece of memory can be reclaimed. Another issue might arise if we want the object

89

to stay alive for a longer period. Perhaps it represents some long-lived state. If its placed on the stack, the next

90

function might overwrite it. Luckily, there's another location for storing data that may need a longer lifespan – the

91

heap.

92

93

While the heap offers greater flexibility, deallocating memory here is much more complex. In some languages, such as C

94

and C++, the responsibility to explicitly deallocate memory falls on the programmer.

95

96

However, in a large codebase, determining the exact moment when a piece of memory is no longer needed is far from

97

trivial. Additionally, if you deallocate the memory too soon, you risk encountering issues when the program attempts to

98

read or write to that memory location later.

99

100

Therefore, both Go and JavaScript use a garbage collector to handle the deallocations on the heap automatically.

101

102

We will explore the garbage collector in more detail later in the post, but in a simplified form, we can say that it

103

divides its work into three phases. In the first phase, the garbage collector starts with variables on the stack and

104

identifies which ones are pointing directly to values on the heap. Here we need to separate variables from values. A

105

variable x might be local to our function, but it could point to a _value_ on the heap.

106

107

During this initial phase, the garbage collector will temporarily pause all other executions. This means that our code

108

cannot run until this phase is complete. Once the garbage collector has identified all objects on the heap that are

109

directly reachable from the stack, it places them into a queue.

110

111

Next, the second phase begins, which can run concurrently with our code. This means that our code has the opportunity to

112

execute simultaneously, although the garbage collector will still compete with us for CPU cycles. This phase is commonly

113

referred to as the marking/painting phase, and it operates similarly to the previous one. The garbage collector examines

114

the values in the queue, marks/paints them, and then adds the values that they are pointing to to the queue. We can

115

think of the heap as a graph, and the garbage collector continues this process until it has marked every reachable

116

value.

117

118

Once that work is complete, every value that hasn't been marked is no longer reachable from any variables in our

119

program, and the memory for those unmarked values can safely be reclaimed."

120

121

This occurs during the third phase, during which the garbage collector once again pauses our code from running while it

122

deallocates the "garbage".

123

124

Now it might become clearer why Go prefers to copy things by value instead of sharing references. By keeping a value

125

local to a function, deallocations are simplified to just moving a pointer back and forth. This approach is typically

126

more performant than having to traverse a graph that could potentially contain millions of records.

127

128

So, why doesn't JavaScript do the same thing? Well, it can't. All objects, arrays, and functions in JavaScript are

129

dynamic. We can attach any property we'd like to them during runtime. This flexibility makes it fast to write our code.

130

We don't need to define the structure of our objects beforehand. However, if the compiler isn't aware of the structure,

131

it can't predict how much memory to allocate. And, without that knowledge, it can't determine where to move the stack

132

pointers. Consequently, JavaScript, along with other dynamic languages, tends to have a lot more heap allocations

133

compared to compiled static languages like Go or C++.

134

135

When heap allocations can become a problem

136

I'd say that for a large portion of applications, it's unlikely you'll encounter any problems due to this memory

137

allocation approach. However, there are certain scenarios where problems could arise, and when they do, they might not

138

be trivial to fix.

139

140

141

Consider this code:

142

143
const data = await someApi()
144
const result = data.map((x) => ({ ...x, name: x.name.toUppercase() }))
144

145

The anonymous function we pass to map is allocated on the heap. Inside this function, we're spreading values into new

146

objects, which are also allocated on the heap. The results from this function are then placed into a new array, which,

147

as you might have guessed, is also going to be allocated on the heap. The garbage collector will have to clean these up

148

once a request is finished.

149

150

Contrast this with Go, where equivalent code wouldn't require a single heap allocation. To reclaim that memory after a

151

request, all that's needed is to move a stack pointer.

152

153

Now, if you benchmark a realistic Node application with numerous functions like the one described, you'll find that a

154

significant portion of the servers CPU time is spent on garbage collections. However, for an application with a

155

consistent load, this is unlikely to cause any major issues.

156

157

However, I've faced some challenging experiences when my Node applications had to handle huge bursts of incoming

158

request. These situations often led to problems at the worst possible times.

159

160

To illustrate, imagine we have a server that executes a lot of business logic. To respond to a single request, it needs

161

to make a thousand allocations on the heap. Now, if this server suddenly starts receiving a surge of traffic, the heap

162

will fill up quickly, triggering the garbage collector. Remember, the garbage collector pauses our code during two of

163

its three phases

164

165

As we wait for the garbage collector to complete its work, we're unable to handle any new requests, leading them to

166

accumulate in a queue. This queue may include health checks from your load balancer. If these requests don't get a

167

response in time, the load balancer might consider the container unhealthy, and shut it down. Consequently, the

168

remaining containers will have to handle an increased volume of traffic, causing their request queues to grow longer.

169

This can create a domino effect where they, too, might be deemed unhealthy.

170

171

The V8 engine, which Node uses, does its best to assist in such situations. It analyzes your code and identifies

172

opportunities to JIT compile frequently used paths. As part of this JIT compilation, it conducts what is known as escape

173

analysis.

174

175

Escape analysis is a process that determines whether an object can be broken up into multiple variables, thereby

176

avoiding a heap allocation. This optimization could significantly improve your application's performance. Consider the

177

following code:

178

179
function calculateSum(a, b) {
180
	const result = { sum: a + b }
181
	return result.sum
182
}
180

181

The result object is created inside the calculateSum function and is used only to hold the sum of a and b. Since

182

the actual object result does not escape the function (only the value of result.sum is returned), the V8 engine can

183

optimize the memory allocation because it's able to identify that our function is equivalent to this:

184

185
function calculateSum(a, b) {
186
	const result = a + b
187
	return result
188
}
186

187

Creating the object is unnessecary, and the function above can easily be converted into byte code where the numbers are

188

loaded directly into registers.

189

190

However, based on my experience, it's quite challenging to predict how these optimizations will unfold in less contrived

191

examples. I've made the mistake of relying on simple benchmarks, which suggested that Node might perform on par with

192

some compiled languages, and then assuming I'd see similar results in my own application. However, my application,

193

consisted of several thousand lines of JavaScript (excluding node modules), and wasn't JIT compiled to the same extent.

194

195

As a rule of thumb I'd say that, if Node is performing close to a compiled language, it indicates that the JIT compiler

196

has done an excellent job, and you're likely comparing machine code with machine code. However, if performance is

197

critical for the type of application you're building and you require more predictable results, I would honestly suggest

198

that you'd at least consider the use of another language.

199

200

However, if I/O is the primary bottleneck for you, and you aren't performing numerous transformations that excessively

201

fill the heap, Node can be almost as effective a choice as any other technology in terms of performance.

202

203

Now that we've established Node's propensity to utilize the heap more than some other technologies, let's dive even

204

deeper into that particular memory segment.

205

206

Node/V8 Heap

207

As I mentioned earlier, V8 is the JavaScript engine powering Node. However, V8 is also used by Google Chrome. When

208

executing, it adheres to the memory layout and management policies of the host process. In the context of Node, the Node

209

runtime oversees memory management, and V8’s allocations occur within the memory segments managed by Node.

210

211

Thus, the V8 engine gains access to the heap segments created by Node and further divides the heap into three distinct

212

parts.

213

214

The first two parts of the heap are known as new space and old space, designated for storing objects, strings, and

215

other data types. This division is made for optimization purposes.

216

217

Initially, when an object is created, it's placed in the new space segment. Being smaller, the new space facilitates

218

faster garbage collection runs due to having less data to scan.

219

220

In fact, the new space is actually further divided into two areas: the nursery and the intermediate. Objects are

221

first placed in the nursery. If they survive a garbage collection run, they move to the intermediate section, and if

222

they survive yet another run, they are then moved to the old space.

223

224

The old space is much larger and, depending on the machine, can grow to several gigabytes. While allocations in this

225

space are still fast, the garbage collection runs are much slower and occur less frequentely.

226

227

By keeping the new space small, Node can perform cleanup operations quickly. It may not be as efficient as merely moving

228

a pointer like the stack, but it's still fast.

229

230

This division is also important because the speed of the garbage collectors traversal isn't the only thing that can slow

231

down your applications; memory fragmentation can also plays a significant role.

232

233

Let's revisit our earlier analogy of RAM as a series of boxes and imagine these boxes are part of our program's heap:

234

235
┌─────────────────────┐ ┌─────────────────────┐ ┌─────────────────────┐ ┌─────────────────────┐
236
│      01001011       │ │      11001011       │ │      01001111       │ │      00001001       │
237
└─────────────────────┘ └─────────────────────┘ └─────────────────────┘ └─────────────────────┘
238
           0                       1                       2                       3
236

237

Now, suppose boxes 0 and 2 are cleared out by the garbage collector because they're no longer reachable from any roots:

238

239
┌─────────────────────┐ ┌─────────────────────┐ ┌─────────────────────┐ ┌─────────────────────┐
240
│      00000000       │ │      11001011       │ │      00000000       │ │      00001001       │
241
└─────────────────────┘ └─────────────────────┘ └─────────────────────┘ └─────────────────────┘
242
           0                       1                       2                       3
240

241

This creates a problem: we can no longer allocate two contiguous bytes. Remember, we prefer not to spread data too

242

widely, as having related values in close proximity speeds up our programs. This is because the CPU fetches everything

243

it needs in one go when it retrieves a cache line.

244

245

Thus, it's a clever strategy that the V8 engine only moves long-lived objects to the larger old space. If this space

246

were shared with many short-lived objects, it would necessitate much more frequent reshuffling of data, which would

247

result in worse performance.

248

249

Now that we've covered the first two divisions that the V8 engine makes of the heap space, let's discuss the third,

250

known as code space.

251

252

The code space portion, is used to store the generated machine code. Other languages that compile to a binary, does not

253

store the machine code on the heap. Instead, it gets stored in the text segment. However, since the machine code in Node

254

is generated by JIT complilation at runtime, it makes the amount of memory needed unpredictable. It may need to grow or

255

be discarded if the engine decides to deoptimize a function. Therefore, the V8 engine will reserve a part of the heap

256

for the compiled byte code.

257

258

Memory leaks

259

Up until now, we've delved into a lot of memory theory. I'd like to shift our focus to something more practical, and

260

share my experiences with identifying and resolving memory leaks.

261

262

A memory leak occurs when an object on the heap remains accessible through some unintended reference. Since heap

263

deallocations are automated, the objects will persist as long as the garbage collector is able to reach them during the

264

traversal.

265

266

I've encountered memory leaks of varying complexity. Some were straightforward to detect and fix, while others were very

267

difficult. Next, I'll share some practical tips and tricks that I've found useful.

268

269

The easiest issues to fix are the ones when you're able to quickly reproduce a Allocation failed - JavaScript heap out

270

of memory error.

271

272

The simplest issues to resolve are those where you can quickly reproduce an Allocation failed - JavaScript heap out of

273

memory error.

274

275

However, in my experience, the it's rarely the case that the errors are that easy to reproduce. I've dealt with leaks

276

that occurred gradually. For instance, there was a case where a reference was inadvertently retained due to the error

277

handling of specific HTTP codes. The server would take several days to crash, often with new deployments occurring well

278

before any signs of trouble.

279

280

So, if the server never crashes, how do you recognize a leak? The most obvious method is to monitor your applications

281

memory metrics. Another indicator is a gradual degradation in performance. As the old space expands, the V8 engine is

282

going to trigger slower, more resource-intensive garbage collector runs. The runs will be triggered more frequently, and

283

consume more and more CPU cycles which is going to impact your applications performance.

284

285

The most effective way to confirm a slow brewing memory issue like this is to run your application locally with the

286

--trace-gc flag. By continuously sending realistic requests to your local server, you can generate logs that look like

287

this

288

289
[42662:0x128008000] 38 ms: Scavenge 6.2 (6.3) -> 5.5 (7.3) MB, 0.25 / 0.00 ms  (average mu = 1.000, current mu = 1.000) allocation failure;
290

291

There is a lot of information in this single line of text, so let's dissect it piece by piece.

292

293

38ms is how long it took for the garbage collector to run. 6.2 is the amount of heap space used before the garbage

294

collector run in MB. (6.3) is the total amount of heap used before the run. 5.5 is the amount of heap used after

295

the run, and (7.3) is the total amount of heap used after the run.

296

297

0.25 / 0.00 ms (average mu = 1.000, current mu = 1.000) is the time spent in GC in milliseconds, and allocation

298

failure is the reason for running the GC. The term allocation failure as the reason for running the GC may sound

299

alarming, but it simply means that V8 has allocated a significant amount of memory in the new space, triggering a

300

garbage collection run to either deallocate some objects or promote them to the old space.

301

302

You can find the actual print statement in V8s source code

303

304

I've saved an important detail for last: the term Scavenge. Scavenge refers to the algorithm that performs garbage

305

collection runs in the new space.

306

307

It's beneficial to us that the algorithms name is explicitly printed in these logs, because we're going to ignore all

308

Scavenges since they trigger frequentely, and instead focus on the Mark-sweep runs, which are garbage collection

309

processes occurring in the old space.

310

311

In fact, before encountering the dreaded Allocation failed - JavaScript heap out of memory error, you'll typically see

312

a log entry stating <--- Last few GCs --->, followed by several Mark-sweep statements. This indicates the V8 was

313

desperatly trying to free up some memory before it crashed.

314

315

The Mark-sweep algorithm, as we've discussed, involves two phases. The first phase marks objects so the garbage

316

collector knows which ones to delete. The second phase, sweep, deallocates memory based on those markings

317

318

Between these two phases, there's another important step. After the marking is complete, the garbage collector

319

identifies contiguous gaps left by unreachable objects and adds them to a free-list.

320

321

These free-lists are going to improve the performance for future allocations. You see, when the V8 needs to allocate

322

memory, it consults these lists to find an appropriately sized chunk, eliminating the need to scan the entire heap. This

323

approach helps to significantly reduce memory fragmentation. In fact, the garbage collector will only move objects

324

around (a process known as compaction) if the memory pages are highly fragmented. From my experience, it's quite rare to

325

see these log statements that indicates that compaction from Node's garbage collector took place.

326

327

Seeing Mark-sweep statements in your logs is normal and shouldn’t be a cause for immediate concern. You should only

328

start to worry if you notice that the intervals between these statements are getting shorter, while the time spent on

329

garbage collection is increasing. Clearly, this is a horrible combination for your applications performance: garbage

330

collection runs become more frequent and take longer.

331

332

Suppose you've now been able to confirm the presence of a memory leak. How should you go about locating the problematic

333

code? My approach varies depending on how slow the leak is.

334

335

If it's noticeable fairly quickly, I typically launch the application with a reduced old space size by setting

336

--max-old-space-size to a low value. This adjustment makes the application crash faster. In conjunction with this, I

337

use the --heapsnapshot-near-heap-limit=1 flag to capture a snapshot of the heap just before it crashes.

338

339

We can then use the developer tools in Google Chrome to load and inspect the snapshot:

340

341
Heap snapshot
Heap snapshot
342

343

If the memory leak is so gradual that significantly reducing the old space size is required to force a crash,

344

pinpointing the actual issue can be challenging. In such cases, the app might crash from normal usage, and the leaking

345

allocations might not even have occurred yet.

346

347

In this scenario, I might leave my computer running overnight with only a moderate reduction in heap size. If it

348

crashes, I can analyze the snapshot in the morning, looking for either one exceptionally large object or numerous small

349

ones cumulatively exceeding the allocated memory.

350

351

If this approach doesn’t yield results, or if I fail to reproduce the issue locally, I might add a temporary endpoint to

352

my server. This endpoint, accessible only within my applications VPC and with proper authentication, triggers some code

353

that takes a snapshot of the servers heap, and uploads it some storage like S3. I might take a few of these snapshots

354

over a couple of days before I try to analyze them.

355

356

I then, once again, load these snapshots into Chrome and choose the comparison option like this:

357

358
Heap snapshot 1
Comparison dropdown option
359

360
Heap snapshot 2
Compare to previous snapshot
361

362

This allows you to see how the heap has changed between different snapshots.

363

364

The next things I like to do is to sort based on Size Delta:

365

366
Delta size
Order by increasing delta size
367

368

This is, of course, a contrived example, but should this code have contained a memory leak, we would expect to see an

369

increasing delta over time.

370

371

This brings us to the end of this post. I hope you have found the information useful!

371

372

The end

373

I usually tweet something when I've finished writing a new post. You can find me on Twitter

374

by clicking 

normalintroduction.md
||537:7

Recently Edited

Recently Edited

File name

Tags

Time to read

Created at

context

  • go
  • context
8 minutes2024-02-28

circular-buffers

  • go
  • concurrency
  • data processing
5 minutes2024-02-04

go-directives

  • go
  • compiler
  • performance
4 minutes2023-10-21

async-tree-traversals

  • node
  • trees
  • graphs
  • typescript
19 minutes2023-09-10

All Files

All Files

  • go

    5 files

  • node

    2 files

  • typescript

    1 file

  • frontend

    1 file

  • workflow

    7 files