Tips about golang memory diagnostics

When we diagnose performance issues of a running system, memory issue is always a must check item, which could affect latency, throughput, jitter from many aspects. This article focuses on memory issue in golang program, and summaries how to diagnose golang memory issues with the help of principles and toolchains.

Golang memory management

Understanding the principles of golang memory management is a necessary prerequisite before digging into memory problems. Memory management is a large concept including memory model, memory layout, stack and heap usage, memory allocation and release, as for golang it is the topic of garbage collection. First of all this article lists some great resources talking about these topics.

How to diagnose memory problem

It is common to meet the following memory problems, this section will discuss diagnostics of golang memory problem and the next section will talk about some best practises to write memory friendly code in golang.

  • runtime.gcBgMarkWorker costs a large amount of CPU time, what is going on actually?
  • I have added a memory allocation restriction to my problem but why is it still OOM?
  • Why is memory allocation so slow, which costs nearly one second?
  • Besides the memory usage, are there any more metrics that can help to diagnose memory problems?
  • When a memory problem is located, what can I do to fix it or improve the situation?

Metrics

Go runtime provides a builtin memory metrics, it is easy to use prometheus/client-go to access these metrics. Note before go1.17, these metrics are collected by runtime.ReadMemStats function, which requires stop the world and should take the implications into account when deciding whether to use the Go collector. While since go1.17 it uses runtime/metrics instead. As for the detailed description of each metric can be found in prometheus go metrics, this article pays attention to part of these metrics.

  • process_resident_memory_bytes: the memory allocated from the OS, which means the amount of memory that belongs specifically to this process in bytes, known as rss.
  • go_memstats_next_gc_bytes: the target heap size for the end of the next GC cycle.
  • go_memstats_next_gc_bytes / (1 + GOGC / 100): an estimated heap size that is being used by program, and also the memory that can be fully managed by program. It was estimated by next GC threshold and GOGC parameter, and it should be almost equal to the inuse_space from pprof heap profile. Besides, between two consecutive GC, this value stays unchanged(because go_memstats_next_gc_bytes doesn’t change during that time), and that’s why inuse_space and heap profile details stay unchanged during two GC.
  • go_memstats_heap_alloc_bytes - go_memstats_next_gc_bytes / (1 + GOGC / 100): it approximately equals to the size of garbage memory. Combining being used memory and estimate garbage memory, in some memory increasing scenario, we can deduce whether the memory increasing is caused by real memory usage increasing, or is caused by slow GC.
  • go_memstats_last_gc_time_seconds: contains unix timestamp when last GC finished. We can use (clamp_max(idelta(go_memstats_last_gc_time_seconds{}[1m]), 1)) > 0 to tell when GC happens.
  • go_gc_duration_seconds: calls debug.ReadGCStats() with PauseQuantile set to 5, which returns the minimum, 25%, 50%, 75%, and maximum pause times. From this metric we can get the maximum GC stw duration, which will lead to the same latency to program.

allocate and free rate

  • irate(go_memstats_alloc_bytes_total): go_memstats_alloc_bytes_total keeps increasing as objects are allocated in the heap, but doesn’t decrease when they are freed. Calling irate with it can reflect the throughput of memory allocation (qps).
  • irate((go_memstats_alloc_bytes_total{} - go_memstats_heap_alloc_bytes{})[30s:]): this can be used to query the throughput of memory free.
  • irate(go_memstats_mallocs_total): shows how many heap objects are allocated. Calling irate with it can reflect memory allocated operation rate (ops).
  • irate(go_memstats_frees_total): shows how many heap objects are freed. Calling irate with it can reflect memory release operation rate (ops).

More diagnose methods

Common memory related optimization

Some common optimization ways will be discussed in this part, since GC is not a silver bullet for all scenarios, and golang GC and runtime are changing with every new release, the following optimization ways are suitable for specific scenarios.

  • The hack way to leverage cpu usage and GC performance by configuring GOGC. A real case about GOGC optimization is discussed in How We Saved 70K Cores Across 30 Mission-Critical Services (Large-Scale, Semi-Automated Go GC Tuning @Uber). Basically, this way is trying to adjust GOGC to an appropriate value based on the hard limit of memory, it is suitable for the scenario that has higher memory/CPU ratio, or the live_dataset is much less than hard limit of memory, in such scenarios, increasing GOGC will make the GC happens less frequenlty. There is a proof why increasing GOGC can decrease CPU costs in golang doc. Note althrough increasing GOGC can decrease CPU cost, it doesn’t mean the larger memory the better, since golang GC has there trigger conditions, including gcTriggerHeap, gcTriggerTime and gcTriggerCycle. Considering we set GOGC to a large enough number(according to the hard limit of memory), each time the GC is triggered by gcTriggerTime(2 minutes by default) instead of gcTriggerHeap, the golang process won’t consume as much memory as live_dataset * (1+GOGC/100) and a lot of memory is not fully used.
  • Use go ballast to reduce GC frequency, this is originally used in twitch and described in article Go memory ballast: How I learnt to stop worrying and love the heap, there is also a related discussion in golang/go/issues/23044.
  • Have a deep mind about whether memory is allocated in stack or heap, feel comfortable to use memory escape analysis and follow some best practises about variable passing, such as for small data, use pass-by-value instead of pointer, which do not escape to heap and can be recycled in time, in order to reduce GC pressure. Especially the variables passing by code snippet is a hot path. In the article Golang Memory Escape In-Depth Analysis has more details about escape analysis.
  • Avoid common memory leaking patterns.
  • Since go1.19 (which has not been released so far), a new runtime/debug function called SetMemoryLimit will be provided, this feature adds a memory limit that would give the Go runtime the information it needs to both respect users’ memory limits, and allow them to optionally use that memory always, to cut back the cost of garbage collection. It mainly focuses to solve the out-of-memory problem, but note OOM can still happen when memory limit is set. The tracking issue of this feature is golang/go/issues/48409.

Reference