Some best practices with go etcd library
Etcd is a distributed reliable key-value store written by golang and native golang client binding is provided in its official repository, which makes it very convenient as well as robust to write golang code to communicate with etcd server. Here robust means the etcd client should guarantee correctness and high availability under faulty conditions, more details can be found in etcd official client design document. This article summarizes some best practises with etcd client library from the lessons I learned from the production environment. To explain some implementation mechanisms I will also link to etcd source code (mainly based on branch release-3.4).
Consume data from the watch response channel ASAP
Watch works with a bi-directional gRPC stream between watch client and etcd server, the most common way to use watch is as follows, receive keyspace changed result from the watch channel and consume these KV events one by one.
1 | cli, err := clientv3.New(clientv3.Config{ |
Supposing the process throughput for changed KV events is less than the watched keyspace changed frequency, it is easy to find the delay between consumption and keyspace update in etcd, with the help of monitoring system or some data comparison, etc. If we can’t reduce the update frequency of etcd keyspace, then we must ensure the process throughput can match the keyspace update frequency, by either increasing the consume speed of consumer or cache the KV changed events from watch channel and consume data asynchronously.
There exist other ways to use watch, for example, multiple clients change the same key randomly, in an atomic way by using Txn API. Another client wants to subscribe to the update event of this key as soon as possible, but when it knows the key has changed, it doesn’t need to know data for each MVCC version exactly. Such as three clients, each of them updates the key once, and the key has three versions: v1, v2, v3. The subscribe client wants to know the key is updated and queries the data in v3 is enough. The logic for subscribe client can be as follows. It creates a channel to receive keyspace change notification from the watch channel and the outer code consumes this channel continually.
1 | output := make(channel struct{}, 1) |
This is a real production scenario and the subscribe client works well (But in fact it is not accurate, I will explain later), and we can always get the keyspace update notification in time. However we observed a slowly increment with heap memory, and a heap pprof shows large amount of inuse memory with go.etcd.io/etcd/mvcc/mvccpb.(*KeyValue).Unmarshal
and go.etcd.io/etcd/clientv3.(*watchGrpcStream).dispatchEvent
. After simplifying the working model to the above code we find out the root cause is consume throughput of output channel is less then keyspace update frequent, which is the same thing as the first scenario! Since the buffer size of output channel is 1, KV changed events are accumulated in etcd client, and the notification from output channel is out of date, the subscribe client works well is just because the keyspace is keeping updated, but the notification is out of date already. The fix for this scenario is quite simple, we change the notification to a non-blocking way by adding a default branch in select.
1 | select { |
From etcd source code we can know each watcherStream
holds an unlimited WatchResponse
buffer, and the watched result is appended to this buffer no matter how large it is. So if the watched keyspace keeps changing, and the consume speed from the watch channel is slow, large amount of WatchResponse
objects will be accumulated in etcd client. This is the lesson we learn from this case, when using etcd watch, we should consume data from the watch response channel as soon as possible in order to keep the keyspace change notification up to date, and prevent unnecessary memory consumption.
Besides if a watch client just wants to subscribe to the keyspace change notification, but doesn’t care about the content of key and value, is there any OpOption
in etcd client library to ignore key or value. Unfortunately, there is no such option for watch API in golang etcd library currently.
Tips for using a lease Session
Lease is a mechanism for detecting client liveness in etcd. The etcd golang library defines an Lease
interface and provides a KeepAlive
API which will try to keep the given lease alive forever. The KeepAlive
API returns a *LeaseKeepAliveResponse
chan, we can read from this channel to know the latest TTL for given leaseID. The common way to create a new Lease and set keepalive for it is as follows.
1 | // ignore error handling |
Note the response channel size of KeepAlive
has a buffer size of 16, if the channel is not consumed promptly the channel may become full. After channel is full, the lease client will continue sending keepalive requests to the etcd server, but will drop responses until there is capacity on the channel to send more responses.
If we don’t want to maintain the keepalive for a lease manually, the etcd golang library also provides a powerful Leased Session encapsulation. When creating a session via NewSession
API, it grants a new Lease and setup KeepAlive
for the binding client automatically, more details can be found in the etcd source code.
Lease TTL is the most important option when we interact with Lease, it is necessary to know how keepalive works with TTL. The first question is how often will a client send the keepalive request to etcd server. It is easy to find in source code, the lessor uses a time barrier mechanism to determine when to send a keepalive message. For each time a keepalive response is received, the lessor sets the nextKeepAlive
time to 1/3
of keepalive TTL. After each round of keepalive request the sendKeepAliveLoop
will sleep 500ms before the next keepalive round.
There is another question, what will happen when network is not stable between etcd client and etcd server? In this scenario the keepalive request message could be blocked because there is no sufficient flow control to schedule messages with the gRPC transport, and the lease could be timeout, the timeout will be detected in the deadlineLoop
.
Lease TTL should be set carefully, according to the discussion in github, the minimum TTL needs to be larger than the election timeout of etcd server, which is 1s by default, and can be 3-5s usually. What’s more, etcd server has a minimum lease TTL restriction, any requests for shorter TTLs will be extended to the minimum TTL. The value of minimum lease TTL is based on election-timeout
of etcd server, and it equals to math.Ceil(3 * election-timeout / 2)
(ref). For example, if election-timeout
is 3s, the minimum lease TTL is math.Ceil(3 * 3 / 2) = 5
s.
Summary
The etcd source code has good readability, detail comments and full completed test cases, and I have also learned many use tips from the etcd source code. If you also use etcd golang library in your code, I strongly recommend reading the source code of etcd to know exactly how it works and what we need to pay attention to.