Thinking about etcd lease
This article will first talk about a wrong use case with etcd lease, and then based on that case, I will dig into the design principle of etcd lease and some use scenario with etcd lease.
A wrong use case with etcd lease
There exist two roles in the following scenario, one is client and the other one is coordinator. More than one clients could exist at the same time. Each of them has an unique ID, and when a new client starts it applies a new lease from etcd and puts a key corresponding to the client ID with lease as option. The client must call KeepAlive
of its lease periodically to keep lease not timeout. Once the lease is timeout and deleted by etcd server, the client becomes illegal and should not access the etcd resource anymore. The coordinator monits the client ID key, when a new client registers, it allocates new resource/task to this client, which can be represented by a client ID relevant key value. And when it detects client ID key deleted (which means the client lease timeout), it will recycle the allocated resource of this client.
Here we use a etcd session to maintain client lease and lease keepalive, a simple work model of client is as follows
1 | session, err := concurrency.NewSession(etcdCli, concurrency.WithTTL(10)) |
In the above code, session.Done
is used for lease aliveness check. However this is not a strict aliveness protection for resource access for two reasons.
After client receives a new update from watch chan, during the process procedure of resource update, the client’s lease could be timeout, which means the client could still access the resource after it is illegal.
The
session.Done
for a lease is not triggered in real-time, which means when the lease is timeout and revoked by etcd server, thesession.Done
channel may not be fired immediately. This is becausesession.Done
is only notified after the etcd client establish a new keepalive request, there could be a time window as long as 1/3 of session ttl thatsession.Done
is not notified.
The goal of aliveness guarantee for resource access can be achieved by using etcd Txn
simply. As there exists a key bounded with client lease, the client can make use of this key to guarantee lease is timeout during other etcd key value operations. The main logic can be as follows.
1 | ErrLeaseTimeout := errors.New("lease associated key is deleted") |
Dig into the implement principle of etcd lease
In this part I will talk about the implement principle of etcd lease based on code in tag-3.4.14. Basically, each etcd server runs a lease manager which implements the Lessor interface. Most of the lease management is via raft to keep lease information consistent among multiple etcd servers. Take the lease grant operation as an example. When a LeaseGrantRequest
is received by etcd server, the gRPC request will be processed in LeaseGrant of a lease server and return LeaseRevokeResponse
after processing. When processing the LeaseGrantRequest, it will be passed to LeaseGrant function of EtcdServer/Lessor to trigger an internal raft request. Then raft message will be applied via the internal raft mechanism to all servers. When applying the LeaseGrant
message in each etcd server, The Grant function of a Lessor
will be finally called.
The main event loop of a Leasor
contains two periodic jobs, revokeExpiredLeases
and checkpointScheduledLeases
, both of them run every 500ms.
- revokeExpiredLeases finds all leases past their expiry and sends them to an expired channel for revoking, the channel is consumed in etcd server’s main loop. Each lease is associated with a LeaseItem and all lease items are stored in a min heap, the heap item is sorted by the expiration time of lease. When I was reading the code about iterating the expiration heap, I found an interesting code snippet, each time the lessor pops an expired item from the heap, it will put back a new lease item with the same lease ID but adding an
expiredLeaseRetryInterval
to the expired time. This is a patched logic to fix a bug that if the receiver of expired channel does not revoke lease successfully, the lease will be never revoked because it can’t be retrieved from lease expiration heap anymore. More details can be found in this PR. - checkpointScheduledLeases was introduced since etcd 3.4 in this PR, this PR has described the requirement and mechanism of
lease checkpointing
detailedly. It is designed for the scenario that one etcd leader is transfered, the new leader will rebuild lease information and inherit the remaining ttl of existing leases instead of auto-renew to their full TTL.
Precision of etd lease
In short, the precision of etcd lease is second level, which is reflected in two aspects:
- When a lease is granted, the time unit for ttl is second. Besides there exists a minimum ttl mechanism in etcd.
- Since etcd server uses a lazy way to determine which lease is timeout, instead of some more precise notification mechanism, it adds a latency for lease timeout. This means when we grant a new lease with TTL = N second, and don’t send any keepalive request for this lease, the time window that this lease will be revoked in etcd server is about [N, N + delta second], where delta is generally 0.5, but considering some time cost of other logic, the delta could be more than 0.5. Taking a sample code as example, this code snippet grants a new lease with TTL=5s every 50ms, 20 leases totally. For each lease attaches a key on it and sends a keepalive request to etcd server to refresh lease. Then watches for the key delete operation and records the duration for each lease timeout. From the testing result, the duration of lease revoked is between [5s, 5.6s], which is as expected.
What’s more, etcd server has a hard code limit when revoking lease, each round of expired lease revoking, at most 500 leases can be revoked. This can be easily verified by the code snippet. In this scenario the lease expiry duration will have more latency, a test result is as follows:
duration(s) | 5 | 5.1 | 5.2 | 5.3 | 5.4 | 5.5 | 5.7 | 5.8 | 6 | 6.1 | 6.4 | 6.5 | 6.8 | 6.9 | 7.2 | 7.3 | 7.7 | 8.1 | 8.2 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
lease count | 23 | 2 | 40 | 147 | 346 | 29 | 470 | 30 | 125 | 375 | 1 | 499 | 21 | 479 | 1 | 499 | 500 | 265 | 148 |
In most cases, making a large amount of keys expire at the same time is not a good design. And when we use etcd lease, we must be aware of the lazy expiration mechanism.
Tolerance with clock drift
Operating systems provide both a “wall clock” which is subject to changes for clock synchronization, and a “monotonic clock” which is not. The general rule is that the wall clock is for telling time and the monotonic clock is for measuring time. Is the etcd lease reliable if the system’s wall clock is updated by NTP service? The answer is yes, both in the etcd server side and etcd client side, the lease implementation is reliable because monotonic clock is used. Since Go 1.9 builtin monotonic time library is provided, etcd makes use of this feature to ensure the safety of time comparison.
- For the server side, both the expiry time setter of a lease and expired checker are using monotonic time.
- For the client side, it uses Time.Before() API to check whether a keepalive request should be sent, which is also clock drift tolerable.
Summary
Etcd lease is powerful but has some restrictions, it is better to know the underlying principle of etcd lease, which will help to use it correctly and reasonably.