Latency is one of the most important metrics in system performance, different systems have various latency requirements, such as the read latency of a relation database maybe less than 50ms, the GC latency of a programming language should be less than 10ms or 1ms, while the latency requirement of two micro services under the same data center could be less than 0.2ms. It is not always latency sensitive in every single part of a system, but as a matter of fact there do exist many components that are latency sensitive and we must be very careful when we design and implement these components or systems.

A lot of articles have talked about system latency, from both the high level, macroscopic perspective, such as the latency of a complex architecture; the latency from systemic interaction such as http API invocation, database read and write, cache access; the latency of operation in programming language such as memory allocation or function call. And the low level, or the underlying system, such as the latency in memory access, IO access, TCP packet transmit, etc. The following latency table is from book Systems Performance, and the project napking-math also provides a table about latency numbers.

Read more »

Recently I met a context deadline exceeded error when using gRPC client to call DialContext to a gRPC server, which was in a customer’s internal environment. After some investigation I found out the root cause, the gRPC client was behind an HTTP proxy and the proxy has no permission to access the gRPC server. It is a common trouble shooting, however I am interested in how we can force a program to use http or socks proxy, I will dive into it and compare the pros and cons among different solutions.

Ways to enable a proxy

There exist many ways to let an application use a proxy, these ways can be classified into three types as follows, and I will talk about these methods briefly.

  • Explicit environment variable to active transport feature which is builtin in the application.
  • Use a hook method to hijack network calls from the application, without changing the program code.
  • Use a network packet hijacking way such as the kernel netfilter module to inspect and modify network packets.
Read more »

It is well known in golang that when allocating storage for a new variable and no explicit initialization is provided, the variable is given to a default value, either zero value for primitive types or nil for nullable types such as pointers, functions, maps etc. More details can be found in go sepc. Some of the nullable types can cause panic when accessing them without initialization. Some examples are as follows.

Read more »

I have read the first part of book Streaming Systems recently and learned quite a lot of high level concepts and many useful tips about streaming systems. Many topics that this book discusses are very instructive to the TiCDC project that I was focusing on in the past year, which is a change data capture system that supports to replicate change data from a distributed NewSQL database to various downstreams. In this article I will talk about some core concepts that a streaming system should pay attention to, comparing the opinion that the book Streaming Systems expresses, to some practical experience from the CDC project I participated.

Concepts in change data capture system

The book Stream Systems introduces many concepts to describe a streaming system from different dimensions. I will map these concepts to the change data capture system roughly.

Read more »

This article will first talk about a wrong use case with etcd lease, and then based on that case, I will dig into the design principle of etcd lease and some use scenario with etcd lease.

A wrong use case with etcd lease

There exist two roles in the following scenario, one is client and the other one is coordinator. More than one clients could exist at the same time. Each of them has an unique ID, and when a new client starts it applies a new lease from etcd and puts a key corresponding to the client ID with lease as option. The client must call KeepAlive of its lease periodically to keep lease not timeout. Once the lease is timeout and deleted by etcd server, the client becomes illegal and should not access the etcd resource anymore. The coordinator monits the client ID key, when a new client registers, it allocates new resource/task to this client, which can be represented by a client ID relevant key value. And when it detects client ID key deleted (which means the client lease timeout), it will recycle the allocated resource of this client.

Here we use a etcd session to maintain client lease and lease keepalive, a simple work model of client is as follows

Read more »

Etcd is a distributed reliable key-value store written by golang and native golang client binding is provided in its official repository, which makes it very convenient as well as robust to write golang code to communicate with etcd server. Here robust means the etcd client should guarantee correctness and high availability under faulty conditions, more details can be found in etcd official client design document. This article summarizes some best practises with etcd client library from the lessons I learned from the production environment. To explain some implementation mechanisms I will also link to etcd source code (mainly based on branch release-3.4).

Consume data from the watch response channel ASAP

Watch works with a bi-directional gRPC stream between watch client and etcd server, the most common way to use watch is as follows, receive keyspace changed result from the watch channel and consume these KV events one by one.

Read more »

本文会介绍如何使用 go 编写 redis loadable modules,并分析编写模块和使用cgo可能遇到的坑。

什么是redis loadable modules

可加载模块是redis最新加入的功能,目前需要在unstable分支才可以使用。简单说模块系统是redis的C代码暴露出一些API,定义在头文件redismodule.h中,外部模块引用该头文件即可访问所有的API函数,这些API提供了包括访问redis的字典空间、调用redis命令、向客户端返回数据等诸多功能。外部模块是以动态库的形式被redis server加载并使用,可以在redis server启动时加载,也可以在启动后动态加载。更多的细节可以参考文档redis module INTRO

在此之前想对redis扩展有两种方案:一是利用lua脚本;另一种则需要修改redis源码,类似于Kit of Redis Module Tools提供的方案。lua脚本的扩展性有限,并且lua是通过redis的上层API来调用redis命令的,无法直接访问底层的存储数据,调用redis更底层的API;修改源码的方案就更加hack,是没有办法不断与上游分支合并的。

Read more »

最近使用了 rq 这个简单的队列处理库,其中有一些任务需要使用MySQL的连接或者redis的连接,对此有一些思考。


rq 提供了两种 worker 模型:基于 fork 的 worker 模型和直接在主线程执行任务的 worker 模型。基于 fork 的 worker 在执行任务之前先 fork 一个子进程,在子进程中执行具体的任务,父进程等待子进程执行返回。在基于 fork 的 worker 模型下,如果在父进程有一个 MySQL/redis 连接,由于子进程会继承父进程的地址空间,具有相同的打开文件、socket、管道等,所以子进程中也有同样的 MySQL/redis 连接,那么在这种情况下这个连接可以直接使用么?通过以下代码简单测试一下,连接 MySQL 使用 torndb ,连接 redis 使用 redis-py

Read more »

有时候因为网络、安全等原因,我们不能通过 ssh 直接连接到目标主机,而是需要通过代理服务器或跳板机实现连接。本文总结通过代理或跳板机使用 ssh 的各种方法,并且分析这些方法的基本原理。

我们设定本地主机的地址为 homepc,绑定有公网 ip;运行有各类代理的代理服务器或跳板机地址为 proxy-server,proxy-server 上绑定一个公网 ip,同时绑定一个内网 ip(假定为10.0.10.252);需要连接的目标主机 target-server,绑定内网 ip(假定为。所有的用户名、登录用户名使用 apple。

Read more »

本文是上一篇文章 python 拾遗 的延续,继续整理 python 的一些使用技巧,以及一些可能被忽略的细节

注意: 以下讨论主要为 Python2.7 版本, Python 3 的内容有待跟进

Read more »