本人 github 地址
github 地址 里面有注释好的代码,下载下来可以方便阅读。
scan 命令
scan 命令场景主要是浏览redis 主键空间里面的键,当然还有keys 也有类似的效果
typedef struct redisDb {
// 主要的键值空间,所有的数据都会存在这个dict 里面
dict *dict; /* The keyspace for this DB */
// 键的过期时间,字典的键为键,字典的值为过期事件 UNIX 时间戳
dict *expires; /* Timeout of keys with a timeout set */
//用bl pop 命令会涉及到
dict *blocking_keys; /* Keys with clients waiting for data (BLPOP)*/
dict *ready_keys; /* Blocked keys that received a PUSH */
//跟watch 命令相关的key
dict *watched_keys; /* WATCHED keys for MULTI/EXEC CAS */
//database id
int id; /* Database ID */
long long avg_ttl; /* Average TTL, just for stats */
//当一个expire cycle没有处理完的时候
//diction 是一个entry数组
unsigned long expires_cursor; /* Cursor of the active expire cycle. */
list *defrag_later; /* List of key names to attempt to defrag one by one, gradually. */
} redisDb;
即通过scan 命令获取到dict 里面的键
命令构成: scan $cursor [MATCH $pattern] [COUNT $count] [TYPE $type]
cursor: 是个整型,就是一个游标,必要参数,一般从0开始扫描,通过返回得到的cursor 用于下一次的浏览。
pattern: 正则表达式,比如*,比如[] ,都是支持的,pattern 选项是可选
count: 表示遍历的长度,但是不一定10就返回10条数据,count 选项也是可选
type: 表示遍历到的key对应val的类型 . type应该说是一个枚举,type取值范围为[string,list,set,zset,hash,stream],type 也是一个可选项
示例: scan 0 match *test count 10 type string
意思是从cursor 0开始遍历 匹配类型为string, key 的后缀为test, 遍历深度为10(实际上10*10)后续我们看代码。
跟scan 命令相关的还有zscan,sscan,hscan,
zscan 是浏览 zset 类型的命令, 跟scan 稍微有不同的是,zscan 首先步骤是通过key找到对应的val,而这个val必须是zset结构的。 然后再开始就行遍历操作
sscan 是浏览set 类型的命令。
hscan 是浏览map类型的命令
scan 命令源码解析
首先我们看到scan 命令的说明
// 可以看到scan 是一个readonly, 带有随机性,跟键值空间有关的命令
//-2 代表最少需要两个关键字,负数代表还有其它可选项
"read-only random @keyspace",
/* The SCAN command completely relies on scanGenericCommand. */
void scanCommand(client *c) {
unsigned long cursor;
if (parseScanCursorOrReply(c,c->argv[1],&cursor) == C_ERR) return;
/* This command implements SCAN, HSCAN and SSCAN commands.
* If object 'o' is passed, then it must be a Hash, Set or Zset object, otherwise
* if 'o' is NULL the command will operate on the dictionary associated with
* the current database.
* When 'o' is not NULL the function assumes that the first argument in
* the client arguments vector is a key so it skips it before iterating
* in order to parse options.
* In the case of a Hash object the function returns both the field and value
* of every element on the Hash. */
void scanGenericCommand(client *c, robj *o, unsigned long cursor) {
int i, j;
//list 是一个双指针链表
list *keys = listCreate();
listNode *node, *nextnode;
long count = 10;
sds pat = NULL;
sds typename = NULL;
int patlen = 0, use_pattern = 0;
dict *ht;
/* Object must be NULL (to iterate keys names), or the type of the object
* must be Set, Sorted Set, or Hash. */
// 这里做一个判断 robj 必须下面几种类型中的一种
serverAssert(o == NULL || o->type == OBJ_SET || o->type == OBJ_HASH ||
o->type == OBJ_ZSET);
/* Set i to the first option argument. The previous one is the cursor. */
//scan 的所需参数是两个
//其它像hscan 也会走到这里
//hscan,sscan, zscan
i = (o == NULL) ? 2 : 3; /* Skip the key argument if needed. */
/* Step 1: Parse options. */
//如果i<argc的时候则不进入,比如 scan 1000,
while (i < c->argc) {
j = c->argc - i;
if (!strcasecmp(c->argv[i]->ptr, "count") && j >= 2) {
if (getLongFromObjectOrReply(c, c->argv[i+1], &count, NULL)
!= C_OK)
goto cleanup;
if (count < 1) {
goto cleanup;
//i+2 , 因为命令的形式会带一个标示再加具体的值
//如 count 2
i += 2;
else if (!strcasecmp(c->argv[i]->ptr, "match") && j >= 2) {
pat = c->argv[i+1]->ptr;
patlen = sdslen(pat);
/* The pattern always matches if it is exactly "*", so it is
* equivalent to disabling it. */
use_pattern = !(pat[0] == '*' && patlen == 1);
i += 2;
// 可以看到type 只限用于scan 命令
else if (!strcasecmp(c->argv[i]->ptr, "type") && o == NULL && j >= 2) {
/* SCAN for a particular type only applies to the db dict */
typename = c->argv[i+1]->ptr;
i+= 2;
} else {
goto cleanup;
/* Step 2: Iterate the collection.
* Note that if the object is encoded with a ziplist, intset, or any other
* representation that is not a hash table, we are sure that it is also
* composed of a small number of elements. So to avoid taking state we
* just return everything inside the object in a single call, setting the
* cursor to zero to signal the end of the iteration. */
/* Handle the case of a hash table. */
ht = NULL;
if (o == NULL) {
//scan 命令是走到这里
ht = c->db->dict;
} else if (o->type == OBJ_SET && o->encoding == OBJ_ENCODING_HT) {
ht = o->ptr;
} else if (o->type == OBJ_HASH && o->encoding == OBJ_ENCODING_HT) {
ht = o->ptr;
count *= 2; /* We return key / value for this type. */
} else if (o->type == OBJ_ZSET && o->encoding == OBJ_ENCODING_SKIPLIST) {
zset *zs = o->ptr;
ht = zs->dict;
count *= 2; /* We return key / value for this type. */
//ht 是dict 类型的会走到下面
if (ht) {
void *privdata[2];
/* We set the max number of iterations to ten times the specified
* COUNT, so if the hash table is in a pathological state (very
* sparsely populated) we avoid to block too much time at the cost
* of returning no or very few elements. */
// 最大迭代数等于count*10
// 默认count 等于10
long maxiterations = count*10;
/* We pass two pointers to the callback: the list to which it will
* add new elements, and the object containing the dictionary so that
* it is possible to fetch more data in a type-dependent way. */
//声明两个指针用于callback 方法
privdata[0] = keys;
privdata[1] = o;
do {
//cursor 可以看作是上次的坐标
cursor = dictScan(ht, cursor, scanCallback, NULL, privdata);
while (cursor &&
maxiterations-- &&
listLength(keys) < (unsigned long)count);
} else if (o->type == OBJ_SET) {
int pos = 0;
int64_t ll;
//set 在小的size的时候是用ziplist代替
cursor = 0;
} else if (o->type == OBJ_HASH || o->type == OBJ_ZSET) {
unsigned char *p = ziplistIndex(o->ptr,0);
unsigned char *vstr;
unsigned int vlen;
long long vll;
//zset 和 hash都会在小的size的时候用ziplist代替
while(p) {
(vstr != NULL) ? createStringObject((char*)vstr,vlen) :
p = ziplistNext(o->ptr,p);
cursor = 0;
} else {
serverPanic("Not handled encoding in SCAN.");
/* Step 3: Filter elements. */
node = listFirst(keys);
while (node) {
robj *kobj = listNodeValue(node);
//next 节点
nextnode = listNextNode(node);
int filter = 0;
/* Filter element if it does not match the pattern. */
if (!filter && use_pattern) {
//是否是sds encoding
if (sdsEncodedObject(kobj)) {
if (!stringmatchlen(pat, patlen, kobj->ptr, sdslen(kobj->ptr), 0))
filter = 1;
} else {
//当作int 类型处理
char buf[LONG_STR_SIZE];
int len;
serverAssert(kobj->encoding == OBJ_ENCODING_INT);
//将 数字转化为string
len = ll2string(buf,sizeof(buf),(long)kobj->ptr);
if (!stringmatchlen(pat, patlen, buf, len, 0)) filter = 1;
/* Filter an element if it isn't the type we want. */
//只有scan 命令才有typename的选项
if (!filter && o == NULL && typename){
//找到key 对应的val, look up nottouch的意思
robj* typecheck = lookupKeyReadWithFlags(c->db, kobj, LOOKUP_NOTOUCH);
char* type = getObjectTypeName(typecheck);
//strcasecmp 是比大小
//则不会对filter 赋值
if (strcasecmp((char*) typename, type)) filter = 1;
/* Filter element if it is an expired key. */
if (!filter && o == NULL && expireIfNeeded(c->db, kobj)) filter = 1;
/* Remove the element and its associted value if needed. */
if (filter) {
listDelNode(keys, node);
/* If this is a hash or a sorted set, we have a flat list of
* key-value elements, so if this element was filtered, remove the
* value, or skip it if it was not filtered: we only match keys. */
if (o && (o->type == OBJ_ZSET || o->type == OBJ_HASH)) {
node = nextnode;
nextnode = listNextNode(node);
if (filter) {
kobj = listNodeValue(node);
listDelNode(keys, node);
node = nextnode;
/* Step 4: Reply to the client. */
addReplyArrayLen(c, 2);
addReplyArrayLen(c, listLength(keys));
while ((node = listFirst(keys)) != NULL) {
robj *kobj = listNodeValue(node);
addReplyBulk(c, kobj);
listDelNode(keys, node);
上面是整个scan 命令如何执行整个流程,可以看到迭代次数是count的10倍,这是因为,遍历的时候会遍历到空位,但也有可能某个entry 链上面有多个key(hash冲突),所以count等于迭代次数*10, 而不等于返回的个数
还有在6.0目前已经支持具体的type 类型.
上面的公共方法我们也可以看到hscan,zscan,sscan,命令也会走到这里,当他们遍历的结构还是被压缩为ziplist 类型的时候,是直接返回全部的元素的
除了遍历字典结构,cursor才会返回非零值,那么cursor 到底是指的什么,我们看下段代码
/* dictScan() is used to iterate over the elements of a dictionary.
* Iterating works the following way:
* 初始的时候坐标是0
* 1) Initially you call the function using a cursor (v) value of 0.
* //返回的坐标用于下一次call
* 2) The function performs one step of the iteration, and returns the
* new cursor value you must use in the next call.
* //返回0 表示遍历结束
* 3) When the returned cursor is 0, the iteration is complete.
* //这个方法保证每一个元素都能return 到客户端,
* //尽管有些方法会返回多次
* The function guarantees all elements present in the
* dictionary get returned between the start and end of the iteration.
* However it is possible some elements get returned multiple times.
* //每个元素fetch到后会调用一个callback 方法,第一个参数
* //privdata第二个参数为fetch到的entry de
* For every element returned, the callback argument 'fn' is
* called with 'privdata' as first argument and the dictionary entry
* 'de' as second argument.
* 现在开始讲这个算法
* 意思是从高位反转后开始自增
* 而不是单纯的对curson进行自增
* 高位自增后再反转
* The iteration algorithm was designed by Pieter Noordhuis.
* The main idea is to increment a cursor starting from the higher order
* bits. That is, instead of incrementing the cursor normally, the bits
* of the cursor are reversed, then the cursor is incremented, and finally
* the bits are reversed again.
* 这个策略实施是因为在迭代过程中
* hashtable可能会扩容。
* This strategy is needed because the hash table may be resized between
* iteration calls.
* hash table的长度是2的次方,计算element的位置是用hash(key) and size-1
* 而这个结果正好就是hask(key) 对 size 取模的结果
* dict.c hash tables are always power of two in size, and they
* use chaining, so the position of an element in a given table is given
* by computing the bitwise AND between Hash(key) and SIZE-1
* (where SIZE-1 is always the mask that is equivalent to taking the rest
* of the division between the Hash of the key and SIZE).
* 下面举出了一个例子,用上面的方法,影响的结果就是hash(key)与mask
* 相同位数上面的数字,比mask更高的位都是0
* For example if the current hash table size is 16, the mask is
* (in binary) 1111. The position of a key in the hash table will always be
* the last four bits of the hash output, and so forth.
* 下面举出了一个例子当iterator,遍历到了1100,在size只有16的情况下
* mask 为1111,
* If the hash table grows, elements can go anywhere in one multiple of
* the old bucket: for example let's say we already iterated with
* a 4 bit cursor 1100 (the mask is 1111 because hash table size = 16).
* 如果hash table 扩展到64,new masks是111111,
* 那么key可能到的新的bucket是??1100
* 所以我们需要遍历的bucket 变成了
* 001100,101100,011100,111100.
* If the hash table will be resized to 64 elements, then the new mask will
* be 111111. The new buckets you obtain by substituting in ??1100
* with either 0 or 1 can be targeted only by keys we already visited
* when scanning the bucket 1100 in the smaller hash table.
* 通常算法是 0000|0000 到0000|0001到 0000|0010,(|右边表示参与运算的)
* 而在这里我们需要0000|0000,到 0000|1000 到 0000|0100
* 这样的好处是什么当扩容的时候高位变成第6位
* 如果现在到了00000100,从高位加+1那么就变成00|100100,可以看到扩容后高位bucket遍历
* 就不会被丢失了
* By iterating the higher bits first, because of the inverted counter, the
* cursor does not need to restariteratorst if the table size gets bigger. It will
* continue iterating using cursors without '1100' at the end, and also
* without any other combination of the final 4 bits already explored.
* 这里并没有讲到这个算法的原理,但是总结出来
* 就是尽量避免扩容或者收容后已经访问的坑位
* Similarly when the table size shrinks over time, for example going from
* 16 to 8, if a combination of the lower three bits (the mask for size 8
* is 111) were already completely explored, it would not be visited again
* because we are sure we tried, for example, both 0111 and 1111 (all the
* variations of the higher bit) so we don't need to test it again.
* 如果上面懂了的话,看下这段注释就很清晰
* 首先要明白rehash 干了什么事。
* 下面这段话就是不管现在是扩容和收容都已
* 长度更大的来做自增,这样不管现在是旧entry有数据
* 或者新的entry有数据都能访问到
* Yes, this is true, but we always iterate the smaller table first, then
* we test all the expansions of the current cursor into the larger
* table. For example if the current cursor is 101 and we also have a
* larger table of size 16, we also test (0)101 and (1)101 inside the larger
* table. This reduces the problem back to having only one table, where
* the larger one, if it exists, is just an expansion of the smaller one.
* This iterator is completely stateless, and this is a huge advantage,
* including no additional memory used.
* The disadvantages resulting from this design are:
* 这种设计方式可能会有重复元素,为什么会有重复元素,因为缩容后会合并bucket,这样就会
* 访问到重复的key。
* 1) It is possible we return elements more than once. However this is usually
* easy to deal with in the application level.
* 然后每次都会访问对应bucket chain上的所有元素,已保证扩容后数据不会丢失
* 2) The iterator must return multiple elements per call, as it needs to always
* return all the keys chained in a given bucket, and all the expansions, so
* we are sure we don't miss keys moving during rehashing.
* // 意思这个反转cursor 不太好懂,但是经过上面注释应该会很好理解了
* 3) The reverse cursor is somewhat hard to understand at first, but this
* comment is supposed to help.
unsigned long dictScan(dict *d,
unsigned long v,
dictScanFunction *fn,
dictScanBucketFunction* bucketfn,
void *privdata)
dictht *t0, *t1;
const dictEntry *de, *next;
unsigned long m0, m1;
if (dictSize(d) == 0) return 0;
/* Having a safe iterator means no rehashing can happen, see _dictRehashStep.
* This is needed in case the scan callback tries to do dictFind or alike. */
// 当进入这个命令的时候
// 迭代+1
// 这个时候不能做rehash
//如果没有在rehash 则进入下面流程
if (!dictIsRehashing(d)) {
//t0 是hashtable
t0 = &(d->ht[0]);
//sizemask 是size-1
m0 = t0->sizemask;
/* Emit entries at cursor */
//bucketfn 可以为空
// 这个bucketfn 可以看作调用bucket的回调函数
if (bucketfn) bucketfn(privdata, &t0->table[v & m0]);
de = t0->table[v & m0];
while (de) {
next = de->next;
// 这里调用回调函数,
// 将参数放入一个list,用于返回
fn(privdata, de);
de = next;
/* Set unmasked bits so incrementing the reversed cursor
* operates on the masked bits */
//比如开始sizemask 为00000111, v=0;
//为什么会得到这样的结果了,首先sizemask是这种结构:00000111 取反永远就是1111000
//所以v|= v |= ~m0 就变成11111xyz (x,y,z 表示v在这个位上面0或者1)
//然后+1 就变成了zy(x+1)00000
//然后这个时候再转过来就变成00000(x+1)yz(注:x+1 如果有进位那么就是按照->这个方向,因为之前是取反的时候+1)
v |= ~m0;
v = rev(v);
v = rev(v);
} else {
t0 = &d->ht[0];
t1 = &d->ht[1];
/* Make sure t0 is the smaller and t1 is the bigger table */
if (t0->size > t1->size) {
t0 = &d->ht[1];
t1 = &d->ht[0];
m0 = t0->sizemask;
m1 = t1->sizemask;
/* Emit entries at cursor */
if (bucketfn) bucketfn(privdata, &t0->table[v & m0]);
de = t0->table[v & m0];
while (de) {
next = de->next;
fn(privdata, de);
de = next;
/* Iterate over indices in larger table that are the expansion
* of the index pointed to by the cursor in the smaller table */
do {
/* Emit entries at cursor */
if (bucketfn) bucketfn(privdata, &t1->table[v & m1]);
de = t1->table[v & m1];
while (de) {
next = de->next;
fn(privdata, de);
de = next;
/* Increment the reverse cursor not covered by the smaller mask.*/
v |= ~m1;
v = rev(v);
v = rev(v);
/* Continue while bits covered by mask difference is non-zero */
//这样的结果就是让高位的 都能遍历到比如0100,扩容后100100,010100,和110100都能访问
} while (v & (m0 ^ m1));
/* undo the ++ at the top */
//每次迭代完之后--, set和get 可以继续做渐进式hash
return v;
链接: redis系列,给你看最完整的字典讲解.
首先我分析扩容,我们先拿一个dict size为4的情况举例, 其实size 二进制表示为0100(省略了一些零),sizemask为0011,其坑位为0000,0001,0010,0011,
现在0000遍历完了,返回了cursor 0010到了客户端,(第一次请求)
客户端发起了第二次请求从 0010开始遍历。
这个发现dict 发生了扩容,现在size变成了1000,sizemask变为了0111
那么再后续不再扩容的情况下,还需要遍历,0010, 0110,0001,0101,0011,0111
可以看到在redis 这种规则下面,扩容后的0100,没有再遍历,这种有什么好处了,
如果是正序遍历的话,其扩容后的高位还会继续遍历一遍,在dict size比较长的情况下,那么就会重复遍历很多重复的元素。
这次我们继续拿dict size为4的情况举例,第一次我们遍历了0000 这个坑位,然后返回0010给客户端
可以看到rehash的情况,通过选择size更大的ht来做为遍历的自增,这样无论是收容状态的rehash还是扩容状态的rehash 都不会出现数据丢失的状况。
/* Function to reverse bits. Algorithm from:
* http://graphics.stanford.edu/~seander/bithacks.html#ReverseParallel */
static unsigned long rev(unsigned long v) {
unsigned long s = CHAR_BIT * sizeof(v); // bit size; must be power of 2
unsigned long mask = ~0UL;
while ((s >>= 1) > 0) {
mask ^= (mask << s);
v = ((v >> s) & mask) | ((v << s) & ~mask);
return v;
我们用一个byte 来推算一下,比如现在v 为 10001000,倒置后的结果应该为00010001
未进入循环前,s=8, mask = 11111111,v =10001000
第一次循环后 s=4, mask=00001111, v=10001000, 前4位和后4位进行了置换
第二次循环后s=2,mask=00110011, v=00100010, 每两位进行位置互换,实际的做法就是两两交叉,这里需要自行脑补一下
scan 命令最佳实践
看了源码我们可以知道scan 其实也是一个挺耗时的命令,时间复杂度为o(k) , k的大小取决于count, 从整个scan 过程来看,它需要遍历整个键值空间。 且无法保证数据的一致性,比如再你进行scan的过程中,会不断有新的数据,新的数据也有可能被访问到,或者缩容的时候会遍历重复的key. 且scan 命令是不能对整个redis-cluster 遍历,当然作者也不会这么去做,
所以最佳实践方案是我们尽可能用hset命令代替set,尽量把同类的集合放到一个字典里面去,比如 车的类型和车的属性需要对应起来,同时可以有不同类型的车,通常我们会以key:xxx.car.%s , %s 为车的类型,value:车的属性 ,
我们可以把这种set key:xxx.car.toyoto economical ,变为hset key.xxx.car toyoto economical,
hscan key.xxx.car 0 count
hscan 也是支持redis cluster的。而且将同类型的键放入不同dic里面,也能减轻主键空间的负担,但是有一点得注意,也不是所有的键都可以用hset,因为hset 无法对其dic里面的某一项单独做过期,所以再设计的时候也要考虑这一方面。
二, 再必须用到scan 命令的时候count 的值最好不要太大,count越大执行时间越久,会造成其它命令的延时,尤其一些现场环境,建议值不超过100。另外像一些redis 查看工具,也最好不要在现场使用,一般大多数都是以scan 命令去做浏览,非常耗时。
这章我们学习了scan 命令的用法,大家如果觉得还满意请关注三连。下一章应该是会写到字符串是如何压缩以及压缩的条件