Bootstrap

C#的Dictionary实现原理分析

1:实现中的参数

//哈希映射后首个元素在entries中的Index
int[] buckets;
//存储Key,Value的数组,C#的dictionary是用数组来组织数据的
Dictionary<TKey, TValue>.Entry[] entries;
private struct Entry
{
public int hashCode;
public int next;
public TKey key;
public TValue value;
}
//当前数据的数量
int count;
//version主要是控制dictionary的version信息,比如你在遍历数组的过程中,删了一个数据,version改变,遍历就可能出问题,下面不针对这个参数做分析。
int version;
//删除会造成数组内部有空隙,这些空隙会链成一个链表一样的结构,freeList对应entries中一个空的Index,并且能往后查找到所有空隙
int freeList;
//数组内部的空隙元素数量
int freeCount;

2:实例分析

主要对增加,删除以及查询做分析

举个例子:
增加
对一些人的Id和名字做存储,比如有以下5人,(1000, “Ally”), (1005, “Green”), (1010, “Curry”), (1011, “Tompson”), (1014, “Wiseman”)
Dictionary<int, strint> dict = new Dictionary<int, string>();
dict.Add(1000, “Ally”);
dict.Add(1005, “Green”);
dict.Add(1010, “Curry”);
dict.Add(1011, “Tompson”);
dict.Add(1014, “Wiseman”);

注意:这边为了分析方便,假设初始化之后entries和buckets的Length是5,(这两个数组长度是一样的,但初始长度不一定为5)

  1. 执行dict.Add(1000,
    “Ally”)之后,根据Key=1000获取到key的hashCode1000(程序中有个hashHelper,这里假设hashCode和key相同)
    entries=> [{hashCode = 1000, next = -1, key = 1000, value = “Ally”},
    {}, {}, {}, {}] 1000%5 = 0;(5是buckets的Length) 所以buckets=>[0, -1, -1,
    -1, -1], 意思hashCode余数为0首元素在entries中的Index为0.
  2. 执行dict.Add(1005, “Green”);根据key=1005获取到key的hashCode1005 1005%5 =5;
    所以buckets=>[1, -1, -1, -1, -1],将当前元素置为余数为0的首元素 entries=> [{hashCode
    = 1000, next = -1, key = 1000, value = “Ally”}, {hashCode = 1005, next = 0, key = 1005, value = “Green”}, {}, {}, {}]
    注意1005这个next对应到第一个元素
  3. 执行dict.Add(1010, “Curry”);根据key=1010获取到key对应的hashCode1010; 1010%5
    =0; 所以buckets=>[2, -1, -1, -1, -1],将当前元素置为余数为0的首元素 entries=> [{hashCode = 1000, next = -1, key = 1000, value = “Ally”}, {hashCode
    = 1005, next = 0, key = 1005, value = “Green”}, {hashCode = 1010, next = 1, key = 1010, value = “Curry”}, {}, {}]
    注意到这三个元素的hashCode余数均为0,通过next将三个元素串在了一起,buckets[0]则表示首个元素对应的Index.
  4. 执行dict.Add(1011, “Tompson”); 1011%5 =1; 所以entries=> [{hashCode =
    1000, next = -1, key = 1000, value = “Ally”}, {hashCode = 1005, next
    = 0, key = 1005, value = “Green”}, {hashCode = 1010, next = 1, key = 1010, value = “Curry”}, {hashCode = 1011, next = -1, key = 1011,
    value = “Tompson”}, {}] buckets=>[2, 3, -1, -1, -1]
  5. 执行dict.Add(1014, “Wiseman”); 1014%5 = 4; 所以entries=> [{hashCode =
    1000, next = -1, key = 1000, value = “Ally”}, {hashCode = 1005, next
    = 0, key = 1005, value = “Green”}, {hashCode = 1010, next = 1, key = 1010, value = “Curry”}, {hashCode = 1011, next = -1, key = 1011,
    value = “Tompson”}, {hashCode = 1014, next = -1, key = 1014, value
    = “Wiseman”}] buckets=>[2, 3, -1, -1, 4]

查询
bool b1 = dict.Containes(1010);
bool b2 = dict.Containes(1040);

  1. 执行bool b1 = dict.Containes(1010); 根据key=1010获取到hashCode1010,
    1010%5=0; 这时候去取buckets[0], 值为2; 检查entries[2]中元素, 该元素为{hashCode =
    1010, next = 1, key = 1010, value = “Curry”},发现是该值,那么返回true
  2. 执行bool b2 = dict.Containes(1040); 根据key=1040获取到hashCode1040,
    1040%5=0; 这时候去取buckets[0], 值为2; 检查entries[2]中元素, 该元素为{hashCode =
    1010, next = 1, key = 1010, value = “Curry”},发现不是,但该元素next为1,
    检查entries[1]中元素,该元素为{hashCode = 1005, next = 0, key = 1005, value =
    “Green”},发现不是,但该元素next为0, 检查entries[0]中元素,该元素为{hashCode = 1000, next
    = -1, key = 1000, value = “Ally”},发现不是,该元素next为-1,说明查找不到该元素,返回false;

删除
dict.Remove(1005);
dict.Remove(1010);

  1. 执行dict.Remove(1005); 根据key=1005获取到hashCode1005, 1005%5=0;
    这时候去取buckets[0], 值为2; 初始化一个Index = -1; 检查entries[2]中元素,
    该元素为{hashCode = 1010, next = 1, key = 1010, value =
    “Curry”},发现不是,但该元素next为1,设置Index = 2; 检查entries[1]中元素,该元素为{hashCode
    = 1005, next = 0, key = 1005, value = “Green”}, 发现是该元素,该元素next为0,所以设置entries[Index]也就是entries[2].next = 0;
    并且初始化该元素,操作之后 entries=> [{hashCode = 1000, next = -1, key = 1000,
    value = “Ally”}, {hashCode = -1, next = -1, key = 0, value = “”},
    {hashCode = 1010, next = 0, key = 1010, value = “Curry”}, {hashCode
    = 1011, next = -1, key = 1011, value = “Tompson”}, {hashCode = 1014, next = -1, key = 1014, value = “Wiseman”}] buckets=>[2, 3, -1,
    -1, 4] 注意这时候freeList起作用:freeList = 1,也就是空隙元素的Index.freeCount = 1;
  2. 执行操作dict.Remove(1010); 根据key=1010获取到hashCode1010, 1010%5=0;
    这时候去取buckets[0], 值为2; 初始化一个Index = -1; 检查entries[2]中元素,
    该元素为{hashCode = 1010, next = 1, key = 1010, value =
    “Curry”},发现就是该元素,该元素next为1 Index = -1,
    说明该元素是第一个,直接将它的next修改到buckets中 执行操作之后 entries=> [{hashCode = 1000,
    next = -1, key = 1000, value = “Ally”}, {hashCode = -1, next = -1,
    key = 0, value = “”}, {hashCode = -1, next = -1, key = -1, value =
    “”}, {hashCode = 1011, next = -1, key = 1011, value = “Tompson”},
    {hashCode = 1014, next = -1, key = 1014, value = “Wiseman”}]
    buckets=>[0, 3, -1, -1, 4] freeList=2,freeCount =
    2;我们可以发现空隙元素也变成了一个链表结构,首个Index=2, Index=2的元素的next为1,将这些空隙串联了起来。

增查删的操作基本如上,C#的dictionary是基于hash表来操作,对应hash之后重复的情况通过链表串联起来,使用时从头往后查询。删除造成的数组元素空隙也通过链表串联起来,重复利用。
(感觉应该是没有红黑树这种结构效率高,因为元素满了之后扩容很慢,复制数组,内部元素重新处理,查询时如果hash重复多的情况都比较慢).

;