M-LAG和E-trunk都是用来实现跨设备链路聚合,解决单点故障的,其大部分特性相同,工作模式M-LAG更胜一筹,支持双活,而且其原理感觉像是vrrp+mstp的升级版,是往增加网络可靠性去发展的;而E-trunk是基于LACP扩展实现,感觉主要就是为了实现跨设备链路聚合而创造的链路聚合拓展协议,然后再进行扩展得到的,个人感觉M-LAG更灵活便捷
M-LAG
M-LAG跨设备链路聚合是一种靠可用技术,其相较于堆叠有更高的灵活性
DRCP
Distributed Relay Control Protocol,分布式聚合控制协议
该协议跑在peer-link上,其目的是为完成M-LAG组成员间的信息交互,当本端接口超时时仍未收到对端的DRCP报文,则认为peer-link为down状态
DRCP超时时间可选短超时(3s),此时发送周期1s,也可选长超时(90s),此时发送周期30s
M-LAG接口
即M-LAG的下行接口,一般有单归接入和双归接入
单归接入,即一台设备接入单台M-LAG系统设备,也叫单挂,单挂的设备在M-LAG系统中,其Mac地址表,ARP表等都会进行备份,依次给其留下备份路径提高可靠性
双归接入,一台设备接入两台设备,上联的这两台设备可实现负载分担,故障可快速回切
keepalive
用于检测keepalive-link的活动状态,周期性发送keepalive报文,收到对端认为keepalive-link为up,反之为down
peer-link
此链路为UP,表明M-LAG正常工作,如果此时keepalive-link为down,系统正常工作,但会打印日志通知管理员检测
此链路为Down,则使用keepalive-link收到的报文进行选举主设备,保障系统能正常转发数据
当peer-link为Down时,认为对端设备挂了,启动keepalive timeout计时器,计时器超时后有以下情况:
设备为主设备,有M-LAG接口为UP,设备为主,否则为None
设备为从设备,升主,之后有M-LAG接口为UP,设备保持为主,否则切为None
设备为None,设备不能收发keepalive报文,keepalive-link处于down状态
角色计算
说了这么多,其实就这一张图,和堆叠类似
- 比较M-LAG接口状态,有可工作的优先
- 先前状态
- 不在MAD Down状态的优先
- 设备健康值,越小越优
- 比较优先级,越大越优先,
- 比较MAC,越小越优先
防环机制
设备A为非M-LAG接口,怎么走?
假如主设备为E,A发包给D,D发给E,丢包???
本地转发优先
接收流量的设备存在表项,只在此设备上发送不经peer-link转发
下图,设备B链路负载分担,怎么走?
好根据上述原则,假设D有,DE是负载分担,那我有一半流量要绕peer-link,这就是缺陷
环路???
单向转发避免
从peer-link收到的消息不向M-LAG接口发送
MAD检测
为防止设备peer-link链路故障后设备间重新选举将流量误导,配置MAD Down之后,当故障发生时会关闭从设备除了管理员指定的所有接口,设备进入MAD Down状态
此时如果keepalive-link故障,则从设备晋升为主,网络中存在两台主设备,可能引发二次故障,可以开启MAD Down保持状态进行干预
一致性检测
为确保两端设备匹配,不影响相关报文转发对相关数据进行一致性检验,目前一般两种:
Type 1类型配置:影响M-LAG系统转发的配置,如果Type 1类型配置不匹配,则将从设备上M-LAG接口置为down状态,比如vlan的配置
Type 2类型配置:仅影响业务模块的配置,如果Type 2类型配置不匹配,从设备上M-LAG接口依然为up状态,不影响M-LAG系统正常工作,由Type 2类型配置对应的业务模块决定是否关闭该业务功能,其他业务模块不受影响,比如web认证,端口安全模式
为了避免设备 M-LAG 接口震荡,设备会在延迟恢复定时器(缺省为30s)一半时间之后进行配置一致性检查
双活网关
两端配置相同vlan以及ip,用双归接入,其实这个拓扑和堆叠也就差个原理罢了,简化不就是链路聚合+堆叠吗?这么接好处心中有数,可实现负载分担快速切换
总结
经过DRCP协商交换配置和资源,构建M-LAG系统,进行主从协商检测一致性,发送keepalive报文检测邻居状态是否正常,正常后进行数据同步与信息交互,M-LAG开始运行,peer-link决定M-LAG系统的工作状态,keepalive用于检测对端状态是否正常,结合peer-link状态进行判断,设备间通过信息交互实现高可靠性
实验
拓扑
配置
1.m-lag的虚拟mac地址优先级
[sw1]m-lag system-number 1
Changing the system number might flap the peer link and cause M-LAG system setup failure. Continue? [Y/N]:y
[sw1]%Jan 22 22:15:47:060 2025 sw1 M-LAG/6/MLAG_SYSEVENT_NUMBER_CHANGE: System number changed from default to 1.
[sw1]m-lag system-mac 1-1-1
Changing the system MAC address might flap the peer link and cause M-LAG system setup failure. Continue? [Y/N]:y
[sw1]%Jan 22 22:16:16:894 2025 sw1 M-LAG/6/MLAG_SYSEVENT_MAC_CHANGE: System MAC address changed from N/A to 0001-0001-0001.
[sw1]m-lag system-priority 120
Changing the system priority might flap the peer link and cause M-LAG system setup failure. Continue? [Y/N]:y
[sw1]%Jan 22 22:18:11:451 2025 sw1 M-LAG/6/MLAG_SYSEVENT_PRIORITY_CHANGE: System priority changed from 32768 to 120.
2.keepalive链路
[sw1-Ten-GigabitEthernet1/0/50]port link-mode route
[sw1-Ten-GigabitEthernet1/0/50]ip add 10.0.0.1 24
[sw1-Ten-GigabitEthernet1/0/50]qui
[sw1]m-lag keepalive ip destination 10.0.0.2 source 10.0.0.1
[sw1]m-lag mad exclude int Ten-GigabitEthernet 1/0/5
3.peer-link链路
[sw1]int Bridge-Aggregation 1
[sw1-Bridge-Aggregation1]link-aggregation mode dynam
[sw1]int Ten-GigabitEthernet 1/0/51
[sw1-Ten-GigabitEthernet1/0/51]port link-aggregation group 1
[sw1-Ten-GigabitEthernet1/0/51]%Jan 22 22:27:58:073 2025 sw1 LAGG/6/LAGG_LACP_RECEIVE_TIMEOUT: LACPDU reception timed out on member port XGE1/0/51 in aggregation group BAGG1.
%Jan 22 22:27:58:082 2025 sw1 IFNET/5/LINK_UPDOWN: Line protocol state on the interface Ten-GigabitEthernet1/0/51 changed to down.
int t%Jan 22 22:28:02:030 2025 sw1 LAGG/6/LAGG_ACTIVE: Member port XGE1/0/51 of aggregation group BAGG1 changed to the active state.
%Jan 22 22:28:02:036 2025 sw1 STP/6/STP_NOTIFIED_TC: Instance 0's port Ten-GigabitEthernet1/0/52 was notified a topology change.
%Jan 22 22:28:02:040 2025 sw1 STP/6/STP_DETECTED_TC: Instance 0's port Bridge-Aggregation1 detected a topology change.
%Jan 22 22:28:02:041 2025 sw1 IFNET/5/LINK_UPDOWN: Line protocol state on the interface Ten-GigabitEthernet1/0/51 changed to up.
%Jan 22 22:28:02:042 2025 sw1 IFNET/3/PHY_UPDOWN: Physical state on the interface Bridge-Aggregation1 changed to up.
%Jan 22 22:28:02:042 2025 sw1 IFNET/5/LINK_UPDOWN: Line protocol state on the interface Bridge-Aggregation1 changed to up.
en
[sw1-Ten-GigabitEthernet1/0/51]int ten1/0/52
[sw1-Ten-GigabitEthernet1/0/52]port link-aggregation g 1
[sw1-Ten-GigabitEthernet1/0/52]int bri 1
[sw1-Bridge-Aggregation1]port m-lag peer-link 1
[sw1]m-lag consistency-check disable
4.m-lag接口
[sw1]int Bridge-Aggregation 2
[sw1-Bridge-Aggregation2]link-aggregation mode dynamic
[sw1-Bridge-Aggregation2]int g1/0/1
[sw1-GigabitEthernet1/0/1]port link-aggregation g 2
[sw1]int Bridge-Aggregation 2
[sw1-Bridge-Aggregation2]port m-lag group 2
5.对端配置
注意:记得关闭一致性检测(模拟器环境 真机不推荐),关闭静态mac源检测(在pee,记得关闭,不然下面接口起不来
[sw2]m-lag system-mac 1-1-1
Changing the system MAC address might flap the peer link and cause M-LAG system setup failure. Continue? [Y/N]:y
[sw2]%Jan 22 22:37:01:457 2025 sw2 M-LAG/6/MLAG_SYSEVENT_MAC_CHANGE: System MAC address changed from N/A to 0001-0001-0001.
[sw2]m-lag system-priority 120
Changing the system priority might flap the peer link and cause M-LAG system setup failure. Continue? [Y/N]:y
[sw2]%Jan 22 22:37:15:524 2025 sw2 M-LAG/6/MLAG_SYSEVENT_PRIORITY_CHANGE: System priority changed from 32768 to 80.
[sw2]m-lag system-number 2
Changing the system number might flap the peer link and cause M-LAG system setup failure. Continue? [Y/N]:y
[sw2]%Jan 22 22:37:27:386 2025 sw2 M-LAG/6/MLAG_SYSEVENT_NUMBER_CHANGE: System number changed from default to 2.
[sw2]int Ten-GigabitEthernet 1/0/50
[sw2-Ten-GigabitEthernet1/0/50]port link-mode route
[sw2-Ten-GigabitEthernet1/0/50]ip address 10.0.0.2 24
[sw2]m-lag keepalive ip destination 10.0.0.1 source 10.0.0.2
[sw2]m-lag mad exclude int Ten-GigabitEthernet 1/0/50
[sw2]int Bridge-Aggregation 1
[sw2-Bridge-Aggregation1]link-aggregation mode dynamic
[sw2-Bridge-Aggregation1]int ten1/0/51
[sw2-Ten-GigabitEthernet1/0/51]port link-aggregation g 1
%Jan 22 22:43:58:431 2025 sw2 LAGG/6/LAGG_LACP_RECEIVE_TIMEOUT: LACPDU reception timed out on member port XGE1/0/51 in aggregation group BAGG1.
[sw2-Ten-GigabitEthernet1/0/51]int t1/0/52
[sw2-Ten-GigabitEthernet1/0/51]port link-aggregation g 1
[sw2]int Ten-GigabitEthernet1/0/52
[sw2-Ten-GigabitEthernet1/0/52]port link-aggregation g 1
[sw2-Ten-GigabitEthernet1/0/52]int bri 1
[sw2-Bridge-Aggregation1]port m-lag peer-link 1
[sw2]int bri 2
[sw2-Bridge-Aggregation2]p
[sw2-Bridge-Aggregation2]link-ag
[sw2-Bridge-Aggregation2]link-aggregation mode dy
[sw2-Bridge-Aggregation2]link-aggregation mode dynamic
[sw2-Bridge-Aggregation2]int g1/0/1
[sw2-GigabitEthernet1/0/1]port link-ag g 2
[sw2-GigabitEthernet1/0/1]int bri 2
[sw2-Bridge-Aggregation2]port m-lag group 2
[sw2]m-lag consistency-check disable
下接设备做链路聚合
[sw3]int bri 2
[sw3-Bridge-Aggregation2]link mode dy
[sw3-Bridge-Aggregation2]link mode dy
[sw3-Bridge-Aggregation2]int ran g1/0/1 g1/0/2
[sw3-if-range]port link g 2
看下keepalive,没有问题
E-trunk
LACP
链路聚合原理,不过多解释,看前面的笔记
系统ID:越小越优先,缺省为使用e-trunk的接口mac
E-Trunk的优先级:用于在聚合组中决策两台设备的主备状态,越小越优
E-Trunk的ID:唯一标识号
主备协商
CE分别与PE1和PE2直连,PE1和PE2之间运行E-Trunk
PE侧
在PE1和PE2设备上分别创建ID相同的E-Trunk和Eth-Trunk,并将Eth-Trunk加入到E-Trunk
CE侧
在CE设备上配置LACP模式的Eth-Trunk,此Eth-Trunk分别与PE1和PE2设备相连
对CE设备而言,E-Trunk不可见
确定E-Trunk的主备状态
PE1与PE2设备之间通过E-Trunk报文进行主备协商,确定E-Trunk的主备状态,正常情况下两台PE的协商结果是一个为主用一个为备用。
PE设备上E-Trunk主备状态是根据报文中所携带的E-Trunk优先级和E-Trunk系统ID确定的,优先级的数值越小,优先级越高,优先级高的为主用。如果E-Trunk优先级相同,那么E-Trunk系统ID小的为主用
说白了就是欺骗设备告诉其绑定的是同一台设备
实验
配置
[sw2]lacp e-trunk system-id 1-1-1
Jan 22 2025 23:33:08-08:00 sw2 DS/4/DATASYNC_CFGCHANGE:OID 1.3.6.1.4.1.2011.5.25
.191.3.1 configurations have been changed. The current change number is 6, the c
hange loop count is 0, and the maximum number of records is 4095.trunk
[sw2]lacp e-trunk priority 12800
ce段做链路聚合,pe端改e-trunk参数之后把自己的eth-trunk加入到e-trunk里,为保持可靠性可以采用bfd联动,在23间在加一条线