Transformer模型编码器部分实现2(全连接+规范化+子层连接+编码器层+整体连接代码)

部分内容来自于网络教程，如有侵权请联系本人删除相关内容：

教程链接：https://www.bilibili.com/video/BV17Y4y1j7cf?p=22&spm_id_from=pageDriver&vd_source=eca88088d891c6950b5aea556143b41c 2.3.5规范化层-part1_哔哩哔哩_bilibili https://www.bilibili.com/video/BV17Y4y1j7cf?p=22&spm_id_from=pageDriver&vd_source=eca88088d891c6950b5aea556143b41c

1.前馈全连接层

在Transformer中前馈全连接层就是具有两层线性层的全连接网络。

作用：注意力记住可能对复杂过程的拟合程度不够，通过增加两层网络来增强模型的能力。

前馈全连接层的代码分析

# 前馈全连接网络
class PositionwiseFeedForward(nn.Module):
    def __init__(self, d_model, d_ff, dropout=0.1):
        # d_model: 词嵌入的维度，同时也是全连接层的输入输出维度
        # d_ff: 第一个线性层的输入维度，第二个线性层的输出维度
        super(PositionwiseFeedForward,self).__init__()
        
        self.fc1 = nn.Linear(d_model, d_ff)
        self.fc2 = nn.Linear(d_ff, d_model)
        self.dropout = nn.Dropout(p = dropout)

    def forward(self,x):
        # x:上一层的输出
        return self.fc2(self.dropout(F.relu(self.fc1(x))))

2.规范化层

随着网络层数的加深，通过多层的计算后参数可能开始出现过大或过小的情况，这样可能会导致学习过程出现异常，模型可能收敛非常慢。因此都会一定层数之后接规范化层进行数值的规范化，使特征数值在合理范围内。

规范化层代码实现：

class LayerNorm(nn.Module):
    def __init__(self, features, eps=1e-6):
        '''
            features:词嵌入维度
            eps:足够小的数，在规范化公式的分母出现，防止分母为0,
        '''
        super(LayerNorm, self).__init__()    
        
        self.a2 = nn.Parameter(torch.ones(features))
        self.b2 = nn.Parameter(torch.zeros(features))
        self.eps = eps

    def forward(self, x):
        # 输入x为上一层的输出
        # 对x的最后一个维度求均值，保持输出维度和输入一样
        mean = x.mean(-1,keepdim=True)
        # 对x的最后一个维度求标准差，保持输出维度和输入一样
        std = x.std(-1,keepdim=True)
        return self.a2*(x - mean)/(std + self.eps)+self.b2

3.子层连接结构

基于以上的模块代码，我们要实现带有残差链接的子层链接结果。一共有两个子层，如图所示：

class SublayerConnection(nn.Module):
    def __init__(self, size, dropout=0.1):
        # size: 词嵌入维度
        super(SublayerConnection,self).__init__()
        self.norm = LayerNorm(size)
        self.dropout = nn.Dropout(p=dropout)
        self.size = size
    
    def forward(self, x, sublayer):
        # 上一层输出x作为输入
        # 第二个参数为子层函数
        
        return x + self.dropout(sublayer(self.norm(x)))

4.编码器子层

在学完以上结构后，我们将编码器的所有部分进行连接，构成整个编码器子层。

编码器层的作用：作为编码器的组成单元，每个编码器层完成一次对输入的提取特征过程，即编码过程。

编码器层示意图：

class EncoderLayer(nn.Module):
    def __init__(self, size, self_attn, feed_forward, dropout):
        # size: 词嵌入维度
        # self_attn: 多头自注意力子层实例化对象
        # feed_forward: 前馈全连接实例化对象
        super(EncoderLayer,self).__init__()
        
        self.self_attn = self_attn
        self.feed_forward = feed_forward
        self.size = size

        self.sublayer = clones(SublayerConnection(size, dropout),2)

    def forward(self, x, mask):
        # x: 上一层输出张量
        # mask: 掩码张量
        # 输出为经过整个编码器层的特征表示
        x = self.sublayer[0](x, lambda x: self.self_attn(x, x, x, mask))
        return self.sublayer[1](x, self.feed_forward)

5.编码器

编码器用于对输入进行指定的特征提取过程，也称为编码，由N个编码器层堆叠而成。

编码器的输出是Transformer中编码器的特征提取表示，他将成为解码器输入的一部分。

class Encoder(nn.Module):
    def __init__(self, layer, N):
        # layer: 编码器层
        # N: 编码器层数
        super(Encoder, self).__init__()
        # 首先使用clone函数克隆N个编码器层
        self.layers = clones(layer, N)
        # 初始化规范化层
        self.norm = LayerNorm(layer.size)
        
    def forward(self, x, mask):
        # x: 上一层的输出
        # mask: 掩码张量
        # 让x依次经过N个编码器处理，最后经过规范化层
        for layer in self.layers:
            x = layer(x, mask)
        return self.norm(x)

Transformer模型编码器部分实现2(全连接+规范化+子层连接+编码器层+整体连接代码)

悦读