循环神经网络（基础篇）

1、RNN的介绍

2、RNN Cell的具体计算过程

3、如何使用RNN Cell

当 batch_size = 1 的时候

当 batch_size = 2 的时候

当序列 seq_len = 4，input_size = 1

RNN Cell 将 hello ->ohlol 全部实现代码

RNN 将 hello ->ohlol 全部实现代码

6、词嵌入

读取One-hot vectors的缺点

One-hot vectors缺点的解决方式是Embedding

代码实现

1、RNN的介绍

h0表示先验知识

CNN+FC 为 h0，作为 RNN 的输入，这样就完成了图像到文本的转化，如果没有先验知识，直接初始化 h0 为全零

h1 和 x2 作为下一个 RNN 的输入，经过 RNN 的线性计算得到 h2，接着以此类推得到 h3、h4....

右边的每一个 RNN Cell 是同一个线性层，这个过程中拿一个线性层反复参与计算，也就是线性层中的前一个权重会参与到后面的计算中构建一个复杂的计算图（循环过程中使用权重共享机制，只用了一个线性层）

2、RNN Cell的具体计算过程

3、如何使用RNN Cell

需要注意RNN Cell中的参数x的输入大小 input_size 和输入隐藏层的大小 hidden_size，同时注意输入的隐藏层大小和输出的隐藏层大小是一致的

对 cell(input,hidden)循环处理，如 h1 = cell(x1,h0)，h0是上一个的输出

当 batch_size = 1 的时候

import torch

batch_size = 1
seq_len = 3
input_size = 4
hidden_size = 2
cell = torch.nn.RNNCell(input_size=input_size,hidden_size=hidden_size)

# (seq_len, batch ,feature)
dataset = torch.rand(seq_len,batch_size,input_size)
print('dataset:：',dataset)
hidden = torch.zeros(batch_size,hidden_size)

for idx, input in enumerate(dataset):
    print('=' * 20,idx,'=' * 20)
    print('input：',input)
    print('input size：',input.shape)
    hidden = cell(input,hidden)

    print('hidden size：',hidden.shape)
    print(hidden)

G:\python_files\DeepLearning\Scripts\python.exe G:/python_files/DeepLearningProgram/RNNCell.py
dataset:： tensor([[[0.0838, 0.6044, 0.0810, 0.7452]],

[[0.6009, 0.8458, 0.0021, 0.5979]],

[[0.4665, 0.0486, 0.2486, 0.5683]]])
==================== 0 ====================
input： tensor([[0.0838, 0.6044, 0.0810, 0.7452]])
input size： torch.Size([1, 4])
hidden size： torch.Size([1, 2])
tensor([[-0.8042, 0.7211]], grad_fn=<TanhBackward0>)
==================== 1 ====================
input： tensor([[0.6009, 0.8458, 0.0021, 0.5979]])
input size： torch.Size([1, 4])
hidden size： torch.Size([1, 2])
tensor([[-0.9128, 0.4984]], grad_fn=<TanhBackward0>)
==================== 2 ====================
input： tensor([[0.4665, 0.0486, 0.2486, 0.5683]])
input size： torch.Size([1, 4])
hidden size： torch.Size([1, 2])
tensor([[-0.8416, 0.2773]], grad_fn=<TanhBackward0>)

Process finished with exit code 0

当 batch_size = 2 的时候

import torch

batch_size = 2
seq_len = 3
input_size = 4
hidden_size = 2
cell = torch.nn.RNNCell(input_size=input_size,hidden_size=hidden_size)

# (seq_len, batch ,feature)
dataset = torch.rand(seq_len,batch_size,input_size)
print('dataset:：',dataset)
hidden = torch.zeros(batch_size,hidden_size)

for idx, input in enumerate(dataset):
    print('=' * 20,idx,'=' * 20)
    print('input：',input)
    print('input size：',input.shape)
    hidden = cell(input,hidden)

    print('hidden size：',hidden.shape)
    print(hidden)

G:\python_files\DeepLearning\Scripts\python.exe G:/python_files/DeepLearningProgram/RNNCell.py
dataset:： tensor([[[0.6906, 0.7810, 0.2862, 0.9390],
[0.8915, 0.2699, 0.4300, 0.3093]],

[[0.4048, 0.7307, 0.3674, 0.7294],
[0.0394, 0.9579, 0.3730, 0.3643]],

[[0.8211, 0.2256, 0.7045, 0.1283],
[0.5108, 0.5651, 0.0126, 0.4928]]])
==================== 0 ====================
input： tensor([[0.6906, 0.7810, 0.2862, 0.9390],
[0.8915, 0.2699, 0.4300, 0.3093]])
input size： torch.Size([2, 4])
hidden size： torch.Size([2, 2])
tensor([[ 0.2782, -0.7428],
[ 0.4471, -0.7467]], grad_fn=<TanhBackward0>)
==================== 1 ====================
input： tensor([[0.4048, 0.7307, 0.3674, 0.7294],
[0.0394, 0.9579, 0.3730, 0.3643]])
input size： torch.Size([2, 4])
hidden size： torch.Size([2, 2])
tensor([[-0.4059, -0.8885],
[-0.5068, -0.8562]], grad_fn=<TanhBackward0>)
==================== 2 ====================
input： tensor([[0.8211, 0.2256, 0.7045, 0.1283],
[0.5108, 0.5651, 0.0126, 0.4928]])
input size： torch.Size([2, 4])
hidden size： torch.Size([2, 2])
tensor([[ 0.1510, -0.9392],
[ 0.2471, -0.9277]], grad_fn=<TanhBackward0>)

Process finished with exit code 0

当序列 seq_len = 4，input_size = 1

batch_size = 2
seq_len = 4
input_size = 1
hidden_size = 2

G:\python_files\DeepLearning\Scripts\python.exe G:/python_files/DeepLearningProgram/RNNCell.py
dataset:： tensor([[[0.6525],
[0.6306]],

[[0.7393],
[0.8773]],

[[0.7610],
[0.1576]],

[[0.1989],
[0.8072]]])
==================== 0 ====================
input： tensor([[0.6525],
[0.6306]])
input size： torch.Size([2, 1])
hidden size： torch.Size([2, 2])
tensor([[-0.7688, 0.2979],
[-0.7707, 0.2928]], grad_fn=<TanhBackward0>)
==================== 1 ====================
input： tensor([[0.7393],
[0.8773]])
input size： torch.Size([2, 1])
hidden size： torch.Size([2, 2])
tensor([[-0.8193, 0.0798],
[-0.8090, 0.1117]], grad_fn=<TanhBackward0>)
==================== 2 ====================
input： tensor([[0.7610],
[0.1576]])
input size： torch.Size([2, 1])
hidden size： torch.Size([2, 2])
tensor([[-0.7975, -0.0332],
[-0.8421, -0.1672]], grad_fn=<TanhBackward0>)
==================== 3 ====================
input： tensor([[0.1989],
[0.8072]])
input size： torch.Size([2, 1])
hidden size： torch.Size([2, 2])
tensor([[-0.8264, -0.2116],
[-0.7667, -0.1379]], grad_fn=<TanhBackward0>)

Process finished with exit code 0

4、如何使用RNN

num_layers的含义是有多少层的RNN，一般选择一层即可，选多了很耗费时间

cell中的参数inputs是包含整个序列中的输入X=[x1,x2,x3,x4......xn]，hidden就是 h0，然后输出两个张量，第一个是张量 out=[h1,h2,h3,h4......hn] ，而另一个张量是hidden，为hn，所以相比较直接使用RNN Cell，直接使用RNN可以简化代码，整个 for 循环不用我们自己写了，自动帮我们循环了，所以要输出整个 X

什么是num_layers

5、实际操作

字符向量化

接入softmax层

准备数据集

idx2char = ['e','h','l','o']
x_data = [1, 0, 2, 2, 3]
y_data = [3, 1, 2, 3, 2]

one_hot_lookup = [
    [1, 0, 0, 0],
    [0, 1, 0, 0],
    [0, 0, 1, 0],
    [0, 0, 0, 1],
]

x_one_hot = [one_hot_lookup[x] for x in x_data]   # (seq_len,input)
# print('x_one_hot：',x_one_hot) [[0, 1, 0, 0], [1, 0, 0, 0], [0, 0, 1, 0], [0, 0, 1, 0], [0, 0, 0, 1]]
inputs = torch.Tensor(x_one_hot).view(-1,batch_size,input_size)
# print('inputs：',inputs)
# inputs： tensor([[[0., 1., 0., 0.]],
#
#         [[1., 0., 0., 0.]],
#
#         [[0., 0., 1., 0.]],
#
#         [[0., 0., 1., 0.]],
#
#         [[0., 0., 0., 1.]]])
labels = torch.LongTensor(y_data).view(-1,1)
# print('labels：',labels)
# labels： tensor([[3],
#         [1],
#         [2],
#         [3],
#         [2]])

RNN Cell 将 hello ->ohlol 全部实现代码

import torch
input_size = 4
hidden_size = 4
batch_size = 1

idx2char = ['e','h','l','o']
x_data = [1, 0, 2, 2, 3]
y_data = [3, 1, 2, 3, 2]

one_hot_lookup = [
    [1, 0, 0, 0],
    [0, 1, 0, 0],
    [0, 0, 1, 0],
    [0, 0, 0, 1],
]

x_one_hot = [one_hot_lookup[x] for x in x_data]   # (seq_len,input)
# print('x_one_hot：',x_one_hot) [[0, 1, 0, 0], [1, 0, 0, 0], [0, 0, 1, 0], [0, 0, 1, 0], [0, 0, 0, 1]]
inputs = torch.Tensor(x_one_hot).view(-1,batch_size,input_size)
# print('inputs：',inputs)
# inputs： tensor([[[0., 1., 0., 0.]],
#
#         [[1., 0., 0., 0.]],
#
#         [[0., 0., 1., 0.]],
#
#         [[0., 0., 1., 0.]],
#
#         [[0., 0., 0., 1.]]])
labels = torch.LongTensor(y_data).view(-1,1)
# print('labels：',labels.shape)   # labels： torch.Size([5, 1])
# labels： tensor([[3],
#         [1],
#         [2],
#         [3],
#         [2]])

class Model(torch.nn.Module):
    def __init__(self, input_size, hidden_size, batch_size):
        super(Model, self).__init__()
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.batch_size = batch_size
        self.RNNCell = torch.nn.RNNCell(input_size = self.input_size,
                                        hidden_size = self.hidden_size)
    def forward(self, input, hidden):
        hidden = self.RNNCell(input,hidden)
        return hidden

    def init_hidden(self):    # 初始化 h0
        return torch.zeros(self.batch_size,self.hidden_size)

net = Model(input_size, hidden_size, batch_size)

criterion = torch.nn.CrossEntropyLoss()   # 接入softmax层
optimizer = torch.optim.Adam(net.parameters(), lr=0.1)

for epoch in range(15):
    loss = 0
    optimizer.zero_grad()
    hidden = net.init_hidden()
    print('预测的字符：',end='')
    for input, label in zip(inputs, labels):   # inputs.shape(seq_len,batch_size,Input_size) input(batch_size,Input_size)
        #print('label：',label) labels(seq_size,1) label(1)
        hidden = net(input,hidden)
        loss += criterion(hidden,label)  # 整个loss的和才是最终的损失
        #print('hidden：', hidden)  # torch.Size([1, 4])        loss += criterion(hidden,label)  # 在构建计算图,所以必须要用 + ,否则只是单独的一个序列，即 loss = criterion(hidden, label)这种写法有问题
        _, idx = hidden.max(dim=1)  # 拿到最大预测值的下标 tensor([[0.1205, 0.0546, 0.4058, 0.3396]] 也即代表的是 [e,h,l,o]的下标
        print(idx2char[idx.item()], end='')
    loss.backward()
    optimizer.step()
    print(',Epoch [%d/15] loss =  %0.4f' % (epoch+1, loss.item()))

G:\python_files\DeepLearning\Scripts\python.exe "G:/python_files/DeepLearningProgram/RNN Cell.py"
labels： torch.Size([5, 1])
预测的字符：hhhee,Epoch [1/15] loss = 6.5936
预测的字符：ohlol,Epoch [2/15] loss = 5.1479
预测的字符：ohlol,Epoch [3/15] loss = 4.1417
预测的字符：ohlol,Epoch [4/15] loss = 3.5055
预测的字符：ohlol,Epoch [5/15] loss = 3.0920
预测的字符：ohlol,Epoch [6/15] loss = 2.8072
预测的字符：ohlol,Epoch [7/15] loss = 2.5990
预测的字符：ohlol,Epoch [8/15] loss = 2.4372
预测的字符：ohlol,Epoch [9/15] loss = 2.3106
预测的字符：ohlol,Epoch [10/15] loss = 2.2115
预测的字符：ohlol,Epoch [11/15] loss = 2.1293
预测的字符：ohlol,Epoch [12/15] loss = 2.0575
预测的字符：ohlol,Epoch [13/15] loss = 1.9963
预测的字符：ohlol,Epoch [14/15] loss = 1.9460
预测的字符：ohlol,Epoch [15/15] loss = 1.9059

Process finished with exit code 0

RNN 将 hello ->ohlol 全部实现代码

import torch
input_size = 4
hidden_size = 4
batch_size = 1
num_layers = 1
seq_len = 5

idx2char = ['e','h','l','o']
x_data = [1, 0, 2, 2, 3]
y_data = [3, 1, 2, 3, 2]

one_hot_lookup = [
    [1, 0, 0, 0],
    [0, 1, 0, 0],
    [0, 0, 1, 0],
    [0, 0, 0, 1],
]

x_one_hot = [one_hot_lookup[x] for x in x_data]
inputs = torch.Tensor(x_one_hot).view(seq_len,batch_size,input_size)
# print('inputs：',inputs)
# inputs： tensor([[[0., 1., 0., 0.]],
#
#         [[1., 0., 0., 0.]],
#
#         [[0., 0., 1., 0.]],
#
#         [[0., 0., 1., 0.]],
#
#         [[0., 0., 0., 1.]]])
labels = torch.LongTensor(y_data)   # (seq_size*batch_size,1)
print('labels.shape：',labels.shape)
print('labels',labels)

class Model(torch.nn.Module):
    def __init__(self, input_size, hidden_size, batch_size, num_layers=1):
        super(Model, self).__init__()
        self.num_layers = num_layers
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.batch_size = batch_size
        self.RNN = torch.nn.RNN(input_size=self.input_size,
                                        hidden_size=self.hidden_size,num_layers=self.num_layers)
    def forward(self, input):
        hidden = torch.zeros(
            self.num_layers,
            self.batch_size,
            self.hidden_size
        )
        out, _ = self.RNN(input,hidden)
        return out.view(-1,self.hidden_size)  # (seq_size*batch_size,hidden_size)


net = Model(input_size, hidden_size, batch_size)

criterion = torch.nn.CrossEntropyLoss()   # 接入softmax层
optimizer = torch.optim.Adam(net.parameters(), lr=0.1)

for epoch in range(15):

    optimizer.zero_grad()
    outputs = net(inputs)   # inputs(seq_size,batch_size,input_size)  outputs(seq_size,batch_size,hidden_size)
    print('outputs：',outputs)
    loss = criterion(outputs,labels)   # (seq_size*batch_size,1)
    loss.backward()
    optimizer.step()
    _, idx = outputs.max(dim=1)  # 取最大值的下标
    print('outputs.max(dim=1)',outputs.max(dim=1))
    idx = idx.data.numpy()
    print('idx：',idx)
    print('Predicted：',''.join([idx2char[x] for x in idx]),end='')
    print(',Epoch [%d/15] loss =  %0.4f' % (epoch+1, loss.item()))

G:\python_files\DeepLearning\Scripts\python.exe G:/python_files/DeepLearningProgram/RNN.py
labels.shape： torch.Size([5])
labels tensor([3, 1, 2, 3, 2])
outputs： tensor([[-0.4914, -0.5676, 0.3506, 0.7260],
[-0.6970, -0.7721, 0.5879, -0.0892],
[-0.5054, -0.6028, 0.2994, 0.4381],
[-0.4411, -0.6743, 0.3136, 0.5604],
[-0.6518, -0.5096, 0.4113, 0.2888]], grad_fn=<ViewBackward0>)
outputs.max(dim=1) torch.return_types.max(
values=tensor([0.7260, 0.5879, 0.4381, 0.5604, 0.4113], grad_fn=<MaxBackward0>),
indices=tensor([3, 2, 3, 3, 2]))
idx： [3 2 3 3 2]
Predicted： olool,Epoch [1/15] loss = 1.1690
outputs： tensor([[-0.6847, -0.3311, 0.4351, 0.8397],
[-0.9187, -0.4628, 0.8491, -0.0610],
[-0.8763, -0.1816, 0.8216, 0.4632],
[-0.8988, -0.3068, 0.8900, 0.5991],
[-0.9491, -0.2364, 0.9254, 0.1066]], grad_fn=<ViewBackward0>)
outputs.max(dim=1) torch.return_types.max(
values=tensor([0.8397, 0.8491, 0.8216, 0.8900, 0.9254], grad_fn=<MaxBackward0>),
indices=tensor([3, 2, 2, 2, 2]))
idx： [3 2 2 2 2]
Predicted： ollll,Epoch [2/15] loss = 1.0504
outputs： tensor([[-0.7974, -0.2362, 0.4075, 0.8494],
[-0.9674, -0.0028, 0.8895, -0.3728],
[-0.9541, 0.1519, 0.8939, 0.4081],
[-0.9655, 0.0087, 0.9500, 0.5349],
[-0.9834, 0.0539, 0.9663, -0.2075]], grad_fn=<ViewBackward0>)
outputs.max(dim=1) torch.return_types.max(
values=tensor([0.8494, 0.8895, 0.8939, 0.9500, 0.9663], grad_fn=<MaxBackward0>),
indices=tensor([3, 2, 2, 2, 2]))
idx： [3 2 2 2 2]
Predicted： ollll,Epoch [3/15] loss = 0.9826
outputs： tensor([[-0.8670, -0.1775, 0.3186, 0.8979],
[-0.9834, 0.4190, 0.8858, -0.3463],
[-0.9782, 0.2251, 0.9323, 0.6671],
[-0.9844, 0.1764, 0.9735, 0.6935],
[-0.9929, 0.2073, 0.9807, -0.1195]], grad_fn=<ViewBackward0>)
outputs.max(dim=1) torch.return_types.max(
values=tensor([0.8979, 0.8858, 0.9323, 0.9735, 0.9807], grad_fn=<MaxBackward0>),
indices=tensor([3, 2, 2, 2, 2]))
idx： [3 2 2 2 2]
Predicted： ollll,Epoch [4/15] loss = 0.9192
outputs： tensor([[-0.9103, -0.2122, 0.1858, 0.9287],
[-0.9894, 0.6635, 0.8448, -0.3441],
[-0.9863, 0.1154, 0.9428, 0.8075],
[-0.9912, 0.2510, 0.9800, 0.7649],
[-0.9962, 0.2230, 0.9848, -0.0682]], grad_fn=<ViewBackward0>)
outputs.max(dim=1) torch.return_types.max(
values=tensor([0.9287, 0.8448, 0.9428, 0.9800, 0.9848], grad_fn=<MaxBackward0>),
indices=tensor([3, 2, 2, 2, 2]))
idx： [3 2 2 2 2]
Predicted： ollll,Epoch [5/15] loss = 0.8708
outputs： tensor([[-0.9374, -0.3276, 0.0160, 0.9436],
[-0.9920, 0.7855, 0.7425, -0.4376],
[-0.9888, -0.1326, 0.9356, 0.8573],
[-0.9942, 0.2462, 0.9809, 0.7434],
[-0.9977, 0.0970, 0.9850, -0.1693]], grad_fn=<ViewBackward0>)
outputs.max(dim=1) torch.return_types.max(
values=tensor([0.9436, 0.7855, 0.9356, 0.9809, 0.9850], grad_fn=<MaxBackward0>),
indices=tensor([3, 1, 2, 2, 2]))
idx： [3 1 2 2 2]
Predicted： ohlll,Epoch [6/15] loss = 0.8216
outputs： tensor([[-0.9546, -0.4712, -0.1802, 0.9536],
[-0.9932, 0.8538, 0.5203, -0.5241],
[-0.9882, -0.3814, 0.9068, 0.8885],
[-0.9959, 0.1872, 0.9789, 0.7184],
[-0.9984, -0.1151, 0.9835, -0.3094]], grad_fn=<ViewBackward0>)
outputs.max(dim=1) torch.return_types.max(
values=tensor([0.9536, 0.8538, 0.9068, 0.9789, 0.9835], grad_fn=<MaxBackward0>),
indices=tensor([3, 1, 2, 2, 2]))
idx： [3 1 2 2 2]
Predicted： ohlll,Epoch [7/15] loss = 0.7695
outputs： tensor([[-0.9658, -0.6085, -0.3808, 0.9628],
[-0.9938, 0.8942, 0.1230, -0.5613],
[-0.9837, -0.5155, 0.8168, 0.9232],
[-0.9968, 0.0580, 0.9713, 0.7732],
[-0.9989, -0.3260, 0.9811, -0.4025]], grad_fn=<ViewBackward0>)
outputs.max(dim=1) torch.return_types.max(
values=tensor([0.9628, 0.8942, 0.9232, 0.9713, 0.9811], grad_fn=<MaxBackward0>),
indices=tensor([3, 1, 3, 2, 2]))
idx： [3 1 3 2 2]
Predicted： oholl,Epoch [8/15] loss = 0.7178
outputs： tensor([[-0.9735, -0.7232, -0.5464, 0.9701],
[-0.9944, 0.9159, -0.3144, -0.5824],
[-0.9749, -0.5678, 0.6341, 0.9446],
[-0.9972, -0.1100, 0.9520, 0.8546],
[-0.9992, -0.5120, 0.9777, -0.4605]], grad_fn=<ViewBackward0>)
outputs.max(dim=1) torch.return_types.max(
values=tensor([0.9701, 0.9159, 0.9446, 0.9520, 0.9777], grad_fn=<MaxBackward0>),
indices=tensor([3, 1, 3, 2, 2]))
idx： [3 1 3 2 2]
Predicted： oholl,Epoch [9/15] loss = 0.6879
outputs： tensor([[-0.9790, -0.8102, -0.6357, 0.9743],
[-0.9953, 0.9232, -0.5247, -0.6337],
[-0.9689, -0.6435, 0.5774, 0.9440],
[-0.9977, -0.3335, 0.9436, 0.8861],
[-0.9994, -0.6626, 0.9760, -0.5683]], grad_fn=<ViewBackward0>)
outputs.max(dim=1) torch.return_types.max(
values=tensor([0.9743, 0.9232, 0.9440, 0.9436, 0.9760], grad_fn=<MaxBackward0>),
indices=tensor([3, 1, 3, 2, 2]))
idx： [3 1 3 2 2]
Predicted： oholl,Epoch [10/15] loss = 0.6628
outputs： tensor([[-0.9831, -0.8715, -0.6769, 0.9762],
[-0.9962, 0.9216, -0.5938, -0.7109],
[-0.9665, -0.7309, 0.6727, 0.9275],
[-0.9985, -0.5930, 0.9549, 0.8892],
[-0.9996, -0.7745, 0.9758, -0.6960]], grad_fn=<ViewBackward0>)
outputs.max(dim=1) torch.return_types.max(
values=tensor([0.9762, 0.9216, 0.9275, 0.9549, 0.9758], grad_fn=<MaxBackward0>),
indices=tensor([3, 1, 3, 2, 2]))
idx： [3 1 3 2 2]
Predicted： oholl,Epoch [11/15] loss = 0.6299
outputs： tensor([[-0.9862, -0.9124, -0.6969, 0.9765],
[-0.9970, 0.9158, -0.6110, -0.7885],
[-0.9655, -0.8035, 0.7984, 0.8932],
[-0.9990, -0.7852, 0.9671, 0.8858],
[-0.9997, -0.8599, 0.9756, -0.7793]], grad_fn=<ViewBackward0>)
outputs.max(dim=1) torch.return_types.max(
values=tensor([0.9765, 0.9158, 0.8932, 0.9671, 0.9756], grad_fn=<MaxBackward0>),
indices=tensor([3, 1, 3, 2, 2]))
idx： [3 1 3 2 2]
Predicted： oholl,Epoch [12/15] loss = 0.6009
outputs： tensor([[-0.9885, -0.9389, -0.7099, 0.9760],
[-0.9976, 0.9095, -0.6146, -0.8505],
[-0.9651, -0.8554, 0.8884, 0.8322],
[-0.9993, -0.8886, 0.9745, 0.8837],
[-0.9998, -0.9177, 0.9749, -0.8228]], grad_fn=<ViewBackward0>)
outputs.max(dim=1) torch.return_types.max(
values=tensor([0.9760, 0.9095, 0.8884, 0.9745, 0.9749], grad_fn=<MaxBackward0>),
indices=tensor([3, 1, 2, 2, 2]))
idx： [3 1 2 2 2]
Predicted： ohlll,Epoch [13/15] loss = 0.5797
outputs： tensor([[-0.9903, -0.9562, -0.7232, 0.9747],
[-0.9980, 0.9053, -0.6240, -0.8955],
[-0.9645, -0.8899, 0.9388, 0.7242],
[-0.9994, -0.9388, 0.9790, 0.8783],
[-0.9998, -0.9516, 0.9737, -0.8467]], grad_fn=<ViewBackward0>)
outputs.max(dim=1) torch.return_types.max(
values=tensor([0.9747, 0.9053, 0.9388, 0.9790, 0.9737], grad_fn=<MaxBackward0>),
indices=tensor([3, 1, 2, 2, 2]))
idx： [3 1 2 2 2]
Predicted： ohlll,Epoch [14/15] loss = 0.5627
outputs： tensor([[-0.9916, -0.9675, -0.7397, 0.9724],
[-0.9983, 0.9052, -0.6472, -0.9274],
[-0.9631, -0.9118, 0.9649, 0.5346],
[-0.9994, -0.9638, 0.9829, 0.8552],
[-0.9998, -0.9704, 0.9720, -0.8646]], grad_fn=<ViewBackward0>)
outputs.max(dim=1) torch.return_types.max(
values=tensor([0.9724, 0.9052, 0.9649, 0.9829, 0.9720], grad_fn=<MaxBackward0>),
indices=tensor([3, 1, 2, 2, 2]))
idx： [3 1 2 2 2]
Predicted： ohlll,Epoch [15/15] loss = 0.5464

Process finished with exit code 0

6、词嵌入

读取One-hot vectors的缺点

（1）、维度太高

（2）、矩阵太过稀疏

（3）、是硬编码，不是学习到的

One-hot vectors缺点的解决方式是Embedding

实际上是把一个高维度的稀疏样本映射为一个稠密的低维度的样本空间中，实际上就是常说的数据降维

隐藏层必须和分类的数量是一致的，但是有时候是不一致的，所以这个时候就需要再加一层线性，达到这样的要求

代码实现

import torch

num_class = 4
input_size = 4
hidden_size = 4
embedding_size = 10
num_layers = 2
batch_size = 1
seq_len = 5

idx2char = ['e', 'h', 'l', 'o']
x_data = [[1, 0, 2, 2, 3]]  # (batch, seq_len)
y_data = [3, 1, 2, 3, 2]  # (batch * seq_len)
inputs = torch.LongTensor(x_data)
print('inputs',inputs)
labels = torch.LongTensor(y_data)

class Model(torch.nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.emb = torch.nn.Embedding(input_size, embedding_size)
        self.rnn = torch.nn.RNN(input_size=embedding_size,
                                hidden_size=hidden_size,
                                num_layers=num_layers,
                                batch_first=True
                                )
        self.fc = torch.nn.Linear(hidden_size, num_class)

    def forward(self,x):
        print('x.size(0)',x.size(0))
        hidden = torch.zeros(num_layers, x.size(0), hidden_size)
        print('hidden.shape',hidden.shape)
        print('hidden',hidden)
        print('x_0：',x)
        x = self.emb(x)  # (batch, seqLen, embeddingSize)
        print('x_1：',x)
        x, _ = self.rnn(x, hidden)
        print('x_2',x)
        x = self.fc(x)
        print('x_3',x)  # (1,5,4)
        print('x.view：',x.view(-1,num_class))  # (5,4)
        return x.view(-1, num_class)

net = Model()
criterion = torch.nn.CrossEntropyLoss()   # 接入softmax层
optimizer = torch.optim.Adam(net.parameters(), lr=0.1)

for epoch in range(15):
    optimizer.zero_grad()
    outputs = net(inputs)
    print('outputs：', outputs)
    loss = criterion(outputs, labels)
    loss.backward()
    optimizer.step()
    _, idx = outputs.max(dim=1)  # 取最大值的下标
    idx = idx.data.numpy()
    print('idx：', idx)
    print('Predicted：', ''.join([idx2char[x] for x in idx]), end='')
    print(',Epoch [%d/15] loss =  %0.4f' % (epoch + 1, loss.item()))

G:\python_files\DeepLearning\Scripts\python.exe G:/python_files/DeepLearningProgram/词嵌入.py
inputs tensor([[1, 0, 2, 2, 3]])
x.size(0) 1
hidden.shape torch.Size([2, 1, 4])
hidden tensor([[[0., 0., 0., 0.]],

[[0., 0., 0., 0.]]])
x_0： tensor([[1, 0, 2, 2, 3]])
x_1： tensor([[[ 0.0979, 0.9104, 1.7472, -0.0186, 0.4478, -1.3402, 0.1717,
1.5238, -0.3157, 1.8523],
[-0.5575, 1.0836, 1.4183, 1.4768, -0.3469, -0.3848, -1.2278,
-0.4501, 0.5026, 1.3427],
[ 0.9381, 1.2550, -1.2158, -0.9144, 0.3298, -1.1841, 0.6781,
0.1703, 0.3936, -0.4299],
[ 0.9381, 1.2550, -1.2158, -0.9144, 0.3298, -1.1841, 0.6781,
0.1703, 0.3936, -0.4299],
[ 0.1590, -0.8760, 0.2953, -1.7547, -0.7498, 0.3259, -0.0867,
0.8370, -1.5976, 0.2770]]], grad_fn=<EmbeddingBackward0>)
x_2 tensor([[[ 0.3772, 0.4659, -0.4824, 0.3167],
[ 0.4253, 0.6224, -0.5328, 0.8509],
[ 0.1511, -0.2543, -0.4202, 0.3263],
[ 0.2619, -0.2187, -0.2259, 0.5462],
[ 0.6239, 0.4253, -0.1566, -0.1336]]], grad_fn=<TransposeBackward1>)
x_3 tensor([[[ 0.5114, -0.2779, 0.4754, -0.7368],
[ 0.7004, -0.0807, 0.3180, -0.9904],
[ 0.2221, -0.2761, 0.2885, -0.6399],
[ 0.3802, -0.1152, 0.1688, -0.7735],
[ 0.5466, -0.3278, 0.6095, -0.6360]]], grad_fn=<ViewBackward0>)
x.view： tensor([[ 0.5114, -0.2779, 0.4754, -0.7368],
[ 0.7004, -0.0807, 0.3180, -0.9904],
[ 0.2221, -0.2761, 0.2885, -0.6399],
[ 0.3802, -0.1152, 0.1688, -0.7735],
[ 0.5466, -0.3278, 0.6095, -0.6360]], grad_fn=<ViewBackward0>)
outputs： tensor([[ 0.5114, -0.2779, 0.4754, -0.7368],
[ 0.7004, -0.0807, 0.3180, -0.9904],
[ 0.2221, -0.2761, 0.2885, -0.6399],
[ 0.3802, -0.1152, 0.1688, -0.7735],
[ 0.5466, -0.3278, 0.6095, -0.6360]], grad_fn=<ViewBackward0>)
idx： [0 0 2 0 2]
Predicted： eelel,Epoch [1/15] loss = 1.6110
x.size(0) 1
hidden.shape torch.Size([2, 1, 4])
hidden tensor([[[0., 0., 0., 0.]],

[[0., 0., 0., 0.]]])
x_0： tensor([[1, 0, 2, 2, 3]])
x_1： tensor([[[ 0.1979, 1.0104, 1.6472, 0.0814, 0.5478, -1.2402, 0.2717,
1.4238, -0.4157, 1.7523],
[-0.6575, 0.9836, 1.3183, 1.5768, -0.4469, -0.4848, -1.1278,
-0.5501, 0.6026, 1.2427],
[ 1.0381, 1.1550, -1.1158, -1.0144, 0.4298, -1.0841, 0.5781,
0.2703, 0.2936, -0.5299],
[ 1.0381, 1.1550, -1.1158, -1.0144, 0.4298, -1.0841, 0.5781,
0.2703, 0.2936, -0.5299],
[ 0.2590, -0.7760, 0.1953, -1.8547, -0.6498, 0.4259, -0.1867,
0.9370, -1.4976, 0.3770]]], grad_fn=<EmbeddingBackward0>)
x_2 tensor([[[-0.4243, -0.1475, -0.7790, -0.4171],
[-0.5545, 0.3204, -0.5777, 0.8650],
[-0.4827, -0.4827, -0.7665, -0.3961],
[-0.7910, -0.2632, -0.7781, -0.0279],
[-0.4898, 0.4215, -0.7939, -0.0173]]], grad_fn=<TransposeBackward1>)
x_3 tensor([[[-2.7128e-01, -6.4384e-01, 5.6473e-01, -6.7270e-04],
[ 1.0902e-01, 7.2345e-02, -1.8547e-01, -3.3819e-01],
[-3.5608e-01, -6.7826e-01, 5.2131e-01, -2.3273e-02],
[-2.9731e-01, -5.2175e-01, 2.0260e-01, -1.3013e-02],
[-8.2588e-02, -3.9467e-01, 3.4536e-01, -4.8831e-02]]],
grad_fn=<ViewBackward0>)
x.view： tensor([[-2.7128e-01, -6.4384e-01, 5.6473e-01, -6.7270e-04],
[ 1.0902e-01, 7.2345e-02, -1.8547e-01, -3.3819e-01],
[-3.5608e-01, -6.7826e-01, 5.2131e-01, -2.3273e-02],
[-2.9731e-01, -5.2175e-01, 2.0260e-01, -1.3013e-02],
[-8.2588e-02, -3.9467e-01, 3.4536e-01, -4.8831e-02]],
grad_fn=<ViewBackward0>)
outputs： tensor([[-2.7128e-01, -6.4384e-01, 5.6473e-01, -6.7270e-04],
[ 1.0902e-01, 7.2345e-02, -1.8547e-01, -3.3819e-01],
[-3.5608e-01, -6.7826e-01, 5.2131e-01, -2.3273e-02],
[-2.9731e-01, -5.2175e-01, 2.0260e-01, -1.3013e-02],
[-8.2588e-02, -3.9467e-01, 3.4536e-01, -4.8831e-02]],
grad_fn=<ViewBackward0>)
idx： [2 0 2 2 2]
Predicted： lelll,Epoch [2/15] loss = 1.1571
x.size(0) 1
hidden.shape torch.Size([2, 1, 4])
hidden tensor([[[0., 0., 0., 0.]],

[[0., 0., 0., 0.]]])
x_0： tensor([[1, 0, 2, 2, 3]])
x_1： tensor([[[ 0.2837, 1.1049, 1.5748, 0.1688, 0.6327, -1.2290, 0.3694,
1.3295, -0.5141, 1.6724],
[-0.7490, 0.8857, 1.2296, 1.6622, -0.5267, -0.5428, -1.0679,
-0.6400, 0.6822, 1.2036],
[ 1.1272, 1.0887, -1.0406, -1.0973, 0.5064, -1.0112, 0.5653,
0.3511, 0.2163, -0.5147],
[ 1.1272, 1.0887, -1.0406, -1.0973, 0.5064, -1.0112, 0.5653,
0.3511, 0.2163, -0.5147],
[ 0.3371, -0.6994, 0.1869, -1.9517, -0.5497, 0.4916, -0.2523,
1.0365, -1.5213, 0.3916]]], grad_fn=<EmbeddingBackward0>)
x_2 tensor([[[-0.7139, -0.6269, -0.8682, -0.7788],
[-0.7125, 0.2163, -0.4858, 0.9187],
[-0.5879, -0.7182, -0.9006, -0.8135],
[-0.9251, -0.5227, -0.8896, -0.4147],
[-0.8109, 0.0873, -0.9328, -0.3958]]], grad_fn=<TransposeBackward1>)
x_3 tensor([[[-0.6093, -0.9582, 0.8367, 0.3744],
[-0.1254, 0.3039, -0.3237, -0.1519],
[-0.6249, -0.9832, 0.9368, 0.3376],
[-0.6056, -0.7613, 0.5163, 0.3487],
[-0.4986, -0.6101, 0.5658, 0.3484]]], grad_fn=<ViewBackward0>)
x.view： tensor([[-0.6093, -0.9582, 0.8367, 0.3744],
[-0.1254, 0.3039, -0.3237, -0.1519],
[-0.6249, -0.9832, 0.9368, 0.3376],
[-0.6056, -0.7613, 0.5163, 0.3487],
[-0.4986, -0.6101, 0.5658, 0.3484]], grad_fn=<ViewBackward0>)
outputs： tensor([[-0.6093, -0.9582, 0.8367, 0.3744],
[-0.1254, 0.3039, -0.3237, -0.1519],
[-0.6249, -0.9832, 0.9368, 0.3376],
[-0.6056, -0.7613, 0.5163, 0.3487],
[-0.4986, -0.6101, 0.5658, 0.3484]], grad_fn=<ViewBackward0>)
idx： [2 1 2 2 2]
Predicted： lhlll,Epoch [3/15] loss = 0.9631
x.size(0) 1
hidden.shape torch.Size([2, 1, 4])
hidden tensor([[[0., 0., 0., 0.]],

[[0., 0., 0., 0.]]])
x_0： tensor([[1, 0, 2, 2, 3]])
x_1： tensor([[[ 0.3574, 1.1832, 1.5158, 0.1763, 0.7025, -1.2111, 0.4509,
1.3209, -0.6047, 1.6154],
[-0.8435, 0.7868, 1.1817, 1.7446, -0.6141, -0.5730, -1.0491,
-0.7232, 0.7654, 1.1728],
[ 1.1879, 1.0463, -0.9724, -1.1533, 0.5616, -0.9556, 0.5106,
0.4087, 0.1604, -0.5426],
[ 1.1879, 1.0463, -0.9724, -1.1533, 0.5616, -0.9556, 0.5106,
0.4087, 0.1604, -0.5426],
[ 0.4226, -0.6146, 0.1662, -2.0365, -0.4497, 0.5195, -0.2856,
1.1228, -1.5541, 0.3787]]], grad_fn=<EmbeddingBackward0>)
x_2 tensor([[[-0.8227, -0.8103, -0.9144, -0.9030],
[-0.7148, 0.2632, -0.2010, 0.9611],
[-0.5948, -0.8889, -0.9465, -0.9445],
[-0.9611, -0.7075, -0.9076, -0.6075],
[-0.9114, -0.4499, -0.9663, -0.6871]]], grad_fn=<TransposeBackward1>)
x_3 tensor([[[-0.8294, -1.1220, 1.0145, 0.7481],
[-0.1595, 0.6444, -0.5075, -0.1069],
[-0.8183, -1.1517, 1.1933, 0.6761],
[-0.8278, -0.9028, 0.7260, 0.7028],
[-0.8176, -0.8830, 0.8363, 0.7229]]], grad_fn=<ViewBackward0>)
x.view： tensor([[-0.8294, -1.1220, 1.0145, 0.7481],
[-0.1595, 0.6444, -0.5075, -0.1069],
[-0.8183, -1.1517, 1.1933, 0.6761],
[-0.8278, -0.9028, 0.7260, 0.7028],
[-0.8176, -0.8830, 0.8363, 0.7229]], grad_fn=<ViewBackward0>)
outputs： tensor([[-0.8294, -1.1220, 1.0145, 0.7481],
[-0.1595, 0.6444, -0.5075, -0.1069],
[-0.8183, -1.1517, 1.1933, 0.6761],
[-0.8278, -0.9028, 0.7260, 0.7028],
[-0.8176, -0.8830, 0.8363, 0.7229]], grad_fn=<ViewBackward0>)
idx： [2 1 2 2 2]
Predicted： lhlll,Epoch [4/15] loss = 0.8192
x.size(0) 1
hidden.shape torch.Size([2, 1, 4])
hidden tensor([[[0., 0., 0., 0.]],

[[0., 0., 0., 0.]]])
x_0： tensor([[1, 0, 2, 2, 3]])
x_1： tensor([[[ 0.4267, 1.2433, 1.4607, 0.1329, 0.7618, -1.1703, 0.5067,
1.3648, -0.6942, 1.5761],
[-0.9382, 0.6893, 1.1630, 1.8207, -0.7039, -0.5756, -1.0618,
-0.7933, 0.8278, 1.1424],
[ 1.2312, 1.0134, -0.9101, -1.1945, 0.6050, -0.9109, 0.4517,
0.4526, 0.1170, -0.5812],
[ 1.2312, 1.0134, -0.9101, -1.1945, 0.6050, -0.9109, 0.4517,
0.4526, 0.1170, -0.5812],
[ 0.4976, -0.5416, 0.1449, -2.1153, -0.3613, 0.5406, -0.3104,
1.2017, -1.5887, 0.3667]]], grad_fn=<EmbeddingBackward0>)
x_2 tensor([[[-0.8584, -0.8936, -0.9523, -0.9560],
[-0.4321, 0.7502, 0.1346, 0.9841],
[-0.5137, -0.9572, -0.9765, -0.9855],
[-0.9732, -0.7780, -0.9136, -0.7397],
[-0.9417, -0.7083, -0.9768, -0.8273]]], grad_fn=<TransposeBackward1>)
x_3 tensor([[[-1.0491, -1.2480, 1.1723, 1.1317],
[ 0.0416, 1.2057, -0.5718, -0.3294],
[-0.9937, -1.2801, 1.4419, 0.9943],
[-1.0338, -1.0294, 0.9075, 1.0789],
[-1.0596, -1.0816, 1.0316, 1.1138]]], grad_fn=<ViewBackward0>)
x.view： tensor([[-1.0491, -1.2480, 1.1723, 1.1317],
[ 0.0416, 1.2057, -0.5718, -0.3294],
[-0.9937, -1.2801, 1.4419, 0.9943],
[-1.0338, -1.0294, 0.9075, 1.0789],
[-1.0596, -1.0816, 1.0316, 1.1138]], grad_fn=<ViewBackward0>)
outputs： tensor([[-1.0491, -1.2480, 1.1723, 1.1317],
[ 0.0416, 1.2057, -0.5718, -0.3294],
[-0.9937, -1.2801, 1.4419, 0.9943],
[-1.0338, -1.0294, 0.9075, 1.0789],
[-1.0596, -1.0816, 1.0316, 1.1138]], grad_fn=<ViewBackward0>)
idx： [2 1 2 3 3]
Predicted： lhloo,Epoch [5/15] loss = 0.7005
x.size(0) 1
hidden.shape torch.Size([2, 1, 4])
hidden tensor([[[0., 0., 0., 0.]],

[[0., 0., 0., 0.]]])
x_0： tensor([[1, 0, 2, 2, 3]])
x_1： tensor([[[ 0.4901, 1.2916, 1.4100, 0.0776, 0.8134, -1.1251, 0.5502,
1.4207, -0.7790, 1.5456],
[-1.0348, 0.5908, 1.1532, 1.8903, -0.7921, -0.5631, -1.0896,
-0.8520, 0.8735, 1.1142],
[ 1.2653, 0.9852, -0.8546, -1.2276, 0.6409, -0.8731, 0.3970,
0.4884, 0.0814, -0.6178],
[ 1.2653, 0.9852, -0.8546, -1.2276, 0.6409, -0.8731, 0.3970,
0.4884, 0.0814, -0.6178],
[ 0.5434, -0.4947, 0.1270, -2.1831, -0.2947, 0.5668, -0.3411,
1.2738, -1.6243, 0.3547]]], grad_fn=<EmbeddingBackward0>)
x_2 tensor([[[-0.8337, -0.9457, -0.9761, -0.9795],
[-0.1997, 0.9313, 0.3498, 0.9923],
[-0.3049, -0.9841, -0.9885, -0.9954],
[-0.9743, -0.8286, -0.9331, -0.8420],
[-0.9416, -0.8127, -0.9829, -0.8923]]], grad_fn=<TransposeBackward1>)
x_3 tensor([[[-1.2439, -1.3849, 1.3886, 1.4863],
[ 0.1852, 1.6604, -0.6304, -0.6072],
[-1.1077, -1.4186, 1.8161, 1.2222],
[-1.2382, -1.1933, 1.1352, 1.4628],
[-1.2636, -1.2427, 1.2345, 1.4863]]], grad_fn=<ViewBackward0>)
x.view： tensor([[-1.2439, -1.3849, 1.3886, 1.4863],
[ 0.1852, 1.6604, -0.6304, -0.6072],
[-1.1077, -1.4186, 1.8161, 1.2222],
[-1.2382, -1.1933, 1.1352, 1.4628],
[-1.2636, -1.2427, 1.2345, 1.4863]], grad_fn=<ViewBackward0>)
outputs： tensor([[-1.2439, -1.3849, 1.3886, 1.4863],
[ 0.1852, 1.6604, -0.6304, -0.6072],
[-1.1077, -1.4186, 1.8161, 1.2222],
[-1.2382, -1.1933, 1.1352, 1.4628],
[-1.2636, -1.2427, 1.2345, 1.4863]], grad_fn=<ViewBackward0>)
idx： [3 1 2 3 3]
Predicted： ohloo,Epoch [6/15] loss = 0.6164
x.size(0) 1
hidden.shape torch.Size([2, 1, 4])
hidden tensor([[[0., 0., 0., 0.]],

[[0., 0., 0., 0.]]])
x_0： tensor([[1, 0, 2, 2, 3]])
x_1： tensor([[[ 0.5421, 1.3355, 1.3689, 0.0425, 0.8572, -1.0925, 0.5906,
1.4555, -0.8462, 1.5182],
[-1.1188, 0.5053, 1.1448, 1.9505, -0.8686, -0.5520, -1.1142,
-0.9025, 0.9126, 1.0897],
[ 1.2961, 0.9593, -0.8079, -1.2573, 0.6717, -0.8397, 0.3473,
0.5202, 0.0509, -0.6477],
[ 1.2961, 0.9593, -0.8079, -1.2573, 0.6717, -0.8397, 0.3473,
0.5202, 0.0509, -0.6477],
[ 0.5538, -0.4760, 0.1132, -2.2306, -0.2548, 0.6023, -0.3846,
1.3373, -1.6594, 0.3401]]], grad_fn=<EmbeddingBackward0>)
x_2 tensor([[[-0.7471, -0.9725, -0.9854, -0.9893],
[-0.3517, 0.9665, 0.4550, 0.9949],
[ 0.0556, -0.9936, -0.9926, -0.9979],
[-0.9699, -0.8757, -0.9572, -0.9119],
[-0.9166, -0.8488, -0.9876, -0.9242]]], grad_fn=<TransposeBackward1>)
x_3 tensor([[[-1.3869, -1.5366, 1.6755, 1.7767],
[ 0.1424, 2.0382, -0.9614, -0.6583],
[-1.1275, -1.5890, 2.3763, 1.3071],
[-1.4277, -1.3849, 1.3876, 1.8341],
[-1.4314, -1.3925, 1.4655, 1.8173]]], grad_fn=<ViewBackward0>)
x.view： tensor([[-1.3869, -1.5366, 1.6755, 1.7767],
[ 0.1424, 2.0382, -0.9614, -0.6583],
[-1.1275, -1.5890, 2.3763, 1.3071],
[-1.4277, -1.3849, 1.3876, 1.8341],
[-1.4314, -1.3925, 1.4655, 1.8173]], grad_fn=<ViewBackward0>)
outputs： tensor([[-1.3869, -1.5366, 1.6755, 1.7767],
[ 0.1424, 2.0382, -0.9614, -0.6583],
[-1.1275, -1.5890, 2.3763, 1.3071],
[-1.4277, -1.3849, 1.3876, 1.8341],
[-1.4314, -1.3925, 1.4655, 1.8173]], grad_fn=<ViewBackward0>)
idx： [3 1 2 3 3]
Predicted： ohloo,Epoch [7/15] loss = 0.5447
x.size(0) 1
hidden.shape torch.Size([2, 1, 4])
hidden tensor([[[0., 0., 0., 0.]],

[[0., 0., 0., 0.]]])
x_0： tensor([[1, 0, 2, 2, 3]])
x_1： tensor([[[ 0.5831, 1.3767, 1.3371, 0.0296, 0.8942, -1.0715, 0.6280,
1.4669, -0.8946, 1.4927],
[-1.1921, 0.4308, 1.1374, 2.0029, -0.9352, -0.5423, -1.1355,
-0.9466, 0.9465, 1.0684],
[ 1.3378, 0.9308, -0.7848, -1.2930, 0.7004, -0.8074, 0.3021,
0.5565, 0.0210, -0.6600],
[ 1.3378, 0.9308, -0.7848, -1.2930, 0.7004, -0.8074, 0.3021,
0.5565, 0.0210, -0.6600],
[ 0.5230, -0.4931, 0.1048, -2.2382, -0.2558, 0.6543, -0.4468,
1.3878, -1.6942, 0.3178]]], grad_fn=<EmbeddingBackward0>)
x_2 tensor([[[-0.5910, -0.9851, -0.9898, -0.9939],
[-0.5726, 0.9818, 0.4660, 0.9963],
[ 0.4976, -0.9972, -0.9948, -0.9989],
[-0.9616, -0.9171, -0.9793, -0.9567],
[-0.8080, -0.7351, -0.9928, -0.9290]]], grad_fn=<TransposeBackward1>)
x_3 tensor([[[-1.4724, -1.6956, 2.0267, 1.9924],
[-0.0162, 2.3976, -1.3327, -0.6035],
[-1.0645, -1.7939, 3.0675, 1.2728],
[-1.5973, -1.5757, 1.6237, 2.1922],
[-1.5276, -1.4356, 1.7385, 2.0237]]], grad_fn=<ViewBackward0>)
x.view： tensor([[-1.4724, -1.6956, 2.0267, 1.9924],
[-0.0162, 2.3976, -1.3327, -0.6035],
[-1.0645, -1.7939, 3.0675, 1.2728],
[-1.5973, -1.5757, 1.6237, 2.1922],
[-1.5276, -1.4356, 1.7385, 2.0237]], grad_fn=<ViewBackward0>)
outputs： tensor([[-1.4724, -1.6956, 2.0267, 1.9924],
[-0.0162, 2.3976, -1.3327, -0.6035],
[-1.0645, -1.7939, 3.0675, 1.2728],
[-1.5973, -1.5757, 1.6237, 2.1922],
[-1.5276, -1.4356, 1.7385, 2.0237]], grad_fn=<ViewBackward0>)
idx： [2 1 2 3 3]
Predicted： lhloo,Epoch [8/15] loss = 0.4840
x.size(0) 1
hidden.shape torch.Size([2, 1, 4])
hidden tensor([[[0., 0., 0., 0.]],

[[0., 0., 0., 0.]]])
x_0： tensor([[1, 0, 2, 2, 3]])
x_1： tensor([[[ 0.6121, 1.4172, 1.3161, 0.0415, 0.9245, -1.0620, 0.6623,
1.4528, -0.9211, 1.4685],
[-1.2567, 0.3651, 1.1308, 2.0491, -0.9940, -0.5338, -1.1543,
-0.9854, 0.9764, 1.0496],
[ 1.3987, 0.8823, -0.8199, -1.3511, 0.7364, -0.7660, 0.2629,
0.6143, -0.0220, -0.6298],
[ 1.3987, 0.8823, -0.8199, -1.3511, 0.7364, -0.7660, 0.2629,
0.6143, -0.0220, -0.6298],
[ 0.4673, -0.5445, 0.1095, -2.1954, -0.3020, 0.7193, -0.5116,
1.3979, -1.7339, 0.2716]]], grad_fn=<EmbeddingBackward0>)
x_2 tensor([[[-0.4527, -0.9909, -0.9924, -0.9963],
[-0.6973, 0.9901, 0.4570, 0.9972],
[ 0.7326, -0.9986, -0.9961, -0.9995],
[-0.9495, -0.9442, -0.9915, -0.9789],
[-0.4742, -0.1649, -0.9969, -0.9163]]], grad_fn=<TransposeBackward1>)
x_3 tensor([[[-1.5354, -1.8545, 2.2894, 2.2032],
[-0.1696, 2.7432, -1.5921, -0.5847],
[-1.0375, -2.0079, 3.5300, 1.3327],
[-1.7407, -1.7375, 1.7487, 2.5414],
[-1.4432, -1.1063, 2.1209, 1.8419]]], grad_fn=<ViewBackward0>)
x.view： tensor([[-1.5354, -1.8545, 2.2894, 2.2032],
[-0.1696, 2.7432, -1.5921, -0.5847],
[-1.0375, -2.0079, 3.5300, 1.3327],
[-1.7407, -1.7375, 1.7487, 2.5414],
[-1.4432, -1.1063, 2.1209, 1.8419]], grad_fn=<ViewBackward0>)
outputs： tensor([[-1.5354, -1.8545, 2.2894, 2.2032],
[-0.1696, 2.7432, -1.5921, -0.5847],
[-1.0375, -2.0079, 3.5300, 1.3327],
[-1.7407, -1.7375, 1.7487, 2.5414],
[-1.4432, -1.1063, 2.1209, 1.8419]], grad_fn=<ViewBackward0>)
idx： [2 1 2 3 2]
Predicted： lhlol,Epoch [9/15] loss = 0.3933
x.size(0) 1
hidden.shape torch.Size([2, 1, 4])
hidden tensor([[[0., 0., 0., 0.]],

[[0., 0., 0., 0.]]])
x_0： tensor([[1, 0, 2, 2, 3]])
x_1： tensor([[[ 0.6304, 1.4563, 1.3052, 0.0712, 0.9488, -1.0597, 0.6923,
1.4208, -0.9300, 1.4458],
[-1.3141, 0.3067, 1.1249, 2.0901, -1.0461, -0.5263, -1.1710,
-1.0199, 1.0030, 1.0329],
[ 1.4601, 0.8359, -0.8590, -1.4087, 0.7705, -0.7275, 0.2316,
0.6713, -0.0634, -0.5980],
[ 1.4601, 0.8359, -0.8590, -1.4087, 0.7705, -0.7275, 0.2316,
0.6713, -0.0634, -0.5980],
[ 0.4012, -0.6077, 0.1150, -2.1428, -0.3615, 0.7934, -0.5860,
1.4010, -1.7799, 0.2185]]], grad_fn=<EmbeddingBackward0>)
x_2 tensor([[[-0.3853, -0.9934, -0.9941, -0.9975],
[-0.7139, 0.9947, 0.4585, 0.9978],
[ 0.8025, -0.9992, -0.9971, -0.9997],
[-0.9425, -0.9481, -0.9946, -0.9854],
[-0.1859, 0.1561, -0.9982, -0.9351]]], grad_fn=<TransposeBackward1>)
x_3 tensor([[[-1.6075, -2.0042, 2.3485, 2.4694],
[-0.2678, 3.0697, -1.7005, -0.6504],
[-1.0594, -2.2112, 3.7004, 1.5058],
[-1.8626, -1.8568, 1.7014, 2.8946],
[-1.3754, -0.9494, 2.4632, 1.7062]]], grad_fn=<ViewBackward0>)
x.view： tensor([[-1.6075, -2.0042, 2.3485, 2.4694],
[-0.2678, 3.0697, -1.7005, -0.6504],
[-1.0594, -2.2112, 3.7004, 1.5058],
[-1.8626, -1.8568, 1.7014, 2.8946],
[-1.3754, -0.9494, 2.4632, 1.7062]], grad_fn=<ViewBackward0>)
outputs： tensor([[-1.6075, -2.0042, 2.3485, 2.4694],
[-0.2678, 3.0697, -1.7005, -0.6504],
[-1.0594, -2.2112, 3.7004, 1.5058],
[-1.8626, -1.8568, 1.7014, 2.8946],
[-1.3754, -0.9494, 2.4632, 1.7062]], grad_fn=<ViewBackward0>)
idx： [3 1 2 3 2]
Predicted： ohlol,Epoch [10/15] loss = 0.3060
x.size(0) 1
hidden.shape torch.Size([2, 1, 4])
hidden tensor([[[0., 0., 0., 0.]],

[[0., 0., 0., 0.]]])
x_0： tensor([[1, 0, 2, 2, 3]])
x_1： tensor([[[ 0.6398, 1.4934, 1.3030, 0.1129, 0.9680, -1.0617, 0.7178,
1.3771, -0.9260, 1.4252],
[-1.3652, 0.2547, 1.1197, 2.1267, -1.0926, -0.5195, -1.1859,
-1.0507, 1.0268, 1.0180],
[ 1.5150, 0.7967, -0.8929, -1.4594, 0.8011, -0.6938, 0.2075,
0.7214, -0.0999, -0.5697],
[ 1.5150, 0.7967, -0.8929, -1.4594, 0.8011, -0.6938, 0.2075,
0.7214, -0.0999, -0.5697],
[ 0.3403, -0.6667, 0.1149, -2.0970, -0.4171, 0.8623, -0.6555,
1.4089, -1.8243, 0.1687]]], grad_fn=<EmbeddingBackward0>)
x_2 tensor([[[-0.3940, -0.9944, -0.9953, -0.9983],
[-0.6586, 0.9972, 0.4641, 0.9982],
[ 0.7998, -0.9995, -0.9977, -0.9998],
[-0.9390, -0.9369, -0.9959, -0.9881],
[ 0.1076, 0.4632, -0.9988, -0.9531]]], grad_fn=<TransposeBackward1>)
x_3 tensor([[[-1.7029, -2.1352, 2.2365, 2.8054],
[-0.3196, 3.3653, -1.6951, -0.7863],
[-1.1068, -2.3992, 3.7077, 1.7407],
[-1.9717, -1.9502, 1.5561, 3.2540],
[-1.2795, -0.7815, 2.8303, 1.4801]]], grad_fn=<ViewBackward0>)
x.view： tensor([[-1.7029, -2.1352, 2.2365, 2.8054],
[-0.3196, 3.3653, -1.6951, -0.7863],
[-1.1068, -2.3992, 3.7077, 1.7407],
[-1.9717, -1.9502, 1.5561, 3.2540],
[-1.2795, -0.7815, 2.8303, 1.4801]], grad_fn=<ViewBackward0>)
outputs： tensor([[-1.7029, -2.1352, 2.2365, 2.8054],
[-0.3196, 3.3653, -1.6951, -0.7863],
[-1.1068, -2.3992, 3.7077, 1.7407],
[-1.9717, -1.9502, 1.5561, 3.2540],
[-1.2795, -0.7815, 2.8303, 1.4801]], grad_fn=<ViewBackward0>)
idx： [3 1 2 3 2]
Predicted： ohlol,Epoch [11/15] loss = 0.2176
x.size(0) 1
hidden.shape torch.Size([2, 1, 4])
hidden tensor([[[0., 0., 0., 0.]],

[[0., 0., 0., 0.]]])
x_0： tensor([[1, 0, 2, 2, 3]])
x_1： tensor([[[ 0.6420, 1.5285, 1.3081, 0.1630, 0.9830, -1.0660, 0.7389,
1.3256, -0.9127, 1.4069],
[-1.4111, 0.2080, 1.1150, 2.1595, -1.1343, -0.5135, -1.1992,
-1.0783, 1.0480, 1.0047],
[ 1.5665, 0.7662, -0.9229, -1.5048, 0.8296, -0.6646, 0.1957,
0.7661, -0.1322, -0.5440],
[ 1.5665, 0.7662, -0.9229, -1.5048, 0.8296, -0.6646, 0.1957,
0.7661, -0.1322, -0.5440],
[ 0.2858, -0.7200, 0.1117, -2.0572, -0.4670, 0.9246, -0.7181,
1.4189, -1.8650, 0.1223]]], grad_fn=<EmbeddingBackward0>)
x_2 tensor([[[-0.4566, -0.9949, -0.9960, -0.9987],
[-0.5359, 0.9986, 0.4468, 0.9986],
[ 0.7477, -0.9997, -0.9982, -0.9999],
[-0.9363, -0.9097, -0.9968, -0.9897],
[ 0.3717, 0.7125, -0.9992, -0.9664]]], grad_fn=<TransposeBackward1>)
x_3 tensor([[[-1.8195, -2.2432, 2.0060, 3.1966],
[-0.3505, 3.6022, -1.5531, -0.9777],
[-1.1767, -2.5662, 3.6074, 2.0221],
[-2.0694, -2.0173, 1.3663, 3.6031],
[-1.1839, -0.6435, 3.2147, 1.2127]]], grad_fn=<ViewBackward0>)
x.view： tensor([[-1.8195, -2.2432, 2.0060, 3.1966],
[-0.3505, 3.6022, -1.5531, -0.9777],
[-1.1767, -2.5662, 3.6074, 2.0221],
[-2.0694, -2.0173, 1.3663, 3.6031],
[-1.1839, -0.6435, 3.2147, 1.2127]], grad_fn=<ViewBackward0>)
outputs： tensor([[-1.8195, -2.2432, 2.0060, 3.1966],
[-0.3505, 3.6022, -1.5531, -0.9777],
[-1.1767, -2.5662, 3.6074, 2.0221],
[-2.0694, -2.0173, 1.3663, 3.6031],
[-1.1839, -0.6435, 3.2147, 1.2127]], grad_fn=<ViewBackward0>)
idx： [3 1 2 3 2]
Predicted： ohlol,Epoch [12/15] loss = 0.1534
x.size(0) 1
hidden.shape torch.Size([2, 1, 4])
hidden tensor([[[0., 0., 0., 0.]],

[[0., 0., 0., 0.]]])
x_0： tensor([[1, 0, 2, 2, 3]])
x_1： tensor([[[ 0.6390, 1.5616, 1.3187, 0.2181, 0.9946, -1.0718, 0.7561,
1.2696, -0.8933, 1.3908],
[-1.4523, 0.1661, 1.1108, 2.1890, -1.1718, -0.5081, -1.2112,
-1.1030, 1.0671, 0.9927],
[ 1.6168, 0.7464, -0.9501, -1.5458, 0.8570, -0.6401, 0.2013,
0.8063, -0.1607, -0.5206],
[ 1.6168, 0.7464, -0.9501, -1.5458, 0.8570, -0.6401, 0.2013,
0.8063, -0.1607, -0.5206],
[ 0.2369, -0.7679, 0.1075, -2.0219, -0.5119, 0.9806, -0.7746,
1.4290, -1.9017, 0.0794]]], grad_fn=<EmbeddingBackward0>)
x_2 tensor([[[-0.5107, -0.9952, -0.9963, -0.9990],
[-0.4077, 0.9993, 0.3503, 0.9988],
[ 0.7038, -0.9998, -0.9986, -0.9999],
[-0.9323, -0.8659, -0.9975, -0.9908],
[ 0.5859, 0.8539, -0.9996, -0.9759]]], grad_fn=<TransposeBackward1>)
x_3 tensor([[[-1.9274, -2.3400, 1.7745, 3.5726],
[-0.4598, 3.7436, -1.3084, -1.1086],
[-1.2415, -2.7221, 3.5107, 2.2849],
[-2.1554, -2.0559, 1.1835, 3.9173],
[-1.1110, -0.5972, 3.5983, 0.9758]]], grad_fn=<ViewBackward0>)
x.view： tensor([[-1.9274, -2.3400, 1.7745, 3.5726],
[-0.4598, 3.7436, -1.3084, -1.1086],
[-1.2415, -2.7221, 3.5107, 2.2849],
[-2.1554, -2.0559, 1.1835, 3.9173],
[-1.1110, -0.5972, 3.5983, 0.9758]], grad_fn=<ViewBackward0>)
outputs： tensor([[-1.9274, -2.3400, 1.7745, 3.5726],
[-0.4598, 3.7436, -1.3084, -1.1086],
[-1.2415, -2.7221, 3.5107, 2.2849],
[-2.1554, -2.0559, 1.1835, 3.9173],
[-1.1110, -0.5972, 3.5983, 0.9758]], grad_fn=<ViewBackward0>)
idx： [3 1 2 3 2]
Predicted： ohlol,Epoch [13/15] loss = 0.1226
x.size(0) 1
hidden.shape torch.Size([2, 1, 4])
hidden tensor([[[0., 0., 0., 0.]],

[[0., 0., 0., 0.]]])
x_0： tensor([[1, 0, 2, 2, 3]])
x_1： tensor([[[ 0.6325, 1.5927, 1.3329, 0.2753, 1.0037, -1.0782, 0.7701,
1.2122, -0.8705, 1.3769],
[-1.4895, 0.1282, 1.1070, 2.2156, -1.2056, -0.5032, -1.2220,
-1.1254, 1.0843, 0.9818],
[ 1.6677, 0.7394, -0.9750, -1.5829, 0.8845, -0.6207, 0.2268,
0.8424, -0.1856, -0.4992],
[ 1.6677, 0.7394, -0.9750, -1.5829, 0.8845, -0.6207, 0.2268,
0.8424, -0.1856, -0.4992],
[ 0.1928, -0.8111, 0.1033, -1.9902, -0.5524, 1.0311, -0.8257,
1.4385, -1.9348, 0.0400]]], grad_fn=<EmbeddingBackward0>)
x_2 tensor([[[-0.5066, -0.9955, -0.9963, -0.9992],
[-0.3985, 0.9996, 0.1104, 0.9990],
[ 0.7850, -0.9999, -0.9989, -1.0000],
[-0.9345, -0.8266, -0.9981, -0.9919],
[ 0.7445, 0.9242, -0.9998, -0.9823]]], grad_fn=<TransposeBackward1>)
x_3 tensor([[[-1.9964, -2.4442, 1.6575, 3.8568],
[-0.8103, 3.7678, -1.1052, -0.9469],
[-1.2312, -2.9084, 3.6352, 2.3763],
[-2.2378, -2.0857, 1.0320, 4.2019],
[-1.0650, -0.6130, 3.9680, 0.7795]]], grad_fn=<ViewBackward0>)
x.view： tensor([[-1.9964, -2.4442, 1.6575, 3.8568],
[-0.8103, 3.7678, -1.1052, -0.9469],
[-1.2312, -2.9084, 3.6352, 2.3763],
[-2.2378, -2.0857, 1.0320, 4.2019],
[-1.0650, -0.6130, 3.9680, 0.7795]], grad_fn=<ViewBackward0>)
outputs： tensor([[-1.9964, -2.4442, 1.6575, 3.8568],
[-0.8103, 3.7678, -1.1052, -0.9469],
[-1.2312, -2.9084, 3.6352, 2.3763],
[-2.2378, -2.0857, 1.0320, 4.2019],
[-1.0650, -0.6130, 3.9680, 0.7795]], grad_fn=<ViewBackward0>)
idx： [3 1 2 3 2]
Predicted： ohlol,Epoch [14/15] loss = 0.0988
x.size(0) 1
hidden.shape torch.Size([2, 1, 4])
hidden tensor([[[0., 0., 0., 0.]],

[[0., 0., 0., 0.]]])
x_0： tensor([[1, 0, 2, 2, 3]])
x_1： tensor([[[ 0.6223, 1.6222, 1.3512, 0.3352, 1.0106, -1.0852, 0.7810,
1.1529, -0.8445, 1.3652],
[-1.5231, 0.0941, 1.1036, 2.2396, -1.2361, -0.4988, -1.2318,
-1.1456, 1.0999, 0.9721],
[ 1.7193, 0.7444, -0.9979, -1.6165, 0.9125, -0.6061, 0.2687,
0.8750, -0.2070, -0.4799],
[ 1.7193, 0.7444, -0.9979, -1.6165, 0.9125, -0.6061, 0.2687,
0.8750, -0.2070, -0.4799],
[ 0.1530, -0.8501, 0.0993, -1.9616, -0.5889, 1.0768, -0.8718,
1.4471, -1.9647, 0.0039]]], grad_fn=<EmbeddingBackward0>)
x_2 tensor([[[-0.4767, -0.9958, -0.9958, -0.9994],
[-0.4806, 0.9998, -0.2376, 0.9991],
[ 0.9011, -0.9999, -0.9993, -1.0000],
[-0.9424, -0.8069, -0.9985, -0.9927],
[ 0.8440, 0.9591, -0.9999, -0.9863]]], grad_fn=<TransposeBackward1>)
x_3 tensor([[[-2.0427, -2.5502, 1.6240, 4.0715],
[-1.3645, 3.6887, -0.9591, -0.5144],
[-1.1925, -3.1052, 3.8751, 2.3741],
[-2.3188, -2.1251, 0.9092, 4.4707],
[-1.0488, -0.6509, 4.2951, 0.6302]]], grad_fn=<ViewBackward0>)
x.view： tensor([[-2.0427, -2.5502, 1.6240, 4.0715],
[-1.3645, 3.6887, -0.9591, -0.5144],
[-1.1925, -3.1052, 3.8751, 2.3741],
[-2.3188, -2.1251, 0.9092, 4.4707],
[-1.0488, -0.6509, 4.2951, 0.6302]], grad_fn=<ViewBackward0>)
outputs： tensor([[-2.0427, -2.5502, 1.6240, 4.0715],
[-1.3645, 3.6887, -0.9591, -0.5144],
[-1.1925, -3.1052, 3.8751, 2.3741],
[-2.3188, -2.1251, 0.9092, 4.4707],
[-1.0488, -0.6509, 4.2951, 0.6302]], grad_fn=<ViewBackward0>)
idx： [3 1 2 3 2]
Predicted： ohlol,Epoch [15/15] loss = 0.0782

Process finished with exit code 0