Bootstrap

循环神经网络(基础篇)

目录

1、RNN的介绍

2、RNN Cell的具体计算过程

3、如何使用RNN Cell  

 当 batch_size = 1 的时候

当 batch_size = 2 的时候

当序列 seq_len = 4,input_size = 1

4、如何使用RNN

 什么是num_layers

 5、实际操作

字符向量化 

接入softmax层

​编辑

 准备数据集

RNN Cell  将 hello ->ohlol 全部实现代码

RNN  将 hello ->ohlol 全部实现代码

6、词嵌入

读取One-hot vectors的缺点

One-hot vectors缺点的解决方式是Embedding

代码实现


1、RNN的介绍

h0表示先验知识

CNN+FC 为 h0,作为 RNN 的输入,这样就完成了图像到文本的转化,如果没有先验知识,直接初始化 h0 为全零

h1 和 x2 作为下一个 RNN 的输入,经过 RNN 的线性计算得到 h2,接着以此类推得到 h3、h4....

右边的每一个 RNN Cell 是同一个线性层,这个过程中拿一个线性层反复参与计算,也就是线性层中的前一个权重会参与到后面的计算中构建一个复杂的计算图(循环过程中使用权重共享机制,只用了一个线性层)

2、RNN Cell的具体计算过程

3、如何使用RNN Cell  

需要注意RNN Cell中的参数x的输入大小 input_size 和输入隐藏层的大小  hidden_size,同时注意输入的隐藏层大小和输出的隐藏层大小是一致的

对 cell(input,hidden)循环处理,如 h1 = cell(x1,h0),h0是上一个的输出

 当 batch_size = 1 的时候

import torch

batch_size = 1
seq_len = 3
input_size = 4
hidden_size = 2
cell = torch.nn.RNNCell(input_size=input_size,hidden_size=hidden_size)

# (seq_len, batch ,feature)
dataset = torch.rand(seq_len,batch_size,input_size)
print('dataset::',dataset)
hidden = torch.zeros(batch_size,hidden_size)

for idx, input in enumerate(dataset):
    print('=' * 20,idx,'=' * 20)
    print('input:',input)
    print('input size:',input.shape)
    hidden = cell(input,hidden)

    print('hidden size:',hidden.shape)
    print(hidden)

G:\python_files\DeepLearning\Scripts\python.exe G:/python_files/DeepLearningProgram/RNNCell.py
dataset:: tensor([[[0.0838, 0.6044, 0.0810, 0.7452]],

        [[0.6009, 0.8458, 0.0021, 0.5979]],

        [[0.4665, 0.0486, 0.2486, 0.5683]]])
==================== 0 ====================
input: tensor([[0.0838, 0.6044, 0.0810, 0.7452]])
input size: torch.Size([1, 4])
hidden size: torch.Size([1, 2])
tensor([[-0.8042,  0.7211]], grad_fn=<TanhBackward0>)
==================== 1 ====================
input: tensor([[0.6009, 0.8458, 0.0021, 0.5979]])
input size: torch.Size([1, 4])
hidden size: torch.Size([1, 2])
tensor([[-0.9128,  0.4984]], grad_fn=<TanhBackward0>)
==================== 2 ====================
input: tensor([[0.4665, 0.0486, 0.2486, 0.5683]])
input size: torch.Size([1, 4])
hidden size: torch.Size([1, 2])
tensor([[-0.8416,  0.2773]], grad_fn=<TanhBackward0>)

Process finished with exit code 0

当 batch_size = 2 的时候

import torch

batch_size = 2
seq_len = 3
input_size = 4
hidden_size = 2
cell = torch.nn.RNNCell(input_size=input_size,hidden_size=hidden_size)

# (seq_len, batch ,feature)
dataset = torch.rand(seq_len,batch_size,input_size)
print('dataset::',dataset)
hidden = torch.zeros(batch_size,hidden_size)

for idx, input in enumerate(dataset):
    print('=' * 20,idx,'=' * 20)
    print('input:',input)
    print('input size:',input.shape)
    hidden = cell(input,hidden)

    print('hidden size:',hidden.shape)
    print(hidden)

G:\python_files\DeepLearning\Scripts\python.exe G:/python_files/DeepLearningProgram/RNNCell.py
dataset:: tensor([[[0.6906, 0.7810, 0.2862, 0.9390],
         [0.8915, 0.2699, 0.4300, 0.3093]],

        [[0.4048, 0.7307, 0.3674, 0.7294],
         [0.0394, 0.9579, 0.3730, 0.3643]],

        [[0.8211, 0.2256, 0.7045, 0.1283],
         [0.5108, 0.5651, 0.0126, 0.4928]]])
==================== 0 ====================
input: tensor([[0.6906, 0.7810, 0.2862, 0.9390],
        [0.8915, 0.2699, 0.4300, 0.3093]])
input size: torch.Size([2, 4])
hidden size: torch.Size([2, 2])
tensor([[ 0.2782, -0.7428],
        [ 0.4471, -0.7467]], grad_fn=<TanhBackward0>)
==================== 1 ====================
input: tensor([[0.4048, 0.7307, 0.3674, 0.7294],
        [0.0394, 0.9579, 0.3730, 0.3643]])
input size: torch.Size([2, 4])
hidden size: torch.Size([2, 2])
tensor([[-0.4059, -0.8885],
        [-0.5068, -0.8562]], grad_fn=<TanhBackward0>)
==================== 2 ====================
input: tensor([[0.8211, 0.2256, 0.7045, 0.1283],
        [0.5108, 0.5651, 0.0126, 0.4928]])
input size: torch.Size([2, 4])
hidden size: torch.Size([2, 2])
tensor([[ 0.1510, -0.9392],
        [ 0.2471, -0.9277]], grad_fn=<TanhBackward0>)

Process finished with exit code 0

当序列 seq_len = 4,input_size = 1

batch_size = 2
seq_len = 4
input_size = 1
hidden_size = 2

G:\python_files\DeepLearning\Scripts\python.exe G:/python_files/DeepLearningProgram/RNNCell.py
dataset:: tensor([[[0.6525],
         [0.6306]],

        [[0.7393],
         [0.8773]],

        [[0.7610],
         [0.1576]],

        [[0.1989],
         [0.8072]]])
==================== 0 ====================
input: tensor([[0.6525],
        [0.6306]])
input size: torch.Size([2, 1])
hidden size: torch.Size([2, 2])
tensor([[-0.7688,  0.2979],
        [-0.7707,  0.2928]], grad_fn=<TanhBackward0>)
==================== 1 ====================
input: tensor([[0.7393],
        [0.8773]])
input size: torch.Size([2, 1])
hidden size: torch.Size([2, 2])
tensor([[-0.8193,  0.0798],
        [-0.8090,  0.1117]], grad_fn=<TanhBackward0>)
==================== 2 ====================
input: tensor([[0.7610],
        [0.1576]])
input size: torch.Size([2, 1])
hidden size: torch.Size([2, 2])
tensor([[-0.7975, -0.0332],
        [-0.8421, -0.1672]], grad_fn=<TanhBackward0>)
==================== 3 ====================
input: tensor([[0.1989],
        [0.8072]])
input size: torch.Size([2, 1])
hidden size: torch.Size([2, 2])
tensor([[-0.8264, -0.2116],
        [-0.7667, -0.1379]], grad_fn=<TanhBackward0>)

Process finished with exit code 0

4、如何使用RNN

num_layers的含义是有多少层的RNN,一般选择一层即可,选多了很耗费时间

cell中的参数inputs是包含整个序列中的输入X=[x1,x2,x3,x4......xn],hidden就是 h0,然后输出两个张量,第一个是张量 out=[h1,h2,h3,h4......hn] ,而另一个张量是hidden,为hn,所以相比较直接使用RNN Cell,直接使用RNN可以简化代码,整个 for 循环不用我们自己写了,自动帮我们循环了,所以要输出整个 X

 

 什么是num_layers

 5、实际操作

字符向量化 

接入softmax层

 

 准备数据集

idx2char = ['e','h','l','o']
x_data = [1, 0, 2, 2, 3]
y_data = [3, 1, 2, 3, 2]

one_hot_lookup = [
    [1, 0, 0, 0],
    [0, 1, 0, 0],
    [0, 0, 1, 0],
    [0, 0, 0, 1],
]

x_one_hot = [one_hot_lookup[x] for x in x_data]   # (seq_len,input)
# print('x_one_hot:',x_one_hot) [[0, 1, 0, 0], [1, 0, 0, 0], [0, 0, 1, 0], [0, 0, 1, 0], [0, 0, 0, 1]]
inputs = torch.Tensor(x_one_hot).view(-1,batch_size,input_size)
# print('inputs:',inputs)
# inputs: tensor([[[0., 1., 0., 0.]],
#
#         [[1., 0., 0., 0.]],
#
#         [[0., 0., 1., 0.]],
#
#         [[0., 0., 1., 0.]],
#
#         [[0., 0., 0., 1.]]])
labels = torch.LongTensor(y_data).view(-1,1)
# print('labels:',labels)
# labels: tensor([[3],
#         [1],
#         [2],
#         [3],
#         [2]])

RNN Cell  将 hello ->ohlol 全部实现代码

import torch
input_size = 4
hidden_size = 4
batch_size = 1

idx2char = ['e','h','l','o']
x_data = [1, 0, 2, 2, 3]
y_data = [3, 1, 2, 3, 2]

one_hot_lookup = [
    [1, 0, 0, 0],
    [0, 1, 0, 0],
    [0, 0, 1, 0],
    [0, 0, 0, 1],
]

x_one_hot = [one_hot_lookup[x] for x in x_data]   # (seq_len,input)
# print('x_one_hot:',x_one_hot) [[0, 1, 0, 0], [1, 0, 0, 0], [0, 0, 1, 0], [0, 0, 1, 0], [0, 0, 0, 1]]
inputs = torch.Tensor(x_one_hot).view(-1,batch_size,input_size)
# print('inputs:',inputs)
# inputs: tensor([[[0., 1., 0., 0.]],
#
#         [[1., 0., 0., 0.]],
#
#         [[0., 0., 1., 0.]],
#
#         [[0., 0., 1., 0.]],
#
#         [[0., 0., 0., 1.]]])
labels = torch.LongTensor(y_data).view(-1,1)
# print('labels:',labels.shape)   # labels: torch.Size([5, 1])
# labels: tensor([[3],
#         [1],
#         [2],
#         [3],
#         [2]])

class Model(torch.nn.Module):
    def __init__(self, input_size, hidden_size, batch_size):
        super(Model, self).__init__()
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.batch_size = batch_size
        self.RNNCell = torch.nn.RNNCell(input_size = self.input_size,
                                        hidden_size = self.hidden_size)
    def forward(self, input, hidden):
        hidden = self.RNNCell(input,hidden)
        return hidden

    def init_hidden(self):    # 初始化 h0
        return torch.zeros(self.batch_size,self.hidden_size)

net = Model(input_size, hidden_size, batch_size)

criterion = torch.nn.CrossEntropyLoss()   # 接入softmax层
optimizer = torch.optim.Adam(net.parameters(), lr=0.1)

for epoch in range(15):
    loss = 0
    optimizer.zero_grad()
    hidden = net.init_hidden()
    print('预测的字符:',end='')
    for input, label in zip(inputs, labels):   # inputs.shape(seq_len,batch_size,Input_size) input(batch_size,Input_size)
        #print('label:',label) labels(seq_size,1) label(1)
        hidden = net(input,hidden)
        loss += criterion(hidden,label)  # 整个loss的和才是最终的损失
        #print('hidden:', hidden)  # torch.Size([1, 4])        loss += criterion(hidden,label)  # 在构建计算图,所以必须要用 + ,否则只是单独的一个序列,即 loss = criterion(hidden, label)这种写法有问题
        _, idx = hidden.max(dim=1)  # 拿到最大预测值的下标 tensor([[0.1205, 0.0546, 0.4058, 0.3396]] 也即代表的是 [e,h,l,o]的下标
        print(idx2char[idx.item()], end='')
    loss.backward()
    optimizer.step()
    print(',Epoch [%d/15] loss =  %0.4f' % (epoch+1, loss.item()))

G:\python_files\DeepLearning\Scripts\python.exe "G:/python_files/DeepLearningProgram/RNN Cell.py"
labels: torch.Size([5, 1])
预测的字符:hhhee,Epoch [1/15] loss =  6.5936
预测的字符:ohlol,Epoch [2/15] loss =  5.1479
预测的字符:ohlol,Epoch [3/15] loss =  4.1417
预测的字符:ohlol,Epoch [4/15] loss =  3.5055
预测的字符:ohlol,Epoch [5/15] loss =  3.0920
预测的字符:ohlol,Epoch [6/15] loss =  2.8072
预测的字符:ohlol,Epoch [7/15] loss =  2.5990
预测的字符:ohlol,Epoch [8/15] loss =  2.4372
预测的字符:ohlol,Epoch [9/15] loss =  2.3106
预测的字符:ohlol,Epoch [10/15] loss =  2.2115
预测的字符:ohlol,Epoch [11/15] loss =  2.1293
预测的字符:ohlol,Epoch [12/15] loss =  2.0575
预测的字符:ohlol,Epoch [13/15] loss =  1.9963
预测的字符:ohlol,Epoch [14/15] loss =  1.9460
预测的字符:ohlol,Epoch [15/15] loss =  1.9059

Process finished with exit code 0

RNN  将 hello ->ohlol 全部实现代码

import torch
input_size = 4
hidden_size = 4
batch_size = 1
num_layers = 1
seq_len = 5

idx2char = ['e','h','l','o']
x_data = [1, 0, 2, 2, 3]
y_data = [3, 1, 2, 3, 2]

one_hot_lookup = [
    [1, 0, 0, 0],
    [0, 1, 0, 0],
    [0, 0, 1, 0],
    [0, 0, 0, 1],
]

x_one_hot = [one_hot_lookup[x] for x in x_data]
inputs = torch.Tensor(x_one_hot).view(seq_len,batch_size,input_size)
# print('inputs:',inputs)
# inputs: tensor([[[0., 1., 0., 0.]],
#
#         [[1., 0., 0., 0.]],
#
#         [[0., 0., 1., 0.]],
#
#         [[0., 0., 1., 0.]],
#
#         [[0., 0., 0., 1.]]])
labels = torch.LongTensor(y_data)   # (seq_size*batch_size,1)
print('labels.shape:',labels.shape)
print('labels',labels)

class Model(torch.nn.Module):
    def __init__(self, input_size, hidden_size, batch_size, num_layers=1):
        super(Model, self).__init__()
        self.num_layers = num_layers
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.batch_size = batch_size
        self.RNN = torch.nn.RNN(input_size=self.input_size,
                                        hidden_size=self.hidden_size,num_layers=self.num_layers)
    def forward(self, input):
        hidden = torch.zeros(
            self.num_layers,
            self.batch_size,
            self.hidden_size
        )
        out, _ = self.RNN(input,hidden)
        return out.view(-1,self.hidden_size)  # (seq_size*batch_size,hidden_size)


net = Model(input_size, hidden_size, batch_size)

criterion = torch.nn.CrossEntropyLoss()   # 接入softmax层
optimizer = torch.optim.Adam(net.parameters(), lr=0.1)

for epoch in range(15):

    optimizer.zero_grad()
    outputs = net(inputs)   # inputs(seq_size,batch_size,input_size)  outputs(seq_size,batch_size,hidden_size)
    print('outputs:',outputs)
    loss = criterion(outputs,labels)   # (seq_size*batch_size,1)
    loss.backward()
    optimizer.step()
    _, idx = outputs.max(dim=1)  # 取最大值的下标
    print('outputs.max(dim=1)',outputs.max(dim=1))
    idx = idx.data.numpy()
    print('idx:',idx)
    print('Predicted:',''.join([idx2char[x] for x in idx]),end='')
    print(',Epoch [%d/15] loss =  %0.4f' % (epoch+1, loss.item()))

G:\python_files\DeepLearning\Scripts\python.exe G:/python_files/DeepLearningProgram/RNN.py
labels.shape: torch.Size([5])
labels tensor([3, 1, 2, 3, 2])
outputs: tensor([[-0.4914, -0.5676,  0.3506,  0.7260],
        [-0.6970, -0.7721,  0.5879, -0.0892],
        [-0.5054, -0.6028,  0.2994,  0.4381],
        [-0.4411, -0.6743,  0.3136,  0.5604],
        [-0.6518, -0.5096,  0.4113,  0.2888]], grad_fn=<ViewBackward0>)
outputs.max(dim=1) torch.return_types.max(
values=tensor([0.7260, 0.5879, 0.4381, 0.5604, 0.4113], grad_fn=<MaxBackward0>),
indices=tensor([3, 2, 3, 3, 2]))
idx: [3 2 3 3 2]
Predicted: olool,Epoch [1/15] loss =  1.1690
outputs: tensor([[-0.6847, -0.3311,  0.4351,  0.8397],
        [-0.9187, -0.4628,  0.8491, -0.0610],
        [-0.8763, -0.1816,  0.8216,  0.4632],
        [-0.8988, -0.3068,  0.8900,  0.5991],
        [-0.9491, -0.2364,  0.9254,  0.1066]], grad_fn=<ViewBackward0>)
outputs.max(dim=1) torch.return_types.max(
values=tensor([0.8397, 0.8491, 0.8216, 0.8900, 0.9254], grad_fn=<MaxBackward0>),
indices=tensor([3, 2, 2, 2, 2]))
idx: [3 2 2 2 2]
Predicted: ollll,Epoch [2/15] loss =  1.0504
outputs: tensor([[-0.7974, -0.2362,  0.4075,  0.8494],
        [-0.9674, -0.0028,  0.8895, -0.3728],
        [-0.9541,  0.1519,  0.8939,  0.4081],
        [-0.9655,  0.0087,  0.9500,  0.5349],
        [-0.9834,  0.0539,  0.9663, -0.2075]], grad_fn=<ViewBackward0>)
outputs.max(dim=1) torch.return_types.max(
values=tensor([0.8494, 0.8895, 0.8939, 0.9500, 0.9663], grad_fn=<MaxBackward0>),
indices=tensor([3, 2, 2, 2, 2]))
idx: [3 2 2 2 2]
Predicted: ollll,Epoch [3/15] loss =  0.9826
outputs: tensor([[-0.8670, -0.1775,  0.3186,  0.8979],
        [-0.9834,  0.4190,  0.8858, -0.3463],
        [-0.9782,  0.2251,  0.9323,  0.6671],
        [-0.9844,  0.1764,  0.9735,  0.6935],
        [-0.9929,  0.2073,  0.9807, -0.1195]], grad_fn=<ViewBackward0>)
outputs.max(dim=1) torch.return_types.max(
values=tensor([0.8979, 0.8858, 0.9323, 0.9735, 0.9807], grad_fn=<MaxBackward0>),
indices=tensor([3, 2, 2, 2, 2]))
idx: [3 2 2 2 2]
Predicted: ollll,Epoch [4/15] loss =  0.9192
outputs: tensor([[-0.9103, -0.2122,  0.1858,  0.9287],
        [-0.9894,  0.6635,  0.8448, -0.3441],
        [-0.9863,  0.1154,  0.9428,  0.8075],
        [-0.9912,  0.2510,  0.9800,  0.7649],
        [-0.9962,  0.2230,  0.9848, -0.0682]], grad_fn=<ViewBackward0>)
outputs.max(dim=1) torch.return_types.max(
values=tensor([0.9287, 0.8448, 0.9428, 0.9800, 0.9848], grad_fn=<MaxBackward0>),
indices=tensor([3, 2, 2, 2, 2]))
idx: [3 2 2 2 2]
Predicted: ollll,Epoch [5/15] loss =  0.8708
outputs: tensor([[-0.9374, -0.3276,  0.0160,  0.9436],
        [-0.9920,  0.7855,  0.7425, -0.4376],
        [-0.9888, -0.1326,  0.9356,  0.8573],
        [-0.9942,  0.2462,  0.9809,  0.7434],
        [-0.9977,  0.0970,  0.9850, -0.1693]], grad_fn=<ViewBackward0>)
outputs.max(dim=1) torch.return_types.max(
values=tensor([0.9436, 0.7855, 0.9356, 0.9809, 0.9850], grad_fn=<MaxBackward0>),
indices=tensor([3, 1, 2, 2, 2]))
idx: [3 1 2 2 2]
Predicted: ohlll,Epoch [6/15] loss =  0.8216
outputs: tensor([[-0.9546, -0.4712, -0.1802,  0.9536],
        [-0.9932,  0.8538,  0.5203, -0.5241],
        [-0.9882, -0.3814,  0.9068,  0.8885],
        [-0.9959,  0.1872,  0.9789,  0.7184],
        [-0.9984, -0.1151,  0.9835, -0.3094]], grad_fn=<ViewBackward0>)
outputs.max(dim=1) torch.return_types.max(
values=tensor([0.9536, 0.8538, 0.9068, 0.9789, 0.9835], grad_fn=<MaxBackward0>),
indices=tensor([3, 1, 2, 2, 2]))
idx: [3 1 2 2 2]
Predicted: ohlll,Epoch [7/15] loss =  0.7695
outputs: tensor([[-0.9658, -0.6085, -0.3808,  0.9628],
        [-0.9938,  0.8942,  0.1230, -0.5613],
        [-0.9837, -0.5155,  0.8168,  0.9232],
        [-0.9968,  0.0580,  0.9713,  0.7732],
        [-0.9989, -0.3260,  0.9811, -0.4025]], grad_fn=<ViewBackward0>)
outputs.max(dim=1) torch.return_types.max(
values=tensor([0.9628, 0.8942, 0.9232, 0.9713, 0.9811], grad_fn=<MaxBackward0>),
indices=tensor([3, 1, 3, 2, 2]))
idx: [3 1 3 2 2]
Predicted: oholl,Epoch [8/15] loss =  0.7178
outputs: tensor([[-0.9735, -0.7232, -0.5464,  0.9701],
        [-0.9944,  0.9159, -0.3144, -0.5824],
        [-0.9749, -0.5678,  0.6341,  0.9446],
        [-0.9972, -0.1100,  0.9520,  0.8546],
        [-0.9992, -0.5120,  0.9777, -0.4605]], grad_fn=<ViewBackward0>)
outputs.max(dim=1) torch.return_types.max(
values=tensor([0.9701, 0.9159, 0.9446, 0.9520, 0.9777], grad_fn=<MaxBackward0>),
indices=tensor([3, 1, 3, 2, 2]))
idx: [3 1 3 2 2]
Predicted: oholl,Epoch [9/15] loss =  0.6879
outputs: tensor([[-0.9790, -0.8102, -0.6357,  0.9743],
        [-0.9953,  0.9232, -0.5247, -0.6337],
        [-0.9689, -0.6435,  0.5774,  0.9440],
        [-0.9977, -0.3335,  0.9436,  0.8861],
        [-0.9994, -0.6626,  0.9760, -0.5683]], grad_fn=<ViewBackward0>)
outputs.max(dim=1) torch.return_types.max(
values=tensor([0.9743, 0.9232, 0.9440, 0.9436, 0.9760], grad_fn=<MaxBackward0>),
indices=tensor([3, 1, 3, 2, 2]))
idx: [3 1 3 2 2]
Predicted: oholl,Epoch [10/15] loss =  0.6628
outputs: tensor([[-0.9831, -0.8715, -0.6769,  0.9762],
        [-0.9962,  0.9216, -0.5938, -0.7109],
        [-0.9665, -0.7309,  0.6727,  0.9275],
        [-0.9985, -0.5930,  0.9549,  0.8892],
        [-0.9996, -0.7745,  0.9758, -0.6960]], grad_fn=<ViewBackward0>)
outputs.max(dim=1) torch.return_types.max(
values=tensor([0.9762, 0.9216, 0.9275, 0.9549, 0.9758], grad_fn=<MaxBackward0>),
indices=tensor([3, 1, 3, 2, 2]))
idx: [3 1 3 2 2]
Predicted: oholl,Epoch [11/15] loss =  0.6299
outputs: tensor([[-0.9862, -0.9124, -0.6969,  0.9765],
        [-0.9970,  0.9158, -0.6110, -0.7885],
        [-0.9655, -0.8035,  0.7984,  0.8932],
        [-0.9990, -0.7852,  0.9671,  0.8858],
        [-0.9997, -0.8599,  0.9756, -0.7793]], grad_fn=<ViewBackward0>)
outputs.max(dim=1) torch.return_types.max(
values=tensor([0.9765, 0.9158, 0.8932, 0.9671, 0.9756], grad_fn=<MaxBackward0>),
indices=tensor([3, 1, 3, 2, 2]))
idx: [3 1 3 2 2]
Predicted: oholl,Epoch [12/15] loss =  0.6009
outputs: tensor([[-0.9885, -0.9389, -0.7099,  0.9760],
        [-0.9976,  0.9095, -0.6146, -0.8505],
        [-0.9651, -0.8554,  0.8884,  0.8322],
        [-0.9993, -0.8886,  0.9745,  0.8837],
        [-0.9998, -0.9177,  0.9749, -0.8228]], grad_fn=<ViewBackward0>)
outputs.max(dim=1) torch.return_types.max(
values=tensor([0.9760, 0.9095, 0.8884, 0.9745, 0.9749], grad_fn=<MaxBackward0>),
indices=tensor([3, 1, 2, 2, 2]))
idx: [3 1 2 2 2]
Predicted: ohlll,Epoch [13/15] loss =  0.5797
outputs: tensor([[-0.9903, -0.9562, -0.7232,  0.9747],
        [-0.9980,  0.9053, -0.6240, -0.8955],
        [-0.9645, -0.8899,  0.9388,  0.7242],
        [-0.9994, -0.9388,  0.9790,  0.8783],
        [-0.9998, -0.9516,  0.9737, -0.8467]], grad_fn=<ViewBackward0>)
outputs.max(dim=1) torch.return_types.max(
values=tensor([0.9747, 0.9053, 0.9388, 0.9790, 0.9737], grad_fn=<MaxBackward0>),
indices=tensor([3, 1, 2, 2, 2]))
idx: [3 1 2 2 2]
Predicted: ohlll,Epoch [14/15] loss =  0.5627
outputs: tensor([[-0.9916, -0.9675, -0.7397,  0.9724],
        [-0.9983,  0.9052, -0.6472, -0.9274],
        [-0.9631, -0.9118,  0.9649,  0.5346],
        [-0.9994, -0.9638,  0.9829,  0.8552],
        [-0.9998, -0.9704,  0.9720, -0.8646]], grad_fn=<ViewBackward0>)
outputs.max(dim=1) torch.return_types.max(
values=tensor([0.9724, 0.9052, 0.9649, 0.9829, 0.9720], grad_fn=<MaxBackward0>),
indices=tensor([3, 1, 2, 2, 2]))
idx: [3 1 2 2 2]
Predicted: ohlll,Epoch [15/15] loss =  0.5464

Process finished with exit code 0
 

6、词嵌入

读取One-hot vectors的缺点

(1)、维度太高

(2)、矩阵太过稀疏

(3)、是硬编码,不是学习到的

One-hot vectors缺点的解决方式是Embedding

实际上是把一个高维度的稀疏样本映射为一个稠密的低维度的样本空间中,实际上就是常说的数据降维

 隐藏层必须和分类的数量是一致的,但是有时候是不一致的,所以这个时候就需要再加一层线性,达到这样的要求

代码实现

import torch

num_class = 4
input_size = 4
hidden_size = 4
embedding_size = 10
num_layers = 2
batch_size = 1
seq_len = 5

idx2char = ['e', 'h', 'l', 'o']
x_data = [[1, 0, 2, 2, 3]]  # (batch, seq_len)
y_data = [3, 1, 2, 3, 2]  # (batch * seq_len)
inputs = torch.LongTensor(x_data)
print('inputs',inputs)
labels = torch.LongTensor(y_data)

class Model(torch.nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.emb = torch.nn.Embedding(input_size, embedding_size)
        self.rnn = torch.nn.RNN(input_size=embedding_size,
                                hidden_size=hidden_size,
                                num_layers=num_layers,
                                batch_first=True
                                )
        self.fc = torch.nn.Linear(hidden_size, num_class)

    def forward(self,x):
        print('x.size(0)',x.size(0))
        hidden = torch.zeros(num_layers, x.size(0), hidden_size)
        print('hidden.shape',hidden.shape)
        print('hidden',hidden)
        print('x_0:',x)
        x = self.emb(x)  # (batch, seqLen, embeddingSize)
        print('x_1:',x)
        x, _ = self.rnn(x, hidden)
        print('x_2',x)
        x = self.fc(x)
        print('x_3',x)  # (1,5,4)
        print('x.view:',x.view(-1,num_class))  # (5,4)
        return x.view(-1, num_class)

net = Model()
criterion = torch.nn.CrossEntropyLoss()   # 接入softmax层
optimizer = torch.optim.Adam(net.parameters(), lr=0.1)

for epoch in range(15):
    optimizer.zero_grad()
    outputs = net(inputs)
    print('outputs:', outputs)
    loss = criterion(outputs, labels)
    loss.backward()
    optimizer.step()
    _, idx = outputs.max(dim=1)  # 取最大值的下标
    idx = idx.data.numpy()
    print('idx:', idx)
    print('Predicted:', ''.join([idx2char[x] for x in idx]), end='')
    print(',Epoch [%d/15] loss =  %0.4f' % (epoch + 1, loss.item()))

G:\python_files\DeepLearning\Scripts\python.exe G:/python_files/DeepLearningProgram/词嵌入.py
inputs tensor([[1, 0, 2, 2, 3]])
x.size(0) 1
hidden.shape torch.Size([2, 1, 4])
hidden tensor([[[0., 0., 0., 0.]],

        [[0., 0., 0., 0.]]])
x_0: tensor([[1, 0, 2, 2, 3]])
x_1: tensor([[[ 0.0979,  0.9104,  1.7472, -0.0186,  0.4478, -1.3402,  0.1717,
           1.5238, -0.3157,  1.8523],
         [-0.5575,  1.0836,  1.4183,  1.4768, -0.3469, -0.3848, -1.2278,
          -0.4501,  0.5026,  1.3427],
         [ 0.9381,  1.2550, -1.2158, -0.9144,  0.3298, -1.1841,  0.6781,
           0.1703,  0.3936, -0.4299],
         [ 0.9381,  1.2550, -1.2158, -0.9144,  0.3298, -1.1841,  0.6781,
           0.1703,  0.3936, -0.4299],
         [ 0.1590, -0.8760,  0.2953, -1.7547, -0.7498,  0.3259, -0.0867,
           0.8370, -1.5976,  0.2770]]], grad_fn=<EmbeddingBackward0>)
x_2 tensor([[[ 0.3772,  0.4659, -0.4824,  0.3167],
         [ 0.4253,  0.6224, -0.5328,  0.8509],
         [ 0.1511, -0.2543, -0.4202,  0.3263],
         [ 0.2619, -0.2187, -0.2259,  0.5462],
         [ 0.6239,  0.4253, -0.1566, -0.1336]]], grad_fn=<TransposeBackward1>)
x_3 tensor([[[ 0.5114, -0.2779,  0.4754, -0.7368],
         [ 0.7004, -0.0807,  0.3180, -0.9904],
         [ 0.2221, -0.2761,  0.2885, -0.6399],
         [ 0.3802, -0.1152,  0.1688, -0.7735],
         [ 0.5466, -0.3278,  0.6095, -0.6360]]], grad_fn=<ViewBackward0>)
x.view: tensor([[ 0.5114, -0.2779,  0.4754, -0.7368],
        [ 0.7004, -0.0807,  0.3180, -0.9904],
        [ 0.2221, -0.2761,  0.2885, -0.6399],
        [ 0.3802, -0.1152,  0.1688, -0.7735],
        [ 0.5466, -0.3278,  0.6095, -0.6360]], grad_fn=<ViewBackward0>)
outputs: tensor([[ 0.5114, -0.2779,  0.4754, -0.7368],
        [ 0.7004, -0.0807,  0.3180, -0.9904],
        [ 0.2221, -0.2761,  0.2885, -0.6399],
        [ 0.3802, -0.1152,  0.1688, -0.7735],
        [ 0.5466, -0.3278,  0.6095, -0.6360]], grad_fn=<ViewBackward0>)
idx: [0 0 2 0 2]
Predicted: eelel,Epoch [1/15] loss =  1.6110
x.size(0) 1
hidden.shape torch.Size([2, 1, 4])
hidden tensor([[[0., 0., 0., 0.]],

        [[0., 0., 0., 0.]]])
x_0: tensor([[1, 0, 2, 2, 3]])
x_1: tensor([[[ 0.1979,  1.0104,  1.6472,  0.0814,  0.5478, -1.2402,  0.2717,
           1.4238, -0.4157,  1.7523],
         [-0.6575,  0.9836,  1.3183,  1.5768, -0.4469, -0.4848, -1.1278,
          -0.5501,  0.6026,  1.2427],
         [ 1.0381,  1.1550, -1.1158, -1.0144,  0.4298, -1.0841,  0.5781,
           0.2703,  0.2936, -0.5299],
         [ 1.0381,  1.1550, -1.1158, -1.0144,  0.4298, -1.0841,  0.5781,
           0.2703,  0.2936, -0.5299],
         [ 0.2590, -0.7760,  0.1953, -1.8547, -0.6498,  0.4259, -0.1867,
           0.9370, -1.4976,  0.3770]]], grad_fn=<EmbeddingBackward0>)
x_2 tensor([[[-0.4243, -0.1475, -0.7790, -0.4171],
         [-0.5545,  0.3204, -0.5777,  0.8650],
         [-0.4827, -0.4827, -0.7665, -0.3961],
         [-0.7910, -0.2632, -0.7781, -0.0279],
         [-0.4898,  0.4215, -0.7939, -0.0173]]], grad_fn=<TransposeBackward1>)
x_3 tensor([[[-2.7128e-01, -6.4384e-01,  5.6473e-01, -6.7270e-04],
         [ 1.0902e-01,  7.2345e-02, -1.8547e-01, -3.3819e-01],
         [-3.5608e-01, -6.7826e-01,  5.2131e-01, -2.3273e-02],
         [-2.9731e-01, -5.2175e-01,  2.0260e-01, -1.3013e-02],
         [-8.2588e-02, -3.9467e-01,  3.4536e-01, -4.8831e-02]]],
       grad_fn=<ViewBackward0>)
x.view: tensor([[-2.7128e-01, -6.4384e-01,  5.6473e-01, -6.7270e-04],
        [ 1.0902e-01,  7.2345e-02, -1.8547e-01, -3.3819e-01],
        [-3.5608e-01, -6.7826e-01,  5.2131e-01, -2.3273e-02],
        [-2.9731e-01, -5.2175e-01,  2.0260e-01, -1.3013e-02],
        [-8.2588e-02, -3.9467e-01,  3.4536e-01, -4.8831e-02]],
       grad_fn=<ViewBackward0>)
outputs: tensor([[-2.7128e-01, -6.4384e-01,  5.6473e-01, -6.7270e-04],
        [ 1.0902e-01,  7.2345e-02, -1.8547e-01, -3.3819e-01],
        [-3.5608e-01, -6.7826e-01,  5.2131e-01, -2.3273e-02],
        [-2.9731e-01, -5.2175e-01,  2.0260e-01, -1.3013e-02],
        [-8.2588e-02, -3.9467e-01,  3.4536e-01, -4.8831e-02]],
       grad_fn=<ViewBackward0>)
idx: [2 0 2 2 2]
Predicted: lelll,Epoch [2/15] loss =  1.1571
x.size(0) 1
hidden.shape torch.Size([2, 1, 4])
hidden tensor([[[0., 0., 0., 0.]],

        [[0., 0., 0., 0.]]])
x_0: tensor([[1, 0, 2, 2, 3]])
x_1: tensor([[[ 0.2837,  1.1049,  1.5748,  0.1688,  0.6327, -1.2290,  0.3694,
           1.3295, -0.5141,  1.6724],
         [-0.7490,  0.8857,  1.2296,  1.6622, -0.5267, -0.5428, -1.0679,
          -0.6400,  0.6822,  1.2036],
         [ 1.1272,  1.0887, -1.0406, -1.0973,  0.5064, -1.0112,  0.5653,
           0.3511,  0.2163, -0.5147],
         [ 1.1272,  1.0887, -1.0406, -1.0973,  0.5064, -1.0112,  0.5653,
           0.3511,  0.2163, -0.5147],
         [ 0.3371, -0.6994,  0.1869, -1.9517, -0.5497,  0.4916, -0.2523,
           1.0365, -1.5213,  0.3916]]], grad_fn=<EmbeddingBackward0>)
x_2 tensor([[[-0.7139, -0.6269, -0.8682, -0.7788],
         [-0.7125,  0.2163, -0.4858,  0.9187],
         [-0.5879, -0.7182, -0.9006, -0.8135],
         [-0.9251, -0.5227, -0.8896, -0.4147],
         [-0.8109,  0.0873, -0.9328, -0.3958]]], grad_fn=<TransposeBackward1>)
x_3 tensor([[[-0.6093, -0.9582,  0.8367,  0.3744],
         [-0.1254,  0.3039, -0.3237, -0.1519],
         [-0.6249, -0.9832,  0.9368,  0.3376],
         [-0.6056, -0.7613,  0.5163,  0.3487],
         [-0.4986, -0.6101,  0.5658,  0.3484]]], grad_fn=<ViewBackward0>)
x.view: tensor([[-0.6093, -0.9582,  0.8367,  0.3744],
        [-0.1254,  0.3039, -0.3237, -0.1519],
        [-0.6249, -0.9832,  0.9368,  0.3376],
        [-0.6056, -0.7613,  0.5163,  0.3487],
        [-0.4986, -0.6101,  0.5658,  0.3484]], grad_fn=<ViewBackward0>)
outputs: tensor([[-0.6093, -0.9582,  0.8367,  0.3744],
        [-0.1254,  0.3039, -0.3237, -0.1519],
        [-0.6249, -0.9832,  0.9368,  0.3376],
        [-0.6056, -0.7613,  0.5163,  0.3487],
        [-0.4986, -0.6101,  0.5658,  0.3484]], grad_fn=<ViewBackward0>)
idx: [2 1 2 2 2]
Predicted: lhlll,Epoch [3/15] loss =  0.9631
x.size(0) 1
hidden.shape torch.Size([2, 1, 4])
hidden tensor([[[0., 0., 0., 0.]],

        [[0., 0., 0., 0.]]])
x_0: tensor([[1, 0, 2, 2, 3]])
x_1: tensor([[[ 0.3574,  1.1832,  1.5158,  0.1763,  0.7025, -1.2111,  0.4509,
           1.3209, -0.6047,  1.6154],
         [-0.8435,  0.7868,  1.1817,  1.7446, -0.6141, -0.5730, -1.0491,
          -0.7232,  0.7654,  1.1728],
         [ 1.1879,  1.0463, -0.9724, -1.1533,  0.5616, -0.9556,  0.5106,
           0.4087,  0.1604, -0.5426],
         [ 1.1879,  1.0463, -0.9724, -1.1533,  0.5616, -0.9556,  0.5106,
           0.4087,  0.1604, -0.5426],
         [ 0.4226, -0.6146,  0.1662, -2.0365, -0.4497,  0.5195, -0.2856,
           1.1228, -1.5541,  0.3787]]], grad_fn=<EmbeddingBackward0>)
x_2 tensor([[[-0.8227, -0.8103, -0.9144, -0.9030],
         [-0.7148,  0.2632, -0.2010,  0.9611],
         [-0.5948, -0.8889, -0.9465, -0.9445],
         [-0.9611, -0.7075, -0.9076, -0.6075],
         [-0.9114, -0.4499, -0.9663, -0.6871]]], grad_fn=<TransposeBackward1>)
x_3 tensor([[[-0.8294, -1.1220,  1.0145,  0.7481],
         [-0.1595,  0.6444, -0.5075, -0.1069],
         [-0.8183, -1.1517,  1.1933,  0.6761],
         [-0.8278, -0.9028,  0.7260,  0.7028],
         [-0.8176, -0.8830,  0.8363,  0.7229]]], grad_fn=<ViewBackward0>)
x.view: tensor([[-0.8294, -1.1220,  1.0145,  0.7481],
        [-0.1595,  0.6444, -0.5075, -0.1069],
        [-0.8183, -1.1517,  1.1933,  0.6761],
        [-0.8278, -0.9028,  0.7260,  0.7028],
        [-0.8176, -0.8830,  0.8363,  0.7229]], grad_fn=<ViewBackward0>)
outputs: tensor([[-0.8294, -1.1220,  1.0145,  0.7481],
        [-0.1595,  0.6444, -0.5075, -0.1069],
        [-0.8183, -1.1517,  1.1933,  0.6761],
        [-0.8278, -0.9028,  0.7260,  0.7028],
        [-0.8176, -0.8830,  0.8363,  0.7229]], grad_fn=<ViewBackward0>)
idx: [2 1 2 2 2]
Predicted: lhlll,Epoch [4/15] loss =  0.8192
x.size(0) 1
hidden.shape torch.Size([2, 1, 4])
hidden tensor([[[0., 0., 0., 0.]],

        [[0., 0., 0., 0.]]])
x_0: tensor([[1, 0, 2, 2, 3]])
x_1: tensor([[[ 0.4267,  1.2433,  1.4607,  0.1329,  0.7618, -1.1703,  0.5067,
           1.3648, -0.6942,  1.5761],
         [-0.9382,  0.6893,  1.1630,  1.8207, -0.7039, -0.5756, -1.0618,
          -0.7933,  0.8278,  1.1424],
         [ 1.2312,  1.0134, -0.9101, -1.1945,  0.6050, -0.9109,  0.4517,
           0.4526,  0.1170, -0.5812],
         [ 1.2312,  1.0134, -0.9101, -1.1945,  0.6050, -0.9109,  0.4517,
           0.4526,  0.1170, -0.5812],
         [ 0.4976, -0.5416,  0.1449, -2.1153, -0.3613,  0.5406, -0.3104,
           1.2017, -1.5887,  0.3667]]], grad_fn=<EmbeddingBackward0>)
x_2 tensor([[[-0.8584, -0.8936, -0.9523, -0.9560],
         [-0.4321,  0.7502,  0.1346,  0.9841],
         [-0.5137, -0.9572, -0.9765, -0.9855],
         [-0.9732, -0.7780, -0.9136, -0.7397],
         [-0.9417, -0.7083, -0.9768, -0.8273]]], grad_fn=<TransposeBackward1>)
x_3 tensor([[[-1.0491, -1.2480,  1.1723,  1.1317],
         [ 0.0416,  1.2057, -0.5718, -0.3294],
         [-0.9937, -1.2801,  1.4419,  0.9943],
         [-1.0338, -1.0294,  0.9075,  1.0789],
         [-1.0596, -1.0816,  1.0316,  1.1138]]], grad_fn=<ViewBackward0>)
x.view: tensor([[-1.0491, -1.2480,  1.1723,  1.1317],
        [ 0.0416,  1.2057, -0.5718, -0.3294],
        [-0.9937, -1.2801,  1.4419,  0.9943],
        [-1.0338, -1.0294,  0.9075,  1.0789],
        [-1.0596, -1.0816,  1.0316,  1.1138]], grad_fn=<ViewBackward0>)
outputs: tensor([[-1.0491, -1.2480,  1.1723,  1.1317],
        [ 0.0416,  1.2057, -0.5718, -0.3294],
        [-0.9937, -1.2801,  1.4419,  0.9943],
        [-1.0338, -1.0294,  0.9075,  1.0789],
        [-1.0596, -1.0816,  1.0316,  1.1138]], grad_fn=<ViewBackward0>)
idx: [2 1 2 3 3]
Predicted: lhloo,Epoch [5/15] loss =  0.7005
x.size(0) 1
hidden.shape torch.Size([2, 1, 4])
hidden tensor([[[0., 0., 0., 0.]],

        [[0., 0., 0., 0.]]])
x_0: tensor([[1, 0, 2, 2, 3]])
x_1: tensor([[[ 0.4901,  1.2916,  1.4100,  0.0776,  0.8134, -1.1251,  0.5502,
           1.4207, -0.7790,  1.5456],
         [-1.0348,  0.5908,  1.1532,  1.8903, -0.7921, -0.5631, -1.0896,
          -0.8520,  0.8735,  1.1142],
         [ 1.2653,  0.9852, -0.8546, -1.2276,  0.6409, -0.8731,  0.3970,
           0.4884,  0.0814, -0.6178],
         [ 1.2653,  0.9852, -0.8546, -1.2276,  0.6409, -0.8731,  0.3970,
           0.4884,  0.0814, -0.6178],
         [ 0.5434, -0.4947,  0.1270, -2.1831, -0.2947,  0.5668, -0.3411,
           1.2738, -1.6243,  0.3547]]], grad_fn=<EmbeddingBackward0>)
x_2 tensor([[[-0.8337, -0.9457, -0.9761, -0.9795],
         [-0.1997,  0.9313,  0.3498,  0.9923],
         [-0.3049, -0.9841, -0.9885, -0.9954],
         [-0.9743, -0.8286, -0.9331, -0.8420],
         [-0.9416, -0.8127, -0.9829, -0.8923]]], grad_fn=<TransposeBackward1>)
x_3 tensor([[[-1.2439, -1.3849,  1.3886,  1.4863],
         [ 0.1852,  1.6604, -0.6304, -0.6072],
         [-1.1077, -1.4186,  1.8161,  1.2222],
         [-1.2382, -1.1933,  1.1352,  1.4628],
         [-1.2636, -1.2427,  1.2345,  1.4863]]], grad_fn=<ViewBackward0>)
x.view: tensor([[-1.2439, -1.3849,  1.3886,  1.4863],
        [ 0.1852,  1.6604, -0.6304, -0.6072],
        [-1.1077, -1.4186,  1.8161,  1.2222],
        [-1.2382, -1.1933,  1.1352,  1.4628],
        [-1.2636, -1.2427,  1.2345,  1.4863]], grad_fn=<ViewBackward0>)
outputs: tensor([[-1.2439, -1.3849,  1.3886,  1.4863],
        [ 0.1852,  1.6604, -0.6304, -0.6072],
        [-1.1077, -1.4186,  1.8161,  1.2222],
        [-1.2382, -1.1933,  1.1352,  1.4628],
        [-1.2636, -1.2427,  1.2345,  1.4863]], grad_fn=<ViewBackward0>)
idx: [3 1 2 3 3]
Predicted: ohloo,Epoch [6/15] loss =  0.6164
x.size(0) 1
hidden.shape torch.Size([2, 1, 4])
hidden tensor([[[0., 0., 0., 0.]],

        [[0., 0., 0., 0.]]])
x_0: tensor([[1, 0, 2, 2, 3]])
x_1: tensor([[[ 0.5421,  1.3355,  1.3689,  0.0425,  0.8572, -1.0925,  0.5906,
           1.4555, -0.8462,  1.5182],
         [-1.1188,  0.5053,  1.1448,  1.9505, -0.8686, -0.5520, -1.1142,
          -0.9025,  0.9126,  1.0897],
         [ 1.2961,  0.9593, -0.8079, -1.2573,  0.6717, -0.8397,  0.3473,
           0.5202,  0.0509, -0.6477],
         [ 1.2961,  0.9593, -0.8079, -1.2573,  0.6717, -0.8397,  0.3473,
           0.5202,  0.0509, -0.6477],
         [ 0.5538, -0.4760,  0.1132, -2.2306, -0.2548,  0.6023, -0.3846,
           1.3373, -1.6594,  0.3401]]], grad_fn=<EmbeddingBackward0>)
x_2 tensor([[[-0.7471, -0.9725, -0.9854, -0.9893],
         [-0.3517,  0.9665,  0.4550,  0.9949],
         [ 0.0556, -0.9936, -0.9926, -0.9979],
         [-0.9699, -0.8757, -0.9572, -0.9119],
         [-0.9166, -0.8488, -0.9876, -0.9242]]], grad_fn=<TransposeBackward1>)
x_3 tensor([[[-1.3869, -1.5366,  1.6755,  1.7767],
         [ 0.1424,  2.0382, -0.9614, -0.6583],
         [-1.1275, -1.5890,  2.3763,  1.3071],
         [-1.4277, -1.3849,  1.3876,  1.8341],
         [-1.4314, -1.3925,  1.4655,  1.8173]]], grad_fn=<ViewBackward0>)
x.view: tensor([[-1.3869, -1.5366,  1.6755,  1.7767],
        [ 0.1424,  2.0382, -0.9614, -0.6583],
        [-1.1275, -1.5890,  2.3763,  1.3071],
        [-1.4277, -1.3849,  1.3876,  1.8341],
        [-1.4314, -1.3925,  1.4655,  1.8173]], grad_fn=<ViewBackward0>)
outputs: tensor([[-1.3869, -1.5366,  1.6755,  1.7767],
        [ 0.1424,  2.0382, -0.9614, -0.6583],
        [-1.1275, -1.5890,  2.3763,  1.3071],
        [-1.4277, -1.3849,  1.3876,  1.8341],
        [-1.4314, -1.3925,  1.4655,  1.8173]], grad_fn=<ViewBackward0>)
idx: [3 1 2 3 3]
Predicted: ohloo,Epoch [7/15] loss =  0.5447
x.size(0) 1
hidden.shape torch.Size([2, 1, 4])
hidden tensor([[[0., 0., 0., 0.]],

        [[0., 0., 0., 0.]]])
x_0: tensor([[1, 0, 2, 2, 3]])
x_1: tensor([[[ 0.5831,  1.3767,  1.3371,  0.0296,  0.8942, -1.0715,  0.6280,
           1.4669, -0.8946,  1.4927],
         [-1.1921,  0.4308,  1.1374,  2.0029, -0.9352, -0.5423, -1.1355,
          -0.9466,  0.9465,  1.0684],
         [ 1.3378,  0.9308, -0.7848, -1.2930,  0.7004, -0.8074,  0.3021,
           0.5565,  0.0210, -0.6600],
         [ 1.3378,  0.9308, -0.7848, -1.2930,  0.7004, -0.8074,  0.3021,
           0.5565,  0.0210, -0.6600],
         [ 0.5230, -0.4931,  0.1048, -2.2382, -0.2558,  0.6543, -0.4468,
           1.3878, -1.6942,  0.3178]]], grad_fn=<EmbeddingBackward0>)
x_2 tensor([[[-0.5910, -0.9851, -0.9898, -0.9939],
         [-0.5726,  0.9818,  0.4660,  0.9963],
         [ 0.4976, -0.9972, -0.9948, -0.9989],
         [-0.9616, -0.9171, -0.9793, -0.9567],
         [-0.8080, -0.7351, -0.9928, -0.9290]]], grad_fn=<TransposeBackward1>)
x_3 tensor([[[-1.4724, -1.6956,  2.0267,  1.9924],
         [-0.0162,  2.3976, -1.3327, -0.6035],
         [-1.0645, -1.7939,  3.0675,  1.2728],
         [-1.5973, -1.5757,  1.6237,  2.1922],
         [-1.5276, -1.4356,  1.7385,  2.0237]]], grad_fn=<ViewBackward0>)
x.view: tensor([[-1.4724, -1.6956,  2.0267,  1.9924],
        [-0.0162,  2.3976, -1.3327, -0.6035],
        [-1.0645, -1.7939,  3.0675,  1.2728],
        [-1.5973, -1.5757,  1.6237,  2.1922],
        [-1.5276, -1.4356,  1.7385,  2.0237]], grad_fn=<ViewBackward0>)
outputs: tensor([[-1.4724, -1.6956,  2.0267,  1.9924],
        [-0.0162,  2.3976, -1.3327, -0.6035],
        [-1.0645, -1.7939,  3.0675,  1.2728],
        [-1.5973, -1.5757,  1.6237,  2.1922],
        [-1.5276, -1.4356,  1.7385,  2.0237]], grad_fn=<ViewBackward0>)
idx: [2 1 2 3 3]
Predicted: lhloo,Epoch [8/15] loss =  0.4840
x.size(0) 1
hidden.shape torch.Size([2, 1, 4])
hidden tensor([[[0., 0., 0., 0.]],

        [[0., 0., 0., 0.]]])
x_0: tensor([[1, 0, 2, 2, 3]])
x_1: tensor([[[ 0.6121,  1.4172,  1.3161,  0.0415,  0.9245, -1.0620,  0.6623,
           1.4528, -0.9211,  1.4685],
         [-1.2567,  0.3651,  1.1308,  2.0491, -0.9940, -0.5338, -1.1543,
          -0.9854,  0.9764,  1.0496],
         [ 1.3987,  0.8823, -0.8199, -1.3511,  0.7364, -0.7660,  0.2629,
           0.6143, -0.0220, -0.6298],
         [ 1.3987,  0.8823, -0.8199, -1.3511,  0.7364, -0.7660,  0.2629,
           0.6143, -0.0220, -0.6298],
         [ 0.4673, -0.5445,  0.1095, -2.1954, -0.3020,  0.7193, -0.5116,
           1.3979, -1.7339,  0.2716]]], grad_fn=<EmbeddingBackward0>)
x_2 tensor([[[-0.4527, -0.9909, -0.9924, -0.9963],
         [-0.6973,  0.9901,  0.4570,  0.9972],
         [ 0.7326, -0.9986, -0.9961, -0.9995],
         [-0.9495, -0.9442, -0.9915, -0.9789],
         [-0.4742, -0.1649, -0.9969, -0.9163]]], grad_fn=<TransposeBackward1>)
x_3 tensor([[[-1.5354, -1.8545,  2.2894,  2.2032],
         [-0.1696,  2.7432, -1.5921, -0.5847],
         [-1.0375, -2.0079,  3.5300,  1.3327],
         [-1.7407, -1.7375,  1.7487,  2.5414],
         [-1.4432, -1.1063,  2.1209,  1.8419]]], grad_fn=<ViewBackward0>)
x.view: tensor([[-1.5354, -1.8545,  2.2894,  2.2032],
        [-0.1696,  2.7432, -1.5921, -0.5847],
        [-1.0375, -2.0079,  3.5300,  1.3327],
        [-1.7407, -1.7375,  1.7487,  2.5414],
        [-1.4432, -1.1063,  2.1209,  1.8419]], grad_fn=<ViewBackward0>)
outputs: tensor([[-1.5354, -1.8545,  2.2894,  2.2032],
        [-0.1696,  2.7432, -1.5921, -0.5847],
        [-1.0375, -2.0079,  3.5300,  1.3327],
        [-1.7407, -1.7375,  1.7487,  2.5414],
        [-1.4432, -1.1063,  2.1209,  1.8419]], grad_fn=<ViewBackward0>)
idx: [2 1 2 3 2]
Predicted: lhlol,Epoch [9/15] loss =  0.3933
x.size(0) 1
hidden.shape torch.Size([2, 1, 4])
hidden tensor([[[0., 0., 0., 0.]],

        [[0., 0., 0., 0.]]])
x_0: tensor([[1, 0, 2, 2, 3]])
x_1: tensor([[[ 0.6304,  1.4563,  1.3052,  0.0712,  0.9488, -1.0597,  0.6923,
           1.4208, -0.9300,  1.4458],
         [-1.3141,  0.3067,  1.1249,  2.0901, -1.0461, -0.5263, -1.1710,
          -1.0199,  1.0030,  1.0329],
         [ 1.4601,  0.8359, -0.8590, -1.4087,  0.7705, -0.7275,  0.2316,
           0.6713, -0.0634, -0.5980],
         [ 1.4601,  0.8359, -0.8590, -1.4087,  0.7705, -0.7275,  0.2316,
           0.6713, -0.0634, -0.5980],
         [ 0.4012, -0.6077,  0.1150, -2.1428, -0.3615,  0.7934, -0.5860,
           1.4010, -1.7799,  0.2185]]], grad_fn=<EmbeddingBackward0>)
x_2 tensor([[[-0.3853, -0.9934, -0.9941, -0.9975],
         [-0.7139,  0.9947,  0.4585,  0.9978],
         [ 0.8025, -0.9992, -0.9971, -0.9997],
         [-0.9425, -0.9481, -0.9946, -0.9854],
         [-0.1859,  0.1561, -0.9982, -0.9351]]], grad_fn=<TransposeBackward1>)
x_3 tensor([[[-1.6075, -2.0042,  2.3485,  2.4694],
         [-0.2678,  3.0697, -1.7005, -0.6504],
         [-1.0594, -2.2112,  3.7004,  1.5058],
         [-1.8626, -1.8568,  1.7014,  2.8946],
         [-1.3754, -0.9494,  2.4632,  1.7062]]], grad_fn=<ViewBackward0>)
x.view: tensor([[-1.6075, -2.0042,  2.3485,  2.4694],
        [-0.2678,  3.0697, -1.7005, -0.6504],
        [-1.0594, -2.2112,  3.7004,  1.5058],
        [-1.8626, -1.8568,  1.7014,  2.8946],
        [-1.3754, -0.9494,  2.4632,  1.7062]], grad_fn=<ViewBackward0>)
outputs: tensor([[-1.6075, -2.0042,  2.3485,  2.4694],
        [-0.2678,  3.0697, -1.7005, -0.6504],
        [-1.0594, -2.2112,  3.7004,  1.5058],
        [-1.8626, -1.8568,  1.7014,  2.8946],
        [-1.3754, -0.9494,  2.4632,  1.7062]], grad_fn=<ViewBackward0>)
idx: [3 1 2 3 2]
Predicted: ohlol,Epoch [10/15] loss =  0.3060
x.size(0) 1
hidden.shape torch.Size([2, 1, 4])
hidden tensor([[[0., 0., 0., 0.]],

        [[0., 0., 0., 0.]]])
x_0: tensor([[1, 0, 2, 2, 3]])
x_1: tensor([[[ 0.6398,  1.4934,  1.3030,  0.1129,  0.9680, -1.0617,  0.7178,
           1.3771, -0.9260,  1.4252],
         [-1.3652,  0.2547,  1.1197,  2.1267, -1.0926, -0.5195, -1.1859,
          -1.0507,  1.0268,  1.0180],
         [ 1.5150,  0.7967, -0.8929, -1.4594,  0.8011, -0.6938,  0.2075,
           0.7214, -0.0999, -0.5697],
         [ 1.5150,  0.7967, -0.8929, -1.4594,  0.8011, -0.6938,  0.2075,
           0.7214, -0.0999, -0.5697],
         [ 0.3403, -0.6667,  0.1149, -2.0970, -0.4171,  0.8623, -0.6555,
           1.4089, -1.8243,  0.1687]]], grad_fn=<EmbeddingBackward0>)
x_2 tensor([[[-0.3940, -0.9944, -0.9953, -0.9983],
         [-0.6586,  0.9972,  0.4641,  0.9982],
         [ 0.7998, -0.9995, -0.9977, -0.9998],
         [-0.9390, -0.9369, -0.9959, -0.9881],
         [ 0.1076,  0.4632, -0.9988, -0.9531]]], grad_fn=<TransposeBackward1>)
x_3 tensor([[[-1.7029, -2.1352,  2.2365,  2.8054],
         [-0.3196,  3.3653, -1.6951, -0.7863],
         [-1.1068, -2.3992,  3.7077,  1.7407],
         [-1.9717, -1.9502,  1.5561,  3.2540],
         [-1.2795, -0.7815,  2.8303,  1.4801]]], grad_fn=<ViewBackward0>)
x.view: tensor([[-1.7029, -2.1352,  2.2365,  2.8054],
        [-0.3196,  3.3653, -1.6951, -0.7863],
        [-1.1068, -2.3992,  3.7077,  1.7407],
        [-1.9717, -1.9502,  1.5561,  3.2540],
        [-1.2795, -0.7815,  2.8303,  1.4801]], grad_fn=<ViewBackward0>)
outputs: tensor([[-1.7029, -2.1352,  2.2365,  2.8054],
        [-0.3196,  3.3653, -1.6951, -0.7863],
        [-1.1068, -2.3992,  3.7077,  1.7407],
        [-1.9717, -1.9502,  1.5561,  3.2540],
        [-1.2795, -0.7815,  2.8303,  1.4801]], grad_fn=<ViewBackward0>)
idx: [3 1 2 3 2]
Predicted: ohlol,Epoch [11/15] loss =  0.2176
x.size(0) 1
hidden.shape torch.Size([2, 1, 4])
hidden tensor([[[0., 0., 0., 0.]],

        [[0., 0., 0., 0.]]])
x_0: tensor([[1, 0, 2, 2, 3]])
x_1: tensor([[[ 0.6420,  1.5285,  1.3081,  0.1630,  0.9830, -1.0660,  0.7389,
           1.3256, -0.9127,  1.4069],
         [-1.4111,  0.2080,  1.1150,  2.1595, -1.1343, -0.5135, -1.1992,
          -1.0783,  1.0480,  1.0047],
         [ 1.5665,  0.7662, -0.9229, -1.5048,  0.8296, -0.6646,  0.1957,
           0.7661, -0.1322, -0.5440],
         [ 1.5665,  0.7662, -0.9229, -1.5048,  0.8296, -0.6646,  0.1957,
           0.7661, -0.1322, -0.5440],
         [ 0.2858, -0.7200,  0.1117, -2.0572, -0.4670,  0.9246, -0.7181,
           1.4189, -1.8650,  0.1223]]], grad_fn=<EmbeddingBackward0>)
x_2 tensor([[[-0.4566, -0.9949, -0.9960, -0.9987],
         [-0.5359,  0.9986,  0.4468,  0.9986],
         [ 0.7477, -0.9997, -0.9982, -0.9999],
         [-0.9363, -0.9097, -0.9968, -0.9897],
         [ 0.3717,  0.7125, -0.9992, -0.9664]]], grad_fn=<TransposeBackward1>)
x_3 tensor([[[-1.8195, -2.2432,  2.0060,  3.1966],
         [-0.3505,  3.6022, -1.5531, -0.9777],
         [-1.1767, -2.5662,  3.6074,  2.0221],
         [-2.0694, -2.0173,  1.3663,  3.6031],
         [-1.1839, -0.6435,  3.2147,  1.2127]]], grad_fn=<ViewBackward0>)
x.view: tensor([[-1.8195, -2.2432,  2.0060,  3.1966],
        [-0.3505,  3.6022, -1.5531, -0.9777],
        [-1.1767, -2.5662,  3.6074,  2.0221],
        [-2.0694, -2.0173,  1.3663,  3.6031],
        [-1.1839, -0.6435,  3.2147,  1.2127]], grad_fn=<ViewBackward0>)
outputs: tensor([[-1.8195, -2.2432,  2.0060,  3.1966],
        [-0.3505,  3.6022, -1.5531, -0.9777],
        [-1.1767, -2.5662,  3.6074,  2.0221],
        [-2.0694, -2.0173,  1.3663,  3.6031],
        [-1.1839, -0.6435,  3.2147,  1.2127]], grad_fn=<ViewBackward0>)
idx: [3 1 2 3 2]
Predicted: ohlol,Epoch [12/15] loss =  0.1534
x.size(0) 1
hidden.shape torch.Size([2, 1, 4])
hidden tensor([[[0., 0., 0., 0.]],

        [[0., 0., 0., 0.]]])
x_0: tensor([[1, 0, 2, 2, 3]])
x_1: tensor([[[ 0.6390,  1.5616,  1.3187,  0.2181,  0.9946, -1.0718,  0.7561,
           1.2696, -0.8933,  1.3908],
         [-1.4523,  0.1661,  1.1108,  2.1890, -1.1718, -0.5081, -1.2112,
          -1.1030,  1.0671,  0.9927],
         [ 1.6168,  0.7464, -0.9501, -1.5458,  0.8570, -0.6401,  0.2013,
           0.8063, -0.1607, -0.5206],
         [ 1.6168,  0.7464, -0.9501, -1.5458,  0.8570, -0.6401,  0.2013,
           0.8063, -0.1607, -0.5206],
         [ 0.2369, -0.7679,  0.1075, -2.0219, -0.5119,  0.9806, -0.7746,
           1.4290, -1.9017,  0.0794]]], grad_fn=<EmbeddingBackward0>)
x_2 tensor([[[-0.5107, -0.9952, -0.9963, -0.9990],
         [-0.4077,  0.9993,  0.3503,  0.9988],
         [ 0.7038, -0.9998, -0.9986, -0.9999],
         [-0.9323, -0.8659, -0.9975, -0.9908],
         [ 0.5859,  0.8539, -0.9996, -0.9759]]], grad_fn=<TransposeBackward1>)
x_3 tensor([[[-1.9274, -2.3400,  1.7745,  3.5726],
         [-0.4598,  3.7436, -1.3084, -1.1086],
         [-1.2415, -2.7221,  3.5107,  2.2849],
         [-2.1554, -2.0559,  1.1835,  3.9173],
         [-1.1110, -0.5972,  3.5983,  0.9758]]], grad_fn=<ViewBackward0>)
x.view: tensor([[-1.9274, -2.3400,  1.7745,  3.5726],
        [-0.4598,  3.7436, -1.3084, -1.1086],
        [-1.2415, -2.7221,  3.5107,  2.2849],
        [-2.1554, -2.0559,  1.1835,  3.9173],
        [-1.1110, -0.5972,  3.5983,  0.9758]], grad_fn=<ViewBackward0>)
outputs: tensor([[-1.9274, -2.3400,  1.7745,  3.5726],
        [-0.4598,  3.7436, -1.3084, -1.1086],
        [-1.2415, -2.7221,  3.5107,  2.2849],
        [-2.1554, -2.0559,  1.1835,  3.9173],
        [-1.1110, -0.5972,  3.5983,  0.9758]], grad_fn=<ViewBackward0>)
idx: [3 1 2 3 2]
Predicted: ohlol,Epoch [13/15] loss =  0.1226
x.size(0) 1
hidden.shape torch.Size([2, 1, 4])
hidden tensor([[[0., 0., 0., 0.]],

        [[0., 0., 0., 0.]]])
x_0: tensor([[1, 0, 2, 2, 3]])
x_1: tensor([[[ 0.6325,  1.5927,  1.3329,  0.2753,  1.0037, -1.0782,  0.7701,
           1.2122, -0.8705,  1.3769],
         [-1.4895,  0.1282,  1.1070,  2.2156, -1.2056, -0.5032, -1.2220,
          -1.1254,  1.0843,  0.9818],
         [ 1.6677,  0.7394, -0.9750, -1.5829,  0.8845, -0.6207,  0.2268,
           0.8424, -0.1856, -0.4992],
         [ 1.6677,  0.7394, -0.9750, -1.5829,  0.8845, -0.6207,  0.2268,
           0.8424, -0.1856, -0.4992],
         [ 0.1928, -0.8111,  0.1033, -1.9902, -0.5524,  1.0311, -0.8257,
           1.4385, -1.9348,  0.0400]]], grad_fn=<EmbeddingBackward0>)
x_2 tensor([[[-0.5066, -0.9955, -0.9963, -0.9992],
         [-0.3985,  0.9996,  0.1104,  0.9990],
         [ 0.7850, -0.9999, -0.9989, -1.0000],
         [-0.9345, -0.8266, -0.9981, -0.9919],
         [ 0.7445,  0.9242, -0.9998, -0.9823]]], grad_fn=<TransposeBackward1>)
x_3 tensor([[[-1.9964, -2.4442,  1.6575,  3.8568],
         [-0.8103,  3.7678, -1.1052, -0.9469],
         [-1.2312, -2.9084,  3.6352,  2.3763],
         [-2.2378, -2.0857,  1.0320,  4.2019],
         [-1.0650, -0.6130,  3.9680,  0.7795]]], grad_fn=<ViewBackward0>)
x.view: tensor([[-1.9964, -2.4442,  1.6575,  3.8568],
        [-0.8103,  3.7678, -1.1052, -0.9469],
        [-1.2312, -2.9084,  3.6352,  2.3763],
        [-2.2378, -2.0857,  1.0320,  4.2019],
        [-1.0650, -0.6130,  3.9680,  0.7795]], grad_fn=<ViewBackward0>)
outputs: tensor([[-1.9964, -2.4442,  1.6575,  3.8568],
        [-0.8103,  3.7678, -1.1052, -0.9469],
        [-1.2312, -2.9084,  3.6352,  2.3763],
        [-2.2378, -2.0857,  1.0320,  4.2019],
        [-1.0650, -0.6130,  3.9680,  0.7795]], grad_fn=<ViewBackward0>)
idx: [3 1 2 3 2]
Predicted: ohlol,Epoch [14/15] loss =  0.0988
x.size(0) 1
hidden.shape torch.Size([2, 1, 4])
hidden tensor([[[0., 0., 0., 0.]],

        [[0., 0., 0., 0.]]])
x_0: tensor([[1, 0, 2, 2, 3]])
x_1: tensor([[[ 0.6223,  1.6222,  1.3512,  0.3352,  1.0106, -1.0852,  0.7810,
           1.1529, -0.8445,  1.3652],
         [-1.5231,  0.0941,  1.1036,  2.2396, -1.2361, -0.4988, -1.2318,
          -1.1456,  1.0999,  0.9721],
         [ 1.7193,  0.7444, -0.9979, -1.6165,  0.9125, -0.6061,  0.2687,
           0.8750, -0.2070, -0.4799],
         [ 1.7193,  0.7444, -0.9979, -1.6165,  0.9125, -0.6061,  0.2687,
           0.8750, -0.2070, -0.4799],
         [ 0.1530, -0.8501,  0.0993, -1.9616, -0.5889,  1.0768, -0.8718,
           1.4471, -1.9647,  0.0039]]], grad_fn=<EmbeddingBackward0>)
x_2 tensor([[[-0.4767, -0.9958, -0.9958, -0.9994],
         [-0.4806,  0.9998, -0.2376,  0.9991],
         [ 0.9011, -0.9999, -0.9993, -1.0000],
         [-0.9424, -0.8069, -0.9985, -0.9927],
         [ 0.8440,  0.9591, -0.9999, -0.9863]]], grad_fn=<TransposeBackward1>)
x_3 tensor([[[-2.0427, -2.5502,  1.6240,  4.0715],
         [-1.3645,  3.6887, -0.9591, -0.5144],
         [-1.1925, -3.1052,  3.8751,  2.3741],
         [-2.3188, -2.1251,  0.9092,  4.4707],
         [-1.0488, -0.6509,  4.2951,  0.6302]]], grad_fn=<ViewBackward0>)
x.view: tensor([[-2.0427, -2.5502,  1.6240,  4.0715],
        [-1.3645,  3.6887, -0.9591, -0.5144],
        [-1.1925, -3.1052,  3.8751,  2.3741],
        [-2.3188, -2.1251,  0.9092,  4.4707],
        [-1.0488, -0.6509,  4.2951,  0.6302]], grad_fn=<ViewBackward0>)
outputs: tensor([[-2.0427, -2.5502,  1.6240,  4.0715],
        [-1.3645,  3.6887, -0.9591, -0.5144],
        [-1.1925, -3.1052,  3.8751,  2.3741],
        [-2.3188, -2.1251,  0.9092,  4.4707],
        [-1.0488, -0.6509,  4.2951,  0.6302]], grad_fn=<ViewBackward0>)
idx: [3 1 2 3 2]
Predicted: ohlol,Epoch [15/15] loss =  0.0782

Process finished with exit code 0
 

;