CLASStorch.nn.
Embedding
(num_embeddings: int, embedding_dim: int, padding_idx: Optional[int] = None, max_norm: Optional[float] = None, norm_type: float = 2.0, scale_grad_by_freq: bool = False, sparse: bool = False, _weight: Optional[torch.Tensor] = None)[SOURCE]
A simple lookup table that stores embeddings of a fixed dictionary and size.
This module is often used to store word embeddings and retrieve them using indices. The input to the module is a list of indices, and the output is the corresponding word embeddings.
Parameters
-
num_embeddings (int) – size of the dictionary of embeddings
-
embedding_dim (int) – the size of each embedding vector
-
padding_idx (int, optional) – If given, pads the output with the embedding vector at
padding_idx
(initialized to zeros) whenever it encounters the index. -
max_norm (float, optional) – If given, each embedding vector with norm larger than
max_norm
is renormalized to have normmax_norm
. -
norm_type (float, optional) – The p of the p-norm to compute for the
max_norm
option. Default2
. -
scale_grad_by_freq (boolean, optional) – If given, this will scale gradients by the inverse of frequency of the words in the mini-batch. Default
False
. -
sparse (bool, optional) – If
True
, gradient w.r.t.weight
matrix will be a sparse tensor. See Notes for more details regarding sparse gradients.
Variables
~Embedding.weight (Tensor) – the learnable weights of the module of shape (num_embeddings, embedding_dim) initialized from \mathcal{N}(0, 1)N(0,1)
Shape:
-
Input: (*)(∗) , LongTensor of arbitrary shape containing the indices to extract
-
Output: (*, H)(∗,H) , where * is the input shape and H=\text{embedding\_dim}H=embedding_dim
NOTE
Keep in mind that only a limited number of optimizers support sparse gradients: currently it’s optim.SGD
(CUDA and CPU), optim.SparseAdam
(CUDA and CPU) and optim.Adagrad
(CPU)
NOTE
With padding_idx
set, the embedding vector at padding_idx
is initialized to all zeros. However, note that this vector can be modified afterwards, e.g., using a customized initialization method, and thus changing the vector used to pad the output. The gradient for this vector from Embedding
is always zero.
Examples:
>>> # an Embedding module containing 10 tensors of size 3 >>> embedding = nn.Embedding(10, 3) >>> # a batch of 2 samples of 4 indices each >>> input = torch.LongTensor([[1,2,4,5],[4,3,2,9]]) >>> embedding(input) tensor([[[-0.0251, -1.6902, 0.7172], [-0.6431, 0.0748, 0.6969], [ 1.4970, 1.3448, -0.9685], [-0.3677, -2.7265, -0.1685]], [[ 1.4970, 1.3448, -0.9685], [ 0.4362, -0.4004, 0.9400], [-0.6431, 0.0748, 0.6969], [ 0.9124, -2.3616, 1.1151]]]) >>> # example with padding_idx >>> embedding = nn.Embedding(10, 3, padding_idx=0) >>> input = torch.LongTensor([[0,2,0,5]]) >>> embedding(input) tensor([[[ 0.0000, 0.0000, 0.0000], [ 0.1535, -2.0309, 0.9315], [ 0.0000, 0.0000, 0.0000], [-0.1655, 0.9897, 0.0635]]])
说明:
embedding = nn.Embedding(10, 3)
第一个参数10表示的是字典大小(字典中的word总数)
第二个维度3表示的是被embedding后的维度,将一个index embedding为多少维度的向量
input = torch.LongTensor([[1,2,4,5],[4,3,2,9]]) embs = embedding(input) 此处input的维度不限制