Bootstrap

Python欧式距离矩阵算法

Python欧式距离矩阵算法

根据欧式距离矩阵(Euclidean Distance Matrix, EDM) 的定义,它是一个n × \times ×n的矩阵,表示欧式空间中一组n个点的空间距离,每个元素为点之间的距离平方。假设 { x i ∈ R d : i ∈ { 1 , . . . , n } } \{x_i \in \mathbb{R}^d : i \in \{1,...,n\}\} {xiRd:i{1,...,n}}

X = ( ∣ ∣ ∣ x 1 x 2 . . . x n ∣ ∣ ∣ ) d × n X =\begin{pmatrix} | & | & & |\\ x_1 & x_2 &... & x_n\\ | & | & & |\\ \end{pmatrix}_{d\times n} X=x1x2...xnd×n

距离矩阵, D i j = ∣ ∣ x i − x j ∣ ∣ 2 2 D_{ij} = ||x_i-x_j||^2_2 Dij=xixj22
= ( x i − x j ) T ( x i − x j ) =(x_i - x_j)^T(x_i - x_j) =(xixj)T(xixj)
= x i T x i + x j x j T − 2 x i T x j =x_i^Tx_i + x_jx_j^T-2x_i^Tx_j =xiTxi+xjxjT2xiTxj
= ( 0 d 12 2 d 13 2 … d 1 n 2 d 21 2 0 d 23 2 … d 2 n 2 d 31 2 d 32 2 0 … d 3 n 2 ⋮ ⋮ ⋮ ⋱ ⋮ d n 1 2 d n 2 2 d 1 n 2 … 0 ) =\begin{pmatrix} 0 &d_{12}^2 & d_{13}^2 & \dots & d_{1n}^2\\ d_{21}^2&0 & d_{23}^2 & \dots & d_{2n}^2\\ d_{31}^2 & d_{32}^2 & 0&\dots & d_{3n}^2\\ \vdots&\vdots&\vdots &\ddots & \vdots\\ d_{n1}^2 & d_{n2}^2 & d_{1n}^2 & \dots& 0\\ \end{pmatrix} =0d212d312dn12d1220d322dn22d132d2320d1n2d1n2d2n2d3n20
欧式距离矩阵,edm(X)定义为
e d m ( X ) = d e f d i a g ( G ) + d i a g ( G ) T − 2 G edm(X) \overset{\mathrm{def}}{=} diag(G) + diag(G)^T - 2G edm(X)=defdiag(G)+diag(G)T2G
其中,格拉姆矩阵(Gram matrix) G = X T X G=X^TX G=XTX, d i a g ( G ) diag(G) diag(G)为含有对角线元素的列向量。

在计算两个矩阵 X d × m X_{d\times m} Xd×m Y d × n Y_{d\times n} Yd×n之间的欧式距离矩阵时,可从上式推导出两个矩阵的欧式距离矩阵公式:
G X = X T X G_X = X^TX GX=XTX
G Y = Y T Y G_Y = Y^TY GY=YTY
e d m ( X ) = d i a g ( G X ) + d i a g ( G Y ) T − 2 X T Y edm(X) = diag(G_X) + diag(G_Y)^T - 2X^TY edm(X)=diag(GX)+diag(GY)T2XTY
= ( x 11 x 22 ⋮ x m m ) m × 1 + ( y 11 y 22 ⋮ y n n ) n × 1 T − 2 X T Y =\begin{pmatrix} x_{11}\\ x_{22}\\ \vdots\\ x_{mm}\\ \end{pmatrix}_{m\times1} + \begin{pmatrix} y_{11}\\ y_{22}\\ \vdots\\ y_{nn}\\ \end{pmatrix}^T_{n\times1} - 2X^TY =x11x22xmmm×1+y11y22ynnn×1T2XTY

这里的 G X G_X GX G Y G_Y GY的对角线元素会经过numpy数组的广播机制(broadcasting),分别广播成 n × n n\times n n×n m × m m\times m m×m的矩阵,进行相加。EDM矩阵的大小应为 m × n m\times n m×n

例子:
X = ( 1 2 3 2 3 4 0 1 2 ) 3 × 3 T ; Y = ( 1 2 3 4 3 2 ) 2 × 3 T X = \begin{pmatrix} 1&2&3\\ 2&3&4\\ 0&1&2\\ \end{pmatrix}^T_{3\times 3}; Y = \begin{pmatrix} 1&2&3\\ 4&3&2\\ \end{pmatrix}^T_{2\times 3} X=1202313423×3T;Y=(142332)2×3T

Python代码:

import numpy as np


def euclidean_dist(x, y=None):
    """
    Compute all pairwise distances between vectors in X and Y matrices.
    :param x: numpy array, with size of (d, m)
    :param y: numpy array, with size of (d, n)
    :return: EDM:   numpy array, with size of (m,n). 
                    Each entry in EDM_{i,j} represents the distance between row i in x and row j in y.
    """
    if y is None:
        y = x

    # calculate Gram matrices
    G_x = np.matmul(x.T, x)
    G_y = np.matmul(y.T, y)

    # convert diagonal matrix into column vector
    diag_Gx = np.reshape(np.diag(G_x), (-1, 1))
    diag_Gy = np.reshape(np.diag(G_y), (-1, 1))

    # Compute Euclidean distance matrix
    EDM = diag_Gx + diag_Gy.T - 2*np.matmul(x.T, y)  # broadcasting
    
    print('This is matrix EDM: ')
    print(EDM)
    return EDM


x = np.array([[1.0, 2.0, 3.0], [2.0, 3.0, 4.0], [0.0, 1.0, 2.0]]).T
y = np.array([[1.0, 2.0, 3.0], [4.0, 3.0, 2.0]]).T
euclidean_dist(x,y)

输出: E D M = ( 0 11 3 8 3 20 ) 3 × 2 EDM = \begin{pmatrix} 0&11\\ 3&8\\ 3&20\\ \end{pmatrix}_{3\times 2} EDM=033118203×2

备注: 欧式距离矩阵中的元素为距离的平方,若需要距离,可用np.sqrt(EDM)开方。

参考文献:

  1. https://www.dabblingbadger.com/blog/2020/2/27/implementing-euclidean-distance-matrix-calculations-from-scratch-in-python
  2. Dokmanic, Ivan, et al. “Euclidean distance matrices: essential theory, algorithms, and applications.” IEEE Signal Processing Magazine 32.6 (2015): 12-30.
  3. Albanie, Samuel. “Euclidean Distance Matrix Trick.” Retrieved from Visual Geometry Group, University of Oxford (2019).
  4. https://ccrma.stanford.edu/~dattorro/EDM.pdf
;