Bootstrap

菜鸟入门_Python_机器学习(4)_PCA和MDA降维和聚类


@sprt
*写在开头:博主在开始学习机器学习和Python之前从未有过任何编程经验,这个系列写在学习这个领域一个月之后,完全从一个入门级菜鸟的角度记录我的学习历程,代码未经优化,仅供参考。有错误之处欢迎大家指正。
系统:win7-CPU;
编程环境:Anaconda2-Python2.7,IDE:pycharm5;
参考书籍:
《Neural Networks and Learning Machines(Third Edition)》- Simon Haykin;
《Machine Learning in Action》- Peter Harrington;
《Building Machine Learning Systems with Python》- Wili Richert;
C站里都有资源,也有中文译本。
我很庆幸能跟随老师从最基础的东西学起,进入机器学习的世界。*


降维和聚类算是无监督学习的重要领域,还是那句话,不论是PCA、MDA还是K-means聚类,网上大牛总结的杠杠的,给几个参考链接:
http://www.cnblogs.com/jerrylead/archive/2011/04/18/2020209.html
http://bbezxcy.iteye.com/blog/2090591
http://www.tuicool.com/articles/7nIvum
http://www.cnblogs.com/python27/p/MachineLearningWeek08.html
http://blog.pluskid.org/?p=407
http://www.cnblogs.com/Key-Ky/archive/2013/11/24/3440684.html
http://www.cnblogs.com/coser/archive/2013/04/10/3013044.html

PCA和MDA的推导过程都是手推,本来想拍照发上来,但前几次‘作’过之后实在提不起兴趣,还好有小伙伴(妹子)总结的很好:
http://blog.csdn.net/totodum/article/details/51049165
http://blog.csdn.net/totodum/article/details/51097329

来看我们这次课的任务:

•数据Cat4D3Groups是4维观察数据,
•请先采用MDS方法降维到3D,形成Cat3D3Groups数据,显示并观察。
•对Cat3D3Groups数据采用线性PCA方法降维到2D,形成Cat2D3Groups数据,显示并观察。
•对Cat2D3Groups数据采用K-Mean方法对数据进行分类并最终确定K,显示分类结果。
•对Cat2D3Groups数据采用Hierarchical分类法对数据进行分类,并显示分类结果。

理论一旦推导完成,代码写起来就很轻松:

Part 1:降维处理
MDA:

# -*- coding:gb2312 -*-
from pylab import *
import numpy as np
from mpl_toolkits.mplot3d import Axes3D

def print_D(data):
    N = np.shape(data)[0]
    d = np.zeros((N, N))
    for i in range(N):
        c = data[i, :]
        for j in range(N):
            e = data[j, :]
            d[i, j] = np.sqrt(np.sum(np.power(c - e, 2)))
    return d

def MDS(D, K):
    N = np.shape(D)[0]
    D2 = D ** 2
    H = np.eye(N) - 1.0/N
    T = -0.5 * np.dot(np.dot(H, D2), H)
    eigVal, eigVec = np.linalg.eig(T)
    indices = np.argsort(eigVal) # 返回从小到大的索引值
    indices = indices[::-1] # 反转

    eigVal = eigVal[indices] # 特征值从大到小排列
    eigVec = eigVec[:, indices] # 排列对应特征向量

    m = eigVec[:, :K]
    n = np.diag(np.sqrt(eigVal[:K]))
    X = np.dot(m, n)

    return X

# test
'''
data = genfromtxt("CAT4D3GROUPS.txt")
D = print_D(data)
# print D

# 4D 转 3D
CAT3D3GROUPS = MDS(D, 3)
# print CAT3D3GROUPS
# D_3D = print_D(CAT3D3GROUPS)
# print D_3D
figure(1)
ax = subplot(111,projection='3d')
ax.scatter(CAT3D3GROUPS[:, 0], CAT3D3GROUPS[:, 1], CAT3D3GROUPS[:, 2], c = 'b')
ax.set_zlabel('Z') #坐标轴
ax.set_ylabel('Y')
ax.set_xlabel('X')
title('MDS_4to3')

# 4D 转 2D
CAT2D3GROUPS = MDS(D, 2)
# print CAT2D3GROUPS
# D_2D = print_D(CAT2D3GROUPS)
# print D_2D
figure(2)
plot(CAT2D3GROUPS
;