【matlab】线性回归

参考吴恩达的course

1 单变量线性回归

1.1 可视化数据集

数据集下载地址：
链接：http://pan.baidu.com/s/1bpezP2f 密码：mwcl

fprintf('Plotting Data ...\n')
data = load('ex1data1.txt');

% 第一列数和第二列数分别为x，y，
X = data(:, 1); 
%（97，1）

y = data(:, 2);
%（97，1）


m = length(y); 
% 训练样本的个数，97

figure; % open a new figure window

plot(X , y , 'rx' , 'MarkerSize' , 10);
%10是放大倍数，rx red的x

1.2 梯度下降

损失函数为：

J (θ) = 1 2 m \sum i = 1 m (h θ (x (i)) - y (i)) 2

$J \left ( \theta\right )= \frac{1}{2m}\sum_{i=1}^{m}\left ( h_{\theta }\left ( x^{\left ( i \right )} \right )-y^{\left ( i \right )} \right )^{2}$

其中 $i$ 是代表每一个样本，一共有97个样本。 $h_{\theta }\left ( x^{\left ( i \right )} \right )$ 表示预测值， $y^{\left ( i \right )}$ 表示真实值， $J \left ( \theta\right )$ 损失函数，采用平方差损失

单变量的 $h_{\theta }$ 如下：

hθ(x)=θTx=θ0+θ1x1 h θ ( x ) = θ T x = θ 0 + θ 1 x 1 $h_{\theta }\left ( x \right )=\theta ^{T}x=\theta _{0}+\theta _{1}x_{1}$

其中变量为 $x_{1}$

$J \left ( \theta\right )$ 对 $\theta$ 求导结果为：

\partial J \partial θ = 1 m \sum i = 1 m (h θ (x (i)) - y (i)) x (i) j

$\frac{\partial J}{\partial \theta }= \frac{1}{m}\sum_{i=1}^{m}\left ( h_{\theta}(x^{(i)})-y^{(i)} \right )x_{j}^{(i)}$

梯度下降法更新如下：

θ j : = θ j - α 1 m \sum i = 1 m (h θ (x (i)) - y (i)) x (i) j

$\theta _{j}:=\theta _{j}-\alpha \frac{1}{m}\sum_{i=1}^{m}\left ( h_{\theta}(x^{(i)})-y^{(i)} \right )x_{j}^{(i)}$

梯度是上升最快的方向，我们要最小化损失函数 $J \left ( \theta\right )$ ，所以用负梯度， $\alpha$ 是学习率。

1.2.1 初始化参数和设置

X = [ones(m, 1), data(:,1)]; 
% x加了一列,变成 (97,2)

theta = zeros(2, 1); 
% 初始化参数

在x的左边加了一列，目的是把 $\theta_0$ 合并到矩阵运算中，直接 $X·\theta=1*\theta_0+x_1\theta_1$ 即可，不用加 $\theta_0$

不这样处理就是 $X·\theta+\theta_0=x_1\theta_1+\theta_0$ ，不够简洁

迭代次数为1500次，学习率为0.01

% Some gradient descent settings
iterations = 1500;

alpha = 0.01;

1.2.2 计算损失函数

损失函数

J (θ) = 1 2 m \sum i = 1 m (h θ (x (i)) - y (i)) 2

$J \left ( \theta\right )= \frac{1}{2m}\sum_{i=1}^{m}\left ( h_{\theta }\left ( x^{\left ( i \right )} \right )-y^{\left ( i \right )} \right )^{2}$

调用computeCost（）函数

computeCost(X, y, theta)

具体实现为：

function J = computeCost(X, y, theta)

m = length(y); 
% 样本的数量

J = 0;
% 初始化

J = sum((X*theta - y).^2) / (2 * m);
% 计算损失

end

1.2.3 最优化 $J \left ( \theta\right )$

利用梯度下降法，找到 $J \left ( \theta\right )$ 的全局最优解，保存最优的 $\theta$ 值，调用gradientDescent（）函数，返回最优的 $\theta$ 值。

θ j : = θ j - α 1 m \sum i = 1 m (h θ (x (i)) - y (i)) x (i) j

$\theta _{j}:=\theta _{j}-\alpha \frac{1}{m}\sum_{i=1}^{m}\left ( h_{\theta}(x^{(i)})-y^{(i)} \right )x_{j}^{(i)}$

theta = gradientDescent(X, y, theta, alpha, iterations);

具体实现为：

function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)

m = length(y); 
% 样本数量

J_history = zeros(num_iters, 1);
% 记录J最优化的过程，(1500,1),all elements is zero

for iter = 1:num_iters
    H = X * theta;
    %(97,2)*(2*1)=(97,1)
    T = [0 ; 0];
    %(2,1)，记录梯度

    for i = 1 : m,
        T = T + (H(i) - y(i)) * X(i,:)';    
        % (1,1)*(1*2)的转置，结果为（2，1）
    end

    theta = theta - (alpha * T) / m;

    J_history(iter) = computeCost(X, y, theta);
    % theta带入，调用损失函数，计算损失，并记录在J_history中

end
end

显示最优theta参数

fprintf('Theta found by gradient descent: ');
fprintf('%f %f \n', theta(1), theta(2));
% (-3.630291,1.166362)

可视化 $J \left ( \theta\right )$ 下降的过程

hold on; 
% keep previous plot visible

plot(X(:,2), X*theta, '-')

legend('Training data', 'Linear regression')

hold off 
% don't overlay any more plots on this figure

用X和最优预测theta预测两个y

% Predict values for X 35,000 and 70,000

predict1 = [1, 3.5] *theta;
fprintf('For X = 35,000, we predict a y %f\n', predict1*10000);

predict2 = [1, 7] * theta;
fprintf('For X = 70,000, we predict a y %f\n',predict2*10000);

For X = 35,000, we predict a y 4519.767868
For X = 70,000, we predict a y 45342.450129

1.2.4 可视化 $J \left ( \theta\right )$

fprintf('Visualizing J(theta_0, theta_1) ...\n')
%(-3.630291,1.166362)

% Grid over which we will calculate J
theta0_vals = linspace(-10, 10, 100);
%用于产生x1，x2之间的N点行矢量，相邻数据跨度相同。其中x1、x2、N分别为起始值、终止值、元素个数。
theta1_vals = linspace(-1, 4, 100);

% initialize J_vals to a matrix of 0's
J_vals = zeros(length(theta0_vals), length(theta1_vals));
%(100,100),all elemens are zero

% Fill out J_vals
for i = 1:length(theta0_vals)
    for j = 1:length(theta1_vals)
      t = [theta0_vals(i); theta1_vals(j)];    
      J_vals(i,j) = computeCost(X, y, t);
    end
end


% Because of the way meshgrids work in the surf command, we need to 
% transpose J_vals before calling surf, or else the axes will be flipped
J_vals = J_vals';
% Surface plot
figure;
surf(theta0_vals, theta1_vals, J_vals)
%我是要用三个向量，x,y,z，画个三维曲面， z表示高度，surf函数中要求z是矩阵

xlabel('\theta_0'); ylabel('\theta_1');

% Contour plot
figure;
% Plot J_vals as 15 contours spaced logarithmically between 0.01 and 100
contour(theta0_vals, theta1_vals, J_vals, logspace(-2, 3, 20))
%x=logspace(a,b,n)生成有n个元素的对数等分行向量x，且x(1)=10的a次方，x(n)=10的b次方
%contour(X,Y,Z,n)
%其中z 必须是二维数组，n表示所画等高线的条数，
%v 可以表示你想画的等高线上函数值 即z的值，
%x，y 必须是和z对应的网格数组 常常用meshgrid生成~~

xlabel('\theta_0'); ylabel('\theta_1');
hold on;
plot(theta(1), theta(2), 'rx', 'MarkerSize', 10, 'LineWidth', 2);

2 多变量线性回归

2.1 载入数据集

数据集下载：
链接：http://pan.baidu.com/s/1bQUFoE 密码：29fv

%% Load Data
data = load('ex1data2.txt');
%(47,3)
X = data(:, 1:2);
%(47,2)，取前两列
y = data(:, 3);
%(47,1)，取第三列
m = length(y);
%47

2.2 数据归一化

调用featureNormalize（）函数，mu和sigma分别为均值和方差

[X mu sigma] = featureNormalize(X);

具体实现如下：

function [X_norm, mu, sigma] = featureNormalize(X)

X_norm = X;
%(47,2)

mu = zeros(1, size(X, 2));
%size(X, 2)取X的第二维，mu大小为（1,2）

sigma = zeros(1, size(X, 2));
%(1,2)

m = size(X , 1);
%47

mu = mean(X);
%计算均值  2000.7    3.2

for i = 1 : m,
    X_norm(i, :) = X(i , :) - mu;
end

sigma = std(X);
%794.7024    0.7610

for i = 1 : m,
    X_norm(i, :) = X_norm(i, :) ./ sigma;
end
end

注意预测的时候，输入X也要进行归一化

2.3 梯度下降

2.3.1 预处理

alpha = 0.3;
% 学习率 0.3, 0.1, 0.03, 0.01，建议这样调节，不同学习率画出来的损失函数下降过程不一样，自己试试

num_iters = 50;
% 迭代次数

theta = zeros(3, 1);
% Init Theta and Run Gradient Descent

调用gradientDescentMulti（）

X = [ones(m, 1) X];
% X左边加一列，原理见单变量线性回归(47,3)

[theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters);

具体实现为：

function [theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters)

m = length(y); % number of training examples
%m=47

J_history = zeros(num_iters, 1);

n = size(X , 2);
%n=3

for iter = 1:num_iters
    H = X * theta;
    %(47,3)*(3*1) = (47*1)
    T = zeros(n , 1);
    %3*1
    for i = 1 : m,
        T = T + (H(i) - y(i)) * X(i,:)';    
    end

    theta = theta - (alpha * T) / m;

    % Save the cost J in every iteration    
    J_history(iter) = computeCostMulti(X, y, theta);
end
end

2.3.2 可视化损失函数下降过程

figure;
plot(1:numel(J_history), J_history, '-b', 'LineWidth', 2);
xlabel('Number of iterations');
ylabel('Cost J');

% Display gradient descent's result
fprintf('Theta computed from gradient descent: \n');
fprintf(' %f \n', theta);
fprintf('\n');

2.4 预测

$x_{1}$ 和 $x_{2}$ 分别为1650，3预测 y 的值，注意要对 $x_{1}$ 和 $x_{2}$ 归一化，才能代入预测

y = 0; 
y =  [1, (1650-mu)/sigma, (3-mu)/sigma] *theta

预测的y值为 308408.290853

2.5 Normal Equations方法

θ=(XTX)−1XTy θ = ( X T X ) − 1 X T y $\theta =\left ( X^{T}X \right )^{-1}X^{T}y$

data = csvread('ex1data2.txt');
X = data(:, 1:2);
y = data(:, 3);
m = length(y);

% Add intercept term to X
X = [ones(m, 1) X];

% Calculate the parameters from the normal equation
theta = normalEqn(X, y);

normalEqn（）函数用来计算theta，具体实现如下：

function [theta] = normalEqn(X, y)
theta = zeros(size(X, 2), 1);
theta = pinv(X' * X) * X' * y;

%pinv(a)是求伪逆矩阵，逆矩阵函数inv只能对方阵求逆，pinv(a)可以对非方阵求逆。
end

直接求出 $\theta$ ，不用归一化处理，不过还是需要在 X <script type="math/tex" id="MathJax-Element-33">X</script> 左边加一列的，下面看看预测的值

y = 0; 
% You should change this
y =  [1, 1650, 3] *theta

预测值y为293081.464335，与梯度下降法的结果308408.290853差不多