Bootstrap

【matlab】线性回归

参考吴恩达的course

1 单变量线性回归

1.1 可视化数据集

数据集下载地址:
链接:http://pan.baidu.com/s/1bpezP2f 密码:mwcl

fprintf('Plotting Data ...\n')
data = load('ex1data1.txt');

% 第一列数和第二列数分别为x,y,
X = data(:, 1); 
%(97,1)

y = data(:, 2);
%(97,1)


m = length(y); 
% 训练样本的个数,97

figure; % open a new figure window

plot(X , y , 'rx' , 'MarkerSize' , 10);
%10是放大倍数,rx red的x

这里写图片描述

1.2 梯度下降

  损失函数为:

J(θ)=12mi=1m(hθ(x(i))y(i))2 J ( θ ) = 1 2 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) 2

  其中 i i 是代表每一个样本,一共有97个样本。hθ(x(i))表示预测值, y(i) y ( i ) 表示真实值, J(θ) J ( θ ) 损失函数,采用平方差损失

  单变量的 hθ h θ 如下:

hθ(x)=θTx=θ0+θ1x1 h θ ( x ) = θ T x = θ 0 + θ 1 x 1

  其中变量为 x1 x 1

   J(θ) J ( θ ) θ θ 求导结果为:

Jθ=1mi=1m(hθ(x(i))y(i))x(i)j ∂ J ∂ θ = 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x j ( i )

  梯度下降法更新如下:

θj:=θjα1mi=1m(hθ(x(i))y(i))x(i)j θ j := θ j − α 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x j ( i )

  梯度是上升最快的方向,我们要最小化损失函数 J(θ) J ( θ ) ,所以用负梯度, α α 是学习率。

1.2.1 初始化参数和设置

X = [ones(m, 1), data(:,1)]; 
% x加了一列,变成 (97,2)

theta = zeros(2, 1); 
% 初始化参数

  在x的左边加了一列,目的是把 θ0 θ 0 合并到矩阵运算中,直接 Xθ=1θ0+x1θ1 X · θ = 1 ∗ θ 0 + x 1 θ 1 即可,不用加 θ0 θ 0

  不这样处理就是 Xθ+θ0=x1θ1+θ0 X · θ + θ 0 = x 1 θ 1 + θ 0 ,不够简洁

  迭代次数为1500次,学习率为0.01

% Some gradient descent settings
iterations = 1500;

alpha = 0.01;

1.2.2 计算损失函数

  损失函数

J(θ)=12mi=1m(hθ(x(i))y(i))2 J ( θ ) = 1 2 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) 2

  调用computeCost()函数

computeCost(X, y, theta)

  具体实现为:

function J = computeCost(X, y, theta)

m = length(y); 
% 样本的数量

J = 0;
% 初始化

J = sum((X*theta - y).^2) / (2 * m);
% 计算损失

end

1.2.3 最优化 J(θ) J ( θ )

  利用梯度下降法,找到 J(θ) J ( θ ) 的全局最优解,保存最优的 θ θ 值,调用gradientDescent()函数,返回最优的 θ θ 值。

θj:=θjα1mi=1m(hθ(x(i))y(i))x(i)j θ j := θ j − α 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x j ( i )

theta = gradientDescent(X, y, theta, alpha, iterations);

  具体实现为:

function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)

m = length(y); 
% 样本数量

J_history = zeros(num_iters, 1);
% 记录J最优化的过程,(1500,1),all elements is zero

for iter = 1:num_iters
    H = X * theta;
    %(97,2)*(2*1)=(97,1)
    T = [0 ; 0];
    %(2,1),记录梯度

    for i = 1 : m,
        T = T + (H(i) - y(i)) * X(i,:)';    
        % (1,1)*(1*2)的转置,结果为(2,1)
    end

    theta = theta - (alpha * T) / m;

    J_history(iter) = computeCost(X, y, theta);
    % theta带入,调用损失函数,计算损失,并记录在J_history中

end
end

  显示最优theta参数

fprintf('Theta found by gradient descent: ');
fprintf('%f %f \n', theta(1), theta(2));
% (-3.630291,1.166362)

  可视化 J(θ) J ( θ ) 下降的过程

hold on; 
% keep previous plot visible

plot(X(:,2), X*theta, '-')

legend('Training data', 'Linear regression')

hold off 
% don't overlay any more plots on this figure

这里写图片描述

  用X和最优预测theta预测两个y

% Predict values for X 35,000 and 70,000

predict1 = [1, 3.5] *theta;
fprintf('For X = 35,000, we predict a y %f\n', predict1*10000);

predict2 = [1, 7] * theta;
fprintf('For X = 70,000, we predict a y %f\n',predict2*10000);

  For X = 35,000, we predict a y 4519.767868
  For X = 70,000, we predict a y 45342.450129

1.2.4 可视化 J(θ) J ( θ )

fprintf('Visualizing J(theta_0, theta_1) ...\n')
%(-3.630291,1.166362)

% Grid over which we will calculate J
theta0_vals = linspace(-10, 10, 100);
%用于产生x1,x2之间的N点行矢量,相邻数据跨度相同。其中x1、x2、N分别为起始值、终止值、元素个数。
theta1_vals = linspace(-1, 4, 100);

% initialize J_vals to a matrix of 0's
J_vals = zeros(length(theta0_vals), length(theta1_vals));
%(100,100),all elemens are zero

% Fill out J_vals
for i = 1:length(theta0_vals)
    for j = 1:length(theta1_vals)
      t = [theta0_vals(i); theta1_vals(j)];    
      J_vals(i,j) = computeCost(X, y, t);
    end
end


% Because of the way meshgrids work in the surf command, we need to 
% transpose J_vals before calling surf, or else the axes will be flipped
J_vals = J_vals';
% Surface plot
figure;
surf(theta0_vals, theta1_vals, J_vals)
%我是要用三个向量,x,y,z,画个三维曲面, z表示高度,surf函数中要求z是矩阵

xlabel('\theta_0'); ylabel('\theta_1');

% Contour plot
figure;
% Plot J_vals as 15 contours spaced logarithmically between 0.01 and 100
contour(theta0_vals, theta1_vals, J_vals, logspace(-2, 3, 20))
%x=logspace(a,b,n)生成有n个元素的对数等分行向量x,且x(1)=10的a次方,x(n)=10的b次方
%contour(X,Y,Z,n)
%其中z 必须是二维数组,n表示所画等高线的条数,
%v 可以表示你想画的等高线上函数值 即z的值,
%x,y 必须是和z对应的网格数组 常常用meshgrid生成~~

xlabel('\theta_0'); ylabel('\theta_1');
hold on;
plot(theta(1), theta(2), 'rx', 'MarkerSize', 10, 'LineWidth', 2);

这里写图片描述

这里写图片描述

2 多变量线性回归

2.1 载入数据集

数据集下载:
链接:http://pan.baidu.com/s/1bQUFoE 密码:29fv

%% Load Data
data = load('ex1data2.txt');
%(47,3)
X = data(:, 1:2);
%(47,2),取前两列
y = data(:, 3);
%(47,1),取第三列
m = length(y);
%47

2.2 数据归一化

  调用featureNormalize()函数,mu和sigma分别为均值和方差

[X mu sigma] = featureNormalize(X);

  具体实现如下:

function [X_norm, mu, sigma] = featureNormalize(X)

X_norm = X;
%(47,2)

mu = zeros(1, size(X, 2));
%size(X, 2)取X的第二维,mu大小为(1,2)

sigma = zeros(1, size(X, 2));
%(1,2)

m = size(X , 1);
%47

mu = mean(X);
%计算均值  2000.7    3.2

for i = 1 : m,
    X_norm(i, :) = X(i , :) - mu;
end

sigma = std(X);
%794.7024    0.7610

for i = 1 : m,
    X_norm(i, :) = X_norm(i, :) ./ sigma;
end
end

  注意预测的时候,输入X也要进行归一化

2.3 梯度下降

2.3.1 预处理

alpha = 0.3;
% 学习率 0.3, 0.1, 0.03, 0.01,建议这样调节,不同学习率画出来的损失函数下降过程不一样,自己试试

num_iters = 50;
% 迭代次数

theta = zeros(3, 1);
% Init Theta and Run Gradient Descent 

  调用gradientDescentMulti()

X = [ones(m, 1) X];
% X左边加一列,原理见单变量线性回归(47,3)

[theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters);

  具体实现为:

function [theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters)

m = length(y); % number of training examples
%m=47

J_history = zeros(num_iters, 1);

n = size(X , 2);
%n=3

for iter = 1:num_iters
    H = X * theta;
    %(47,3)*(3*1) = (47*1)
    T = zeros(n , 1);
    %3*1
    for i = 1 : m,
        T = T + (H(i) - y(i)) * X(i,:)';    
    end

    theta = theta - (alpha * T) / m;

    % Save the cost J in every iteration    
    J_history(iter) = computeCostMulti(X, y, theta);
end
end

2.3.2 可视化损失函数下降过程

figure;
plot(1:numel(J_history), J_history, '-b', 'LineWidth', 2);
xlabel('Number of iterations');
ylabel('Cost J');

% Display gradient descent's result
fprintf('Theta computed from gradient descent: \n');
fprintf(' %f \n', theta);
fprintf('\n');

这里写图片描述

这里写图片描述

2.4 预测

   x1 x 1 x2 x 2 分别为1650,3预测 y 的值,注意要对 x1 x 1 x2 x 2 归一化,才能代入预测

y = 0; 
y =  [1, (1650-mu)/sigma, (3-mu)/sigma] *theta

  预测的y值为 308408.290853

2.5 Normal Equations方法

θ=(XTX)1XTy θ = ( X T X ) − 1 X T y

data = csvread('ex1data2.txt');
X = data(:, 1:2);
y = data(:, 3);
m = length(y);

% Add intercept term to X
X = [ones(m, 1) X];

% Calculate the parameters from the normal equation
theta = normalEqn(X, y);

  normalEqn()函数用来计算theta,具体实现如下:

function [theta] = normalEqn(X, y)
theta = zeros(size(X, 2), 1);
theta = pinv(X' * X) * X' * y;

%pinv(a)是求伪逆矩阵,逆矩阵函数inv只能对方阵求逆,pinv(a)可以对非方阵求逆。
end

  直接求出 θ θ ,不用归一化处理,不过还是需要在 X X <script type="math/tex" id="MathJax-Element-33">X</script> 左边加一列的,下面看看预测的值

y = 0; 
% You should change this
y =  [1, 1650, 3] *theta

  预测值y为293081.464335,与梯度下降法的结果308408.290853差不多

;