参考吴恩达的course
1 单变量线性回归
1.1 可视化数据集
数据集下载地址:
链接:http://pan.baidu.com/s/1bpezP2f 密码:mwcl
fprintf('Plotting Data ...\n')
data = load('ex1data1.txt');
% 第一列数和第二列数分别为x,y,
X = data(:, 1);
%(97,1)
y = data(:, 2);
%(97,1)
m = length(y);
% 训练样本的个数,97
figure; % open a new figure window
plot(X , y , 'rx' , 'MarkerSize' , 10);
%10是放大倍数,rx red的x
1.2 梯度下降
损失函数为:
其中 i i 是代表每一个样本,一共有97个样本。表示预测值, y(i) y ( i ) 表示真实值, J(θ) J ( θ ) 损失函数,采用平方差损失
单变量的 hθ h θ 如下:
其中变量为 x1 x 1
J(θ) J ( θ ) 对 θ θ 求导结果为:
梯度下降法更新如下:
梯度是上升最快的方向,我们要最小化损失函数 J(θ) J ( θ ) ,所以用负梯度, α α 是学习率。
1.2.1 初始化参数和设置
X = [ones(m, 1), data(:,1)];
% x加了一列,变成 (97,2)
theta = zeros(2, 1);
% 初始化参数
在x的左边加了一列,目的是把 θ0 θ 0 合并到矩阵运算中,直接 X⋅θ=1∗θ0+x1θ1 X · θ = 1 ∗ θ 0 + x 1 θ 1 即可,不用加 θ0 θ 0
不这样处理就是 X⋅θ+θ0=x1θ1+θ0 X · θ + θ 0 = x 1 θ 1 + θ 0 ,不够简洁
迭代次数为1500次,学习率为0.01
% Some gradient descent settings
iterations = 1500;
alpha = 0.01;
1.2.2 计算损失函数
损失函数
调用computeCost()函数
computeCost(X, y, theta)
具体实现为:
function J = computeCost(X, y, theta)
m = length(y);
% 样本的数量
J = 0;
% 初始化
J = sum((X*theta - y).^2) / (2 * m);
% 计算损失
end
1.2.3 最优化 J(θ) J ( θ )
利用梯度下降法,找到 J(θ) J ( θ ) 的全局最优解,保存最优的 θ θ 值,调用gradientDescent()函数,返回最优的 θ θ 值。
theta = gradientDescent(X, y, theta, alpha, iterations);
具体实现为:
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
m = length(y);
% 样本数量
J_history = zeros(num_iters, 1);
% 记录J最优化的过程,(1500,1),all elements is zero
for iter = 1:num_iters
H = X * theta;
%(97,2)*(2*1)=(97,1)
T = [0 ; 0];
%(2,1),记录梯度
for i = 1 : m,
T = T + (H(i) - y(i)) * X(i,:)';
% (1,1)*(1*2)的转置,结果为(2,1)
end
theta = theta - (alpha * T) / m;
J_history(iter) = computeCost(X, y, theta);
% theta带入,调用损失函数,计算损失,并记录在J_history中
end
end
显示最优theta参数
fprintf('Theta found by gradient descent: ');
fprintf('%f %f \n', theta(1), theta(2));
% (-3.630291,1.166362)
可视化 J(θ) J ( θ ) 下降的过程
hold on;
% keep previous plot visible
plot(X(:,2), X*theta, '-')
legend('Training data', 'Linear regression')
hold off
% don't overlay any more plots on this figure
用X和最优预测theta预测两个y
% Predict values for X 35,000 and 70,000
predict1 = [1, 3.5] *theta;
fprintf('For X = 35,000, we predict a y %f\n', predict1*10000);
predict2 = [1, 7] * theta;
fprintf('For X = 70,000, we predict a y %f\n',predict2*10000);
For X = 35,000, we predict a y 4519.767868
For X = 70,000, we predict a y 45342.450129
1.2.4 可视化 J(θ) J ( θ )
fprintf('Visualizing J(theta_0, theta_1) ...\n')
%(-3.630291,1.166362)
% Grid over which we will calculate J
theta0_vals = linspace(-10, 10, 100);
%用于产生x1,x2之间的N点行矢量,相邻数据跨度相同。其中x1、x2、N分别为起始值、终止值、元素个数。
theta1_vals = linspace(-1, 4, 100);
% initialize J_vals to a matrix of 0's
J_vals = zeros(length(theta0_vals), length(theta1_vals));
%(100,100),all elemens are zero
% Fill out J_vals
for i = 1:length(theta0_vals)
for j = 1:length(theta1_vals)
t = [theta0_vals(i); theta1_vals(j)];
J_vals(i,j) = computeCost(X, y, t);
end
end
% Because of the way meshgrids work in the surf command, we need to
% transpose J_vals before calling surf, or else the axes will be flipped
J_vals = J_vals';
% Surface plot
figure;
surf(theta0_vals, theta1_vals, J_vals)
%我是要用三个向量,x,y,z,画个三维曲面, z表示高度,surf函数中要求z是矩阵
xlabel('\theta_0'); ylabel('\theta_1');
% Contour plot
figure;
% Plot J_vals as 15 contours spaced logarithmically between 0.01 and 100
contour(theta0_vals, theta1_vals, J_vals, logspace(-2, 3, 20))
%x=logspace(a,b,n)生成有n个元素的对数等分行向量x,且x(1)=10的a次方,x(n)=10的b次方
%contour(X,Y,Z,n)
%其中z 必须是二维数组,n表示所画等高线的条数,
%v 可以表示你想画的等高线上函数值 即z的值,
%x,y 必须是和z对应的网格数组 常常用meshgrid生成~~
xlabel('\theta_0'); ylabel('\theta_1');
hold on;
plot(theta(1), theta(2), 'rx', 'MarkerSize', 10, 'LineWidth', 2);
2 多变量线性回归
2.1 载入数据集
数据集下载:
链接:http://pan.baidu.com/s/1bQUFoE 密码:29fv
%% Load Data
data = load('ex1data2.txt');
%(47,3)
X = data(:, 1:2);
%(47,2),取前两列
y = data(:, 3);
%(47,1),取第三列
m = length(y);
%47
2.2 数据归一化
调用featureNormalize()函数,mu和sigma分别为均值和方差
[X mu sigma] = featureNormalize(X);
具体实现如下:
function [X_norm, mu, sigma] = featureNormalize(X)
X_norm = X;
%(47,2)
mu = zeros(1, size(X, 2));
%size(X, 2)取X的第二维,mu大小为(1,2)
sigma = zeros(1, size(X, 2));
%(1,2)
m = size(X , 1);
%47
mu = mean(X);
%计算均值 2000.7 3.2
for i = 1 : m,
X_norm(i, :) = X(i , :) - mu;
end
sigma = std(X);
%794.7024 0.7610
for i = 1 : m,
X_norm(i, :) = X_norm(i, :) ./ sigma;
end
end
注意预测的时候,输入X也要进行归一化
2.3 梯度下降
2.3.1 预处理
alpha = 0.3;
% 学习率 0.3, 0.1, 0.03, 0.01,建议这样调节,不同学习率画出来的损失函数下降过程不一样,自己试试
num_iters = 50;
% 迭代次数
theta = zeros(3, 1);
% Init Theta and Run Gradient Descent
调用gradientDescentMulti()
X = [ones(m, 1) X];
% X左边加一列,原理见单变量线性回归(47,3)
[theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters);
具体实现为:
function [theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters)
m = length(y); % number of training examples
%m=47
J_history = zeros(num_iters, 1);
n = size(X , 2);
%n=3
for iter = 1:num_iters
H = X * theta;
%(47,3)*(3*1) = (47*1)
T = zeros(n , 1);
%3*1
for i = 1 : m,
T = T + (H(i) - y(i)) * X(i,:)';
end
theta = theta - (alpha * T) / m;
% Save the cost J in every iteration
J_history(iter) = computeCostMulti(X, y, theta);
end
end
2.3.2 可视化损失函数下降过程
figure;
plot(1:numel(J_history), J_history, '-b', 'LineWidth', 2);
xlabel('Number of iterations');
ylabel('Cost J');
% Display gradient descent's result
fprintf('Theta computed from gradient descent: \n');
fprintf(' %f \n', theta);
fprintf('\n');
2.4 预测
x1 x 1 和 x2 x 2 分别为1650,3预测 y 的值,注意要对 x1 x 1 和 x2 x 2 归一化,才能代入预测
y = 0;
y = [1, (1650-mu)/sigma, (3-mu)/sigma] *theta
预测的y值为 308408.290853
2.5 Normal Equations方法
data = csvread('ex1data2.txt');
X = data(:, 1:2);
y = data(:, 3);
m = length(y);
% Add intercept term to X
X = [ones(m, 1) X];
% Calculate the parameters from the normal equation
theta = normalEqn(X, y);
normalEqn()函数用来计算theta,具体实现如下:
function [theta] = normalEqn(X, y)
theta = zeros(size(X, 2), 1);
theta = pinv(X' * X) * X' * y;
%pinv(a)是求伪逆矩阵,逆矩阵函数inv只能对方阵求逆,pinv(a)可以对非方阵求逆。
end
直接求出 θ θ ,不用归一化处理,不过还是需要在 X X <script type="math/tex" id="MathJax-Element-33">X</script> 左边加一列的,下面看看预测的值
y = 0;
% You should change this
y = [1, 1650, 3] *theta
预测值y为293081.464335,与梯度下降法的结果308408.290853差不多