Take-Away Notes for Machine Learning by Stanford University on
Coursera.
Week 2, Lecture 4-5
"Linear Regression
with Multiple Variables"
How linear regression can be extended to accommodate multiple input
features.
Environment Setup
Instructions
Omitted.
Multivariate Linear
Regression
Multiple Features
Notations:
value of feature
in the training example
the input(features) of
the training example
= the number of training
examples
= the number of features
Hypothesis:
Gradient Descent
repeat until convergence{
$ _j := _j - {m}{i=1}(h{}(x{(i)}) - y^(i)) $
} for
Feature Scaling & Mean Normalization speed
up gradient descent by having each of our input values in roughly the
same range, e.g.
where stands for the
average of all the values for feature (i) and is either the ranges of or the standard
deviation.
Learning Rate
Debugging Gradient Descent:
If ever increases,
then probably should be
decreased.
Automatic Convergence Test:
Declare convergence if decreases by less than in one iteration, where is some small value, e.g. .
Features & Polynomial
Regression
Combine multiple features into one
Change the behavior or curve of the hypothesis function by making it quadratic, cubic
or square root or functions of any other form.
Computing Parameters
Analytically
Normal Equation
Note: There is no need to do feature scaling with the normal
equation.
Gradient Descent
Normal Equation
Need to choose
No need to choose
Need many iterations
No need to iterate
, need to calculate
inverse of
Works well when is large
Slow if is very large
Where:
refers to the learning
rate;
refers to the number of
attributes of inputs or features;
Normal Equation
Non-Invertibility
For :
1 2
pinv(X' * X) * X' * y % pinv stands for `pseudo-invert`, % which even works for singular/degenerate variables
is non-invertible if: -
Redundant Features(Linearly Dependent) - Too many features (e.g. )
Solutions to the above problems include deleting a feature that is
linearly dependent with another or deleting one or more features when
there are too many features.
%%% Plotting Data %%% t = [0:0.01:1.0] y1 = sin(2*p*4*t) y2 = cos(2*pi*4*t)
plot(t, y1) hold on % stops refreshing & allows further modifications with prompts plot(t, y2, 'r') xlabel('time') ylabel('value') lengend('sin', 'cos') % notations print -dpng 'myplot.png' % save files close % literally as it is
figure(1); plot(t, y1) figure(2); ploy(t, y2)
subplot(1, 2, 1); % divides plot a 1*2 grid, access first element plot(t, y1) subplot(1, 2, 2) % allows 2 plots in a single figure window plot(t, y2)
%%% Control Statements: for, while, if Statement %%%
temp = zeros (10, 1)
for i = 1:10, v(i) = 2^i; end; % for loop
indices = 1:10 % for i = indices do the same
while i <= 5, v(i) = 100 + i; i = i + 1; if i == 5, break; % if-breaks end; end; % while loop
if v(1) == 1, disp('v_1 is 1'); elseif v(1) == 2, disp('v_1 is 2'); else disp('v_1 is unknown') % if-else
% Octave search path(advanced/optional) addpath('C:\Users') % add the directory/path of self-defined functions
Cost Function J: Least Mean Square
1 2 3 4 5 6 7 8 9
function J = costFunctionJ(X, y, theta) % X is the "design matrix" containing our training examples; % y is the class labels
m = size (X, 1); % number of training examples pred = X * theta ; % predictions of hypothesis on all m examples sqrErrors = (pred - y).^2 ; % squared errors
%Vectorized Implementation% prediction = theta * x
...when you vectorize later algorithms that we'll see in this class,
there's good trick, whether in Octave or some other language like C++,
Java, for getting your code to run more efficiently.