ML by Stanford: Wk2
Ch'i YU Lv3

Take-Away Notes for Machine Learning by Stanford University on Coursera.

Week 2, Lecture 4-5

"Linear Regression with Multiple Variables"

How linear regression can be extended to accommodate multiple input features.

Environment Setup Instructions

Omitted.

Multivariate Linear Regression

Multiple Features

Notations:

value of feature in the training example

the input(features) of the training example

= the number of training examples

= the number of features

Hypothesis:

Gradient Descent

repeat until convergence{

$ _j := _j - {m}{i=1}(h{}(x{(i)}) - y^(i)) $

} for

  1. Feature Scaling & Mean Normalization speed up gradient descent by having each of our input values in roughly the same range, e.g.

    where stands for the average of all the values for feature (i) and is either the ranges of or the standard deviation.

  2. Learning Rate

    • Debugging Gradient Descent:
      • If ever increases, then probably should be decreased.
    • Automatic Convergence Test:
      • Declare convergence if decreases by less than in one iteration, where is some small value, e.g. .

Features & Polynomial Regression

  • Combine multiple features into one
  • Change the behavior or curve of the hypothesis function by making it quadratic, cubic or square root or functions of any other form.

Computing Parameters Analytically

Normal Equation

Note: There is no need to do feature scaling with the normal equation.

Gradient Descent Normal Equation
Need to choose No need to choose
Need many iterations No need to iterate
, need to calculate inverse of
Works well when is large Slow if is very large

Where:

refers to the learning rate;

refers to the number of attributes of inputs or features;


Normal Equation Non-Invertibility

For :

1
2
pinv(X' * X) * X' * y % pinv stands for `pseudo-invert`,
% which even works for singular/degenerate variables

is non-invertible if: - Redundant Features(Linearly Dependent) - Too many features (e.g. )

Solutions to the above problems include deleting a feature that is linearly dependent with another or deleting one or more features when there are too many features.

Octave / Matlab Tutorial

Basic Operations:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
%%% Basic Operations %%%
% Add / Sub / Mult / Div / Power omitted.

1 == 2 % False => 0
1 ~= 2 % True => 1
1 && 0 % And
1 || 0 % Or

PS!('>> '); % Change Octave Prompts

disp(pi); % 4 decimals as default(short)

disp(sprintf('2 decimals: %0.2f, pi'))

disp(sprintf('6 decimals: %0.4f, pi'))

format long % 16 valid decimals
format short % 4 valid decimals

A = [1, 2; 3, 4; 5, 6] % assign a Matrix

v = [1; 2; 3] % assign a Vector

v1 = 1:0.1:2 % vector of 1.0, 1.1, ..., 2.0 in short format

v2 = 1:6 % vector of 1, 2, 3, 4, 5, 6

rand(3, 3)
randn(1, 3) % build matrix of random values

eye(3) % 3 by 3 Identity Matrix

help eye % return help info of func eye
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
%%% Moving Data Around %%%
m, n = size(A)

length(A) % longest Dimension of a Matrix

pwd % current directory
cd % move to ...
ls % display ...

load somefile.dat

who % display variables in the current scope
whos % display variables in the current scope with details

clear % clear all variables in the current scope

v = priceY(1:10) % Top 10 values of vector priceY

save data.mat % save data in bin
save data.txt % save data in txt

A_32 = A(3, 2) % 3rd row, 2nd column

A([1, 3], :) % get all elements of A who's 1st index is 1 or 3

A(:, 2) % get all elements of A who's 2nd index is 2

A = [A, [1, 10, 1]] % append another column vector to right

A(:, 2) = [10; 11; 12] % assign this vector to the 2nd column of A

A(:) % put all elements of A into a single vector

C = [A, B] % concat 2 matrix with A on left, B on right

D = [A; B] % concat 2 matrix with A on top, B on button

Computing on Data:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
%%% Computing on Data %%%

A = [1, 2; 3, 4; 5, 6]
B = [11, 12; 13, 14; 15, 16]
C = [1, 2; 2, 2]

% Note: in octave `.` usually represents element-wise operations

A .* B % element-wise mult
A * C % matrix mult

v = [1; 2; 4]

1./ v % element-wise div

v + ones(length(v, 1)) % element-wise + 1 for vector v

A' % matrix transpose

[val, ind] = max(v) % val = 4, ind = 3

max(A) % returns column-wise maximum

magic(4) % generates 4 by 4 matrix

[r, c] = find(A >= 7) % return r as row-index and c as column index
% pointing to target elements

sum(a)
prod(a) % literally as what it is
floor(a) % round down
ceil(a) % round up

max(A, [], 1) % per column maximum; e.g. returns [8, 9, 7]
max(A, [], 2) % per row maximum; e.g. returns [8; 9; 7]

sum(A, 1) % column-wise sum
sum(A, 2) % row-wise sum
sum(sum(A)) % sum all elements

pinv(A) % pseudo-transpose of a matrix

Plotting Data:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
%%% Plotting Data %%%
t = [0:0.01:1.0]
y1 = sin(2*p*4*t)
y2 = cos(2*pi*4*t)

plot(t, y1)
hold on % stops refreshing & allows further modifications with prompts
plot(t, y2, 'r')
xlabel('time')
ylabel('value')
lengend('sin', 'cos') % notations
print -dpng 'myplot.png' % save files
close % literally as it is

figure(1); plot(t, y1)
figure(2); ploy(t, y2)

subplot(1, 2, 1); % divides plot a 1*2 grid, access first element
plot(t, y1)
subplot(1, 2, 2) % allows 2 plots in a single figure window
plot(t, y2)

axis[0.5, 1, -1, 1]

clf % clear figure

imagesc(A), coloarbar, colormap gray; % heat map

Control Statements: for, while, if Statement

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
%%% Control Statements: for, while, if Statement %%%

temp = zeros (10, 1)

for i = 1:10,
v(i) = 2^i;
end; % for loop

indices = 1:10 % for i = indices do the same

while i <= 5,
v(i) = 100 + i;
i = i + 1;
if i == 5,
break; % if-breaks
end;
end; % while loop

if v(1) == 1,
disp('v_1 is 1');
elseif v(1) == 2,
disp('v_1 is 2');
else
disp('v_1 is unknown') % if-else

% Octave search path(advanced/optional)
addpath('C:\Users') % add the directory/path of self-defined functions

Cost Function J: Least Mean Square

1
2
3
4
5
6
7
8
9
function J = costFunctionJ(X, y, theta)
% X is the "design matrix" containing our training examples;
% y is the class labels

m = size (X, 1); % number of training examples
pred = X * theta ; % predictions of hypothesis on all m examples
sqrErrors = (pred - y).^2 ; % squared errors

J = 1 /(2 * m) * sum(sqrErrors);

Vectorization:

1
2
3
4
5
6
7
8
9
10
%%% Vectorization %%%

%Unvectorized Implemetation%
prediction = 0.0;
for j = 1:n+1,
prediction = prediction + theta(j) * x(j);
end;

%Vectorized Implementation%
prediction = theta * x

...when you vectorize later algorithms that we'll see in this class, there's good trick, whether in Octave or some other language like C++, Java, for getting your code to run more efficiently.