Take-Away Notes for Machine Learning by Stanford University on Coursera.
Week 1, Lecture 1-3
"Introduction"
Introduce the core idea of teaching a computer to learn concepts using data without being explicitly programmed.
Definition of Machine Learning
A computer program is said to learn from experience E
with respect to some class of tasks T
and performance
measure P
, if its performance at tasks in T
,
as measured by P
, improves with experience
E
.
Supervised & Unsupervised Learning
- Supervised Learning
- Trained with labeled data
- Classification(Discrete),
Regression(Continuous)
Multi-Class Classification
Decision Tree
Bayesian Logic
Linear Regression
Logistic Regression
Support Vector Machine
- Unsupervised Learning
- Trained with unlabeled data
- Clustering
KNN
Apriori
Algorithm
"Linear Regression with One Variable"
Linear regression predicts a real-valued output based on an input value.
Model & Cost Function
Model Representation
Hypothesis:
Parameters:
Cost Function
Goal:
Model Notations - input
variable /
features - output
variable features -
e.g.
Cost Function: Least Mean Square/Mean Square Error
Purpose: Choose
Parameter Learning
Gradient Descent Algorithm
Mathematical Algorithm: Repeat Simultaneous Update Algorithm
i.e.
repeat until convergence{
}
where
if
if
The intuition behind the converge is that
"Linear Algebra Review"
Basic understanding of linear algebra is necessary for the rest of the course, especially as we begin to cover models with multiple variables.
Matrices & Vectors
Notations: - i
th row and j
th column of matrix A - A vector
with n
rows is referred to as an n
-dimensional
vector - Octave
/Matlab
will be 1-indexed. Note that for
some programming languages(e.g. Python
), the arrays are
0-indexed. - Matrices are usually denoted by uppercase names while
vectors are lowercase. - "Scalar" means that an object is a single
value, not a vector or matrix. -
Coding Kit: 1
2
3
4
5
6
7
8
9
10
11
12
13
14% Initializing Matrix
A = [1, 2, 3; 4, 5, 6; 7, 8, 9; 10, 11, 12]
% Initializing Vector
v = [1; 2; 3; 4; 5; 6; 7]
% Size of Matrix
[m, n] = size(A)
% Size of Vector
l = size(v)
% Indexed Term
A_23 = A(2, 3)
Addition & Scalar Multiplication
Element-wise Operations.
To add of subtract two matrices, their dimensions
must
be the same.
Coding Kit 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21% Initialize Matrix A and B
A = [1, 2, 4; 5, 3, 2]
B = [1, 3, 4; 1, 1, 1]
% Initialize Scalar s
s = 2
% Element-wise Addition
add_AB = A + B
% Element-wise Subtraction
sub_AB = A - B
% Scalar Multiplication
mult_As = A * s
% Scalar Division
div_As = A / s
% What happens if we have a Matrix + scalar?
add_As = A + s % Element-wise addition, e.g. A_ij = A_ij + s
Matrix Matrix(Vector) Multiplication
The general rule is, m x n matrix(vector) multiples a n x p matrix(vector) results in a m x p matrix(vector).
Matrices are NOT commutative
:
Matrices are associative
:
The identity matrix
See "Essence of Linear Algebra" for more tricks of fast calculation of matrix multiplication.
Coding Kit 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17% Initialize Matrix A
A = [1, 2, 3; 4, 5, 6;7, 8, 9]
% Initialize Vector v
v = [1; 1; 1]
% Initialize 3 by 3 Identity Matrix
I = eye(3)
% Multiply A * v
Av = A * v
% Multiply A * A
AA = A * A
% Multiply A * I
AI = A * A
Inverse & Transpose
The inverse of a matrix A is denoted as
The transposition of a matrix A is denoted as
Note: An invertible(also non-singular or non-degenerate) matrix is a matrix for which matrix inversion operation exists, given that the determinant of which is non-zero.
Coding Kit 1
2
3
4
5
6
7
8
9
10
11% Initialize Matrix A
A = [1, 2, 0; 0, 5, 6; 7, 0, 9]
% Transpose A
A_trans = A'
% Take the inverse of A
A_inv = inv(A)
% What is A^(-1)*A?
A_invA = inv(A)*A % Identity Matrix