ML by Stanford: Wk4
Ch'i YU Lv3

Take-Away Notes for Machine Learning by Stanford University on Coursera.

Week 4, Lecture 8

Neural Networks: Representation

Motivations

Non-Linear Hypotheses

Some Machine Learning Problems includes too much features that may not suitable for Linear Regression or Logistic Regression.

Given Computer Vision as example:

A 50*50 RGB pixel image would contains 2500 pixels, and the dimension of feature size will be if it's a gray-scale one and if it's RGB images with separated red, green and blue channels.

Therefore , the computation of which would be really expensive to find and to represent all of these features per training samples.

Neurons and the Brain

The origins of Neural Networks was algorithms that try to mimic the brain.

It's pretty amazing to want extent is as if you can plug in almost every sensor to the brain and the brain's learning algorithms will just figure out how to learn from that data and deal with that data!

Neural Networks

Model Representation

The value for each of the "activation" nodes is obtained as follows:

where:

"activation" of unit in layer

matrix of weights controlling function mapping from layer to layer

The values for each of the "activation" nodes is obtained as follows:

Thus to compute the activation nodes by using a matrix of parameters: - Apply each row of the parameters to inputs to obtain the value for one activation node; - Apply Logistic Function to the sum of values of activation nodes which have been multiplied by yet another parameter matrix containing the weights for our second layer of nodes.

(Need Further Implementation though)

Dimensions

If a network has inits in layer and units in layer , then will be of dimension

Where the comes from the addition in of the bias nodes and .

In other words, the output nodes will do not include the bias nodes while the inputs will do.


Forward Propagation: Vectorized Implementation

Define a new variable that encompasses tha parameters inside the function.

Replace by the variable for all the parameters we would get:

In other words, for layer and node , the variable will be:

Where the vector representation of and is:

Therefore setting $x = a^{(1)}, we can rewrite the equation as:

where the function can be applied element-wise to vector .

The add a bias unit(equal to ) to layer after we have computed . This will be element .

To compute the final hypothesis:

Firstly compute another vector:

The last theta matrix will have only one row which is multiplied by one column so that our result is a single number.

We'll got our final result with:

Notice that in this last step, between layer j and layer j+1, we are doing exactly the same thing as we did in logistic regression. Adding all these intermediate layers in neural networks allows us to more elegantly produce interesting and more complex non-linear hypotheses.

Applications

Examples and Intuitions

To Be continued. See also in Shower Thoughts.

Multi-Class Classification

To define resulting classes as :

$$y = {

, , ... ,

}$$

Where each element represents a different input corresponding to the ith classes.

The inner layers, each provide with some new information wich leads to the final hypothesis function:

e.g. for cases of where , in which the resulting class is the third one down or what represents.