The input attributes x1, x2 ... is expected to be numeric and the output is expected to be numeric as well.
Here our goal is to learn the parameters of the underlying model, which is the coefficients.
Linear Regression

Notice that non-linear relationship is equivalent of a linear relationship at a higher dimension. e.g. if x2 = x1 * x1, then it becomes a quadratic relationship. Because of this, the polynomial regression can be done using linear regression technique.
Given a batch of training data, we want to figure out the weight vector W such that the total sum of error (which is the difference between the predicted output and the actual output) to be minimized.

Gradient Descent

In the case of Linear Regression ...

Logistic Regression

We use the exact same gradient descent approach to determine the weight vector W.

Neural Network

Learning in Neural network is to discover all the hidden values of w. In general, we use the same technique above to adjust the weight using gradient descent layer by layer. We start from the output layer and move towards the input layer (this technique is called backpropagation). Except the output layer, we don't exactly know the error at the hidden layer, we need to have a way to estimate the error at the hidden layers.
But notice there is a symmetry between the weight and the input, we can use the same technique how we adjust the weight to estimate the error of the hidden layer.

1 comment:
Hi Mr. Ho, first of all congratulation for your blog. It's so interesting! :)
In respect of Neural Networks, there are different tecniques we can use to adjust weights and they substantially depend from neural network topology.
In particular there are:
- Wiener-Hopf Method
- Steepest Descent Method (your Gradient Algorithm)
- Least-Mean-Square Algorithm (Stocastic Gradient Algorithm)
With Kolmogorov theorem we know that each continue, limited and mototone function (with n variables) could be represented as sum of many mono-variable functions. The problem is that this is a Theorem of existence: it say that exists a set of functions but do not say how we can calculate them. For this reason we use a Neural Network.
Bye and congratulation again.
Post a Comment