Pragmatic Programming Techniques: Machine Learning with Linear Model

Sunday, November 8, 2009

Machine Learning with Linear Model

Linear Model is a family of model-based learning approaches that assume the output y can be expressed as a linear algebraic relation with the input attributes x1, x2 ...

The input attributes x1, x2 ... is expected to be numeric and the output is expected to be numeric as well.

Here our goal is to learn the parameters of the underlying model, which is the coefficients.

Linear Regression

Here the input and output are both numeric, related through a simple linear relationship. The learning goal is to figure out the hidden weight value (ie: the W vector).

Notice that non-linear relationship is equivalent of a linear relationship at a higher dimension. e.g. if x2 = x1 * x1, then it becomes a quadratic relationship. Because of this, the polynomial regression can be done using linear regression technique.

Given a batch of training data, we want to figure out the weight vector W such that the total sum of error (which is the difference between the predicted output and the actual output) to be minimized.

Instead of using the batch processing approach, a more effective approach is to learn incrementally (update the weight vector for each input data) using a gradient descent approach.

Gradient Descent

Gradient descent is a very general technique that we can use to incrementally adjust the parameters of the linear model. The basic idea of "gradient descent" is to adjust each dimension (w0, w1, w2) of the W vector according to their contribution of the square error. Their contribution is measured by the gradient along the dimension which is the differentiation of the square error with respect to w0, w1, w2.

In the case of Linear Regression ...

Logistic Regression

Logistic Regression is used when the output y is binary and not a real number. The first part is the same as linear regression while a second step sigmod function is applied to clamp the output value between 0 and 1.

We use the exact same gradient descent approach to determine the weight vector W.

Neural Network

Inspired by how our brain works, Neural network organize many logistic regression units into layers of perceptrons (each unit has both input and outputs in binary form).

Learning in Neural network is to discover all the hidden values of w. In general, we use the same technique above to adjust the weight using gradient descent layer by layer. We start from the output layer and move towards the input layer (this technique is called backpropagation). Except the output layer, we don't exactly know the error at the hidden layer, we need to have a way to estimate the error at the hidden layers.

But notice there is a symmetry between the weight and the input, we can use the same technique how we adjust the weight to estimate the error of the hidden layer.

1 comment:

Anonymous said...: Hi Mr. Ho, first of all congratulation for your blog. It's so interesting! :)

In respect of Neural Networks, there are different tecniques we can use to adjust weights and they substantially depend from neural network topology.

In particular there are:
- Wiener-Hopf Method
- Steepest Descent Method (your Gradient Algorithm)
- Least-Mean-Square Algorithm (Stocastic Gradient Algorithm)

With Kolmogorov theorem we know that each continue, limited and mototone function (with n variables) could be represented as sum of many mono-variable functions. The problem is that this is a Theorem of existence: it say that exists a set of functions but do not say how we can calculate them. For this reason we use a Neural Network.

Bye and congratulation again.; December 2, 2009 at 12:39 AM