Hi Mr. Ho, first of all congratulation for your blog. It's so interesting! :)

In respect of Neural Networks, there are different tecniques we can use to adjust weights and they substantially depend from neural network topology.

In particular there are:
 - Wiener-Hopf Method
 - Steepest Descent Method (your Gradient Algorithm)
 - Least-Mean-Square Algorithm (Stocastic Gradient Algorithm)

With Kolmogorov theorem we know that each continue, limited and mototone function (with n variables) could be represented as sum of many mono-variable functions. The problem is that this is a Theorem of existence: it say that exists a set of functions but do not say how we can calculate them. For this reason we use a Neural Network.

Bye and congratulation again.