Understanding the Power of Neural Networks

Why Are Neural Networks So Powerful?
Well, that is a difficult question that has haunted us for about 30 years. The more successes we see from the applications of deep learning, the more puzzled we are. Why is it that we can plug in this “thing” to pretty much any problem, classification or  prediction, and with just some limited amount of tuning, we almost always can get good results. For curious minds, having such a powerful tool without knowing why it works is very unsettling: what if there is something even more powerful right around the corner?

As one works more and more with neural networks, this feeling of unsettling deepens. There are many design parameters, the number of hidden layers, number of nodes, the type of activation functions, etc, often have to be chosen only by the trial-and-error approach. Even more puzzling is the fact that neural networks are often considered hungry in training samples. At the same time, the back propagation training procedure is perhaps the most mysterious element of neural networks. Without understanding this procedure, it is hard to imagine how to improve its efficiency.

All of these problems essentially point at the same question that we have not answered: What exactly is a neural network doing? Where does its power come from?

Now here is an answer! Ready?
In short, Neural Networks extract from the data the most relevant part of the  information that describes the statistical dependence between the features and the labels. In other words, the size of a Neural Networks specifies a  data structure  that we can compute and store, and the result of training the network is the best approximation of the statistical relationship between the features and the labels that can be represented by this data structure.

I know you have two questions right away: '''REALLY? WHY?'''

The “why” part is a bit involved. We have a new paper that covers this. Briefly, we need to first define a metric that quantifies how valuable a piece of partial information is for a specific inference task, and then we can show that Neural Networks actually draw the most valuable part of information from the data. As a bonus, the same argument can also be used in understanding and comparing other learning algorithms,  PCA,  compressed sensing, etc. So everything ends up in the same picture. Pretty cool, huh? (Be aware, it is a loooong paper.)

In this page, we try to answer the “Really?” question. One way to do that is I can write a mathematical proof, which is included in the paper. Here, we will only try to numerically demonstrate some of the results, and maybe add just a little bit of mathematical and intuitive explanations. We will at the end of these experiments introduce a new metric, which can be used to measure how effective the network is. This is particularly useful when we are designing a complex network, and are having difficulties in choosing the network structure and the hyperparameters.

There are several goals of this page:


 * 1) For beginners of neural networks, we provide some links to the commonly used packages and comments in the codes. The examples we use are all simple and small. Hopefully this can be a good way to help you start programming;
 * 2) For engineers with extensive experience in using neural networks, we use the code as a common language to explain the main ideas of our research, and hope it can help you to design your next project more flexibly and more efficiently;
 * 3) For statisticians that are more interested in the theory, this page is only a supplimentary material, not a replacement, for the mathematical proofs in our paper. We hope that you find the experiments to be an effective tool to explain the ideas behind.

To follow the experiments:
You can just read the code and comments on this page, and trust me with the results; or you can run it yourself. To do that, you will need a standard Python environment, including Numpy, Matplotlib, etc. Also, you will need a standard neural network package. For that, I use Keras, and run it with TensorFlow. You can follow this link to get them installed. I recommend using Anaconda to install them, which takes, if you do it right, less than 10 minutes. Trust me, it’s a worthy effort. These packages are really well made and powerful. Of course, you are also welcome to just sit back, relax and enjoy the journey, as I will show you everything that you are supposed to see from these experiments.

You need to have the following lines to initialize.

If you receive no error message after these, then congratulation, you have installed your packages right.