Saturday, July 1, 2017

Deep Learning Demystified


Computers capable of thinking themselves was a topic of discussions for quite long time. It was easy to get carried away by the hype and hyperbole created by this that many of older generation brush aside these advancements as if those were scenes from a science fiction movie. Not any more , there are valid enough proofs to prove that some of those fictions may be true sooner or later.   Currently you can deposit a check to a machine and get the exact amount you need. The machine reads the amount in the check and deposit that amount to your account.  Amazon is able to tell you most probable items you may need based on the items you added onto your shopping cart. Your email system is able to recognize which email may be important for you and categorize the emails based on the content of the email.  Netflix may help you in choosing the next movie for you based on the movies you had already seen and the rating you gave for those movies. There are systems developed today which can predict the legal outcome of a case, if you provided the proofs required. Today medications can be prepared based on once individual gnome configuration. Black and White  Photos for which color photos are not available  can be converted to color. In this case learning from how black and white photos for which color photos were available, can be applied to photos were only black and white photos are available to create color photos of those photos. And we had just scratched the surface of possibilities, there is more to come in future.  

Although computers amazed humans with its ability to crunch numbers fast , computers always lacked human like thinking capability.  It was nothing more than a  machine which always wanted some kind of assistance. But in the last decade lot of advancements made in the field of Artificial Intelligence (AI) especially in the sub areas of Machine learning and Deep learning. This is the technology behind all the experiences I just described in the starting paragraph.  Machine learning involved feeding data about a subject and asking yes or no questions about the data based on the object the data was collected . Answers to the questions asked was treated as a learning in  relation to the object. As time goes by system learns to ask right kind of questions more accurately.  Provided a set of questions,  with object being unknown , system can tell exactly about nature of the  object based on the answers you gave to those questions. Welcome to machine learning. Now if the machine can go wrong and if you can educate the system , so that next time the machine will get a correct conclusion if  something similar happened , that’s deep learning. Engineers especially in the technology domain are already spending their  time designing systems which can ask right kind of questions and tackling difficult problems which traditional computers is not capable of solving.

As this article is intended to give a layman an idea about how all these things work under the hood, I want to come with a simple example.  We will take a simple challenge say given an  image with numbers your computer needs to  make  a best guess about what’s the digit  in the image. If you solve this problem and if you can  scale it to much larger real life problems , you can identify what’s written in signboard and even you can tell what language it is and what does that mean in a given context.  If you have an autonomous car, your car should be able to understand the signboards. isn’t it ? 

Take the example of below picture.  This is the pixel by pixel depiction of how the digits in an image will really appear as we zoom in. In the picture below we see 32 as a number represented in pixels.






Note that there are around 64 (8x8) pixels in each smaller square and total of 6 such squares to form the wired mesh above.  Now how we can detect pro-grammatically the digits in this pixels. One way of doing this is having filters. The idea is to slide 8x8 filters (something like shown below) over the above six boxes.  For example here is a filter which can use to detect a digit one (1)



similarly for digit three (3) you can have something like this below


So when you slide these type of filters and count the number of black boxes you get as this 8x8 filter moves thru the six boxes in the wired mesh, you may get  64 boxes dark boxes. Then that means the number represented is totally obscured and the full 8x8 box will appear dark. The  number in the filter matches the number present in the larger grid.  Remember that each number is represented in a way so that no filter will provide exact 64 black squares if there is no match with the digit in the filter. That way you will come to know the exact number involved. Exactly this is the way our brain also detects the object, but it does that quite efficiently and its really fast in doing that. Here we have ten filters for each digit, but in the case of our eye it will be millions of filters applied on what we see, that too in a nick of second.  Finally what it does is to do some analysis on the results we got to determine what we are seeing. 

Convolutional Neural Networks (CNN)

Machine learning is done  by mimicking what exactly the cells (neurons)  in human brain behave. The kind of problem discussed above is very much analogues to the way our brain detects objects thru our eyes. The above problem is an example of Convolution neural network (CNN) , but this is a very simple case,  in realty the problem space we have is much complex than this. But the model we used  can be  be scaled up to solve real world problems, like that faced by a autonomous car which tries to understand about the surroundings that too in a nick of second so that it can take reasonable action. For  real life problems we need powerful computers to do all these calculations. Sometimes it involves data centers running for weeks to months to solve problems based on its complexity. Sometimes the filters like above needs to be repeatedly applied to (epochs) solve a problem. 

The strides the square makes over the bigger wire frame is called a convolution.  The filter here in this case  taking a stride size of  8  (ie the size of each square)  pixels  as it proceeds with its scanning. In one stride for each square ,  it gets the number of black squares as numbers. Now in the next step it will calculate if it got a 64 black squares. If yes,  that's the number represented by the filter, otherwise the  number filter represents can me omitted from consideration.  This is a Convolutional neural network as simple as it can get to. 

Now lets analyse the way the neurons in our brain work and lets discuss about the  structure of a neuron and try to map the above problem which we solved the way neuron will handle. In the process we will try to relate it to some  machine learning jargon too. 



In the above picture you can see the neurons. Here one neuron will be connected to the next neuron thru the portion called  Synapse. The signals coming from the axon of a neuron passes thru the synapse at the end of axon and it transmits the signals to next dendrite which is a part of another neuron. Thus it forms a network of neurons.  Below is a diagram of connected neurons. 




As you can see the information coming thru several dendrites are consolidated some logic is applied (based on its previous learning) and then transmitted to the nearby neuron. The action where we applied a filter to the wire frame is analogous to the function of the above neuron. It gets some data then transforms into more meaningful content and then transfers to the next filter. Here one filter can tell if its the number that it expects or clearly tell the number that it is not. Each layer you go you get deep understanding about the number which the wire frame represents. Now it may seem simple, but you have 100 billion of them in a human brain. Now you can imagine how complex the brain works. Also neurons may be as small as 4 microns. 

Machine Learning 

Now lets bring some Machine learning terminology into these scenarios we discussed. The above scenario that we discussed may be good for understanding the number displayed on a wire frame. Complex models are required to understand 2D images. When it comes to 3D images the complexity increases. The number of layers where filters are applied also increases dramatically. The model we adopted may be suited well for identifying numbers, but say if its a speech and you want to identify what the number was said , it requires totally different model, although underlying principle of layers for  classifying or doing regression  to come to a conclusion about some data is same. 

One of the widely used model for object detection is Alexnet. This is a network like above but for determining what is in a 2D image. The network layers is as below



As you can see this has much more layers(filters) applied. This has an accuracy rate of above 80% in determining what exactly is in a picture. This was one of the models which came as winner for the Imagenet competition where 12 millions of images where  classified into 1000 of categories.  Similarly lot of models are developed Resnet, MSNIT, GoogleNet, LeNet are all such models developed for different applications.

Now in the above example there is no learning part involved. Its more about extracting the features from the data available (feature extraction in machine learning parlance) . So comes the second part of learning.


Learning :  In the scenario above we are using filters which we  took from somewhere. Imagine a scenario where you need to build your own filter. When a baby is born his/her brain is as clean slate. He has to generate such filters how does that happen ? How the previous learning can help in generating such filter layers in your brain ?  See the cat experiment by Hubel and Wiesel








Interesting right ?  So here what happens is your previous learning helps in defining these filters which the brain uses next time such a scenario arises. Each filter is given a weight-age. Each time the result of applying that filter resulted in a right or wrong conclusion and this is given as a weight to the decision you had taken. This is called back propagation in machine learning. The method of back propagation is used for adjusting the weights of the nodes(in the above example , filters)  in the models we use. This is the learning part. 


I had not explained the statistical and mathematical principles followed for implementing all these. 
 There are many platforms created in computer software to deal with this. Discussion about all those subjects  is beyond the scope of this article and I plan to get into those details in another post. 


Happy Machine Learning ! :) 

No comments:

Post a Comment