Deep Reinforcement Learning

How can we understand something that we have never seen before?

What impels us to think and act?

How do we learn?

Once upon a time, these questions were purely philosophical endeavors — and seemingly hopeless ones at that. Questions too far from reasonable application to be considered with any meaningful consequence.

However, amidst the ever-rising tide of technological innovation, these questions have reemerged from the depths of philosophy to be examined in closer detail by new fields, seemingly far removed from their philosophical origins.

Now, these questions are no longer things merely to ponder, for they have now grown to immense consequence in the young yet highly promising field of Deep Reinforcement Learning.

First, some background: the field of Deep Reinforcement Learning is a natural extension of Deep Learning, which is a natural extension of machine learning, which is a natural extension of statistical inference.

All clear?

If not, then don’t worry… because you are about to be a lot more confused anyways.

But before we get confused by Reinforcement Learning, let’s start with its older brother – Deep Learning.

Deep Learning is a pretty name for the type of machine learning that statistical theory hasn’t yet been able to catch up to. Made possible by a progressive increase in general computational power and public dataset size, Deep Learning is essentially the application of basic machine learning principles to more complicated functions – often called Neural Nets. The complexity of these functions allows them to map the data that they take in to a higher dimensional space that accounts for relationships between elements of the incoming data. During computation, various tricks are used to achieve useful properties, such as the nonlinearity between hidden layers that allows for the approximation of more complex functions, or dropout and entropy reduction in training to approach desirable statistical properties. This mess of engineering tricks fastened onto the underlying structure provided by traditional machine learning has allowed Neural Nets to solve problems that have previously seemed too difficult for traditional statistics.

For example, by seeking to encode the underlying information in a dataset of images and paintings, combinations of Neural Networks called GAN’s can create mesmerizing, psychedelic artwork.

In Deep Learning, the goal is often a simple form of classification: mapping a set of data to what it means. In essence, we want to be able to say that an image of a duck is a duck and that an image of a cat is a cat. From a human perspective, this task seems trivial, yet we can hardly begin to describe a solution. What about this arrangement of data causes us to say that the image it represents truly is a cat?

If, for instance, you laid out the list of pixels that represent the image, you could go through them one by one and verify that every single pixel does not necessarily make this picture a picture of a cat – and the same thing can be done with every set of two pixels. Nevertheless, and however indescribable by mere language, Deep Learning is able to capture underlying patterns in images, and match the image to its meaning with shocking, sometimes superhuman accuracy. Every day, new applications are imagined for this technology, weather it be in the realm of art, medicine, or business.

Reinforcement learning is a fascinating extension to Deep Learning, relying essentially on modification to traditional classification that is simple in premise: instead of seeking the proper name for an image, we seek proper action for an image – as defined by the propensity of the action to maximize total reward. This is most simply interpreted in the context of games, then extrapolated into more consequential scenarios, such as the development of self-driving cars.

The goal in both levels of this problem, in games and in the real world, involves recognizing that a certain action is likely to achieve a desired outcome given an observed state. In essence, this is a variant of the classic classification problem, but clearly, more complexities are introduced by the ability to alter environmental state with action. Nevertheless, much of the existing Deep Learning architecture can remain, and we can simply change the loss function to seek to maximize total reward gained over a task, rather than minimize error in prediction of the proper image label.

As researchers seek to improve performance in their Reinforcement Learning algorithms, they are, whether they know it or not, seeking answers to the philosophical questions we started with. And as we are able to create algorithms that are able to learn more efficiently, we are offered a glimpse at the answers to these questions and insight into the inner workings of the mind in the process.

How can we understand something that we have never seen before?

What impels us to think and act?

How do we learn?


1. https://adepratt.weebly.com/student-work.html

2. https://experiments.withgoogle.com/collection/ai

3. https://sigmoidal.io/dl-computer-vision-beyond-classification/

Misha Obukhov

Misha Obukhov is a student at UCSB. He is interested in linear algebra and making humans obsolete.