Reinforcement Learning is one of the hottest research topics currently and its popularity is only growing day by day. Reinforcement Learning RL is a type of machine learning technique that enables an agent to learn in an interactive environment by trial and error using feedback from its own actions and experiences. Though both supervised and reinforcement learning use mapping between input and output, unlike supervised learning where feedback provided to the agent is correct set of actions for performing a task, reinforcement learning uses rewards and punishment as signals for positive and negative behavior.
As compared to unsupervised learning, reinforcement learning is different in terms of goals. While the goal in unsupervised learning is to find similarities and differences between data points, in reinforcement learning the goal is to find a suitable action model that would maximize the total cumulative reward of the agent.
The figure below represents the basic idea and elements involved in a reinforcement learning model. Value : Future reward that an agent would receive by taking an action in a particular state. A Reinforcement Learning problem can be best explained through games. If, on the other hand, you hire someone without a lot of experience and their performance hurts the bottom line, most companies will update their job descriptions the next time around.
Or if you start to see a higher rate of unsubscribes to a marketing email, you may send fewer promotional messages. Negative reinforcement is a powerful force, too. Reinforcement learning is obviously more sophisticated, but the principle of having an intelligent agent use trial and error and improve its ability to achieve an objective based on rewards is the same.
This scientific approach distinguishes reinforcement learning from AI technologies whereby an algorithm is being told what to look for from known historical examples, a technique known as supervised learning. Instead of simply scanning data sets to find a mathematical equation that can reproduce historical outcomes, reinforcement learning is focused on discovering the optimal actions that will lead to the desired outcome.
It was a bot developed by Google that leveraged reinforcement learning. That is far from the only example of where organizations are winning with reinforcement learning. A group of university researchers recently developed an automated tuning system using reinforcement learning to help train prosthetic legs to adjust to the natural gait of the person wearing them. Other technology companies are looking at reinforcement learning as the basis for designing chatbots in a variety of business settings.
And of course, reinforcement learning is a natural fit for those trying to design self-driving cars that will be both efficient and safe. In fact, a recent research report from MIT showed the era of deep learning is ending, based on citations by other scientists. Although machine learning is seen as a monolith, this cutting-edge technology is diversified, with various sub-types including machine learning, deep learning, and the state-of-art technology of deep reinforcement learning.
Reinforcement learning is the training of machine learning models to make a sequence of decisions. The agent learns to achieve a goal in an uncertain, potentially complex environment. In reinforcement learning, an artificial intelligence faces a game-like situation. The computer employs trial and error to come up with a solution to the problem. To get the machine to do what the programmer wants, the artificial intelligence gets either rewards or penalties for the actions it performs.
Its goal is to maximize the total reward. Although the designer sets the reward policy—that is, the rules of the game—he gives the model no hints or suggestions for how to solve the game.
In contrast to human beings, artificial intelligence can gather experience from thousands of parallel gameplays if a reinforcement learning algorithm is run on a sufficiently powerful computer infrastructure. Applications of reinforcement learning were in the past limited by weak computer infrastructure.
That early progress is now rapidly changing with powerful new computational technologies opening the way to completely new inspiring applications. Training the models that control autonomous cars is an excellent example of a potential application of reinforcement learning. In an ideal situation, the computer should get no instructions on driving the car. The programmer would avoid hard-wiring anything connected with the task and allow the machine to learn from its own errors.
In a perfect situation, the only hard-wired element would be the reward function. For more real-life applications of reinforcement learning check this article. The main challenge in reinforcement learning lays in preparing the simulation environment, which is highly dependant on the task to be performed.
When the model has to go superhuman in Chess, Go or Atari games, preparing the simulation environment is relatively simple. When it comes to building a model capable of driving an autonomous car, building a realistic simulator is crucial before letting the car ride on the street. The model has to figure out how to brake or avoid a collision in a safe environment, where sacrificing even a thousand cars comes at a minimal cost.
Transferring the model out of the training environment and into to the real world is where things get tricky. Scaling and tweaking the neural network controlling the agent is another challenge. There is no way to communicate with the network other than through the system of rewards and penalties.
This in particular may lead to catastrophic forgetting , where acquiring new knowledge causes some of the old to be erased from the network to read up on this issue, see this paper , published during the International Conference on Machine Learning.
Yet another challenge is reaching a local optimum — that is the agent performs the task as it is, but not in the optimal or required way. Finally, there are agents that will optimize the prize without performing the task it was designed for. An interesting example can be found in the OpenAI video below, where the agent learned to gain rewards, but not to complete the race. In fact, there should be no clear divide between machine learning, deep learning and reinforcement learning.
It is like a parallelogram — rectangle — square relation, where machine learning is the broadest category and the deep reinforcement learning the most narrow one. Reader features refer to how the reader interacts with the content e. Context features include news aspects such as timing and freshness of the news. A reward is then defined based on these user behaviors.
Using reinforcement learning, AlphaGo Zero was able to learn the game of Go from scratch. It learned by playing against itself. After 40 days of self-training, Alpha Go Zero was able to outperform the version of Alpha Go known as Master that has defeated world number one Ke Jie. It only used black and white stones from the board as input features and a single neural network.
A simple tree search that relies on the single neural network is used to evaluate positions moves and sample moves without using any Monte Carlo rollouts. In this paper , the authors propose real-time bidding with multi-agent reinforcement learning. The handling of a large number of advertisers is dealt with using a clustering method and assigning each cluster a strategic bidding agent.
In marketing, the ability to accurately target an individual is very crucial. This is because the right targets obviously lead to a high return on investment. The proposed method outperforms the state-of-the-art single-agent reinforcement learning approaches.
This can, for example, be used in building products in an assembly line. This is achieved by combining large-scale distributed optimization and a variant of deep Q-Learning called QT-Opt. QT-Opt support for continuous action spaces makes it suitable for robotics problems. A model is first trained offline and then deployed and fine-tuned on the real robot.
Google AI applied this approach to robotics grasping where 7 real-world robots ran for robot hours in a 4-month period. Whereas reinforcement learning is still a very active research area significant progress has been made to advance the field and apply it in real life. In this article, we have barely scratched the surface as far as application areas of reinforcement learning are concerned.
Hopefully, this has sparked some curiosity that will drive you to dive in a little deeper into this area. Derrick Mwiti is a data scientist who has a great passion for sharing knowledge. His content has been viewed over a million times on the internet. Derrick is also an author and online instructor. He also trains and works with various institutions to implement data science solutions as well as to upskill their staff. So, structuring your project and keeping track of experiments is a crucial part of success.
From this point of view, working on an ML project might be challenging in general, but some fields are more complicated than others. Reinforcement Learning RL is one of the complicated ones. This article is dedicated to structuring and managing RL projects. GDPR compliant.
0コメント