02

Reinforcement learning

Implement our AI in a short period of time
by speedy learning.

What is reinforcement learning?

Reinforcement learning is a type of AI machine learning. Computers usually act following a human-created program. With reinforcement learning, however, a computer can understand the current situation by itself, set its own rules, and determine what action to take. Humans do not need to set the rules with a program.
For a computer to determine what action to take next, it needs a lot of experiences, including experiences of failure, just as humans do.
When we teach a robot some action, tightening a screw, for example, we make it try that action again and again. This is how it learns.

During reinforcement learning, a computer makes repeated attempts at actions and is evaluated (rewarded) based on how well it achieved the objective. It revises its action to get a higher evaluation, gradually getting closer and closer to the objective. Reinforcement learning is the part of AI that learns through the principle of “practice makes perfect.” It is the part of AI that finds success from failure.

Strengths of Mitsubishi Electric

Reducing the number of pre-learning trials
by estimating degree of success.

Reinforcement learning does not require a human to set rules with a program. However, learning can take a lot of time because a huge number of trials are needed for pre-learning.
Mitsubishi Electric has developed proprietary technology that reduces the number of trials to about 1/50 the conventional total. Conventional reinforcement learning senses trial results and sets control parameters based on evaluation of the same. In addition to that, Mitsubishi Electric’s technology uses our knowledge of the machinery that incorporates the AI to estimate the degree of success of trial results and sends feedback to the AI on what motions would get the equipment close to the target state faster. Control parameters are then set accordingly. This allows learning with fewer trials, making it possible to greatly reduce the time and cost of implementing AI.

当社強化学習の機能ブロック図
試行数比較