Survey on using DeepRL for Quadruped Robot Locomotion
Legged robots are an attractive option to wheeled robots primarily for applications in rough terrain and difficult, cluttered environments. The freedom to choose contact points enables such robots to interact with the environment while avoiding obstacles effectively. With such capabilities, legged robots may be applied to application such as rescuing people in mountains and forests, climbing stairs, carrying payloads in construction sites, and inspecting unstructured under-ground tunnels. Designing dynamic and agile locomotion for such robots is a longstanding research problem because it is difficult to control an under-actuated robot performing highly dynamic motion that requires tricky balance. The recent advances in Deep Reinforcement Learning (DRL) has made it possible to learn robot locomotion policies from scratch without any human intervention. In this post, we discuss various research directions where DRL can be employed to solve locomotion problems in quadruped robots.
Sim-to-real Gap
It is observed that policies learnt in simulation performs poorly on real robot. This is because of the sim-to-reality gap caused by model discrepancies such as wrong simulation parameters, simulation assumptions, unmodeled dynamics, numerical errors, etc. This reality gap is often greatly amplified in locomotion tasks because of a lot of motor dynamics and ground contact involved in such tasks. Any such small model discrepancies magnify as time progresses. An alternative to learning in simulation is directly performing training on actual physical robot platform. While this can be carried out for other tasks, training on a quadruped robot can be very expensive in terms of time as well as data collection. Other difficulties include resetting the simulation and collecting a large amount of data. If the robot fails at any time, it can potentially damage the robot. Therefore, carrying out training in simulation is faster, cheaper and safer. Taking care of this reality gap is very important in learning locomotion tasks.
Gaits
While quadruped robots have far reaching diverse applications, speed is an important aspect in their usage. Most of the quadruped gaits are hand-tuned by experts which is very time consuming process and require a good deal of human expertise. Further, changes in hardware of the robot and/or the surface on which it is supposed to walk requires the parameters of the gait to be tuned again. A solution to this problem is using machine learning approaches to autonomously learn best parameters of the gait. Since reinforcement learning uses optimization algorithms, it can be used to learn the fastest possible gaits in quadruped robots. Previous works have shown that they have achieved speeds that are higher than the previously known gait speeds as well as than the hand-tuned expert gait speeds. The learnt speeds are higher and they also consume significantly less energy.
Locomotion Controllers
Designing locomotion controllers for legged robots is a longstanding research challenge in the robotics community. Each component of the controller such as trajectory optimization, model-predictive control, foot placement planning, state estimation, contact scheduling have to be designed by experts and often an accurate dynamics model of the robot is difficult to acquire. Further, as the application or the hardware changes, these components of the controller have to be designed again. Reinforcement learning promise to overcome these limitations by learning effective controllers directly from experience. RL can be trained to learn end-to-end control policies since it does not assume any prior knowledge of the gait or environment. It does not even need information about the robot’s dynamics and hence can be applied to automate the controller design. However, directly applying DRL for this process require large number of training data. Therefore, developing DRL algorithms that is both sample efficient and robust to the choice of hyper-parameters is an exciting direction of research work.
Reward Function in DRL
While all Deep reinforcement learning algorithms require a sophisticated reward function which are often cumbersome to design in practice, some research works explore unsupervised reinforcement learning approaches for carrying out reward-free training. Further designing such reward functions are brittle; a slight error in defining reward signals might lead to catastrophic failure of the system or learning of a very distinct behaviour than intended. As a solution to the reward function tuning required in RL problems, inverse reinforcement learning problem can be used-given expert trajectories in a variety of situations, determine the reward function to be minimized. The recovered reward function is then used to generate a desired policy for a given environment. Many works also focus on imitation learning-based approaches for reproducing the diverse and agile locomotion skills of animals without the need of manually designing controllers to emulate many complex behaviors of animals. Such learnt control policies outperforms expert tuned gaits by a very large amount.
Even though there has already been extensive research work in this field, there are still many gaps that are to be filled before legged robots can be deployed in real environment. Firstly, learning locomotion policies that can dynamically change running speed and direction according to environment conditions is a great direction of future work. Secondly, learning other complex behaviors that are valuable in their real world application such as places where the robot has to climb stairs or jump over an obstacle can also be carried out.
Comments
Post a Comment