Large language models (LLMs) can accelerate the training of robotics systems in super-human ways, according to a new study by scientists at Nvidia, the University of Pennsylvania and the University of Texas, Austin. The study introduces DrEureka, a technique that can automatically create reward functions and randomization distributions for robotics systems. DrEureka stands for Domain Randomization Eureka. DrEureka only requires a high-level description of the target task and is faster and more efficient than human-designed rewards in transferring learned policies from simulated environments to the real world. The implications can be great for the fast-moving world of robotics, which has recently gotten a renewed boost from the advances in language and vision models. When designing robotics models for new tasks, a policy is usually trained in a simulated environment and deployed to the real world. The difference between simulation and real-world environments, referred to as the “sim-to-real” gap, is one of the big challenges of any robotics system. Configuring and fine-tuning the policy for optimal performance usually requires a bit of back and forth between simulation and real-world environments. Recent works have shown that LLMs can combine their vast world knowledge and reasoning capabilities with the physics engines of virtual simulators to learn complex low-level skills. For example, LLMs can be used to design reward functions, the components that steer the robotics reinforcement learning (RL) system to find the correct sequences of actions for the desired task. However, once a policy is learned in simulation, transferring it to the real world requires a lot of manual tweaking of the reward functions and simulation parameters.
Full report : Nvidia’s DrEureka outperforms humans in training robotics systems.