SEED RL

SEED RL – tool that scales to thousands of machines, which enables training at millions of frames per second, and significantly improves computational efficiency and reduce costs by up to 80%.

SEED RL is based on Google’s TensorFlow 2.0 framework, features an architecture that takes advantage of graphics cards and tensor processing units (TPUs) by centralizing model inference. To avoid data transfer bottlenecks, it performs AI inference centrally with a learner component that trains the model using input from distributed inference. The target model’s variables and state information are kept local, while observations are sent to the learner at every environment step and latency is kept to a minimum thanks to a network library based on the open source universal RPC framework.

In order for this architecture to be successful, two state-of-the-art algorithms are integrated into SEED RL. One algorithm — V-trace — predicts an action distribution from which an action can be sampled, while another — R2D2 — selects an action based on the predicted future value of that action. The first is V-trace, a policy gradient-based method, first introduced with IMPALA. In general, policy gradient-based methods predict an action distribution from which an action can be sampled. V-trace is an off-policy method and thus works well in the asynchronous SEED RL architecture. The second algorithm is R2D2, a Q-learning method that selects an action based on the predicted future value of that action using recurrent distributed replay. This approach allows the Q-learning algorithm to be run at scale, while still allowing the use of recurrent neural networks that can predict future values based on the information of all past frames in an episode.

SEED RL’s learner component can be scaled across thousands of cores (e.g., up to 2,048 on Cloud TPUs), and the number of actors — which iterate between taking steps in the environment and running inference on the model to predict the next action — can scale up to thousands of machines.

SEED RL – significantly improves computational efficiency and reduce costs by up to 80%, which can secure Google ultimate leardership in Machine Learning and AI

Comments

Leave a Reply Cancel reply