Adaptive Representations for Reinforcement Learning by Shimon Whiteson

By Shimon Whiteson

This e-book provides new algorithms for reinforcement studying, a sort of laptop studying during which an self sufficient agent seeks a keep an eye on coverage for a sequential choice job. because present equipment more often than not depend upon manually designed answer representations, brokers that immediately adapt their very own representations have the capability to dramatically increase functionality. This publication introduces novel methods for immediately researching high-performing representations. the 1st procedure synthesizes temporal distinction tools, the normal method of reinforcement studying, with evolutionary equipment, which may study representations for a vast classification of optimization difficulties. This synthesis is finished by means of customizing evolutionary tips on how to the online nature of reinforcement studying and utilizing them to adapt representations for price functionality approximators. the second one technique immediately learns representations in line with piecewise-constant approximations of price capabilities. It starts with coarse representations and progressively refines them in the course of studying, reading the present coverage and cost functionality to infer the simplest refinements. This ebook additionally introduces a singular procedure for devising enter representations. this system addresses the function choice challenge through extending an set of rules that evolves the topology and weights of neural networks such that it evolves their inputs too. as well as introducing those new tools, this publication offers vast empirical leads to a number of domain names demonstrating that those suggestions can considerably enhance functionality over equipment with handbook representations.

Show description

Read or Download Adaptive Representations for Reinforcement Learning PDF

Best nonfiction_6 books

Additional resources for Adaptive Representations for Reinforcement Learning

Example text

The second possibility is a Darwinian implementation, in which the changes made by TD are discarded and the new population is bred from the original genomes, as they were at birth. It has long since been determined that biological systems are Darwinian, not Lamarckian. However, it remains unclear which approach is better computationally, despite substantial research (110; 168; 171). The potential advantage of Lamarckian evolution is obvious: it prevents each generation from having to repeat the same learning.

Finally, we took the setting with the highest performance and conducted an additional 20 runs, for a total of 25. For simplicity, the graphs that follow show only this Q-learning result: the best configuration with the best initial weight setting. 1 shows the results of these experiments. For each method, the corresponding line in the graph represents a uniform moving average over the aggregate reward received in the past 1,000 episodes, averaged over all 25 runs. Using average performance, as we do throughout this book, is somewhat unorthodox for evolutionary methods, which are more commonly evaluated on the performance of the generation champion.

McWherter et al. (94) present methods for scheduling jobs with different priority classes. 2, the relative importance of a job type does not change as a function of time. McGovern et al. (91) use reinforcement learning for CPU instruction scheduling but aim only to minimize completion time. One method that can be adapted to the server job scheduling task is the generalized cμ rule (95), in which the server always processes at time t the oldest job of that type k which maximizes Ck (ok )/pk , where Ck is the derivative of the cost function for job type k, ok is the age of the oldest job of type k and pk is the average processing time for jobs of type k.

Download PDF sample

Rated 4.31 of 5 – based on 50 votes