Recurrent policy gradients

Daan Wierstra; Alexander Förster; Jan Peters; Jürgen Schmidhuber

Download from

dx.doi.org

More download options

Recurrent policy gradients

Daan Wierstra, Alexander Förster, Jan Peters & Jürgen Schmidhuber

Logic Journal of the IGPL 18 (5):620-634 (2010) Copy BIBT_EX

Abstract

Reinforcement learning for partially observable Markov decision problems is a challenge as it requires policies with an internal state. Traditional approaches suffer significantly from this shortcoming and usually make strong assumptions on the problem domain such as perfect system models, state-estimators and a Markovian hidden system. Recurrent neural networks offer a natural framework for dealing with policy learning using hidden state and require only few limiting assumptions. As they can be trained well using gradient descent, they are suited for policy gradient approaches. In this paper, we present a policy gradient method, the Recurrent Policy Gradient which constitutes a model-free reinforcement learning method. It is aimed at training limited-memory stochastic policies on problems which require long-term memories of past observations. The approach involves approximating a policy gradient for a recurrent neural network by backpropagating return-weighted characteristic eligibilities through time. Using a ‘‘Long Short-Term Memory’’ RNN architecture, we are able to outperform previous RL methods on three important benchmark tasks. Furthermore, we show that using history-dependent baselines helps reducing estimation variance significantly, thus enabling our approach to tackle more challenging, highly stochastic environments

Author's Profile

Alexander Förster

Keywords

Add keywords

Reprint years

DOI

10.1093/jigpal/jzp049

Other Versions

No versions found

Links

PhilArchive

This entry is not archived by us. If you are the author and have permission from the publisher, we recommend that you archive it. Many publishers automatically grant permission to authors to archive pre-prints. By uploading a copy of your work, you will enable us to better index it, making it easier to find.

Upload a copy of this work Papers currently archived: 106,894

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

Sign in / register and customize your OpenURL resolver
Configure custom resolver

My notes

Analytics

Added to PP
2015-02-04

Downloads
14 (#1,385,309)

6 months
1 (#1,601,934)

Historical graph of downloads

How can I increase my downloads?

Author's Profile

Alexander Förster

Citations of this work

No citations found.

Add more citations

References found in this work

No references found.

Add more references

Applied ethics	Epistemology	History of Western Philosophy	Meta-ethics	Metaphysics	Normative ethics
Philosophy of biology	Philosophy of language	Philosophy of mind	Philosophy of religion	Science Logic and Mathematics	More ...

Recurrent policy gradients

Abstract

Author's Profile

Categories

Keywords

Reprint years

DOI

Other Versions

Links

PhilArchive

External links

Through your library

My notes

Similar books and articles

Analytics

Author's Profile

Citations of this work

References found in this work