合理的政策形成アルゴリズムの連続値入力への拡張

Transactions of the Japanese Society for Artificial Intelligence 22 (3):332-341 (2007)
  Copy   BIBTEX

Abstract

Reinforcement Learning is a kind of machine learning. We know Profit Sharing, the Rational Policy Making algorithm, the Penalty Avoiding Rational Policy Making algorithm and PS-r* to guarantee the rationality in a typical class of the Partially Observable Markov Decision Processes. However they cannot treat continuous state spaces. In this paper, we present a solution to adapt them in continuous state spaces. We give RPM a mechanism to treat continuous state spaces in the environment that has the same type of a reward. We show the effectiveness of the proposed method in numerical examples.

Other Versions

No versions found

Links

PhilArchive



    Upload a copy of this work     Papers currently archived: 101,130

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

Similar books and articles

Profit Sharing の不完全知覚環境下への拡張: PS-r^* の提案と評価.Kobayashi Shigenobu Miyazaki Kazuteru - 2003 - Transactions of the Japanese Society for Artificial Intelligence 18:286-296.
罰を回避する合理的政策の学習.坪井 創吾 宮崎 和光 - 2001 - Transactions of the Japanese Society for Artificial Intelligence 16 (2):185-192.
罰回避政策形成アルゴリズムの改良とオセロゲームへの応用.坪井 創吾 宮崎 和光 - 2002 - Transactions of the Japanese Society for Artificial Intelligence 17:548-556.
Ga により探索空間の動的生成を行う Q 学習.Matsuno Fumitoshi Ito Kazuyuki - 2001 - Transactions of the Japanese Society for Artificial Intelligence 16:510-520.
Profit Sharing 法における強化関数に関する一考察.Tatsumi Shoji Uemura Wataru - 2004 - Transactions of the Japanese Society for Artificial Intelligence 19:197-203.
経験に固執しない Profit Sharing 法.Ueno Atsushi Uemura Wataru - 2006 - Transactions of the Japanese Society for Artificial Intelligence 21:81-93.
強化学習エージェントへの階層化意志決定法の導入―追跡問題を例に―.輿石 尚宏 謙吾 片山 - 2004 - Transactions of the Japanese Society for Artificial Intelligence 19:279-291.
不完全知覚判定法を導入した Profit Sharing.Masuda Shiro Saito Ken - 2004 - Transactions of the Japanese Society for Artificial Intelligence 19:379-388.
尤度情報に基づく温度分布を用いた強化学習法.鈴木 健嗣 小堀 訓成 - 2005 - Transactions of the Japanese Society for Artificial Intelligence 20:297-305.
Qdsega による多足ロボットの歩行運動の獲得.Matsuno Fumitoshi Ito Kazuyuki - 2002 - Transactions of the Japanese Society for Artificial Intelligence 17:363-372.

Analytics

Added to PP
2014-03-15

Downloads
27 (#821,816)

6 months
4 (#1,246,434)

Historical graph of downloads
How can I increase my downloads?

Citations of this work

No citations found.

Add more citations

References found in this work

No references found.

Add more references