Planning and learning for weakly-coupled distributed agents

Stochastic planning has gained popularity over classical planning in recent years by offering principled ways to model the uncertainty inherent in many real world applications. In particular, Markov decision process (MDP) based approaches are particularly well suited to problems that involve sequent...

Mô tả chi tiết

Lưu vào:
Hiển thị chi tiết
Tác giả chính: Guo, Anyuan
Định dạng: Luận án
Ngôn ngữ:en_US
Thông tin xuất bản: University of Massachusetts Amherst 2008
Chủ đề:
Truy cập trực tuyến:http://ir.vnulib.edu.vn/handle/123456789/1714
Từ khóa: Thêm từ khóa bạn đọc
Không có từ khóa, Hãy là người đầu tiên gắn từ khóa cho biểu ghi này!
id oai:192.168.1.90:123456789-1714
record_format dspace
spelling oai:192.168.1.90:123456789-17142022-03-28T10:19:24Z Planning and learning for weakly-coupled distributed agents Guo, Anyuan Thesis Stochastic planning has gained popularity over classical planning in recent years by offering principled ways to model the uncertainty inherent in many real world applications. In particular, Markov decision process (MDP) based approaches are particularly well suited to problems that involve sequential decision making. Partially observable stochastic games (POSGs) are a multi-agent extension of the MDP framework to decentralized decision making. While elegant and expressive, this model has been shown to be intractable. Many real world problems exhibit additional structure that can be leveraged for computational gain. The focus of this thesis is on a class of problems in which the agents are largely independent but may interact by constraining each other’s actions. The best-known solution concepts in game theory are that of computing Nash equilibrium and performing strategy elimination. This dissertation addresses both of these solution concepts. First, we show that finding a Nash equilibrium where each agent achieves reward at least k is NP-hard. Then, we present the Multi-Agent Action Elimination (MAAE) algorithm which performs iterated elimination of dominated strategies. We show experimentally that is is able to solve much larger problems than the general purpose algorithm by pruning a large portion of the policy space. In situations where the agent does not have a model of the environment, it has to learn to act with just the reward as feedback. In this dissertation, we explore reinforcement learning based approaches. In particular, we develop the Bound-Based Exploration with Soft Pruning algorithm (BBESP). It is a Q-learning based algorithm that carries over many of the key ideas from our planning algorithm. It uses learned bounds to perform soft pruning of actions as well as focusing an agent’s exploration. We show that this approach learns faster than traditional Q-learning on this class of problems. 2008-01-11T06:47:59Z 2008-01-11T06:47:59Z 2006 Thesis http://ir.vnulib.edu.vn/handle/123456789/1714 en_US Doctor of Philosophy application/pdf application/pdf University of Massachusetts Amherst
institution Đại học Quốc Gia Hồ Chí Minh
collection DSpace
language en_US
topic Thesis
spellingShingle Thesis
Guo, Anyuan
Planning and learning for weakly-coupled distributed agents
description Stochastic planning has gained popularity over classical planning in recent years by offering principled ways to model the uncertainty inherent in many real world applications. In particular, Markov decision process (MDP) based approaches are particularly well suited to problems that involve sequential decision making. Partially observable stochastic games (POSGs) are a multi-agent extension of the MDP framework to decentralized decision making. While elegant and expressive, this model has been shown to be intractable. Many real world problems exhibit additional structure that can be leveraged for computational gain. The focus of this thesis is on a class of problems in which the agents are largely independent but may interact by constraining each other’s actions. The best-known solution concepts in game theory are that of computing Nash equilibrium and performing strategy elimination. This dissertation addresses both of these solution concepts. First, we show that finding a Nash equilibrium where each agent achieves reward at least k is NP-hard. Then, we present the Multi-Agent Action Elimination (MAAE) algorithm which performs iterated elimination of dominated strategies. We show experimentally that is is able to solve much larger problems than the general purpose algorithm by pruning a large portion of the policy space. In situations where the agent does not have a model of the environment, it has to learn to act with just the reward as feedback. In this dissertation, we explore reinforcement learning based approaches. In particular, we develop the Bound-Based Exploration with Soft Pruning algorithm (BBESP). It is a Q-learning based algorithm that carries over many of the key ideas from our planning algorithm. It uses learned bounds to perform soft pruning of actions as well as focusing an agent’s exploration. We show that this approach learns faster than traditional Q-learning on this class of problems.
format Thesis
author Guo, Anyuan
author_facet Guo, Anyuan
author_sort Guo, Anyuan
title Planning and learning for weakly-coupled distributed agents
title_short Planning and learning for weakly-coupled distributed agents
title_full Planning and learning for weakly-coupled distributed agents
title_fullStr Planning and learning for weakly-coupled distributed agents
title_full_unstemmed Planning and learning for weakly-coupled distributed agents
title_sort planning and learning for weakly-coupled distributed agents
publisher University of Massachusetts Amherst
publishDate 2008
url http://ir.vnulib.edu.vn/handle/123456789/1714
work_keys_str_mv AT guoanyuan planningandlearningforweaklycoupleddistributedagents
_version_ 1749008530478727168