Cooperative Multi-Agent Reinforcement Learning in Decentralized Networks

Figura, Martin

doi:10.7274/xg94hm54j30

FiguraM072022D.pdf (11.43 MB)

Cooperative Multi-Agent Reinforcement Learning in Decentralized Networks

thesis

posted on 2022-07-25, 00:00 authored by Martin Figura

Optimal decision-making or policy satisfies both short-term and long-term objectives. Policy optimization in complex dynamic environments is notoriously difficult due to the absence of succinct mathematical models. Reinforcement learning (RL) leverages data from experiments or simulations to learn optimal policies that, in many domains, have already outperformed policies designed by humans. RL still faces numerous challenges in the multi-agent setting, where the participating agents interact in shared environments. This dissertation aims to address two challenges in decentralized cooperative multi-agent reinforcement learning, a new training paradigm that features scalability and privacy guarantees for cooperative agents.

In the first part, we study the behavior of the cooperative agents in a network that includes adversarial agents. Adversarial attacks in training can strongly influence the performance of multi-agent RL algorithms. It is, thus, highly desirable to augment existing algorithms such that the impact of adversarial attacks on cooperative networks is eliminated, or at least bounded. We introduce a resilient projection-based consensus multi-agent actor-critic algorithm, whereby each agent receives a private reward and communicates with its neighbors to estimate the team-average reward and value function. We show that in the presence of Byzantine agents, whose estimation and communication strategies are arbitrary, the estimates of the cooperative agents converge to a bounded consensus value, provided that there are at most H Byzantine agents in the network that is (2H+1)-robust. Furthermore, we prove that the joint cooperative policy converges to a bounded neighborhood around a locally optimal cooperative policy.

In the second part, we consider a fully cooperative network subject to communication delays and packet dropouts. The assumption about disrupted communication is reasonable in online decentralized training where agents continuously accumulate new experiences from the environment and communicate periodically. We present a multi-agent actor-critic algorithm with TD error aggregation, where the aggregation of TD errors ensures cooperation between the agents. The assumptions about the communication lead to an increased communication burden for every agent as measured by the dimension of the transmitted data; nonetheless, the communication burden is only quadratic in the graph size, and thus the algorithm is applicable in large networks. We prove analytically that the agents approximately maximize the team-average objective function.

History

Date Modified

2022-08-08

Defense Date

2022-07-22

CIP Code

14.1001

Research Director(s)

Vijay Gupta

Committee Members

Panos Antsaklis Ji Liu Xenofon Koutsoukos

Degree

Doctor of Philosophy

Degree Level

Doctoral Dissertation

Alternate Identifier

1339093749

Library Record

6264470

OCLC Number

1339093749

Program Name

Electrical Engineering

Usage metrics

Keywords

Not Assigned

Licence

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Cooperative Multi-Agent Reinforcement Learning in Decentralized Networks

History

Date Modified

Defense Date

CIP Code

Research Director(s)

Committee Members

Degree

Degree Level

Alternate Identifier

Library Record

OCLC Number

Program Name

Usage metrics

Categories

Keywords

Licence

Exports