University of Notre Dame
Browse
FiguraM072022D.pdf (11.43 MB)

Cooperative Multi-Agent Reinforcement Learning in Decentralized Networks

Download (11.43 MB)
thesis
posted on 2022-07-25, 00:00 authored by Martin Figura

Optimal decision-making or policy satisfies both short-term and long-term objectives. Policy optimization in complex dynamic environments is notoriously difficult due to the absence of succinct mathematical models. Reinforcement learning (RL) leverages data from experiments or simulations to learn optimal policies that, in many domains, have already outperformed policies designed by humans. RL still faces numerous challenges in the multi-agent setting, where the participating agents interact in shared environments. This dissertation aims to address two challenges in decentralized cooperative multi-agent reinforcement learning, a new training paradigm that features scalability and privacy guarantees for cooperative agents.


In the first part, we study the behavior of the cooperative agents in a network that includes adversarial agents. Adversarial attacks in training can strongly influence the performance of multi-agent RL algorithms. It is, thus, highly desirable to augment existing algorithms such that the impact of adversarial attacks on cooperative networks is eliminated, or at least bounded. We introduce a resilient projection-based consensus multi-agent actor-critic algorithm, whereby each agent receives a private reward and communicates with its neighbors to estimate the team-average reward and value function. We show that in the presence of Byzantine agents, whose estimation and communication strategies are arbitrary, the estimates of the cooperative agents converge to a bounded consensus value, provided that there are at most H Byzantine agents in the network that is (2H+1)-robust. Furthermore, we prove that the joint cooperative policy converges to a bounded neighborhood around a locally optimal cooperative policy.


In the second part, we consider a fully cooperative network subject to communication delays and packet dropouts. The assumption about disrupted communication is reasonable in online decentralized training where agents continuously accumulate new experiences from the environment and communicate periodically. We present a multi-agent actor-critic algorithm with TD error aggregation, where the aggregation of TD errors ensures cooperation between the agents. The assumptions about the communication lead to an increased communication burden for every agent as measured by the dimension of the transmitted data; nonetheless, the communication burden is only quadratic in the graph size, and thus the algorithm is applicable in large networks. We prove analytically that the agents approximately maximize the team-average objective function.

History

Date Modified

2022-08-08

Defense Date

2022-07-22

CIP Code

  • 14.1001

Research Director(s)

Vijay Gupta

Committee Members

Panos Antsaklis Ji Liu Xenofon Koutsoukos

Degree

  • Doctor of Philosophy

Degree Level

  • Doctoral Dissertation

Alternate Identifier

1339093749

Library Record

6264470

OCLC Number

1339093749

Program Name

  • Electrical Engineering

Usage metrics

    Dissertations

    Categories

    No categories selected

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC