Multi-Objective Reinforcement Learning

Published in Université du Luxembourg [FSTM], 2024

Recommended citation: Florian Felten, ‘Multi-Objective Reinforcement Learning’, in Unilu - Université du Luxembourg [FSTM], Luxembourg. https://hdl.handle.net/10993/61488

The recent surge in artificial intelligence (AI) agents assisting us in daily tasks suggests that these agents possess the ability to comprehend key aspects of our environment, thereby facilitating better decision-making. Presently, this understanding is predominantly acquired through data-driven learning methods. Notably, reinforcement learning (RL) stands out as a natural framework for agents to acquire behaviors by interacting with their environment and learning from feedback. However, despite the effectiveness of RL in training agents to optimize a single objective, such as minimizing cost or maximizing performance, it overlooks the inherent complexity of decision-making in real-world scenarios where multiple objectives may need to be considered simultaneously. Indeed, an essential aspect that remains understudied is the human tendency to make compromises in various situations, influenced by values, circumstances, or mood. This limitation underscores the need for advancements in AI methodologies to address the nuanced trade-offs inherent to human decision-making. Thus, this work aims to explore the extension of RL principles into multi-objective settings, where agents can learn behaviors that balance competing objectives, thereby enabling more adaptable and personalized AI systems. In the first part of this thesis, we explore the domain of multi-objective reinforcement learning (MORL), a recent technique aimed at enabling AI agents to acquire diverse behaviors associated with different trade-offs from multiple feedback signals. While MORL is relatively recent, works in this field often rely on existing knowledge coming from older fields such as multi-objective optimization (MOO) and RL. Our initial contribution involves a comprehensive analysis of the relationships between RL, MOO, and MORL. This examination culminates in the development of a taxonomy for categorizing MORL algorithms, drawing on concepts derived from preceding fields. Building upon this foundational understanding, we proceed to investigate the feasibility of leveraging techniques from MOO and RL to enhance MORL methodologies. This exploration yields several contributions. Among these, we introduce the utilization of metaheuristics to address the exploration-exploitation dilemma in MORL. Additionally, we introduce a versatile framework rooted in the derived taxonomy, facilitating the creation of novel MORL algorithms based on techniques coming from MOO and RL. Furthermore, our efforts extend towards improving the scientific rigor and practical applicability of MORL in real-world scenarios. To this end, we introduce methods and a suite of open-source tools that have become the standard in MORL. Many real-world situations also involve collaboration among multiple agents to accomplish tasks efficiently. Therefore, the second part of this thesis transitions to settings involving multiple agents, leading to the nascent field of multi-objective multi-agent reinforcement learning (MOMARL). In this domain, as an initial contribution, we release a comprehensive set of open-source utilities aimed to accelerate and establish a robust foundation for research within this evolving domain. Furthermore, we perform an initial study exploring the transferability of knowledge and methodologies from both MORL and multi-agent RL to the MOMARL settings. Finally, we validate our approach in a real-world application. Specifically, we aim to automatically learn the coordination of multiple drones having different objectives, harnessing the MOMARL framework to orchestrate their actions effectively. This empirical validation serves as evidence of the viability and versatility of the proposed methodologies in addressing complex real-world challenges.

See here.