Metaheuristics-based Exploration Strategies for Multi-Objective Reinforcement Learning

Published in Proceedings of the 14th International Conference on Agents and Artificial Intelligence, 2022

Recommended citation: Felten, F., Danoy, G., Talbi, E. and Bouvry, P. (2022). Metaheuristics-based Exploration Strategies for Multi-Objective Reinforcement Learning. In Proceedings of the 14th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART, ISBN 978-989-758-547-0; ISSN 2184-433X, pages 662-673. DOI: 10.5220/0010989100003116. https://www.scitepress.org/Link.aspx?doi=10.5220/0010989100003116

Abstract

The fields of Reinforcement Learning (RL) and Optimization aim at finding an optimal solution to a problem, characterized by an objective function. The exploration-exploitation dilemma (EED) is a well known subject in those fields. Indeed, a consequent amount of literature has already been proposed on the subject and shown it is a non-negligible topic to consider to achieve good performances. Yet, many problems in real life involve the optimization of multiple objectives. Multi-Policy Multi-Objective Reinforcement Learning (MPMORL) offers a way to learn various optimised behaviours for the agent in such problems. This work introduces a modular framework for the learning phase of such algorithms, allowing to ease the study of the EED in Inner-Loop MPMORL algorithms. We present three new exploration strategies inspired from the metaheuristics domain. To assess the performance of our methods on various environments, we use a classical benchmark - the Deep Sea Treasure (DST) - as well as propose a harder version of it. Our experiments show all of the proposed strategies outperform the current state-of-the-art ε-greedy based methods on the studied benchmarks.