Metaheuristics-based Exploration Strategies for Multi-Objective Reinforcement Learning

Published in Proceedings of the 14th International Conference on Agents and Artificial Intelligence, 2022

Abstract

The fields of Reinforcement Learning (RL) and Optimization aim at finding an optimal solution to a problem, characterized by an objective function. The exploration-exploitation dilemma (EED) is a well known subject in those fields. Indeed, a consequent amount of literature has already been proposed on the subject and shown it is a non-negligible topic to consider to achieve good performances. Yet, many problems in real life involve the optimization of multiple objectives. Multi-Policy Multi-Objective Reinforcement Learning (MPMORL) offers a way to learn various optimised behaviours for the agent in such problems. This work introduces a modular framework for the learning phase of such algorithms, allowing to ease the study of the EED in Inner-Loop MPMORL algorithms. We present three new exploration strategies inspired from the metaheuristics domain. To assess the performance of our methods on various environments, we use a classical benchmark - the Deep Sea Treasure (DST) - as well as propose a harder version of it. Our experiments show all of the proposed strategies outperform the current state-of-the-art ε-greedy based methods on the studied benchmarks.

Recommended citation: Felten, F., Danoy, G., Talbi, E. and Bouvry, P. (2022). Metaheuristics-based Exploration Strategies for Multi-Objective Reinforcement Learning. In Proceedings of the 14th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART, ISBN 978-989-758-547-0; ISSN 2184-433X, pages 662-673. DOI: 10.5220/0010989100003116.
Download Paper