Pruning dominated policies in multiobjective Pareto Q-learning

Loading...
Thumbnail Image

Identifiers

Publication date

Reading date

Collaborators

Advisors

Tutors

Editors

Journal Title

Journal ISSN

Volume Title

Publisher

Metrics

Google Scholar

Share

Research Projects

Organizational Units

Journal Issue

Keywords

Abstract

The solution for a Multi-Objetive Reinforcement Learning problem is a set of Pareto optimal policies. MPQ-learning is a recent algorithm that approximates the whole set of all Pareto-optimal deterministic policies by directly generalizing Q-learning to the multiobjective setting. In this paper we present a modification of MPQ-learning that avoids useless cyclical policies and thus improves the number of training steps required for convergence.

Description

Bibliographic citation

Collections

Endorsement

Review

Supplemented By

Referenced by