Listar LCC - Artículos por autor "Mandow-Andaluz, Lorenzo"
Mostrando ítems 1-6 de 6
-
An evaluation of best compromise search in graphs
Machuca, Enrique; Mandow-Andaluz, Lorenzo; Galand, Lucie (Springer, 2013-09)This work evaluates two different approaches for multicriteria graph search problems using compromise preferences. This approach focuses search on a single solution that represents a balanced tradeoff between objectives, ... -
Multi-objective dynamic programming with limited precision
Mandow-Andaluz, Lorenzo; Pérez-de-la-Cruz-Molina, José Luis; Pozas García, Nicolás (Springer, 2021-11-02)This paper addresses the problem of approximating the set of all solutions for Multi-objective Markov Decision Processes. We show that in the vast majority of interesting cases, the number of solutions is exponential or ... -
PQ-learning: aprendizaje por refuerzo multiobjetivo
En este artí culo describimos y analizamos PQ-learning, un algoritmo para problemas de aprendizaje por refuerzo multiobjetivo. El algoritmo es una extensi ón de Q-learning, un algoritmo para problemas de aprendizaje ... -
Pruning dominated policies in multiobjective Pareto Q-learning
Mandow-Andaluz, Lorenzo; Pérez-de-la-Cruz-Molina, José Luis (2019-10-18)The solution for a Multi-Objetive Reinforcement Learning problem is a set of Pareto optimal policies. MPQ-learning is a recent algorithm that approximates the whole set of all Pareto-optimal deterministic policies by ... -
Randomness and control in design processes: an empirical study with architecture students.
Belmonte-Martínez, María Victoria; Millán-Valldeperas, Eva; Ruiz-Montiel, Manuela; Badillo, Reyes; Boned-Purkiss, Francisco Javier; Mandow-Andaluz, Lorenzo; Pérez-de-la-Cruz-Molina, José Luis[et al.] (2014-02-12)The aim of this study is to explore designers' preferences between randomness and control in the generation of architectural forms. To this end, a generative computer tool was implemented that allows both random and ... -
A temporal difference method for multi-objective reinforcement learning
This work describes MPQ-learning, an temporal-difference method that approximates the set of all non-dominated policies in multi-objective Markov decision problems, where rewards are vectors and each component stands for ...