A framework for assessing the capabilities of code generation of constraint domain-specific languages with large language models

Delgado, David; Burgueño-Caballero, Lola; Clarisó, Robert

doi:10.1016/j.jss.2026.112871

A framework for assessing the capabilities of code generation of constraint domain-specific languages with large language models

dc.centro	E.T.S.I. Informática
dc.contributor.author	Delgado, David
dc.contributor.author	Burgueño-Caballero, Lola
dc.contributor.author	Clarisó, Robert
dc.date.accessioned	2026-04-09T10:02:31Z
dc.date.issued	2026
dc.departamento	Lenguajes y Ciencias de la Computación
dc.description.abstract	Large language models (LLMs) can be used to support software development tasks, e.g., through code completion or code generation. However, their effectiveness drops significantly when considering less popular programming languages such as domain-specific languages (DSLs). In this paper, we propose a generic framework for evaluating the capabilities of LLMs generating DSL code from textual specifications. The generated code is assessed from the perspectives of well-formedness and correctness. This framework is applied to a particular type of DSL, constraint languages, focusing our experiments on OCL and Alloy and comparing their results to those achieved for Python, a popular general-purpose programming language. Experimental results show that, in general, LLMs have better performance for Python than for OCL and Alloy. LLMs with smaller context windows such as open-source LLMs may be unable to generate constraint-related code, as this requires managing both the constraint and the domain model where it is defined. Moreover, some improvements to the code generation process such as code repair (asking an LLM to fix incorrect code) or multiple attempts (generating several candidates for each coding task) can improve the quality of the generated code. Meanwhile, other decisions like the choice of a prompt template have less impact. All these dimensions can be systematically analyzed using our evaluation framework, making it possible to decide the most effective way to set up code generation for a particular type of task.
dc.description.sponsorship	Ministerio de Ciencia, Innovación y Universidades
dc.description.sponsorship	Funding for open access charge: Universidad de Málaga / CBUA
dc.description.sponsorship	Universitat Oberta de Catalunya
dc.identifier.citation	David Delgado, Lola Burgueño, Robert Clarisó, A framework for assessing the capabilities of code generation of constraint domain-specific languages with large language models, Journal of Systems and Software, Volume 238, 2026, 112871, ISSN 0164-1212, https://doi.org/10.1016/j.jss.2026.112871.
dc.identifier.doi	10.1016/j.jss.2026.112871
dc.identifier.issn	0164-1212
dc.identifier.uri	https://hdl.handle.net/10630/46312
dc.language.iso	eng
dc.publisher	Elsevier
dc.rights	Attribution-NonCommercial 4.0 International	en
dc.rights.accessRights	open access
dc.rights.uri	http://creativecommons.org/licenses/by-nc/4.0/
dc.subject	Lenguajes de programación lógicos
dc.subject	Ingeniería del software
dc.subject	Inteligencia artificial
dc.subject	Calidad
dc.subject.other	Large language models
dc.subject.other	Domain-specific languages
dc.subject.other	Framework
dc.subject.other	Code generation
dc.subject.other	Quality
dc.title	A framework for assessing the capabilities of code generation of constraint domain-specific languages with large language models
dc.type	journal article
dc.type.hasVersion	VoR
dspace.entity.type	Publication
relation.isAuthorOfPublication	31808e70-d2ec-4318-8ead-dded38954d40
relation.isAuthorOfPublication.latestForDiscovery	31808e70-d2ec-4318-8ead-dded38954d40

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 1-s2.0-S0164121226001044-main.pdf
Size:: 9.26 MB
Format:: Adobe Portable Document Format

Download

Collections

Artículos