A framework for assessing the capabilities of code generation of constraint domain-specific languages with large language models

Delgado, David; Burgueño-Caballero, Lola; Clarisó, Robert

doi:10.1016/j.jss.2026.112871

A framework for assessing the capabilities of code generation of constraint domain-specific languages with large language models

Files

1-s2.0-S0164121226001044-main.pdf (9.26 MB)

Identifiers

URI: https://hdl.handle.net/10630/46312

ISSN: 0164-1212

DOI: 10.1016/j.jss.2026.112871

Publication date

2026

Authors

Delgado, David

Burgueño-Caballero, Lola

Clarisó, Robert

Publisher

Elsevier

Metrics

Share

Export

Center

E.T.S.I. Informática

Department/Institute

Lenguajes y Ciencias de la Computación

Keywords

Lenguajes de programación lógicos
Ingeniería del software
Inteligencia artificial
Calidad

Abstract

Large language models (LLMs) can be used to support software development tasks, e.g., through code completion or code generation. However, their effectiveness drops significantly when considering less popular programming languages such as domain-specific languages (DSLs). In this paper, we propose a generic framework for evaluating the capabilities of LLMs generating DSL code from textual specifications. The generated code is assessed from the perspectives of well-formedness and correctness. This framework is applied to a particular type of DSL, constraint languages, focusing our experiments on OCL and Alloy and comparing their results to those achieved for Python, a popular general-purpose programming language. Experimental results show that, in general, LLMs have better performance for Python than for OCL and Alloy. LLMs with smaller context windows such as open-source LLMs may be unable to generate constraint-related code, as this requires managing both the constraint and the domain model where it is defined. Moreover, some improvements to the code generation process such as code repair (asking an LLM to fix incorrect code) or multiple attempts (generating several candidates for each coding task) can improve the quality of the generated code. Meanwhile, other decisions like the choice of a prompt template have less impact. All these dimensions can be systematically analyzed using our evaluation framework, making it possible to decide the most effective way to set up code generation for a particular type of task.

Bibliographic citation

David Delgado, Lola Burgueño, Robert Clarisó, A framework for assessing the capabilities of code generation of constraint domain-specific languages with large language models, Journal of Systems and Software, Volume 238, 2026, 112871, ISSN 0164-1212, https://doi.org/10.1016/j.jss.2026.112871.

Collections

Artículos

Creative Commons license

Except where otherwised noted, this item's license is described as Attribution-NonCommercial 4.0 International

Full item page

A framework for assessing the capabilities of code generation of constraint domain-specific languages with large language models

Files

Identifiers

Publication date

Reading date

Authors

Collaborators

Advisors

Tutors

Editors

Journal Title

Journal ISSN

Volume Title

Publisher

Metrics

Share

Export

Research Projects

Organizational Units

Journal Issue

Center

Department/Institute

Keywords

Abstract

Description

Bibliographic citation

Collections

Endorsement

Review

Supplemented By

Referenced by

Creative Commons license