gApp: a text preprocessing system to improve the neural machine translation of discontinuous multiword expressions

Loading...
Thumbnail Image

Identifiers

Publication date

Reading date

Authors

Hidalgo Ternero, Carlos Manuel
Zhou Lian, Xiaoqing

Collaborators

Advisors

Tutors

Editors

Journal Title

Journal ISSN

Volume Title

Publisher

Metrics

Google Scholar

Share

Research Projects

Organizational Units

Journal Issue

Department/Institute

Abstract

In this paper we present research results with gApp, a text-preprocessing system designed for automati-cally detecting and converting discontinuous multiword expressions (MWEs) into their continuous forms so as to improve the performance of current neural machine translation systems (NMT) (see Hidalgo-Ternero, 2021 & 2022, Hidalgo-Ternero & Corpas Pastor, 2020, 2022a & 2022b, Hidalgo-Ternero, Lista, and Corpas Pastor, 2022, and Hidalgo-Ternero and Zhou-Lian, 2022a & 2022b). To test its effectiveness, eight experiments with several NMT systems such as DeepL, Google Translate, ModernMT and VIP have been carried out in different language directionalities (ES/FR/IT > ES/EN/DE/FR/IT/PT/ZH) for the trans-lation of somatisms, i.e., MWEs containing lexemes referring to human or animal body parts (Mellado Blanco, 2004). More specifically, we have analysed both flexible verb-noun idiomatic constructions (VNICs) and flexible verb + prepositional phrase (VPP) constructions. In this regard, the promising results obtained for these typologies of MWEs throughout experiments 1-8 will shed some light on new avenues for enhancing MWE-aware NMT systems.

Description

Bibliographic citation

Endorsement

Review

Supplemented By

Referenced by

Creative Commons license

Except where otherwised noted, this item's license is described as Atribución 4.0 Internacional