OneSpace: Detecting cross-language clones by learning a common embedding space

Loading...
Thumbnail Image

Identifiers

Publication date

Reading date

Collaborators

Advisors

Tutors

Editors

Journal Title

Journal ISSN

Volume Title

Publisher

Elsevier

Metrics

Google Scholar

Share

Research Projects

Organizational Units

Journal Issue

Center

Abstract

Identifying clone code fragments across different languages can enhance the productivity of software developers in several ways. However, the clone detection task is often studied in the context of a single language and less explored for code snippets spanning different languages. In this paper, we present OneSpace, a new cross-language clone detection approach. OneSpace projects different programming languages to the same embedding space using both code and API data. OneSpace, hence, leverages a Siamese Network to infer the similarity of the embedded programs. We evaluate OneSpace by detecting clones across three language pairs; JAVA-Python, Java-C++ and Java-C. We compared OneSpace with the other state-of-art techniques, SupLearn and CLCDSA. In our evaluation, OneSpace provided higher effectiveness than the state of the art. Our ablation study validated some of our intuitions in designing OneSpace, particularly that using a single embedding space (as opposed to separate ones) provides higher effectiveness. Additionally, we designed a variant of OneSpace that uses Word-Mover-Distance Algorithm and provides lower effectiveness, but is much more efficient. We also found that OneSpace provides higher effectiveness than the state of the art, even for: complex implementations, single-method implementations, varying ratios of positive to negative clones in training, varying amounts of training data, and for additional programming languages.

Description

Bibliographic citation

Mohammed El Arnaoty, Francisco Servant, OneSpace: Detecting cross-language clones by learning a common embedding space, Journal of Systems and Software, Volume 208, 2024, 111911, ISSN 0164-1212, DOI: https://doi.org/10.1016/j.jss.2023.111911

Collections

Endorsement

Review

Supplemented By

Referenced by

Creative Commons license

Except where otherwised noted, this item's license is described as Attribution-NonCommercial-NoDerivatives 4.0 Internacional