Corpus sense: A comprehensive tool for advanced text and discourse exploration

Loading...
Thumbnail Image

Files

1-s2.0-S2666799125000280-main.pdf (7.2 MB)

Description: Artículo principal

Identifiers

Publication date

Reading date

Collaborators

Advisors

Tutors

Editors

Journal Title

Journal ISSN

Volume Title

Publisher

Elsevier

Metrics

Google Scholar

Share

Research Projects

Organizational Units

Journal Issue

Abstract

Corpus Sense is a web application with a focus on content and discourse analysis designed to facilitate the exploration, analysis and visualization of linguistic corpora that incorporates some advanced functionalities not available in existing software. The tool enables users to obtain useful insights with minimal effort by combining quantitative, qualitative and AI-powered features. It is designed for small to medium-sized corpora (currently up to 2.5 million tokens), permits online corpus sharing, and offers unique functionalities, such as NLP-based keyword extraction, named entity recognition, semantic search and advanced topic modelling with LLM-generated interpretable labels. The application’s interface is simple and intuitive, in an effort to make it accessible to a wide range of user profiles. This paper provides a comprehensive overview of the application’s development, architecture and applications in corpus linguistics and discourse analysis research. This description is complemented by a discussion of the integration of novel NLP-based and AI-assisted tools with traditional corpus analysis methods.

Description

Bibliographic citation

Antonio Moreno-Ortiz, Corpus sense: A comprehensive tool for advanced text and discourse exploration, Applied Corpus Linguistics, Volume 5, Issue 3, 2025, 100145, ISSN 2666-7991, https://doi.org/10.1016/j.acorp.2025.100145. (https://www.sciencedirect.com/science/article/pii/S2666799125000280)

Collections

Endorsement

Review

Supplemented By

Referenced by

Creative Commons license

Except where otherwised noted, this item's license is described as Attribution-NonCommercial-NoDerivatives 4.0 Internacional