GENERAL INFORMATION ........................... 1. Dataset title: DISPARSA Corpus 2. Authors: Perez-Hernández, Chantal Fernández-Cruz, Javier García-Gámez, María Fernández-Melendres, Carla 3. Author contact information: Antonio Moreno-Ortiz (amo@uma.es) METHODOLOGICAL INFORMATION ................................. 1. Description of the methods for collection/generation of data: This is a collection of user-generated reviews of hospitality resources in the province of Malaga (Spain). The corpus was manually annotated for discourse information (functional discourse units), enities, aspect-based sentiment analysis categories. The original reviews were obtained from Tripadvisor. Publications: - Moreno-Ortiz, A., & García-Gámez, M. (2025). Corpus Annotation of Functional Discourse Units for Aspect-Based Sentiment Analysis. Corpus Pragmatics. https://doi.org/10.1007/s41701-025-00199-0 - Fernández-Cruz, J. (2025). Dissecting Economic Op-Eds: An Annotation Schema for Editorials in Quality Newspapers. Corpus Pragmatics, 9 (2), 293-318. https://doi.org/10.1007/s41701-025-00192-7 2. Data processing methods: Annotation was performed using a custom schema using Prodigy. Annotation results were futher processed to extract statistics and human-readable views, including HTML. 3. Software or instruments needed to interpret the data: Distribution of annotated data is in JSONL format, as generated by Prodigy. Formats of post-processed output are CSV, HTML and XLSX. 4. Standards and calibration information, if appropriate: N/A 5. Environmental or experimental conditions: N/A FILE OVERVIEW ---------------------- Restaurants/ restaurants1.jsonl (JSONL, 1.14 MB) restaurants1_spans.html (HTML, 1.11 MB) restaurants1_spans.xlsx (Excel, 199.06 KB) restaurants1_stats.html (HTML, 201.28 KB) restaurants1_texts.html (HTML, 162.98 KB) restaurants2.jsonl (JSONL, 1.13 MB) restaurants2_spans.html (HTML, 1.01 MB) restaurants2_spans.xlsx (Excel, 188.26 KB) restaurants2_stats.html (HTML, 186.49 KB) restaurants2_texts.html (HTML, 162.79 KB) Hotels/ hotels.jsonl (JSONL, 1.24 MB) hotels_sentences.csv (CSV, 117.20 KB) hotels_span_aspects.csv (CSV, 102.30 KB) hotels_span_fdus.csv (CSV, 105.06 KB) hotels_spans.html (HTML, 1.52 MB) hotels_spans.xlsx (Excel, 250.10 KB) hotels_stats.html (HTML, 366.39 KB) hotels_texts.html (HTML, 158.56 KB) Monuments/ monuments.jsonl (JSONL, 3.38 MB) monuments_log.txt (Text, 965.00 B) monuments_sentences.csv (CSV, 297.24 KB) monuments_span_aspects.csv (CSV, 155.43 KB) monuments_span_fdus.csv (CSV, 248.09 KB) monuments_spans.html (HTML, 3.34 MB) monuments_spans.xlsx (Excel, 575.47 KB) monuments_stats.html (HTML, 511.27 KB) monuments_texts.html (HTML, 655.82 KB) Cultural_Centers/ cultural_centers.jsonl (JSONL, 745.87 KB) cultural_centers_sentences.csv (CSV, 46.43 KB) cultural_centers_span_aspects.csv (CSV, 23.66 KB) cultural_centers_span_fdus.csv (CSV, 40.15 KB) cultural_centers_spans.html (HTML, 769.31 KB) cultural_centers_spans.xlsx (Excel, 127.46 KB) cultural_centers_stats.html (HTML, 130.84 KB) cultural_centers_texts.html (HTML, 121.01 KB) Parks/ parks.jsonl (JSONL, 1.28 MB) parks_sentences.csv (CSV, 142.73 KB) parks_span_aspects.csv (CSV, 62.85 KB) parks_span_fdus.csv (CSV, 101.91 KB) parks_spans.html (HTML, 1.06 MB) parks_spans.xlsx (Excel, 198.67 KB) parks_stats.html (HTML, 149.40 KB) parks_texts.html (HTML, 181.44 KB) DATA-SPECIFIC INFORMATION: ------------------------------------------- 1. Filename: *.jsonl (all annotation files follow the same format) 1.1. Variables list: _input_hash _task_hash _timestamp _view_id answer meta meta.date meta.eval_entity_type meta.genre meta.name meta.province meta.score meta.source meta.url spans spans.end spans.label [ "ASP:ACCESSIBILITY", "ASP:AIR_CONDITIONING", "ASP:ATMOSPHERE", "ASP:BATHROOM", "ASP:BED", "ASP:CLEANLINESS", "ASP:COMFORT", "ASP:DESIGN", "ASP:ENTERTAINMENT", "ASP:FACILITIES", "ASP:FOOD_DRINKS", "ASP:GARDENS", "ASP:GENERAL", "ASP:INFO", "ASP:INTERNET", "ASP:LOCATION", "ASP:MAINTENANCE", "ASP:PARKING", "ASP:POOL", "ASP:PRICE", "ASP:QUIETNESS", "ASP:RECEPTION", "ASP:ROOMS", "ASP:SIZE", "ASP:STAFF", "ASP:VIEWS", "ASP:WELLNESS", "ENT:EVAL_ENTITY", "ENT:OP_HOLDER", "ENT:OTHER_EVAL_ENTITY", "ENT:READER", "FDU:ACTIVITY", "FDU:ADVICE", "FDU:CONTEXT", "FDU:DESCRIPTION", "FDU:EVAL_NEG", "FDU:EVAL_POS", "LEX:NEG", "LEX:NEG\u201d", "LEX:POS" ] spans.start spans.token_end spans.token_start text tokens tokens.end tokens.id tokens.start tokens.text tokens.ws 1.3. Special formats or abbreviations used: (Formatos especiales o abreviaturas utilizadas) ASP: ASPECT ENT: ENTITY FDU: FUNCTIONAL DISCOURSE UNIT LEX: LEXICAL UNIT