From words to visuals: a transformer-based multi-modal framework for emotion-driven tourism analytics

dc.centroFacultad de Ciencias Económicas y Empresarialeses_ES
dc.contributor.authorCalderón-Fajardo, Víctor
dc.contributor.authorRodríguez-Rodríguez, Ignacio
dc.contributor.authorPuig-Cabrera, Miguel
dc.date.accessioned2025-10-29T12:36:28Z
dc.date.available2025-10-29T12:36:28Z
dc.date.issued2025
dc.departamentoEconomía y Administración de Empresases_ES
dc.description.abstractTraditional tourism analytics have primarily relied on isolated sentiment analysis and image processing techniques, often failing to capture the subtle interaction between textual expressions and visual aesthetics inherent in tourist experiences. This study addresses these limitations by proposing a novel multi-modal framework that transforms textual reviews into AI-generated images using standardized prompts, thereby converting affective signals into explicit visual features. Leveraging state-of-the-art models—such as Distilled Bidirectional Encoder Representations from Transformers (DistilBERT) for fine-grained emotion recognition and Contrastive Language–Image Pre‑training (CLIP) for semantic extraction of visual attributes—our approach maps complex sentiments onto interpretable visual characteristics, integrating explainable features to uncover the underlying structure in tourist perceptions. This approach enhances classification performance and provides a transparent mechanism for understanding how distinct emotional states correspond to specific visual cues. Experimental evaluations on a dataset encompassing four diverse tourist destinations—Berlin, Dublin, Cairo, and Málaga—demonstrate high classification accuracy and robust correlations between text-derived emotions and image-based features, close to more powerful embedding methods. Significant correlations were observed between emotions and visual features, e.g., brightness and contentment, as well as between entropy and shame, indicating that our method efficiently captures the affective resonance between visual and textual modalities. Our findings underscore the transformative potential of converting textual sentiment into visual representations to facilitate more accurate, interpretable, and actionable analytics in the tourism sector. This framework suggests promising avenues for dynamic destination characterization, informed marketing strategies, and enhanced urban planning initiatives, laying the...es_ES
dc.description.sponsorshipFunding for open access charge: Universidad de Málaga / CBUAes_ES
dc.identifier.citationCalderón-Fajardo, V., Rodríguez-Rodríguez, I. & Puig-Cabrera, M. From words to visuals: a transformer-based multi-modal framework for emotion-driven tourism analytics. Inf Technol Tourism 27, 939–979 (2025). https://doi.org/10.1007/s40558-025-00334-2es_ES
dc.identifier.doi10.1007/s40558-025-00334-2
dc.identifier.urihttps://hdl.handle.net/10630/40505
dc.language.isoenges_ES
dc.publisherSpringeres_ES
dc.rightsAtribución 4.0 Internacional*
dc.rights.accessRightsopen accesses_ES
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/*
dc.subjectTurismoes_ES
dc.subjectInteligencia artificiales_ES
dc.subject.otherMarco multimodales_ES
dc.subject.otherExperiencias de los turistases_ES
dc.titleFrom words to visuals: a transformer-based multi-modal framework for emotion-driven tourism analyticses_ES
dc.typejournal articlees_ES
dc.type.hasVersionVoRes_ES
dspace.entity.typePublication

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
s40558-025-00334-2.pdf
Size:
3.72 MB
Format:
Adobe Portable Document Format
Description:

Collections