Mapping of political events related to the COVID-19 pandemic on Twitter using topic modelling and keywords over time.

Fernández-Melendres, Carla; Moreno-Ortiz, Antonio Jesús

Mapping of political events related to the COVID-19 pandemic on Twitter using topic modelling and keywords over time.

dc.centro	Facultad de Filosofía y Letras	es_ES
dc.contributor.author	Fernández-Melendres, Carla
dc.contributor.author	Moreno-Ortiz, Antonio Jesús
dc.date.accessioned	2023-07-06T09:12:39Z
dc.date.available	2023-07-06T09:12:39Z
dc.date.created	2023-05
dc.date.issued	2023
dc.departamento	Filología Inglesa, Francesa y Alemana
dc.description.abstract	This research aims to study the relationship between actual, real-world events related to the COVID-19 pandemic and the impact these events produced on social media. To achieve this objective, we employ topic modelling and keyword extraction techniques. Topic modelling is a Natural Language Processing technique that attempts to identify topics automatically from a collection of documents (Vayansky and Kumar, 2020). This is similar to keyword extraction but, unlike this, topic modelling algorithms return clusters of words that make up the topic. Thus, a second objective is to compare the results of these two methods when it comes to identifying the salient topics in a corpus. We have used the publicly available and multilingual COVID-19 Twitter dataset collected from January 21, 2020 (and still ongoing) available via the COVID-19-TweetsIDs GitHub repository (Chen, Lerman & Ferrara, 2020). For this study, we will focus on tweets written in English from 2020 and 2021. We limited our study to the years 2020 to 2021, which contains 1 billion tweets (31 billion tokens), and extracted a random, time-stratified sample of 0,1%, which resulted in a total of approximately 1 million tweets (31 million tokens). In terms of methods, we employed unsupervised machine learning methods for both tasks. For topic modelling we used BERT embeddings and the BERTopic library (Grootendorst, 2022). Our script generates a full list of topics and assigned terms, a coherence score, and several data visualisations, such as topics-over-time graphs, heatmaps, and topic hierarchies. For keyword extraction, we used TextRank (Mihalcea & Tarau, 2004), a language-independent, graph-based ranking model. We then compare results returned by both methods in terms of usefulness and, finally, provide an interpretation of results by relating the extracted topics to the situation of the global pandemic at diﬀerent stages of the crisis.	es_ES
dc.description.sponsorship	Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech.	es_ES
dc.identifier.uri	https://hdl.handle.net/10630/27190
dc.language.iso	eng	es_ES
dc.relation.eventdate	10/05/2023	es_ES
dc.relation.eventplace	Oviedo	es_ES
dc.relation.eventtitle	XIV Congreso Internacional de Lingüística de Corpus	es_ES
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 Internacional	*
dc.rights.accessRights	open access	es_ES
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	*
dc.subject	COVID-19 - En los medios de comunicación social	es_ES
dc.subject	Redes sociales en Internet	es_ES
dc.subject	Lingüística computacional	es_ES
dc.subject.other	COVID-19	es_ES
dc.subject.other	Twitter	es_ES
dc.subject.other	Topic modelling	es_ES
dc.subject.other	Keywords	es_ES
dc.subject.other	Political events	es_ES
dc.title	Mapping of political events related to the COVID-19 pandemic on Twitter using topic modelling and keywords over time.	es_ES
dc.type	conference output	es_ES
dspace.entity.type	Publication
relation.isAuthorOfPublication	72db3906-0c3b-4b43-b5bb-c0ac4ded0b88
relation.isAuthorOfPublication	3233c4af-5a32-40f2-9c82-103bc48c43cd
relation.isAuthorOfPublication.latestForDiscovery	72db3906-0c3b-4b43-b5bb-c0ac4ded0b88

Files

Original bundle

Now showing 1 - 1 of 1

Name:: CILC_2023_abstract.pdf
Size:: 108.77 KB
Format:: Adobe Portable Document Format
Description:

Download

Collections

Ponencias, Comunicaciones a congresos y Pósteres