Mapping of political events related to the COVID-19 pandemic on Twitter using topic modelling and keywords over time.

dc.centroFacultad de Filosofía y Letrases_ES
dc.contributor.authorFernández-Melendres, Carla
dc.contributor.authorMoreno-Ortiz, Antonio Jesús
dc.date.accessioned2023-07-06T09:12:39Z
dc.date.available2023-07-06T09:12:39Z
dc.date.created2023-05
dc.date.issued2023
dc.departamentoFilología Inglesa, Francesa y Alemana
dc.description.abstractThis research aims to study the relationship between actual, real-world events related to the COVID-19 pandemic and the impact these events produced on social media. To achieve this objective, we employ topic modelling and keyword extraction techniques. Topic modelling is a Natural Language Processing technique that attempts to identify topics automatically from a collection of documents (Vayansky and Kumar, 2020). This is similar to keyword extraction but, unlike this, topic modelling algorithms return clusters of words that make up the topic. Thus, a second objective is to compare the results of these two methods when it comes to identifying the salient topics in a corpus. We have used the publicly available and multilingual COVID-19 Twitter dataset collected from January 21, 2020 (and still ongoing) available via the COVID-19-TweetsIDs GitHub repository (Chen, Lerman & Ferrara, 2020). For this study, we will focus on tweets written in English from 2020 and 2021. We limited our study to the years 2020 to 2021, which contains 1 billion tweets (31 billion tokens), and extracted a random, time-stratified sample of 0,1%, which resulted in a total of approximately 1 million tweets (31 million tokens). In terms of methods, we employed unsupervised machine learning methods for both tasks. For topic modelling we used BERT embeddings and the BERTopic library (Grootendorst, 2022). Our script generates a full list of topics and assigned terms, a coherence score, and several data visualisations, such as topics-over-time graphs, heatmaps, and topic hierarchies. For keyword extraction, we used TextRank (Mihalcea & Tarau, 2004), a language-independent, graph-based ranking model. We then compare results returned by both methods in terms of usefulness and, finally, provide an interpretation of results by relating the extracted topics to the situation of the global pandemic at different stages of the crisis.es_ES
dc.description.sponsorshipUniversidad de Málaga. Campus de Excelencia Internacional Andalucía Tech.es_ES
dc.identifier.urihttps://hdl.handle.net/10630/27190
dc.language.isoenges_ES
dc.relation.eventdate10/05/2023es_ES
dc.relation.eventplaceOviedoes_ES
dc.relation.eventtitleXIV Congreso Internacional de Lingüística de Corpuses_ES
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 Internacional*
dc.rights.accessRightsopen accesses_ES
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.subjectCOVID-19 - En los medios de comunicación sociales_ES
dc.subjectRedes sociales en Internetes_ES
dc.subjectLingüística computacionales_ES
dc.subject.otherCOVID-19es_ES
dc.subject.otherTwitteres_ES
dc.subject.otherTopic modellinges_ES
dc.subject.otherKeywordses_ES
dc.subject.otherPolitical eventses_ES
dc.titleMapping of political events related to the COVID-19 pandemic on Twitter using topic modelling and keywords over time.es_ES
dc.typeconference outputes_ES
dspace.entity.typePublication
relation.isAuthorOfPublication72db3906-0c3b-4b43-b5bb-c0ac4ded0b88
relation.isAuthorOfPublication3233c4af-5a32-40f2-9c82-103bc48c43cd
relation.isAuthorOfPublication.latestForDiscovery72db3906-0c3b-4b43-b5bb-c0ac4ded0b88

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
CILC_2023_abstract.pdf
Size:
108.77 KB
Format:
Adobe Portable Document Format
Description: