Processing Structured Data Streams

Barquero Moreno, Gala

Preguntas frecuentes Manual de uso Derechos de autor Contacto/Sugerencias

dc.contributor.advisor	Vallecillo-Moreno, Antonio Jesús
dc.contributor.advisor	Troya-Castilla, Javier
dc.contributor.author	Barquero Moreno, Gala
dc.contributor.other	Lenguajes y Ciencias de la Computación	es_ES
dc.date.accessioned	2021-05-06T12:10:47Z
dc.date.available	2021-05-06T12:10:47Z
dc.date.created	2021
dc.date.issued	2021
dc.date.submitted	2021-02-19
dc.identifier.uri	https://hdl.handle.net/10630/21686
dc.description	We elaborate this study in order to choose the most suitable technology to develop our proposal. Second, we propose three methods to reduce the set of data to be processed by a query when working with large graphs, namely spatial, temporal and random approximations. These methods are based on Approximate Query Processing techniques and consist in discarding the information that is considered not relevant for the query. The reduction of the data is performed online with the processing and considers both spatial and temporal aspects of the data. Since discarding information in the source data may decrease the validity of the results, we also define the transformation error obtain with these methods in terms of accuracy, precision and recall. Finally, we present a preprocessing algorithm, called SDR algorithm, that is also used to reduce the set of data to be processed, but without compromising the accuracy of the results. It calculates a subgraph from the source graph that contains only the relevant information for a given query. Since this technique is a preprocessing algorithm it is run offline before the actual processing begins. In addition, an incremental version of the algorithm is developed in order to update the subgraph as new information arrives to the system.	es_ES
dc.description.abstract	A large amount of data is daily generated from different sources such as social networks, recommendation systems or geolocation systems. Moreover, this information tends to grow exponentially every year. Companies have discovered that the processing of these data may be important in order to obtain useful conclusions that serve for decision-making or the detection and resolution of problems in a more efficient way, for instance, through the study of trends, habits or customs of the population. The information provided by these sources typically consists of a non-structured and continuous data flow, where the relations among data elements conform graph structures. Inevitably, the processing performance of this information progressively decreases as the size of the data increases. For this reason, non-structured information is usually handled taking into account only the most recent data and discarding the rest, since they are considered not relevant when drawing conclusions. However, this approach is not enough in the case of sources that provide graph-structured data, since it is necessary to consider spatial features as well as temporal features. These spatial features refer to the relationships among the data elements. For example, some cases where it is important to consider spatial aspects are marketing techniques, which require information on the location of users and their possible needs, or the detection of diseases, that use data about genetic relationships among subjects or the geographic scope. It is worth highlighting three main contributions from this dissertation. First, we provide a comparative study of seven of the most common processing platforms to work with huge graphs and the languages that are used to query them. This study measures the performance of the queries in terms of execution time, and the syntax complexity of the languages according to three parameters: number of characters, number of operators and number of internal variables.	es_ES
dc.language.iso	eng	es_ES
dc.publisher	UMA Editorial	es_ES
dc.rights	info:eu-repo/semantics/openAccess	es_ES
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	*
dc.subject	Proceso de datos	es_ES
dc.subject	Rendimiento	es_ES
dc.subject.other	Data Stream Processing	es_ES
dc.subject.other	Dynamic Graphs	es_ES
dc.subject.other	Performance Optimisation	es_ES
dc.subject.other	Data Queries	es_ES
dc.subject.other	Model-driven engineering	es_ES
dc.title	Processing Structured Data Streams	es_ES
dc.type	info:eu-repo/semantics/doctoralThesis	es_ES
dc.centro	E.T.S.I. Informática	es_ES
dc.rights.cc	Attribution-NonCommercial-NoDerivatives 4.0 Internacional	*

Ficheros en el ítem

Nombre:: TD_BARQUERO_MORENO_Gala.pdf
Tamaño:: 12.35Mb
Formato:: PDF

Este ítem aparece en la(s) siguiente(s) colección(ones)

LCC - Tesis

Mostrar el registro sencillo del ítem

Excepto si se señala otra cosa, la licencia del ítem se describe como Attribution-NonCommercial-NoDerivatives 4.0 Internacional