Web Page Classification based on unsupervised learning using MIME type analysis
Loading...
Identifiers
Publication date
Reading date
Authors
Jiménez, Luis Roberto
Jiménez, Luis Roberto
Collaborators
Advisors
Tutors
Editors
Journal Title
Journal ISSN
Volume Title
Publisher
Share
Department/Institute
Abstract
The properties of a web page have a strong impact on the experience of web users. In this work, a classification method based on unsupervised clustering is proposed to group web pages into classes based on download content that may affect the Quality of Experience (QoE) perceived by the user. Groups are defined based on Multipurpose Internet Mail Extensions (MIME) content breakdown and external subdomain connections, obtained with a desktop personal computer (PC) running WebPageTest tool. The dataset is generated with a PC as a terminal, emulating the first access to 500 popular websites. The collected data is divided into groups with a classical unsupervised learning algorithm, namely K-means clustering. Results show how web pages are classified in six groups and their cluster
characteristics.






