Advancing edge-based clustering and graph embedding for biological network analysis: a case study in RASopathies

Loading...
Thumbnail Image

Identifiers

Publication date

Reading date

Collaborators

Advisors

Tutors

Editors

Journal Title

Journal ISSN

Volume Title

Publisher

Oxford University Press

Metrics

Google Scholar

Share

Research Projects

Organizational Units

Journal Issue

Abstract

Understanding and predicting biological processes from protein–protein interaction (PPI) networks requires accurate and efficient representations of their structure. However, many existing methods fail to capture the complex, overlapping modular structure of biological systems. To address this, we propose a network embedding strategy that improves both biological interpretability and predictive power. By transforming networks into a low-dimensional space while preserving key topological properties, embedding enables the discovery of novel functional relationships. Pre-clustering a network before embedding enhances representation quality, i.e. the ability to preserve meaningful structural and functional properties in the embedding space. However, traditional non-overlapping clustering methods can introduce bias by ignoring the overlapping nature of biological communities. We overcome this limitation by integrating the Hierarchical Link Clustering (HLC) algorithm into an embedding workflow tailored for large, weighted, undirected networks. First, we introduce two optimized HLC implementations for Python and R, both outperforming existing methods in clustering accuracy and scalability. Then, by restricting random walks to HLC-defined communities, we improve the representation of biological pathways, as shown using Reactome on the human PPI network. We also apply our full cluster embedding workflow to analyze RASopathies, a group of interrelated disorders with a diverse range of phenotypes, caused by mutations in genes from the RAS/MAPK pathway. This approach was used not only to represent known pathways, but also to identify potential novel gene candidates associated with RASopathies, including Noonan and Costello syndrome. HLC implementations are available in the CDLIB library (https://github.com/GiulioRossetti/cdlib), and at https://github.com/jimrperkins/linkcomm for Python and R, respectively.

Description

Bibliographic citation

Federico García-Criado, Pedro Seoane, Elena Rojano, Juan A G Ranea, James R Perkins, Advancing edge-based clustering and graph embedding for biological network analysis: a case study in RASopathies, Briefings in Bioinformatics, Volume 26, Issue 4, July 2025, bbaf320, https://doi.org/10.1093/bib/bbaf320

Collections

Endorsement

Review

Supplemented By

Referenced by

Creative Commons license

Except where otherwised noted, this item's license is described as Atribución-NoComercial 4.0 Internacional