An Efficient Computational Model for Large-Scale Prediction of Protein-Protein Interactions Based on Accurate and Scalable Graph Embedding
Su, XR (Su, Xiao-Rui)[ 1,2,3 ]; You, ZH (You, Zhu-Hong)[ 1,2,3 ]; Hu, L (Hu, Lun)[ 1,2,3 ]; Huang, YA (Huang, Yu-An)[ 1 ]; Wang, Y (Wang, Yi)[ 1,2,3 ]; Yi, HC (Yi, Hai-Cheng)[ 1,2,3 ]
刊名FRONTIERS IN GENETICS
2021
卷号12期号:2页码:1-10
关键词large-scale protein-protein interaction GraphZoom weighted graph graph embedding
ISSN号1664-8021
DOI10.3389/fgene.2021.635451
英文摘要

Protein-protein interaction (PPI) is the basis of the whole molecular mechanisms of living cells. Although traditional experiments are able to detect PPIs accurately, they often encounter high cost and require more time. As a result, computational methods have been used to predict PPIs to avoid these problems. Graph structure, as the important and pervasive data carriers, is considered as the most suitable structure to present biomedical entities and relationships. Although graph embedding is the most popular approach for graph representation learning, it usually suffers from high computational and space cost, especially in large-scale graphs. Therefore, developing a framework, which can accelerate graph embedding and improve the accuracy of embedding results, is important to large-scale PPIs prediction. In this paper, we propose a multi-level model LPPI to improve both the quality and speed of large-scale PPIs prediction. Firstly, protein basic information is collected as its attribute, including positional gene sets, motif gene sets, and immunological signatures. Secondly, we construct a weighted graph by using protein attributes to calculate node similarity. Then GraphZoom is used to accelerate the embedding process by reducing the size of the weighted graph. Next, graph embedding methods are used to learn graph topology features from the reconstructed graph. Finally, the linear Logistic Regression (LR) model is used to predict the probability of interactions of two proteins. LPPI achieved a high accuracy of 0.99997 and 0.9979 on the PPI network dataset and GraphSAGE-PPI dataset, respectively. Our further results show that the LPPI is promising for large-scale PPI prediction in both accuracy and efficiency, which is beneficial to other large-scale biomedical molecules interactions detection.

WOS记录号WOS:000627776400001
内容类型期刊论文
源URL[http://ir.xjipc.cas.cn/handle/365002/7832]  
专题新疆理化技术研究所_多语种信息技术研究室
通讯作者You, ZH (You, Zhu-Hong)[ 1,2,3 ]
作者单位1.Xinjiang Lab Minor Speech & Language Informat Pro, Urumqi, Peoples R China
2.Univ Chinese Acad Sci, Beijing, Peoples R China
3.Chinese Acad Sci, Xinjiang Tech Inst Phys & Chem, Urumqi, Peoples R China
推荐引用方式
GB/T 7714
Su, XR ,You, ZH ,Hu, L ,et al. An Efficient Computational Model for Large-Scale Prediction of Protein-Protein Interactions Based on Accurate and Scalable Graph Embedding[J]. FRONTIERS IN GENETICS,2021,12(2):1-10.
APA Su, XR ,You, ZH ,Hu, L ,Huang, YA ,Wang, Y ,&Yi, HC .(2021).An Efficient Computational Model for Large-Scale Prediction of Protein-Protein Interactions Based on Accurate and Scalable Graph Embedding.FRONTIERS IN GENETICS,12(2),1-10.
MLA Su, XR ,et al."An Efficient Computational Model for Large-Scale Prediction of Protein-Protein Interactions Based on Accurate and Scalable Graph Embedding".FRONTIERS IN GENETICS 12.2(2021):1-10.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace