Abstract
Many proteins remain functionally unannotated. Sequence alignment (SA)
uncovers missing annotations by transferring functional knowledge between
species' sequence-conserved regions. Because SA is imperfect, network alignment
(NA) complements SA by transferring functional knowledge between conserved
biological network, rather than just sequence, regions of different species.
Existing NA assumes that it is topological similarity (isomorphic-like
matching) between network regions that corresponds to the regions' functional
relatedness. However, we recently found that functionally unrelated proteins
are almost as topologically similar as functionally related proteins. So, we
redefined NA as a data-driven framework, TARA, which learns from network and
protein functional data what kind of topological relatedness (rather than
similarity) between proteins corresponds to the proteins' functional
relatedness. TARA used topological information (within each network) but not
sequence information (between proteins across networks). Yet, its alignments
yielded higher protein functional prediction accuracy than alignments of
existing NA methods, even those that used both topological and sequence
information. Here, we propose TARA++ that is also data-driven, like TARA and
unlike other existing methods, but that uses across-network sequence
information on top of within-network topological information, unlike TARA. To
deal with the within-and-across-network analysis, we adapt social network
embedding to the problem of biological NA. TARA++ outperforms protein
functional prediction accuracy of existing methods.
Citation
ID:
282316
Ref Key:
milenkovic2020datadriven