Word correlation matrices for protein sequence analysis and remote homology detection

Meinicke, Peter

doi:10.1186/1471-2105-9-259

Publication:
Word correlation matrices for protein sequence analysis and remote homology detection

Files

1471-2105-9-259_Lingner.pdf (342.72 KB)

Date

2008

Authors

Lingner, Thomas

Meinicke, Peter

Publisher

Biomed Central Ltd

Abstract

Background: Classification of protein sequences is a central problem in computational biology. Currently, among computational methods discriminative kernel-based approaches provide the most accurate results. However, kernel-based methods often lack an interpretable model for analysis of discriminative sequence features, and predictions on new sequences usually are computationally expensive. Results: In this work we present a novel kernel for protein sequences based on average word similarity between two sequences. We show that this kernel gives rise to a feature space that allows analysis of discriminative features and fast classification of new sequences. We demonstrate the performance of our approach on a widely-used benchmark setup for protein remote homology detection. Conclusion: Our word correlation approach provides highly competitive performance as compared with state-of-the-art methods for protein remote homology detection. The learned model is interpretable in terms of biologically meaningful features. In particular, analysis of discriminative words allows the identification of characteristic regions in biological sequences. Because of its high computational efficiency, our method can be applied to ranking of potential homologs in large databases.

URI

https://resolver.sub.uni-goettingen.de/purl?gro-2/54043

Collections

Publications

Full item page

Publication:
Word correlation matrices for protein sequence analysis and remote homology detection

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Research Projects

Organizational Units

Journal Issue

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By

Publication: Word correlation matrices for protein sequence analysis and remote homology detection

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Research Projects

Organizational Units

Journal Issue

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By

Publication:
Word correlation matrices for protein sequence analysis and remote homology detection