Percolation of annotation errors through hierarchically structured protein sequence databases.

Walter R Gilks; Benjamin Audit; Daniela de Angelis; Sophia Tsoka; Christos A Ouzounis

doi:10.1016/j.mbs.2004.08.001

Article Dans Une Revue Mathematical Biosciences Année : 2005

Percolation of annotation errors through hierarchically structured protein sequence databases.

(1) , (2, 3) , (1) , (2) , (2)

1
2
3

Walter R Gilks

Fonction : Auteur

Medical Research Council Biostatistics Unit

Benjamin Audit

Fonction : Auteur
PersonId : 180725
IdHAL : benjamin-audit
ORCID : 0000-0003-2683-9990
IdRef : 140755896

European Bioinformatics Institute [Hinxton]

Laboratoire Joliot Curie

Daniela de Angelis

Fonction : Auteur

Medical Research Council Biostatistics Unit

Sophia Tsoka

Fonction : Auteur

European Bioinformatics Institute [Hinxton]

Christos A Ouzounis

Fonction : Auteur

European Bioinformatics Institute [Hinxton]

Résumé

Databases of protein sequences have grown rapidly in recent years as a result of genome sequencing projects. Annotating protein sequences with descriptions of their biological function ideally requires careful experimentation, but this work lags far behind. Instead, biological function is often imputed by copying annotations from similar protein sequences. This gives rise to annotation errors, and more seriously, to chains of misannotation. [Percolation of annotation errors in a database of protein sequences (2002)] developed a probabilistic framework for exploring the consequences of this percolation of errors through protein databases, and applied their theory to a simple database model. Here we apply the theory to hierarchically structured protein sequence databases, and draw conclusions about database quality at different levels of the hierarchy.

Mots clés

Automated annotation hierarchical protein classification

Domaines

Bio-informatique [q-bio.QM] Bio-Informatique, Biologie Systémique [q-bio.QM]

Benjamin Audit : Connectez-vous pour contacter le contributeur

https://ens-lyon.hal.science/ensl-00175660

Soumis le : samedi 29 septembre 2007-12:54:42

Dernière modification le : vendredi 12 mai 2023-04:10:39

Dates et versions

ensl-00175660 , version 1 (29-09-2007)

Identifiants

HAL Id : ensl-00175660 , version 1
DOI : 10.1016/j.mbs.2004.08.001
PUBMED : 15748731

Citer

Walter R Gilks, Benjamin Audit, Daniela de Angelis, Sophia Tsoka, Christos A Ouzounis. Percolation of annotation errors through hierarchically structured protein sequence databases.. Mathematical Biosciences, 2005, 2 (193), pp.223-34. ⟨10.1016/j.mbs.2004.08.001⟩. ⟨ensl-00175660⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-LYON CNRS UDL

74 Consultations

0 Téléchargements

Percolation of annotation errors through hierarchically structured protein sequence databases.

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager