ProPhylER Dataflow and Analysis Pipeline I
Overview
ProPhylER's data were generated by a dataflow that consists of three main parts:
Cluster generation is based on Smith-Waterman alignments of pairs of sequences and is tuned to identify likely orthologs and close paralogs from whole-genome proteomes, to the exclusion of distant paralogs. Two manual curation steps ensured that clusters contain sufficient sequences to be informative, but that divergent paralogs would not be in the same cluster. After genome protein clusters are built, they are augmented by UniProt sequences, which contribute a substantial additional amount of information because of their annotations and because many UniProt sequences come from species whose genomes have not been sequenced. Another manual curation step at this point ensured integrity and specificity of the clusters.
From this point on, we require robust alignments of orthologs and close paralogs that
This was enforced by manual curation of the alignments in which sequences were removed that violated ProPhylER's high standards for alignment specificity.
For each alignment, trees are then built automatically. After treebuilding, curator input may also be required. ProPhylER's post-treebuilding algorithms are then applied to
A number of large clusters have undergone additional curation in which subclusters were generated that can be viewed in the same ProPhylER session, allowing comparisons between different phylogenetic scopes or paralogs.
Detailed Dataflow Descriptions
1. From Individual Protein Sequences to Clusters of Closely Related Homologs
2. From Clusters of Close Homologs to Alignments and Trees
3. From Alignments and Trees to Profiles, Constrained Regions, and Mutation Impact Scores
Last updated 8/25/08
Release
ProPhylER 1.0 is live now.
News
January 5 2010
The ProPhylER paper is now published in Genome Research
March 12 2010
Searching by name is now supported on the search page
March 12 2010
Searching with hg 18 coordinates for evaluating coding SNPs is now supported
Contacts
prophyler [at] prophyler.org
arend [at] stanford.edu
Resource Links
Ensembl
Uniprot
PDB
WuBlast
Probcons
Semphy
Jmol
Java
Other Links
Sidow Lab
Stanford Pathology Dept
Stanford Genetics Dept
Stanford School of Medicine
Funded by