Release

ProPhylER 1.0 is live now.

News

January 5 2010
The ProPhylER paper is now published in Genome Research

March 12 2010
Searching by name is now supported on the search page

March 12 2010
Searching with hg 18 coordinates for evaluating coding SNPs is now supported

Contacts

prophyler [at] prophyler.org
arend [at] stanford.edu

Resource Links

Ensembl
Uniprot
PDB
WuBlast
Probcons
Semphy
Jmol
Java

Other Links

Sidow Lab
Stanford Pathology Dept
Stanford Genetics Dept
Stanford School of Medicine

Funded by

NIH/NHGRI

Cluster Statistics Home Page

Topic Headers are Links to Graphical Summaries

All Clusters with Viewable Data

These pages summarize key statistics for all clusters for which ProPhylER analyses were successful, and which contain sufficient evolutionary variation to be useful.

Note: we added a few additional clusters in late 2009 but did not regenerate the statistics cited here.

There are 8,967 clusters with viewable analyses.
These clusters contain 217,760 sequences in high-quality alignments.


Number of Sequences per Species

Human tops the list at 13,861 sequences. Model organisms are also well-represented. Species that are highly diverged such as worm or sea squirt, and species with poor gene predictions such as Tetraodon, are less well-represented but still contribute thousands of sequences each.


Number of Clusters by Phylogenetic Scope

The best-represented phylogenetic scope is Eukaryotes with 2,887 clusters, meaning that all of those clusters contain sequences from a diverse sampling of eukaryotes. Having been in existence for well over a billion years and still being alignable among diverse organisms, these proteins are among the most constrained of all. The second-best represented scope is Vertebrates, with 2,349 clusters, mostly representing proteins that originated in the ancestral lineage of vertebrates after its divergence from that of other animals. The third scope is Metazoa, which is highly enriched in proteins of signal transduction and multicellular function.


Mean Tree Length and Alignment Size by Phylogenetic Scope

Eukaryotic clusters have an average of 43.1 sequences. Amniote clusters (containing sequences only from mammals and birds/reptiles) tend to be smallest, having an average of 6.3 sequences. The older the protein, the more sequences its cluster tends to contain. Tree length scales almost identically by a factor of five, underscoring the general tendency of large and old clusters having better resolution in ProPhylER's analyses than clusters of young proteins.


Alignment Length and Tree Length Distributions

This chart depicts the ranges of alignment length or tree length for each decile of clusters.


Clusters with Subsets

A lot of clusters tend to contain so many paralogs, or so many sequences from narrower phylogenetic scopes, that we broke them up into subsets (as explained here). 355 of the 8,967 viable clusters are such 'subsetted' clusters. They contain a total of 2802 subsets. The number of subsetted clusters will grow as we scrutinize other clusters for having sufficient depth and curate the problematic clusters mentioned below.

Problematic Clusters: Not Yet Viewable

Problematic Clusters

Some large clusters have closely related "neighbors" that may need to be merged into one cluster, and then subsetted as described above. Others are very diverse, and we need to decide whether to split them into separate clusters. Doing so will take some time, and our goal is to work through these before the next ProPhylER release. (If your ProPhylER search identified a protein from such a cluster, the search results page will ask you to contact us so that we can expedite analysis and make the cluster viewable.)


Home | Overview | Stats | Search | Help | Documentation | People | Site Map

Last updated 1/11/10