Release
ProPhylER 1.0 is live now.
News
January 5 2010
The ProPhylER paper is now published in Genome Research
March 12 2010
Searching by name is now supported on the search page
March 12 2010
Searching with hg 18 coordinates for evaluating coding SNPs is now supported
Contacts
prophyler [at] prophyler.org
arend [at] stanford.edu
Resource Links
Ensembl
Uniprot
PDB
WuBlast
Probcons
Semphy
Jmol
Java
Other Links
Sidow Lab
Stanford Pathology Dept
Stanford Genetics Dept
Stanford School of Medicine
Funded by
Limitations of ProPhylER
ProPhylER does not handle splice variants. One and only one protein sequence per gene is used.
Please contact us if you think something is amiss.
Limitations with Regards to Search
Certain limitations were uncovered in the course of building and curating the database and are mostly related to the underlying sequence data.
Names: The ProPhylER names of most clusters are simply concatenations of the UniProt or Ensembl names of the constituent proteins. Some large clusters have curated ProPhylER names. Naming sequences and clusters is fraught with difficulty and automatically generated names can be quite confusing. This is the main reason why ProPhylER does not (yet) have a Browse-by-name functionality, and why ProPhylER does not allow searches by name. This is a shortcoming but the workaround is simple: search by sequence. The sequence of a protein is its own best unique identifier.
Sequence Representation -- Clusters: Evolution generates a huge diversity of sequences and some simply do not have a lot of orthologs in the current databases. Especially in organisms that are highly diverged on average, such as C. elegans, a minority of genes has a sufficient number of orthologs or closely related paralogs from which an alignment can be built. As a consequence, ProPhylER cannot guarantee that your favorite protein from your favorite critter has made it into a cluster.
Sequence Representation -- Alignments: The alignment curation of ProPhylER is extremely stringent. More than 50% of sequences that received a cluster assignment were not used in the final alignment. As a consequence, your protein may be part of a cluster, but the alignment may not include it. Please see the Search Failures Help Page for further discussion of sequence representation and how it affects your search results.
Limitations with Regards to Displayed Data
ProPhylER is based on statistical methdology that analyzes evolutionary data. The methodology is not perfect, and neither is evolution. All predictions are subject to uncertainty. Part of using ProPhylER effectively is to be able to gauge how good the underlying data are.
The most important parameter to understand in this regard is the "Tree Length", in substitutions per site. More subs/site means more variation, and therefore more resolving power. You find the subs/site number in the Selector tab as pointed out in this set of screenshots. That page also illustrates the effect of much versus little evolutionary variation (tree length) on ProPhylER analyses. (For a refresher as to what Tree Length is and how it is calculated visit the Html slide show about trees off the Documentation link.)
ProPhylER has a hard cutoff of 0.5 subs/site; below this cutoff, no analyses are displayed because statistical fluctuations contribute too much noise for analyses to be reliable.
ProPhylER has little control over tree length, which is a function of representation in the sequence databases, and evolution -- mostly evolution. Some proteins are 'young' and present in only a small part of the tree of life; for those, ProPhylER is unlikely to have an analysis because not enough species are present, and not enough time has elapsed to let evolutionary variation accumulate. Other proteins are old and ProPhylER has lots of sequences that collaborate to give much tree length.
Last updated 8/25/08