Release
ProPhylER 1.0 is live now.
News
January 5 2010
The ProPhylER paper is now published in Genome Research
March 12 2010
Searching by name is now supported on the search page
March 12 2010
Searching with hg 18 coordinates for evaluating coding SNPs is now supported
Contacts
prophyler [at] prophyler.org
arend [at] stanford.edu
Resource Links
Ensembl
Uniprot
PDB
WuBlast
Probcons
Semphy
Jmol
Java
Other Links
Sidow Lab
Stanford Pathology Dept
Stanford Genetics Dept
Stanford School of Medicine
Funded by
Phylogenetic Scope and Alignment Subsets
The Phylogenetic Scope of a set of sequences is the last common ancestor of the species that are represented. For example, an alignment with sequences from human, chicken, frog, and Ciona would have a "Chordata" phylogenetic scope. A cluster of sequences from the same species except Ciona would have a "Tetrapoda" scope.
The term "Phylogenetic Scope" was coined in this paper to help explain sensitivity/specificity tradeoffs in comparative sequence analyses of genomic DNA. The same concepts apply to comparative analyses of protein data.
Phylogenetic Scope of Major Organisms
The tree below depicts a mostly bifurcating taxonomy of some important organisms. The indicated scope on an internal branch of the tree encompasses the species that are descendants of that branch, to the exclusion of all other species. For example, a ProPhylER Cluster that has a tetrapod scope certainly contains an amphibian such as frog, and one or more species of birds, reptiles, or mammals.
Because the tree of life is so rich, there is a very large number of possible scopes, not all of which are represented here. For purposes of collecting statistics on the ProPhylER database, we used a simplified tree that contains only the scopes shown here. A cluster with a Coelomate scope would be counted as a cluster with a Metazoan scope, for example.
The ProPhylER Interface does provide the actual phylogenetic scope of the cluster, based on a relatively recent version of the NCBI taxonomy. If you do not recognize the scientific name of the phylogenetic scope, search the NCBI taxonomy or click the Taxonomy tab of the Interface to see the species that are represented in the tree.

Figure 1. Simplified taxonomy of major organisms and some important phylogenetic scopes.
Cluster Subsets
Note: this is relevant only for a relatively small fraction of ProPhylER clusters; most clusters only have the Master scope and one ProPhylER analysis.
ProPhylER may have several different analyses that are specific to different levels in the phylogenetic hierarchy. This 'subsetting' of the alignment, tree, and ultimately of the analyses, was performed by a curator who decided that there was enough information in the tree to make analyses of individual subsets potentially useful. Figure 2 shows how a gene tree can contain different scopes.
![]() |
Figure 2. A hypothetical tree of a cluster with 16 sequences, with phylogenetic scopes indicated. Three sequences, two human and one yeast, are labeled for purposes of discussion below. ProPhylER would have separate analyses that are limited to the indicated subsets, selectable in the Selector tab of the Interface. Note that Human YFG1 is in three sets: Metazoa YFG1, Metazoa YFG, and the Master set. Similarly, Human YFG2 is in three sets also, whereas Yeast YFG is in two sets. |
Why are subset analyses potentially useful? Why don't we just use the Master set?
If we could always be sure that there have been no shifts in constraint whatsoever over the course of the evolutionary history of the sequences in the tree, then just focusing on the Master set would be the right thing to do. The Master set gives the greatest power for constraint analysis because it contains the most (presumably functionally neutral) diversity. But what if there was a functional shift between fungi and metazoa, or perhaps after the duplication that gave rise to the two metazoan paralogs? The residues in the protein that were responsible for the functional shift would appear unconstrained in the Master set. In the appropriate subset, however, they would appear to be constrained.
One of the interesting features of ProPhylER is that it allows you to compare different subsets, if they are available for your protein. By comparing subsets you may be able to find regions or amino acids that are constrained in one subset but not in the other, or that are different between subsets, but constrained within each subset.
Choosing a Scope and Reference Sequence
The Quick Help tab of the Interface, and the dedicated Help Page, explain how to choose the phylogenetic scope of the analysis. Here we explain the why: for which purpose you would want to choose which subsets. The description below is of the format Reference Sequence [Phylogenetic Scope].
Deciding which scope is right for your purpose (assuming you are interested in Human YFG1):
Human YFG1 [Metazoa YFG1] versus Human YFG1 [Metazoa YFG] and
Human YFG1 [Metazoa YFG] versus Human YFG1 [Master]
Given the region or the residue of the protein in which you are interested, which scope has sufficient information in terms of variation, without losing alignability? If you find that you are interested in a region that is well-aligned among all fungi and metazoa, such as a highly conserved domain, then choosing the Master scope would be best because it maximizes the available information. On the other hand, if you find that the region of interest only exists in the metazoan YFG1, then any scope broader than Metazoa YFG1 will obscure a potential constraint signal.
Comparing paralogs to look for regions or residues that may differ in constraint:
Human YFG1 [Metazoa YFG1] versus Human YFG2 [Metazoa YFG2]
Finding regions in paralogs that differ but are constrained:
Human YFG1 [Metazoa YFG1] versus Human YFG1 [Metazoa YFG], and, reciprocally,
Human YFG2 [Metazoa YFG2] versus Human YFG2 [Metazoa YFG]
Caveats
ProPhylER, like many other tools of prediction or inference, is subject to specificity / sensitivity tradeoffs. If there is little variation in a particular scope then everything will look 'conserved'. While ProPhylER does not give you analyses that are below a certain threshold of evolutionary variation, you still need to exercise judgment as to whether the overall amont of variation that is present in the analysis is sufficient to conclude that a particular residue or region is constrained.
This is of special importance for MAPP analyses because MAPP only looks at a single position, which provides inherently less power than the windowing analyses that underlie ESF.
Nonetheless, in the context of comparisons between paralogs or orthologs, as discussed here, you need to be mindful of the fact that some subsets have more variation than others. For example, as can be seen by the branch lengths in the tree above, the scope Metazoan YFG has more than twice the evolutionary variation, and therefore a great deal more power, than the scope Fungi YFG. Apparent differences in constraint may simply be due to one scope having more power to detect variation than the other.
Last updated 10/05/08