|
  PRED-GPCR Help page. |
Version 1.01 |
What do the different E-values mentioned in the Filtering options and the Results page mean?
|
PRED-GPCR, just like Pfam, is based on profile Hidden Markov Model searches implemented by the
HMMER software package. HMMER
uses some rather accurate empirical methods to estimate E-values (expectation values) for a given query sequence
with the profile HMMs included in the PRED-GPCR library. These E-values are refered to as "Individual motif E-values" in the
PRED-GPCR system. E-values measure consistency between the
results actually obtained in the trial and the "pure chance" explanation for those results. In other words E-value is the
number of hits that would be expected in a given size Database of randomly generated sequences, having a match score at least
as high as the match score of your sequence with a single motif. Typically, significant E-values are considered those much below 1.
Therefore these E-values are by themselves a measure of significance of the matches returned for a given query (For a more in
depth discussion, on profile HMMs and HMMER scores see the HMMER software package related documentation).
However, the PRED-GPCR library includes more than one profile HMMs for each GPCR family. Therefore, what we need is a statistically
valid method to combine the evidence of all the HMMs derived from the same family. This method is provided by the Qfast algorithm
(Bailey et. al 1998) which is used in the PRED-GPCR system to produce a single E-value for each family (refered to as "Combined E-values"
in the PRED-GPCR system).
|
I can't decide what filtering options I should choose...
|
Well, that depends on what you are after. The Combined E-value cutoff filters your results for combined family E-values. The default value
for this field is 0.004 which has been determined to be the weighted Minimum Error Point on a test set of unseen examples. Minimum Error
Point is the E-value for which a classifier makes the fewest errors (False positives plus False negatives). However, you can use a higher
combined E-value cutoff if you wish to broaden your search space.
The Indvidual motifs cutoff filters single motifs and can be selected to be:
- A motif specific cutoff that uses discrete predefined thresholds for each motif
in the PRED-GPCR library to filter your results. These thresholds are empirical cutoffs, weighted between the Last True Positive
(Last matching family member) E-value and the first True Negative (first matching sequence not belonging to the family) in a
tuning data set.
- A Global E-value cutoff that uses a common, user defined threshold to filter all motifs. Again, you can use
a loose Global E-value cutoff if you wish to include distant hits in your query results.
As a general principle keep in mind that there is a trade-off between selectivity and sensitivity for different E-value cutoffs. More
strict cutoffs suppress sensitivity in favor of selectivity and vice-versa.
The low complexity filter implements the CAST algorithm (Promponas et. al 2000) which allows detection of low complexity regions and
their selective masking. This filter can improve the selectivity of the method, since the sequence
score takes into account all scoring domains and could, therefore, return false positives in case of low scoring domains
repeated along the sequence.
|
How should I evaluate my results?
|
Trusted results are ONLY those with a combined E-value below the weighted Minimum Error Point (see above) and family corresponding
motif E-values below the individual motif specific E-value cutoff. These matches are indicated with the "!" symbol in the Results page. Additionally, users are warned for the less significant matches with the "?" symbol (Example Output. All results
failing to fulfil either of the criteria mentioned above while still producing significant E-values (less than 1 order of magnitude above
the predefined cutoffs) are marginal and should be used with discretion.
In addition a sequence that does not produce significant results cannot safely be assumed to be a non-GPCR. There are GPCR families which
have not been included in the PRED-GPCR classification system. This is the case for a few sparsely populated GPCR families or some
ill-characterised Orphan GPCR families.
|
Should I query the PRED-GPCR system with a sequence fragment?
|
Of course. If you feel lucky...You see the PRED-GPCR library motifs correspond to a
confined segment of the protein family multiple sequence alignment. Even if a protein
family is represented in the PRED-GPCR library with more than one motifs, these motifs
could correspond to a different fragment of the sequence you have in your hands.
Therefore it is quite probable that a fragment sequence belonging to one of the GPCR
families included in the PRED-GPCR system will not produce significant matches. So,
always keep in mind that the PRED-GPCR system is more effective when queried with whole
sequences.
|
How are the family related Swiss-Prot and Trembl Entries gathered?
|
These entries are obtained automatically using the PRED-GPCR system.
The Swiss-Prot and Trembl databases are regularly queried against the PRED-GPCR
library. All sequences matching below the weighted Minimum Error Point AND with
motif matches below the individual motif specific cutoffs are treated as trustworthy and
assumed family members.
|
PRED-GPCR Version 1.01
Designed for viewing with Internet Explorer 4 or above, Netscape 6 or above.