PRED-COUPLE: a method that implements refined profile Hidden Markov Models to predict the coupling specificity of G-protein coupled receptors to G-proteins

Version 1.00

General description of the Method

The core of the PRED-COUPLE predictor is a library of refined profile Hidden Markov Models (pHMMs) that have been trained by the intracellular domain sequences of 282 GPCRs with known coupling properties, in a way that provides high discriminative power: From 6149 pHMMs originaly constructed, 18 have been selected and included in the refined library. All pHMMs were constructed and calibrated by the HMMER software package similar to signatures implemented in the pfam database. The HMMER package is also used to calculate scores and E-values during queries against the refined library of PRED-COUPLE. Thus, each motif is assigned a specific score and an E-value that corresponds to that score.
In a following step, motif specific E-values from motifs that belong to the same coupling group (e.g. gioloop1,gioloop2,gioloop3a, gioloop3b,giocterminal) are combined by using the QFAST(Bailey and Gribskov, 1998) algorithm. Combined E-values are then compared and sorted in an increasing order, as the final prediction of the method.

Results interpentation

Although scores do have a formal probabilistic interpentation, E-values only are presented in the output of this method. E-values measure consistency between the results actually obtained in the trial and their occurance by "pure chance" (i.e. having no relation to the regions modeled by a motif). The E-value is literaly the number of false possitives expected to score at least as high as the target sequence in a query against a given size database. Thus higher scores correspond to lower E-values. Typically, significant E-values are considered those below 1. Thus, E-values are by themselves a measure of significance of the matches returned for a given query (For a more in depth discussion, on profile HMMs and HMMER scores see the HMMER software package related documentation). However, each profile is characterized by a motif specific cutoff value, that has been already estimated with ROC curve analysis. Thus, trusted results are considered those with an E-value equal or below the motif specific cutoff and are indicated by the ! symbol. However, statisticaly significant hits with an E-value above the motif specific cutoff but below 1.5, which is the limit hit of the method, are indicated by the ? symbol.
For each motif, the match region within the query sequence is also presented, and that is expected to be within one of the receptor's intracellular domains. This information is valuable in deciphering structural themes of the GPCR to G-protein interaction, thus could be used to guide mutagenesis studies.
The Combined E-values field is the final prediction of the method. Specific cutoffs have already been applied to combined E-values so all Combined E-values presented are reliable. The combined p-value,(i.e.probability that a random sequence would score as high as the query sequence or more) is also presented here. Multiple Combined E-value hits should be scored by promiscuous receptors. However, their order is not expected to be accurate.

Implementation of pfam profiles

The PRED-COUPLE method also includes a library of 6 profile HMMS from the pfam Version 14.0 database that describe seven (7) transmembrane domain receptors, namely 7tm 1 to 6. The hypothesis that a query sequence indeed belongs to a GPCR is tested in precedent step, in a query against those six pHMMs. However, more signatures for GPCRs exist (or are expected to be implemented) in pfam or other protein pattern and domain databases like PROSITE,PRINTS and InterPro. We strongly suggest querying your sequence(s) against those databases before running the PRED-COUPLE method. However, when your queries include fragments of GPCRs or other sequences, simply ignore the CAUTION!!!!Probable non-GPCR sequence message that might appear.

Original paper :
"A method for the prediction of GPCRs coupling specificity to G-proteins using refined profile Hidden Markov Models"
Sgourakis N.G., Bagos P.G., Papasaikas P.G. and Hamodrakas S.J.BMC Bioinformatics, 6:104, 2005