FT - Study of Residue Periodicities in Sequences
|
Pasquier, C.M., Promponas, V.I., Varvayannis, N.J. and Hamodrakas, S.J.
|
|
Index | |
|
2.Copyright notice | |
|
4. Description of the Output |
3. Input description
The input form, which allows a user to specify the data needed
by the program, is divided in seven panels (cf. fig 1.):
- all data concerning the sequence to be analysed (the sequence itself,
the format used, the selection of a part ot the sequence) are specified
in the Sequence specification panel,
- the selection of residues we want to analyse the periodicity is done
in the Selection of Residues panel,
- additional parameters allowing to select the result returned by the
program are specified in the Filter panel,
- information about the way the results are displayed are specified in
the Output Format panel,
- command buttons to perform a search of periodicities for the selected
residues, to check the input data or to clear the form are available in
the Command panel,
- shortcut buttons that allow to do searchs for predefined groups of
residues are grouped in the Shortcut Buttons
panel,
- the Statistics area is used by the program to display statistics
about the presence of residues in the sequence
- errors relative to the data entered on the form are displayed in the
Errors panel,
Sequence Specification |
Statistics |
Output Formatting |
Shortcut Buttons |
Selection of Residues |
Filter |
Command |
Errors |
Fig. 1: Layout of the input form
A sequence can be either directly typed in the text area or
pasted from a text editor. Alternatively,
the user can specify the URL (with the form 'http://...') of a sequence
accessible through the internet network.
The sequence can be given in FASTA,
SwissProt or
PDB format. The user should select the format used by checking the
appropriate box in the format frame
As the PDB format may contain possibly more than one chain of residues, it
is necessary to indicate which chain you want to analyse. This is done by
entering the name of the selected chain (in the same format as in the PDB
sequence) in the ChainID area. If the PDB sequence contains only one
chain, then its identifier should be a space and you can leave this parameter
blank. If no ChainID is specified and the PDB sequence contains more than
one chain, then the first chain found is used.
For convenience, it is also possible to enter a sequence as plain text
by using the 1-letter code of residues.
Capital letters must be used for proteins and lowercase letters for DNA
sequences. The 'X' letter is used for unknown or not important
residues. Spaces or other characters are allowed since they do not affect
the results of the program. The maximum length of a sequence is
4096 residues.
By default, the whole sequence is analysed but it is possible to select
only a part of the sequence by changing the two parameters below:
- Start residue
- This integer number indicate the position in the sequence of the first
residue to be selected
- End residue
- This integer number indicate the position in the sequence of the last
residue to be selected. The special value '#' for the end residue
indicates that we want to select the sequence from the value specified
for the start residue to the end.
These two values are inclusive, then the length of the selected sequence
is equal to (End residue - Start residue + 1).
- embedding size
- The sequence is 'embedded' in an array of 0's for computational
purposes. The size of this array must be a power of 2 and obviously
it must be greater than the length of the selected part of the sequence.
The size of this array affects the resolution of the results: a bigger
array size gives a better resolution but, of course, it affects the speed
of the computation. This number should not be greater than 4096.
The special character '#' means that the program must automatically
choose the smallest power of 2 that is greater than the length of the
selected sequence
- Ignore gaps
- By default, gaps are represented by the character '-' in sequences
expressed with the 1-letter codes of residues. Gaps are considered as
unknown residues by the program. The 'Ignore gaps' option can be checked
to force the program to ignore gaps ant to handle them as space characters.
- Accept Unknown
- By default, unknown residues are specified with the letter 'X'. But, in some
databases, some other letters are used to indicate an unknown residue.
The Accept Unk. checkbox should be used in such cases to indicate that
each unknown letter found in the sequence must be considered as an 'X'.
This panel display the names of all residues followed by a text area which
represents the weight of the corresponding residue. The weights are used to
create the pulse which will be analysed by the program. For example, by selecting
the weight 1 for 'A' and 2 for 'L', the sequence MISLIAALAVD will be
transformed into the array
{0 0 0 2 0 1 1 2 1 0 0}
by the program.
The weight of the residue could represent a special property of the residue
(the charge for example) and can have a non-integer value.
This, may allow a user to make an association with available tables of aminoacid/
base properties of residues (e.g a hydrophobicity scale) easily. Simply, he has
to fill the text area for each aminoacid/base with the arithmetic value found in
the relevant table of the property and then run the program.
If a residue that does not appear in the sequence is weighted the results will
not be affected. |
In this panel, additional parameters may be specified. These variables
are transmitted to the calculation program and act on the returned results.
- Cutoff
- The cutoff represents the minimum intensity to be displayed.
The probability of finding by chance a periodicity with intensity I
is equal to exp(-I).
- Minimum period
- This parameter represents the minimim value of the periodicities presented
in the output. All periodicities lower that this limit are not displayed.
The minimum period must be greater than '1'.
- Maximum period
- This parameter represents the maximum value of the periodicities presented
in the output. All periodicities greater that this limit are not displayed.
The maximum period must not exceed the length of the selected part of the
sequence. The special character '#' means that the program must use as
maximum period, half of the selected sequence length.
The display of the output is controlled from this panel.
- Stats in %
- By default, the panel 'Statistics' displays the number of each residue
in the sequence. Statistics can be displayed as percentages by checking this
option.
- 3-letter
- By default, amino acids are represented with their 1-letter code. By
checking this option, one can force the program to display amino acids
with their 3-letter code (correspondence table
between 1-letter and 3-letter code).
- HTML page / plain text
- These checkboxes allow the user to choose how the results should be
displayed. Choose 'plain text' if you want to copy and paste the results
in a text editor or if you want to save it without all the HTML stuff.
It should be noted that the graphical representation is not inserted in the
plain text mode
This panel contains three buttons:
- Submit
- Check the data on the form, call the program and display the results.
If errors are detected in the form, the data are not processed and
comprehensive error messages are displayed in the frame panel.
- Check
- Check the data on the form but doesn't call the program. This button
generates the display of the statistics, the length of the sequence and
the minimum embedding size on the form.
- Reset
- Clear the form and redisplay default values for all the input fields
This panel contains eleven buttons:
The first one, called 'Most frequent' automatically searches for periodicities
for all the residues in the sequence. Most frequent residues are examined
first. The content of the result page depends of the choosen display mode
(see above).
If 'Plain text' is selected, then the outputs of all the searches are merged
in an unique result page that you can save or copy in a text editor.
If 'HTML page' is selected, then the result page for the most frequent
residue in the sequence is displayed first. Result pages for the next most
frequent residue(s) are obtained by pressing a button at the top of the page.
All the other buttons generate a search for a predefined group of residues:
- a-helix
- Execute the program for the group of residues
A, V, L, I, F, W, D, E, Q, M, H, K
- ß-sheet
- Execute the program for the group of residues
V, L, I, F, W, Y, T, C, Q, M
- ß-turn
- Execute the program for the group of residues
G, P, D, N, S, C, K, W, Y, Q, T, R, E
- Hydrophobic
- Execute the program for the group of residues
C, V, I, L, M, F, Y, W, A, P
- Polar
- Execute the program for the group of residues
D, E, H, K, R, S, T, N, Q
- Charged
- Execute the program for the group of residues
H, R, K, D, E
- Charged+
- Execute the program for the group of residues
H, R, K
- Charged-
- Execute the program for the group of residues
D, E
- Aromatic
- Execute the program for the group of residues
H, F, W, Y
- Aliphatic
- Execute the program for the group of residues
V, L, I, A
This panel shows the appearance of each residue in the whole sequence.
Statistics are updated each time the user clicks on the 'check' button.
More frequent residues appear first. The user can choose, with the checkboxes
in the 'output format' panel to represent the amino acids by their
1-letter or the 3-letter code and to display the statistics by numbers or as
percentages.
Error messages are displayed in this panel after the user clicks on the
'submit' or 'check' buttons.
Explanation of the error messages:
- E01: The character [char] is not allowed in the sequence.
- The character specified cannot be used in a sequence. Allowed characters
are the twenty amino acid letters, the four DNA letters the 'X'
letter (to indicate an unknown residue), the gap symbol '-', the space
character, the carriage return and the tabulation.
If the option 'Accept unknown' is selected, then all the other letters
(lowercase and uppercase) are accepted and converted to an 'X'.
- E02: 'Start residue' must be an integer.
E11: 'Start residue' cannot be less than '1'.
- 'Start residue' allows to select only a part of the specified sequence.
It is the index of the beginning of this sub-sequence and only integer
values are possible.
- E03: 'End residue' must be an integer or '#'.
E12: 'End residue' cannot be greater than the length of the sequence.
E13: 'End residue' must be greater than 'Start residue'.
- Like 'Start residue', 'End residue' allows to select only a part of
the specified sequence.
It is the index of the end of this sub-sequence and only integer
values are possible.
But, unlike 'Start residue', a special character can be specified to
replace the last index of the sequence. '*' means that all residues
between the index of 'Start residue' to the end of the sequence
must be used for the calculation. An error is reported if 'End residue'
is less or equal to 'Start residue'.
- E04: 'Embedding size' must be an integer or '#'.
E16: 'Embedding size' must be a power of 2.
E14: 'Embedding size' cannot be less than the length of the selected part of the sequence.
E15: the maximum value for the 'Embedding size' is 4096.
- As specified in the documentation, 'Embedding size' indicates the
size of the array where the sequence is 'embedded'. This value must
be an integer, a power of 2, and must be greater than the length of the sequence.
The special character '#' is also allowed. It indicates that the program
must choose the smallest power of 2 that is greater than the length of the selected sequence.
The maximum value for this parameter is 4096.
- E05: 'Cutoff' must be a float.
- The cutoff represents the minimum intensity to be displayed.
It must be a floating point number.
- E06: 'Minimum period' must be a float.
E17: 'Minimum period' must be greater than '1'.
- This parameter represents the minimim value of the periodicities
presented in the result. It must be an integer greater that 1.
- E07: 'Maximum period' must be a float or '#'.
E18: 'Maximum period' cannot be greater than the length of the selected part of the sequence.
E19: 'Maximum period' must be greater than the 'Minimum period'.
- This parameter represents the maximum value of the periodicities
presented in the result. It must be an integer less that then length of
the selected part of the sequence and greater than the minimum period.
The special character '#' is also allowed. It indicate that the program
must use as maximum period half of the sequence length.
- E08: the weight specified for the '[char] residue' must be a float.
- The weights are used to create the pulse which will analysed.
All specified weights must be a floating point number.
-
- E09: the length of the sequence must not exceed 4096 characters.
- The maximum size of the array where the sequence in embedded is fixed
and is equal to 4096. A sequence with more than 4096 residue cannot be processed by
the program.
-
- E10: a sequence must be specified.
- This message indicates that the program fails to handle the sequence.
Check if the sequence in the text area is compatible with the format
specified.
-
- E20: URL of the sequence incorrect.
E21: Cannot access the sequence at the specified URL.
- This message indicates that the program fails to connect to the URL of the
sequence or to read data from this URL.
-