FT - Study of Residue Periodicities in Sequences
Pasquier, C.M., Promponas, V.I.,
Varvayannis, N.J. and Hamodrakas, S.J.

Index 2.Copyright notice 4. Description of the Output

3. Input description

The input form, which allows a user to specify the data needed by the program, is divided in seven panels (cf. fig 1.):


 
Sequence Specification
 
Statistics Output
Formatting
Shortcut Buttons
Selection of Residues Filter
Command
Errors

Fig. 1: Layout of the input form

3.1 Sequence Specification

A sequence can be either directly typed in the text area or pasted from a text editor. Alternatively, the user can specify the URL (with the form 'http://...') of a sequence accessible through the internet network.

The sequence can be given in FASTA, SwissProt or PDB format. The user should select the format used by checking the appropriate box in the format frame

As the PDB format may contain possibly more than one chain of residues, it is necessary to indicate which chain you want to analyse. This is done by entering the name of the selected chain (in the same format as in the PDB sequence) in the ChainID area. If the PDB sequence contains only one chain, then its identifier should be a space and you can leave this parameter blank. If no ChainID is specified and the PDB sequence contains more than one chain, then the first chain found is used.

For convenience, it is also possible to enter a sequence as plain text by using the 1-letter code of residues.

Capital letters must be used for proteins and lowercase letters for DNA sequences. The 'X' letter is used for unknown or not important residues. Spaces or other characters are allowed since they do not affect the results of the program. The maximum length of a sequence is 4096 residues.

By default, the whole sequence is analysed but it is possible to select only a part of the sequence by changing the two parameters below:

Start residue
This integer number indicate the position in the sequence of the first residue to be selected
End residue
This integer number indicate the position in the sequence of the last residue to be selected. The special value '#' for the end residue indicates that we want to select the sequence from the value specified for the start residue to the end.
These two values are inclusive, then the length of the selected sequence is equal to (End residue - Start residue + 1).
embedding size
The sequence is 'embedded' in an array of 0's for computational purposes. The size of this array must be a power of 2 and obviously it must be greater than the length of the selected part of the sequence. The size of this array affects the resolution of the results: a bigger array size gives a better resolution but, of course, it affects the speed of the computation. This number should not be greater than 4096.
The special character '#' means that the program must automatically choose the smallest power of 2 that is greater than the length of the selected sequence
Ignore gaps
By default, gaps are represented by the character '-' in sequences expressed with the 1-letter codes of residues. Gaps are considered as unknown residues by the program. The 'Ignore gaps' option can be checked to force the program to ignore gaps ant to handle them as space characters.
Accept Unknown
By default, unknown residues are specified with the letter 'X'. But, in some databases, some other letters are used to indicate an unknown residue. The Accept Unk. checkbox should be used in such cases to indicate that each unknown letter found in the sequence must be considered as an 'X'.

3.2 Selection of Residues

This panel display the names of all residues followed by a text area which represents the weight of the corresponding residue. The weights are used to create the pulse which will be analysed by the program. For example, by selecting the weight 1 for 'A' and 2 for 'L', the sequence MISLIAALAVD will be transformed into the array {0 0 0 2 0 1 1 2 1 0 0} by the program.

The weight of the residue could represent a special property of the residue (the charge for example) and can have a non-integer value. This, may allow a user to make an association with available tables of aminoacid/ base properties of residues (e.g a hydrophobicity scale) easily. Simply, he has to fill the text area for each aminoacid/base with the arithmetic value found in the relevant table of the property and then run the program. If a residue that does not appear in the sequence is weighted the results will not be affected.

3.3 Filter

In this panel, additional parameters may be specified. These variables are transmitted to the calculation program and act on the returned results.

Cutoff
The cutoff represents the minimum intensity to be displayed. The probability of finding by chance a periodicity with intensity I is equal to exp(-I).
Minimum period
This parameter represents the minimim value of the periodicities presented in the output. All periodicities lower that this limit are not displayed. The minimum period must be greater than '1'.
Maximum period
This parameter represents the maximum value of the periodicities presented in the output. All periodicities greater that this limit are not displayed. The maximum period must not exceed the length of the selected part of the sequence. The special character '#' means that the program must use as maximum period, half of the selected sequence length.

3.4 Output Format

The display of the output is controlled from this panel.

Stats in %
By default, the panel 'Statistics' displays the number of each residue in the sequence. Statistics can be displayed as percentages by checking this option.
3-letter
By default, amino acids are represented with their 1-letter code. By checking this option, one can force the program to display amino acids with their 3-letter code (correspondence table between 1-letter and 3-letter code).
HTML page / plain text
These checkboxes allow the user to choose how the results should be displayed. Choose 'plain text' if you want to copy and paste the results in a text editor or if you want to save it without all the HTML stuff. It should be noted that the graphical representation is not inserted in the plain text mode

3.5 Commands

This panel contains three buttons:

Submit
Check the data on the form, call the program and display the results. If errors are detected in the form, the data are not processed and comprehensive error messages are displayed in the frame panel.
Check
Check the data on the form but doesn't call the program. This button generates the display of the statistics, the length of the sequence and the minimum embedding size on the form.
Reset
Clear the form and redisplay default values for all the input fields

3.6 Shortcut Buttons

This panel contains eleven buttons:

The first one, called 'Most frequent' automatically searches for periodicities for all the residues in the sequence. Most frequent residues are examined first. The content of the result page depends of the choosen display mode (see above). If 'Plain text' is selected, then the outputs of all the searches are merged in an unique result page that you can save or copy in a text editor. If 'HTML page' is selected, then the result page for the most frequent residue in the sequence is displayed first. Result pages for the next most frequent residue(s) are obtained by pressing a button at the top of the page.

All the other buttons generate a search for a predefined group of residues:

a-helix
Execute the program for the group of residues A, V, L, I, F, W, D, E, Q, M, H, K
ß-sheet
Execute the program for the group of residues V, L, I, F, W, Y, T, C, Q, M
ß-turn
Execute the program for the group of residues G, P, D, N, S, C, K, W, Y, Q, T, R, E
Hydrophobic
Execute the program for the group of residues C, V, I, L, M, F, Y, W, A, P
Polar
Execute the program for the group of residues D, E, H, K, R, S, T, N, Q
Charged
Execute the program for the group of residues H, R, K, D, E
Charged+
Execute the program for the group of residues H, R, K
Charged-
Execute the program for the group of residues D, E
Aromatic
Execute the program for the group of residues H, F, W, Y
Aliphatic
Execute the program for the group of residues V, L, I, A

3.7 Statistics

This panel shows the appearance of each residue in the whole sequence. Statistics are updated each time the user clicks on the 'check' button. More frequent residues appear first. The user can choose, with the checkboxes in the 'output format' panel to represent the amino acids by their 1-letter or the 3-letter code and to display the statistics by numbers or as percentages.

3.8 Errors

Error messages are displayed in this panel after the user clicks on the 'submit' or 'check' buttons.

Explanation of the error messages:

E01: The character [char] is not allowed in the sequence.
The character specified cannot be used in a sequence. Allowed characters are the twenty amino acid letters, the four DNA letters the 'X' letter (to indicate an unknown residue), the gap symbol '-', the space character, the carriage return and the tabulation. If the option 'Accept unknown' is selected, then all the other letters (lowercase and uppercase) are accepted and converted to an 'X'.
E02: 'Start residue' must be an integer.
E11: 'Start residue' cannot be less than '1'.
'Start residue' allows to select only a part of the specified sequence. It is the index of the beginning of this sub-sequence and only integer values are possible.
E03: 'End residue' must be an integer or '#'.
E12: 'End residue' cannot be greater than the length of the sequence.
E13: 'End residue' must be greater than 'Start residue'.
Like 'Start residue', 'End residue' allows to select only a part of the specified sequence. It is the index of the end of this sub-sequence and only integer values are possible. But, unlike 'Start residue', a special character can be specified to replace the last index of the sequence. '*' means that all residues between the index of 'Start residue' to the end of the sequence must be used for the calculation. An error is reported if 'End residue' is less or equal to 'Start residue'.
E04: 'Embedding size' must be an integer or '#'.
E16: 'Embedding size' must be a power of 2.
E14: 'Embedding size' cannot be less than the length of the selected part of the sequence.
E15: the maximum value for the 'Embedding size' is 4096.
As specified in the documentation, 'Embedding size' indicates the size of the array where the sequence is 'embedded'. This value must be an integer, a power of 2, and must be greater than the length of the sequence. The special character '#' is also allowed. It indicates that the program must choose the smallest power of 2 that is greater than the length of the selected sequence. The maximum value for this parameter is 4096.
E05: 'Cutoff' must be a float.
The cutoff represents the minimum intensity to be displayed. It must be a floating point number.
E06: 'Minimum period' must be a float.
E17: 'Minimum period' must be greater than '1'.
This parameter represents the minimim value of the periodicities presented in the result. It must be an integer greater that 1.
E07: 'Maximum period' must be a float or '#'.
E18: 'Maximum period' cannot be greater than the length of the selected part of the sequence.
E19: 'Maximum period' must be greater than the 'Minimum period'.
This parameter represents the maximum value of the periodicities presented in the result. It must be an integer less that then length of the selected part of the sequence and greater than the minimum period. The special character '#' is also allowed. It indicate that the program must use as maximum period half of the sequence length.
E08: the weight specified for the '[char] residue' must be a float.
The weights are used to create the pulse which will analysed. All specified weights must be a floating point number.
E09: the length of the sequence must not exceed 4096 characters.
The maximum size of the array where the sequence in embedded is fixed and is equal to 4096. A sequence with more than 4096 residue cannot be processed by the program.
E10: a sequence must be specified.
This message indicates that the program fails to handle the sequence. Check if the sequence in the text area is compatible with the format specified.
E20: URL of the sequence incorrect.

E21: Cannot access the sequence at the specified URL.
This message indicates that the program fails to connect to the URL of the sequence or to read data from this URL.