Foldit Wiki
Advertisement

FASTA format is a file format used to represent proteins and DNA as single-letter codes. The format was originally developed as part of FASTA, a computer program developed in the 1980s for searching for proteins and DNA.

The FASTA format is now widely used by other programs. Several Foldit recipes use a variation of this format to represent the amino acids that make up the primary structure of a protein.

Here's an example of a FASTA-format file, taken from the PDB entry identified as 1HA8:

>1HA8:A|PDBID|CHAIN|SEQUENCE
GECEQCFSDGGDCTTCFNNGTGPCANCLAGYPAGCSNSDCTAFLSQCYGGC

The first line is a header or comment which helps identify the contents of the file. The second line, beginning with "GECEQ", is the primary structure of the protein, identifying the amino acids by their one-letter codes. (See amino acids for the codes.)

In Foldit, the internal functions use lowercase for amino acid codes. The Foldit version of the sequence for 1HA8 would be:

geceqcfsdggdcttcfnngtgpcanclagypagcsnsdctaflsqcyggc

Many online resources accept a sequence in this format, and don't require a header line.

The original FASTA format suggests using lines of 80 characters or less when listing the sequence. The PDB and some other sources may use this format. In Foldit, it's more common to list the entire sequence as one long line.

Some of the Foldit recipes use this modified FASTA format include:

The full version of the FASTA format includes other features, such as the ability to include multiple sequences in a single file. In this scenario, each sequence has its own identifying header line. This feature is not currently used in any Foldit recipes.

In addition to the greater than sign ">" used to identify the header line, a semicolon ";" can be used to identify a comment line. Anything after the ";" is to be treated as a common. In some cases, the ";" may appear at the start of the header line. Again, no Foldit recipes use these features.

See also:

Advertisement