Irc 71954 1308391776 mimi

Mimi's solution to the Easy Mini Freestyle protein, from Puzzle 430 in 2011.

Easy Mini Freestyle is a beginner puzzle, an example of a de-novo puzzle.

A de-novo puzzle involves a protein with an unknown shape. The protein is presented as a straight extended chain.

The Easy Mini Freestyle puzzle starts with no secondary structure assigned. This is different than most de-novo puzzles in Foldit, which start with a secondary structure predicted by an automated method.

The "mini" part of the puzzle's name means that this is a relatively short protein, with only 49 segments.

The goal of this puzzle is simply to find the best possible fold for this protein. The "freestyle" in the name means that you can use whatever tools you think will work best.

The "easy" part of the name is not so clear. This protein has been available in Foldit since at least 2011, and presumably scientists have known about it for longer. Despite this, the protein has never been solved.

Tools and strategiesEdit

De-novo puzzles generally start with the sidechains curled toward the backbone. This tends to make many of the sidechains look like proline. Simply shaking the protein at the start for a cycle or two unfolds the sidechains and moves the score into positive territory.

This puzzle starts without any secondary structure, meaning all segments are marked as loop. Looking at the sidechains can give clues about the secondary structure.

Foldit divides the amino acids in a protein into hydrophobics and hydrophilics. The hydrophobic segments should be in the core of the protein as much as possible -- "hide the hydrophobics". Hydrophilic segments tend to be found on the surface of the protein, especially the long ones, lysine and arginine.

Easy Mini Freestyle

Easy Mini Freestyle. In segments 9-19, hydrophobic segments alternate with hydrophilic segments, indicating a possible sheet.

The amino acids glycine and proline often indicate a spot where the protein bends or curves. They not usually found in the middle of a sheet or helix, although they're sometimes at the ends. Proline and glycine are most commonly found in the sections of loop between sheets and helixes.

In this puzzle, segments 9 through 19 offer some clues. First, segments 9 and 10 are glycine. Glycine is very flexible, so two glycines in a row often means a sharp turn in the direction of the backbone.

(The puzzle starts with segment 1 to the right. If you hover over a segment and hit the tab key, the segment information window appears, which gives you the segment number.)

Following the two glycines, you may notice that there are three short hydrophobics -- segments 14, 16, and 18 -- alternating with three long hydrophilics -- segments 13, 15, and 17. One possibility is that this section should be a sheet. The sheet could be aligned with the hydrophobics pointing toward the core of the protein, and the hydrophilics on the surface.

The same method can be applied to the rest of the protein, but there aren't any more glycines, just a proline two segments from the end. In segments 36 through 42, there are there phenylalanines on one side (at 36, 38, and 42), so this might once again be part of a sheet.

There are also various secondary structure prediction tools available online. These tools use the primary structure, the sequence of amino acids in the protein to predict the secondary structure.

(Use of these prediction tools seems to fall within the Foldit Community Rules, which restrict certain uses of outside information.)

Since they generally involve unsolved proteins, de-novo puzzles can be quite challenging. There are several strategy pages on this wiki which give detailed examples:

There's a video Black-belt folding: de-novo

See Puzzle 427 and Puzzle 430 for player solutions to early versions of this puzzle.

Technical stuffEdit

The structure of this protein has never been published in the Protein Data Bank, which puts it in the "unsolved" category.

The Jpred secondary structure prediction tool finds several similar sequences in the UniProt database. One of these matches, available as id UPI00032AE178, is part of a larger sequence associated with Ochotona princeps, the Southern American pika, a member of the rabbit family. (Or, a "lagomorph".)

The protein has been identified by looking at DNA, which gives the primary structure, but doesn't reveal anything about the protein's final shape.

The pika protein has a WD40 domain, which tends to from a beta-propeller in large numbers. It's not known if this WD40 keeps the pika from squeaking. This particular part of the protein does not contain the WD40 domain or its structure would be obvious; it is however known as the Wdr18 C-terminus domain due to it being a common ending for some WD40-containing proteins. You can more or less figure out how important each residue is for the structure by looking at the domain's HMM logo, which shows the frequency of each amino acid for a position.

Community content is available under CC-BY-SA unless otherwise noted.