Alpha Fold: Use artificial intelligence to understand life

  On July 15, two top international academic journals, “Nature” and “Science”, published research results at the same time, proving that artificial intelligence software can sequence protein structures quickly and accurately. Among them, the Alpha Fold (Alpha Fold) developed by the British “Deep Thinking” company has made amazing achievements in 2020, and now it has been upgraded to Alpha Fold 2, which should be enough to make human beings cheer and encourage. However, unfortunately, absolutely Most people don’t know what Alpha Fold is.
  Simply put, it may be easier to understand Alpha Fold as an upgraded version of Alpha Go and its application in life sciences, or as the “back wave” of Alpha Go. Alpha Dog became famous because he defeated Lee Sedol, Korea’s top professional Go player in 2016, and Ke Jie, who defeated the world’s No. 1 Go player China in 2017. However, the achievements of Alpha Folding may take time to become known to the world.
Why is it important to determine protein structure

  Alpha Fold is also an artificial intelligence (AI) software, its biggest role is to quickly and accurately determine the shape of the protein, especially the 3D shape.
  The essence of life is protein, and protein is composed of polypeptide chains composed of amino acids in a certain order. Moreover, their structure ranges from one-dimensional (amino acid sequence), two-dimensional (distance), and then to three-dimensional (coordinates). It can be folded into various exquisite shapes to complete various functions and play an important role.
  A large number of diseases are inextricably linked to the folding shape of proteins. For example, the folded form of the spike protein (S protein) of the new coronavirus determines its speed of invading human cells and its ability to cause disease. Similarly, the folded form of Prion protein also determines the pathogenicity and lethality of infectious spongiform encephalopathy in mammals including humans.
  In theory, a protein can be folded in countless ways from one to three dimensions. As early as 1969, American molecular biologist Leventhall pointed out that because proteins have great degrees of freedom in unfolded polypeptide chains, any protein molecule has an astronomical number of possible conformations, the number of which is 3 to 300 times. Square to 10 to the 143th power configuration. Coupled with mutation, some proteins have more configurations, such as the mutation of the new coronavirus S protein.
  It takes a lot of time and energy to understand and accurately determine the configuration of a protein, and at the same time, it may not be accurate. This also results in painstaking efforts in the development of drugs and vaccines and disease treatment. For example, although there is a vaccine for new coronary pneumonia, the virus protein mutates frequently. If the mutation structure of the protein cannot be accurately recognized, it will be difficult to develop new vaccines and obtain effective drugs. So far, there is no effective drug for the treatment of new coronary pneumonia. The reason is also the unclear understanding of the structure of the virus protein.
  That being the case, letting AI help people understand and accurately determine the structure of proteins is of great significance and very practical. Although humans have calculated the amino acid sequences of billions of proteins of humans and other species, so far, only about 100,000 of them have been analyzed by experimental methods.
  In the 14th “Critical Evaluation of Protein Structure Prediction” (CASP14) contest to be held from May to July 2020, Alpha Fold 2 shines. The competition requires the participating teams to analyze the structure of the protein based on its amino acid sequence. The protein used in the game will first be analyzed by experimental methods, and the specific results will not be made public-it sounds a bit like a game between Alpha Dog and Li Shishi or Ke Jie.
  As a result, the structure of most proteins determined by Alpha Fold 2 is very accurate. It is not only as accurate as the protein structure measured by experimental methods, but also far superior to other methods for analyzing new protein structures. The median distance (95% coverage) between the superimposed atoms that make up the backbone of the protein backbone determined by alpha-fold 2 is 0.96 angstroms (0.096 nanometers), and the second-ranked method can only achieve an accuracy of 2.8 angstroms. .
  This means that Alpha Fold 2 defeats all other ways of determining protein structure. Moreover, the neural network of Alpha Fold 2 can predict the structure of a typical protein in a few minutes, and can generate a high-precision structure in a few days.
From Alpha Fold to Rose Fold

  The accurate determination of protein structure by Alpha Fold 2 certainly comes from training and deep learning. The training data comes from approximately 170,000 protein structures, as well as large databases and neural network model structures containing protein sequences of unknown structures. Among them, the model operates on protein sequences and amino acid residue pairs, and iteratively transmits information between the two representations to generate structures. Therefore, like Alpha Dog, Alpha Fold 2 requires deep learning to accurately determine protein structure.
  However, the advantage of Alpha Fold 2 over Alpha Dog is that this type of AI software has formed group operations and has more varieties of new technologies. For example, the rose fold developed by the Washington University School of Medicine in the United States. As mentioned at the beginning of this article, Alpha Fold 2 was first published in Nature recently; at the same time, Rose Fold was published in Science.
  Rose fold uses deep learning to quickly and accurately predict protein structure on ordinary game software with limited information, and to construct complex biological models in a short period of time. The rose fold is a “three-track” neural network that can take into account protein sequence patterns, how amino acids interact, and the possibility of three-dimensional protein structures. In this template, protein information flows back and forth between one, two, and three dimensions to infer the relationship between the chemical part of the protein and the folded structure.
  In comparison, the 3D structure of the rose fold predicted protein is almost equivalent to the level of alpha fold 2, and it is faster and requires less computer processing power, so it may be more practical. The University of Washington research team has used rose folds to calculate hundreds of new protein structures, including many little-known human genomic proteins, such as protein structures related to lipid metabolism problems, inflammatory disorders, and cancer cell growth.
  There are thousands of human proteins, and there are as many as billions of proteins from other species, including proteins from bacteria and viruses. In the past, humans could only use cryo-electron microscopy (CryoEM), magnetic resonance (MR), and X-ray crystallography to determine the structure of proteins, and it took a lot of trial and error to finally determine the structure of the protein. The structure of some proteins cannot be detected. For example, in the protein database (PDB), there are 4 kinds of proteins that cannot be determined by magnetic resonance, including bovine glycine N-acyltransferase, bacterial oxidoreductase, bacterial surface layer protein (SLP) And secreted proteins from the fungus Phanerochaete genus Chrysosporium.
  Now, with alpha fold 2 and rose fold, protein structure determination is much easier. This also provides cutting-edge weapons for uncovering the phenomenon and essence of life, as well as for the development of countless drugs, vaccines and therapies to combat diseases. Over time, Alpha Fold 2 and Rose Fold will make greater contributions to mankind than Alpha Dog, allowing us to have more powerful means to understand the nature of life.