Regenesis Page 8
Ideally, for doing metabolic engineering—making exotic biomaterials, pharmaceuticals, and chemicals—we would have all possible enzymes encoded in one master genome and we’d be able to turn their expression on and off, up and down, like a light switch, as needed for various tasks. So instead of a minimal genome, we want a maximal genome—like the vast online shopping world as opposed to the bare survival backpack. We call this collection of nearly all enzymes E. pluri, as in E. coli meets E pluribus unum. “From many states to one nation” becomes, in the world of genome engineering, “From many enzyme genes to one integrated, customizable, all-purpose biofactory.”
While natural free-living cellular genomes range in size by a factor of a 150,000-fold difference—from 580 kbp (M. genitalium) to 90 billion bp (trumpet lily)—a rule of thumb is that a cell can double its genome size without any major reengineering. A cell can (and often does in the wild) double its own size, as well as double the size of its genome, without any help from human genome engineers. A cell can do this on its own by means of a variation on the process whereby a genome duplicates itself before the cell divides. If the cell wall misses its part of the division dance, the cell finds itself with twice the normal DNA content. This generally means a bigger cell that can be useful for intimidating or eating smaller cells. Later on it can be useful in providing vast resources for making new gene functions without hurting the old gene functions, since you now have two copies of each gene. When this is done by genome engineers they are simply taking advantage of the natural flexibility of cellular processes that have evolved across billions of years.
So, E. pluri might be able to accommodate 4,000 new genes encoding enzymes (each about 1 kbp long). To give a feeling for how huge this is, the full set of known enzymes not already in E. coli (and not counting minor variants) is 7,000. In much larger genomes the number of genes does not go up directly with total genomic bulk; this phenomenon is formally known as the C-value paradox, which states that the size of a genome does not correlate with the size or complexity of the organism, due to the presence of junk DNA. Possibly useful junk, but not all that useful.
It’s an open question whether synthetic biology can overcome this limit to growth in complexity (1,000-fold increase in genome size from mustard to lily with no increase in gene number) if this is a trend in reality. As we head off into these complex cellular systems, we must reevaluate, for example, how to integrate multiple new amino acids into an already complex mixture. This integration requires freeing up some of the sixty-four codons (i.e., eliminating existing uses of those essential for life) and replacing them with orthogonal instances of bonding nonstandard amino acids to tRNAs. (“Orthogonal” in this setting means highly reactive in making one kind of bond without getting engaged by any of the other bonds.)
The in vitro minigenome depends on the bonding of each tRNA with a specific amino acid using the simple 47-mer ribozyme depicted in Figure 3.1, but this process requires heavy human involvement in the lab and twenty separate reaction tubes. A normal cell has twenty synthetase enzymes, one for each of the twenty amino acids. Rarely does the wrong amino acid get bonded to a tRNA. The number of nonstandard amino acids (NSAAs) now exceeds seventy-five (largely from Peter Schultz and his students). The main barrier to expanding beyond the twenty standard amino acids is that all sixty-four codons already have an essential function and many have two such functions (e.g., overlaps of codons recognized by tRNAs; see Figure 3.2). For example, the anticodon for one of the tRNAs for valine (V) specifically binds to the mRNA codons GUU, GUA, GUG, and the other valine tRNA binds to GUU and GUC (in order to add a valine to the growing protein chain). So, even if we deleted the second tRNA, we would only free up GUC because GUU is still in use by the first tRNA.
The number of new synthetases (for bonding amino acids to the correct tRNAs) characterized so far is nine. These are intended to be specific for making only one kind of NSAA bond to one kind of new tRNA, with close to zero effect on any of the other NSAAs or standard twenty amino acids, but nothing is perfect. Even the standard twenty synthetases make mistakes at about 1 in 100,000 tries—and the use of all twenty-nine at once in the same cell is not yet fully tested. These NSAAs enable new chemical reactions specific to that NSAA without reacting with any other cellular molecules. Each new reaction chemistry was probably a really big deal evolutionarily (considering the amount of energy that, over the ages, nature devoted to evolving from four bases to twenty amino acids), and a big deal today too, as these chemistries will enable new routes to microbial drug manufacturing, new safety features in biotechnology, and so on.
But beyond RNA and proteins there’s yet another innovation possible in the realm of protein or RNA backbones. Backbones are the constant parts of the protein or RNA chemistry while the side chains (groups of molecules attached to the backbones) provide the variation (the spice of life). Normal (meaning naturally occurring) backbones consist of alternating sugar and phosphate molecules bonded together in an extended chain. Can we utilize totally new backbones? You bet! Another key polymer backbone is the polyketides—polymers of monomeric units consisting of two carbons plus one oxygen (and various derivatives of this) found in lipids and antibiotics.
Like RNA and proteins, the polyketide polymer is colinear with the basic information in DNA, meaning that the order of base pairs in the DNA determines the order of monomer type in the other polymers. The chemotherapeutic drug Doxorubicin is biosynthesized as a 10-mer (10 ketones in a linear polymer) with the sixth position modified from a ketone to an alcohol. In going from DNA to protein, the number of letters drops by threefold (e.g., UUU becomes F [phenylalanine]). In principle, we could make a ribosome that translates from mRNA directly to polyketides (instead of directly to polypeptides), but in nature this is accomplished in a more indirect manner by going from polynucleotide (mRNA) to polypeptide (enzyme) to polyketide. Each ketide monomer addition step requires its own enzyme domain (and these domains are each about 500 amino acids long), so the number of letters gets compressed down another 500-fold. This may seem verbose, but it gets the job done. Still, the possibility of constructing a more general and less verbose version on the ribosome is being explored in my lab.
Polyketides turn into long carbon chains (like fatty acids) and flat aromatic plates (like the tetracycline family of antibiotics). Long carbon chains and aromatics are the key components of diamonds and graphene, which have some truly amazing material properties—for example, single-electron transistors, and/or the strongest and stiffest, have the highest thermal conductivity of any materials yet discovered (as recognized by the 2010 Nobel Prize for Chemistry). This class of biopolymer has many uses already when harvested from cells, and has even more profound applications in the future in biological and nonbiological systems as we get better at designing, selecting, and manufacturing this class of molecules.
Figure 3.2 The translational code. Four RNA bases (U,C,A,G) make sixty-four triplets (3 base sequence in order from the innermost ring). 4*4*4 = 64. Each triplet encodes one of the twenty amino acids (they appear in the outermost ring: A,C,D,E,F,G,H,I,K,L,M,N,P,Q,R,S,T,V,W,Y). In addition, a few of the triplets encode three stop codons (UAA, UAG, UGA), which tell the ribosome to stop synthesizing whatever protein molecule is currently being made. These stop codons are often called by their historic names ochre, amber, and umber, and are abbreviated here B, O, and U, respectively. (O and U codons encode the less common amino acids pyrrolysine and selenocysteine, respectively, in some organisms.) In the early 1960s, members of the Steinberg, Epstein, and Benzer labs at Caltech were studying viruses that infect some strains of E. coli and not others. These researchers offered to name any mutant that might have these amazing properties after the one who found it. It just so happened that a student named Harris Bernstein (who normally worked on fungi, not viruses) found it. “Bernstein” in German means “amber,” and so they named the UAG stop codon amber. They later dubbed the other two stop codons ochre and umber to maintain the color theme. To
let everyone know how hip you are, you can purchase (at thednastore.com) a bumper sticker that says “I Stop for UAA.”
So what have we learned in this discussion of the mirror world, the genetic code, and the generation of diversity? Basically, the following lessons: (1) It is indeed possible to create mirror amino acids and mirror proteins with predictable (small- and large-scale) properties. (2) With the addition of more work and more parts, we can create a fully mirror biological world. (3) These, along with new amino acids and backbones, can then give us access to an entire new world of exotic biomaterials, pharmaceuticals, chemicals—and who knows what else.
How Fast Can Evolution Go?
As we continue our crusade for increased diversity, we must now put the essential question: How fast and how diverse can evolution be made to go? We can increase diversity and replexity by adding genes and by adding needed polymer types, including but not limited to mirror versions of standard chemical reaction types and totally new reaction types, but how does evolution scale up to make the truly marvelous functional diversity in the world? The answer lies in the key parameters involved in maximizing the rate of evolutionary change: population and subpopulation sizes, mutation rate, time, selection and replication rates, recombination (the rearrangement of genetic material), multicellularity, and macro-evolution (evolutionary change that occurs at or above the level of the species).
Worldwide we have a steady state of 1027 organisms and a mutation rate of 107 per base pair per cell division. The Cambrian explosion happened in the brief period from 580 to 510 million years ago, when the rate of evolution is said to have accelerated by a factor of ten as seen in the number of species coming and going, and as defined by fossil morphologies. The rearrangement of genetic material (recombination) occurs at a rate of about once or a few times per chromosome in a variety of organisms. On a lab scale we typically are limited to 1010 organisms, and less than a year for selection, so we are at a 1017 and 7x108-fold disadvantage, respectively, compared to what occurs in nature. So if we want to evolve in the lab as fast as nature did it during the Cambrian, or even faster, then we have to make up for these disadvantages with higher mutation, selection, and recombination rates.
The fastest lab replication times for a free living cell are held by E. coli and Vibrio, which clock in at about eleven minutes and eighteen minutes per replication, respectively, speeds that are probably limited by the rate at which a ribosome can make another ribosome. In principle, even though a bacterial virus or phage might take sixteen minutes to complete an infection cycle, the burst size (in terms of the number of virus [phage] babies released when the host cell bursts) is 128, and so the doubling time of the virus is 16/7 = 2.3 minutes (since 27 = 128).
Other replication speed demons are flies (midges and fruit flies) due to their boom and bust lifestyle. Flies crawl around starving. Then suddenly a piece of fruit appears, and the flies that manage to convert that piece of fruit into fly eggs fastest win. The best flies lay an egg every forty minutes. But even more impressive, some of its genomes—indeed whole nuclei (which look a bit like cells surrounded by a nuclear membrane)—can divide in six minutes, beating even E. coli, but only by cheating, since whole nuclei depend on prefabricated ribosomes lurking in the vast resources of the newly fertilized fly egg.
The limiting factor on mutation rate is the finite size of the population in question and the deadly consequences of mutations hitting positions in the genome that are essential for life. Some viruses are highly mutable; for example, lentiviruses such as HIV have mutation rates as high as 0.1 percent per replication cycle. This is possible because their small genome of 9,000 base pairs would have on average one (or a few) serious changes and some will have zero. In addition, sharing of genomic material can occur between two adjacent viral genomes that are dysfunctional due to different mutations. In contrast, with 300,000 base pairs that matter per E. coli genome, and probably three times that for humans, the number of errors per base pair per division must be close to one per million (and can get better than one per billion).
How far could we push this if we could only mutate the nonessential or, better yet, the most likely to be useful bases, in order to succeed in our quest to turbo-charge the rate of evolution of organisms? If we had forty sites in the genome for which we would like to try out two possible variants in all possible combinations, then that would require a population at least 240 = 1012 (a trillion) cells just to hold all of the combinations at once. Those cells would all fit in a liter (≈ quart). This would correspond to a mutation rate of forty genetic changes (rather than one) per genome per cell division. We could get away with fewer by spreading them over time or if the selection is additive. (“Additive” means that each change has some advantage and the order of change doesn’t matter much.) So this additive scenario provides an alternative to exploring all 240 special combinations at once. If we don’t need to explore every combination exhaustively but want the highest mutation rate, that rate could be (theoretically) millions per cell per generation, depending on the efficiency of synthesizing and/or editing genomes (described below).
My gut feeling (by no means proven) is that, despite limitations of space and time, we humans can suddenly start to evolve thousands of times faster than during the impressive Cambrian era, and that we can direct this diversity toward our material needs instead of letting it occur randomly.
The Real Point of Reading Genomes: Comparative and Synthetic Genomics
All these changes and innovations lie in the future. In the nearer term we will reap the benefits not only by manipulating genetic codes and mutating genomes but by reading them.
The object of the Human Genome Project (unsung predecessor to the well-known Personal Genome Project) was not, ironically, to read a real genome. Its goal (and final result) was to sequence a composite genome of several individuals, a veritable MixMaster blend of humanity. Furthermore, the genome that was actually sequenced was riddled with hundreds of gaps. While it was a historic milestone of science, it was nevertheless mostly symbolic—like the moon landing—and had relatively little value in practical, personal, or medical terms.
But supposing that we had sequenced a number of whole, intact, and genuine human genomes, the real point in reading them would be to compare them against each other and to mine biological widgets from them— genetic sequences that performed specific, known, and useful functions. Discovering such sequences would extend our ability to change ourselves and the world because, essentially, we could copy the relevant genes and paste them into our own genomes, thereby acquiring those same useful functions or capacities.
A functional stretch of DNA defines a biological object that does something—and not always something good. There are genes that give us diseases and genes that protect us from diseases. Others give us special talents, control our height and weight, and so forth, with new gene discoveries being made all the time. In general, the more highly conserved (unchanged) the RNA or protein sequence, then the more valuable it is in the functions that it encodes, and the farther back in time it goes. The most highly conserved sequences of all are the components of protein synthesis, the ribosomal RNAs and tRNAs that transport amino acids from the cytoplasm of a cell to the ribosome that then strings the amino acids together into proteins. Even though these structures are hard to change evolutionarily, they can be changed via genome engineering in order to make multivirus-resistant organisms (Chapter 5), and by using mirror image amino acids, they can be changed to make multi-enzyme-resistant biology (Chapter 1).
Today it is possible to read genetic sequences directly into computers where we can store, copy, and alter them and finally insert them back into living cells. We can experiment with those cells and let them compete among themselves to evolve into useful cellular factories. This is a way of placing biology, and evolution, under human direction and control. The J. Craig Venter Institute spent $40 million constructing the first tiny bacterial genomes in 2010 without spelling out the reasons for do
ing so. So let’s identify these reasons now, beginning with the reasons for making smaller viral genomes.
The first synthetic genome was made by Blight, Rice, and colleagues in 2000. They did this with little fanfare, basically burying the achievement in footnote 9 of their paper in Science (“cDNAs spanning 600 to 750 bases in length were assembled in a stepwise PCR assay with 10 to 12 gel-purified oligonucleotides [60 to 80 nucleotides (nt)] with unique complementary overlaps of 16 nt”).
The authors had in fact synthesized the hepatitis C virus (HCV). This virus, which affects 170 million people, is the leading cause of liver transplantation. Its genome is about 9,600 bases long. The synthesis allowed researchers to make rapid changes to discover which of them improved or hurt their ability to grow the various strains in vitro (outside of the human body), which was a big deal at the time.
In 2002 Cello, Paul, and Wimmer synthesized the second genome, that of polio virus. This feat received more press coverage even though the genome was smaller (7,500 bases) and the amount of damage done to people per year was considerably less than that done by HCV (polio is nearly extinct). This heightened awareness was due in part to moving their achievement into the title: “Generation of Infectious Virus in the Absence of Natural Template.” One rationale for the synthesis was to develop safer attenuated (weakened) vaccine strains.
In 2003 Hamilton Smith and coworkers synthesized the third genome, that of the phiX174 virus, in order to improve the speed of genome assembly from oligonucleotides. This exercise received even more attention, despite the fact that the genome was still smaller (5,386 bases) and didn’t impact human health at all.