Regenesis Read online

Page 9


  In 2008 the causative agent of SARS (severe acute respiratory syndrome), a coronavirus, was made from scratch in order to get access to the virus when the original researchers refused to share samples of it, perhaps under the mistaken impression that synthesis of the 30,000-base pair genome would provide an unacceptable barrier to their competitors. The researchers created the synthetic SARS virus anyway, for the purpose of investigating how the virus evolved, to establish where it came from, and to develop vaccines and other treatments for the disease it caused.

  It turned out, however, that the synthetic versions of the SARS virus didn’t work due to published sequencing errors. When the errors were discovered and corrected, the synthetic viral genome did work, and was infective both in cultured cells and in mice.

  So, in summary, the reasons for synthesizing virus genomes are to better understand where they came from, how they evolved, and to assist in the development of drugs, vaccines, and associated therapeutics. Why did the JCVI spend $40 million making a copy of a tiny bacterial genome? Some of this is attributable to the cost of research—full of dead ends and expensive discoveries. The other factor was that the core technologies used were first-generation technologies for reading and writing DNA, provided in the form of 1,000 base pair chunks by commercial vendors at about $0.50 per base pair plus similar costs for (semiautomated) assembly. So to redo a 1 million base pair genome would cost $1 million, while synthesizing a useful industrial microbe like E. coli would cost $12 million. But using second generation technologies for genome engineering (see below), not just one but a billion such genomes can cost less than $9,000. This is done by making many combinations of DNA snippets harvested from inexpensive yet complex DNA chips.

  Most fundamentally, the object of synthesizing genomes is to create new organisms that we can experiment with and optimize for various narrow and targeted purposes, from the creation of new drugs and vaccines to biofuels, chemicals, and new materials.

  In the quest to engineer genomes into existence, my lab mates and I have developed a technique called multiplex automated genome engineering (MAGE). The kernel of the technique is the idea of multiplexing, a term derived from communications theory and practice. It refers to the simultaneous transmission of several messages over a single communication channel, as for example through an optical fiber. In the context of molecular genetics, multiplexing refers to the process of inserting several small pieces of synthetic DNA into a genome at multiple sites, simultaneously. Doing this would make it possible to introduce as many as 10 million genetic modifications into a genome within a reasonable time period.

  Here’s how it works. To modify a genome, MAGE uses the smallest bits of DNA that are unique in the genome to be changed. The smallest unique pieces are theoretically around 12-mers for 5 million base pair bacterial genomes, since the number of ways that you can arrange four bases (A, C, G, T) into words twelve base pairs long is 412 = 16 million. In practice, the optimum seems to be 90-mers. We get these 90-mers into cells by means of a short, 2,500-volt electrical pulse. We chemically armor the ends of the 90-mer against enzyme damage in the cell, coat them with a protein that enhances pairing, and then let these pieces sneak into the gaps wherever the genome is replicating. Among other tricks, we use multiple 90-mers and multiple cycles. This collection of lab shortcuts and other techniques raises the chances of getting the desired genomic mutations from practically infinitesimal (one in a million) to very common (several per hour per cell). So, instead of assembling small DNAs into a huge piece of DNA, at great cost and with multiple errors, we use many small bits in parallel, a process that allows us to test several combinations for the purpose of seeing which work well and which are deleterious to the cell.

  The other technique that greatly enables MAGE is selection, which is of course the key mechanism propelling Darwinian evolution. Selection in this context means rapidly separating out cells (or molecules) that have the more extreme values of the desired properties from all the rest. “Give me a fulcrum and I will move the world,” from Archimedes, gets updated in the synthetic genomics lab to, “Give me a selection, and I will change the world.”

  We can easily select for leaving behind babies, since that is the best-tested selection paradigm of all time (tested by 1027 organisms for 3 billion years each). The players who leave behind the most surviving grandchildren tend to dominate subsequent history. This is of course a selection for all of the processes required for replication, not just the properties that we want to enhance. But frequently human engineers want to select for some property other than replication—for example, for overproducing milk in cows or producing insulin in bacteria, or artemisinin (an anti-malarial precursor) in yeast.

  Given that we want to engineer a whole genome, which is better? (1) Synthesize all of it, all at once, with possibly hundreds of design flaws to debug simultaneously? Or (2) make targeted changes combinatorially, dividing and conquering each bug swiftly and in parallel (and harnessing lab evolution). The Church lab voted strongly for (2).

  Summing up, genome engineering already embraces much more than just genome copying, and much more than even the mature engineering disciplines (i.e., design and fabrication). Genome engineering has evolution on its side—not slow evolution but intelligently designed, fast evolution to create a range of special-purpose organisms that can produce desired substances.

  Symbiosis, Morphology, and the Cambrian Diversity Explosion

  So now let’s suppose that we have created entire new genomes, and perhaps even some new single-cell organisms. The next step, obviously, is multicellularity.

  One of the first signs of multicellularity, even before cells clump and differentiate, is sex. Exchanging genetic material allows for highly parallel discovery of new genomic sections. For example, if it takes a day for a bacterial cell to replicate in the wild, it will take thirty days for a mutant cell to replace a population of about a billion cells, and it will take 100,000 years to change 20 percent of a bacterial genome (a working definition of a new species). This assumes that selection pressure is fairly steady in one direction and that there is a constant stream of beneficial mutations. In practice a significant fraction of the evolutionary steps toward that 20 percent are neutral, neither enhancing nor harming the organism, and the number of setbacks are such that a million years is not unusual for the full appearance of new species. If new favorable mutations can combine by means of sex, then the time spent sweeping away the competing population is saved (keeping in mind that sex can separate favorable alternative forms of a gene as well as bring them together).

  To drive all this toward its limits, we are using a second lab technique—conjugative assembly genome engineering (CAGE)—selectively mating two bacterial strains in order to combine parts of their genomes. If MAGE is a means of breaking up a long piece of DNA and making small changes in each part, CAGE is a means of putting the changed parts together again. This allows us to divide and conquer the E. coli genome, for example, into thirty-two pieces. We first apply the MAGE technique, focusing on only 140 kbp in each of thirty-two separate strains in parallel. We then mate strain 1 and 2, bringing together adjacent sectors in the genomes of the strains. Similar matings bring together sixteen even-numbered sectors with the adjacent sixteen odd-numbered sectors, thereby reducing the number of strains by a factor of two. Then we repeat the process to get down to eight strains, then four, then two, and then finally one bacterial strain.

  Figure 3.3 Divide and conquer with MAGE and CAGE. For example, to change all UAG stop codon targets (322 total in the E. coli genome) into the functionally identical codon UAA, we divide this task into thirty-two sectors of the genome pie (far left) and design thirty-two pools of ten oligos each (second column) that can change ten codons in each sector (third column). By selective mating we merge adjacent pairs of sectors (e.g., 1 and 2) resulting in sixteen bacterial strains, each with a double-size sector (fourth column) in the genome of that strain. This process of merging adjacent sectors contin
ues to eight, then four, then two (columns 5 to 7); the final mating results in one strain that has all of the UAG codons changed to UAA, as was planned.

  A second advantage of multicellularity is cross-feeding, where one cell type can focus on one type of food chemistry and can barter its specialty food for a different food from an adjacent cell. You can divide your genome over two or more cell types and hence save space, modularize and parallelize selection, and quickly benefit from sharing the multitude of tasks.

  A third advantage of multicellularity is shape selection. Single cells have some diversity in morphology, but their limit becomes rapidly evident. For example, making a cell that is as hard as a tooth doesn’t automatically confer chewing ability, since that requires structures larger and more complex than a cell. How to get from the straight (one-dimensional) DNA to complex three-dimensional dynamic shapes is still mysterious. But, as always, synthesis offers us both a path to discovery and a rigorous assessment of progress toward understanding the processes involved.

  Figure 3.4 Basic DNA origami. One long (single-stranded) DNA circle is represented as a twelve-hour clock face on the left. The addition of two (single-stranded) staple oligos connect (via base-pairing each segment of the circle) 11 to 1 and 9 to 3.This results in a shape resembling a Dali-esque odalisque with a double-stranded collar and belt.

  Between 1977 and 1981, based on the first folded RNA structure (tRNA, Figure 3.1), Ned Seeman and I invented ways to design and establish morphology from the basic base pairing rules of RNA and DNA (G with C, and A with U/T, Figure 1.4). We can now design and build atomically precise shapes swiftly and with generally high yield. We call these shapes DNA nanostructures (from Ned Seeman and William Shih), or DNA origami (from Paul Rothemund).

  As a further example, 170 DNA staples (40-mers) bind to precise locations around a 7,000 base long single-stranded circle with half of each 40-mer staple binding one place and the other half binding somewhere else (designed by caDNAno, a computer-aided design process).

  Shawn Douglas, Ido Bachelet, and I at the Wyss Institute for Biologically Inspired Engineering at Harvard have used this technique to construct nano robots—essentially cages made of DNA that hold cancer-killing antibodies. These nano cages open up and release their cargo of antibodies only when they touch cancer cells. The rules for defining the shape of protein structures are considerably harder than those for designing DNA structures (the awesome base pairs of Figure 1.4). The rules for designing cell and multicellular shapes are harder yet (unless, of course, we’ve seen them before).

  Figure 3.5 3-D DNA origami. One long DNA circle (represented as mostly horizontal lines plus far left and far right loops) is guided to fold into a specific 3-D log pile of eighteen double helices. The short, linear staples (mostly vertical lines plus internal loops) connect specific distant points on the circle.

  Despite the three advantages listed above, multicellularity tends to come with a hard consequence—you give up your immortality. Most single-cell organism species seem to live indefinitely, its members still evolving, traceable as a continuous genetic lineage (especially in the highly conserved ribosomal RNA genes) back to the dawn of time. Selection in species with diverse cell types comes along with planned obsolescence and a circle of life progression—from egg to chicken to egg. As single-cell organisms replicate by simple cell division, lessons learned (mutations and environmental responses) just prior to replication are often retained in the daughter cells. But with multicellular organisms, the larger they are, the more phases they have to pass through in going from egg to adult. In species that learn, a lifetime of experience is lost at death. This loss can be partially compensated for by inheritance of cultural (non-DNA) artifacts, such as in the Cyanistes birds in the 1960s that taught each other how to open foil-topped British milk bottles.

  In any case, the advantages and disadvantages of multicellularity may no longer apply to humans because of the huge impact of cultural inheritance and the prescient design of our technological culture. The advantages of multicellularity (reassortment of genes, division of labor and shapes) are now affected by markets and software. The downside of knowledge lost at death is offset by vast libraries and education, but a full education can take sixty years. So we’d like to either speed up education or slow down decay once a person is educated. The former might be made possible by making more rapid and accurate bridges between human knowledge and computer knowledge, either by the optimal use of existing senses or by means of multi-electrode neuronal input and output. Retarding the processes of decay, by contrast, might be made possible by discovering why some animals live long and vigorous lives while others die quickly (ranging from 3 days to 400 years; see Chapter 9). To study this phenomenon, we’d like to have two species that are closely related but have radically different life spans. As for example . . .

  The Naked Mole Rat and Longevity

  As a physical specimen, the naked mole rat is the stuff of nightmares. With its saggy pink skin, piglike nose, spindly legs, tiny, almost vestigial eyes, and mere holes for ears, it looks like the ultimate misbegotten animal.

  This highly unusual specimen isn’t even a rat, strictly speaking, nor is it a mole. It belongs to a genus (Heterocephalus) that has no other known members. It lives up to its name in being “naked,” for it is almost hairless. The mole rat spends virtually its entire life underground in total darkness, in a complex maze of subterranean channels and tunnels whose cumulative length can add up to two miles or more. This is remarkable because these are small animals, generally about the length of a human finger, and weigh little more than a mouse. Native to the hot grassland regions of Kenya, Ethiopia, and Somalia, the mole rat does not drink water (or anything else!). It can run backward and forward equally fast. Because its skin lacks a key neurotransmitter that in mammals is responsible for transmitting pain signals, the naked mole rat can feel no skin pain.

  As if it’s not already distinctive enough, the mole rat has a social structure that is almost unique among mammals. The species is “eusocial,” meaning that its colonies are organized like those of ants or bees, with the members existing in strict hierarchical castes. At the top is a queen who breeds with only a few select males. Next down are the soldiers, who defend the colony against predatory snakes or foreign invader rats. At the bottom of the social scale are the workers, who forage for food, mainly roots and tubers.

  But however odd their appearance and behavior—one observer has called them “fauna incognita”—naked mole rats possesses two additional characteristics that make them of special interest to biologists. The first is that they are the world’s longest-lived rodent. Whereas the house mouse, for example, has an average life span of two or three years, the naked mole rat can live for twenty-five years or more (the current record is 28.3 years). The second is that they are extremely resistant to cancer. Indeed, cancer has never been detected among these animals.

  These two facts illustrate the importance of the new science of comparative genomics. Genomics in general relies on our ability to read, or sequence, the DNA of a given organism. One goal of sequencing the human genome is to identify genes that play a role in disease (see Chapter 9). But reading genomes has another and equally important objective, which is to find useful biological widgets in other organisms—special-purpose apps, as it were. Comparative genomics, the study of how similar genes function across different species, will allow us to locate genetic structures that confer distinct advantages on certain classes of organisms. The long life span of the naked mole rat is one example. Its longevity is a trait that must be rooted somewhere in its genetic makeup. If we can find the gene—or more likely the combination of genes—that gives such great longevity to the animal, this will be a genomic component that we can exploit to our benefit, and perhaps even import to the human genome.

  In humans, old age is a factor in ailments such as heart disease, type 2 diabetes, cancer, and neurodegenerative diseases including Parkinson’s and Alzheimer’s. But nobody knows
why some organisms have fleeting life spans while others live for a century or more. There is at least a forty-fold variation in maximum longevity among mammals. The white-faced capuchin monkey has a life span of over fifty years. Humans can live for over one hundred years. And then there’s the case of the bowhead whale: with an estimated life span of over two hundred years, the bowhead whale is the only mammal known to outlive human beings, and is possibly the longest-lived mammal on earth.

  Still, it’s a mystery why different species that share a similar body plan, biochemistry, and physiology nevertheless age at such different rates. Comparative genomics may help us solve the riddle. Sequencing the genomes of these long-lived mammalian species may reveal a set of homologous (similar or shared) genes responsible for their extended life spans. Discovering these genetic structures would provide us with insights into the mechanisms of aging and of age-related human diseases, and this in turn will lead to better diagnoses and treatments.

  In 2007 a group of researchers, including myself and my colleague Joao Pedro Magalhaes at the University of Liverpool, supported by seventy-nine scientists from other institutions, formally proposed sequencing the genomes of the naked mole rat, the capuchin monkey, and the bowhead whale to the National Human Genome Research Institute (NHGRI). This initial proposal was rejected, essentially on the grounds that it is a long way from knowing the respective sequences to understanding exactly how the genetic structures in question function to lengthen life span. This may be true, but knowing the sequences would nevertheless be a genuine first step toward making progress in solving the problem.

  A year later, the same group, supported by the same seventy-nine scientists, submitted a second proposal, this time to sequence the genome of the naked mole rat alone. Not only does the mole rat have exceptional longevity, but it also seems to be immune to cancer. But this proposal too was rejected, on the grounds that identifying the precise complex of genetic structures that underpin the longevity of the animal would be difficult.