Hacking the Code of Life Read online

Page 2


  Scientists in pretty much all biological disciplines rapidly took up and improved this amazing new box of tools. The basic technology was expanded, and made faster, easier and cheaper to use. For nearly fifty years these techniques provided the methods required to create astonishing new breakthroughs, from gene therapy for rare human diseases to nutritionally-enhanced rice that could save hundreds of thousands of lives a year. But although scientists expanded the range of questions they could address using these tools, the technology remained fundamentally unaltered. It was very recognisably the same as that developed by Cohen and Boyer back in the days of bell bottom trousers, platform shoes and the original series of Hawaii Five-0.

  But in 2012 all this changed, when a new technology emerged which has altered once again how we can manipulate the DNA of living organisms. This new technology is cheap, incredibly easy to use, fast, flexible and may prove to be the silicon chip to the Cohen and Boyer valve. But to understand why, we need to look in more detail at DNA.

  DNA Class 101

  DNA is the genetic material of almost all organisms. The acronym stands for deoxyribose nucleic acid, which is a bit of a mouthful. A helpful way to think of DNA is as a written text like a script or a book. Any written text is made up of letters from an alphabet. In the case of DNA, the alphabet contains only four ‘letters’ called A, C, G and T. Technically, these are referred to as bases, but ‘letters’ probably serves our purpose better here.

  It might seem odd that the basic alphabet of complex life is so simple. But you can do a lot with four letters if you have enough of them. When your parents had sex and created you, your mother and father each contributed 3,000,000,000 of these letters, arranged in very specific sequences. At most of the 3 billion positions the letter is the same in both your mother and father. But about once in every 300 positions the letter will be different in your mother and father. In your mother it might be a T, in your father a G, for example. This means potentially there are 10 million sites where your DNA sequence will be different from someone else’s.3

  This is one reason why humans vary so much. We have different DNA scripts from each other, because we will have inherited different combinations of those 10 million potential variations. It’s also why closely related members of a family are more similar to each other than to unrelated individuals – we are more likely to have inherited similar genetic variations because we have shared close ancestors. You look like your own mother, not like your partner’s mum.

  Similarly, all humans are far more similar to each other in our genetic scripts than we are to other species. The sequence of letters in human DNA is different from in other organisms and the differences become more pronounced the further back we have to go in evolutionary history to find a common ancestor. If we compare the sequence of DNA letters between humans and chimpanzees, they are about 98.8% similar.4 But if we compare humans and bananas the figure drops to 50%. This doesn’t mean we are half-banana. There are complexities in the way these figures are calculated that make precise numbers a bit misleading, but you get the point.

  Boyer and Cohen’s breakthrough gave scientists the tools to interrogate and to use the genetic material of living organisms. Instead of inferring why a particular region of DNA was important, by examining what happened when you crossed individuals with and without a trait you were interested in, you could use the DNA itself to address the question.

  You could directly test a hypothesis at the level of the genetic material itself. If, for example, you thought a particular region of DNA in a strain of bacteria made that bug resistant to an antibiotic, you could test the idea quickly using Boyer and Cohen’s method. You just take the relevant region of DNA out of an antibiotic-resistant bacterium, and put it into one that is normally killed by the same drug. If the genetically engineered bacterium is now resistant to the antibiotic, you can feel a lot more confident that you were right about the role of that region of DNA.

  If we think of DNA as an alphabet, then the complete sequence of those letters in an organism can be thought of as its book. This complete sequence is generally known as the genome. The genes – the DNA sequences that code for Mendel’s invisible units of heredity – can be thought of as paragraphs within that book.

  Often these genes code for proteins. Proteins are the molecules that carry out many of the actions in the cells and bodies of living organisms. The haemoglobin that carries oxygen in red blood cells; the insulin that controls the uptake of glucose from the bloodstream after a meal; the rhodopsin pigment in our eyes that responds to light signals are all examples of proteins.

  Unless a writer is particularly avant-garde, they will usually use paragraphs when they write a book. Sometimes they may write a paragraph and then decide it’s in the wrong place and they want to insert it somewhere else in their book. If we think of an early writer – Mary Shelley for instance – this would have been a very cumbersome process. But for a modern writer like Stephen King, it’s not really a problem. He can just cut and paste. That’s what Boyer and Cohen’s innovation essentially enabled researchers to do – to cut and paste genomes.

  Often a writer will cut and paste within a single document such as a book. But there’s also nothing to stop them from pasting the paragraph into a completely different book. That was also possible with the first generation of genetic engineering, as scientists were finally able to move genetic ‘paragraphs’ from one life-form to another. Pasting a particular DNA gene/paragraph from a jellyfish into the genome of a mouse created mice who glowed bright green under ultraviolet light. Thousands of other applications developed, with major impacts for basic research and with practical applications such as the development of improved crops or the creation of new treatments for human diseases.

  But even though researchers made multiple improvements to the basic technology, there were still fundamental problems holding back progress. Genetic engineering in bacteria is easy. Their genomes are small, and it’s very simple to persuade bacteria to absorb new genes. You can generate genetically engineered bacteria in just a few days. It’s much more complicated to do similar experiments in mammals. It’s harder to persuade mammalian cells to incorporate the new genes, for a start. And if you want living organisms such as live mice, rather than just mouse cells in the lab, you need to inject DNA into fertilised mouse eggs, implant these eggs into female mice, and hope the tiny embryos develop and grow OK. If they don’t, you may have lost months of time, during which your competitors are getting ahead of you, and your grant funding is running out with nothing to show for it.

  When a writer performs a cut-and-paste on a manuscript, they control where they put the paragraph they have moved. This is a good thing, as randomly placed paragraphs rarely work well. But with the original technology for moving genes around it was very difficult to control where they were inserted. This created profound problems because in living organisms the expression of a gene is highly influenced by where it is in the genome. Put it into the wrong location and it can be like sticking a ballerina into concrete, or a seal on a trampoline. The outcomes might be interestingly weird but they aren’t likely to tell you much about normal activity of the gene.

  In 2001 scientists finally had access to the entire genome sequence of humans, our complete 3 billion letters of genetic information. It’s not really a book, more like a multi-volume opus that fills a two-metre-high bookshelf, and it’s been extraordinarily useful. We humans aren’t the only species for which this book of life has been recorded. Researchers have sequenced the genomes of over 180 other species and the number is increasing all the time.5

  Scientific curiosity has increased over this period as well. Whenever new technologies are introduced, they increase the range of questions that researchers can tackle experimentally. But the inquisitive nature of scientists means that we always want to probe with greater sophistication and more complexity. The limitations of the Boyer and Cohen approach, even with all the improvements made to it over a period of more than fo
rty years, were a source of ever-growing frustration.

  What if instead of wanting to know about the actions of an entire gene (a paragraph), you actually want to know the precise role of just one letter? After all, that could be the difference between your business card describing you as an ‘interior designer’ or an ‘inferior designer’. Of course, a business card is very small, with only a tiny amount of text. Can it really be true that one letter in our 3-billion-letter books of human life could be equally important? Well, yes. Boys with just a single letter change in a specific gene6 develop a devastating condition characterised by gout, cerebral palsy, mental retardation and self-mutilation of lips and fingers.7 That’s just one example. There are hundreds – possibly thousands – of other human disorders caused by such single letter errors.

  It was extremely difficult, costly and time-consuming to use the original techniques to make changes to just one letter in a complex genome. It was even more difficult to change simultaneously a few letters at different positions in a genetic book. But being able to do this is vital if we want to explore how some of the 10 million variable letters in the human genome work together to affect our lives.

  That’s why the entirely new technology that developed from 2012 onwards has been such a breakthrough. Almost in one bound, scientists were free of the constraints imposed by the technical limitations of the existing methodologies. In this exciting new landscape, any lab could tackle fascinating new questions, cheaply, quickly and easily, with a high likelihood of technical success, and a degree of precision that had previously been the stuff of dreams. Welcome to the wonderful and sometimes worrying world of gene editing.

  Notes

  1. http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1000653

  2. http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1000653

  3. https://ghr.nlm.nih.gov/primer/genomicresearch/snp

  4. https://www.amnh.org/exhibitions/permanent-exhibitions/human-origins-and-cultural-halls/anne-and-bernard-spitzer-hall-of-human-origins/understanding-our-past/dna-comparing-humans-and-chimps/

  5. http://www.genomenewsnetwork.org/resources/sequenced_genomes/genome_guide_p1.shtml

  6. Davidson, B.L., Tarle, S.A., Palella, T.D., Kelley, W.N. ‘Molecular basis of hypoxanthine-guanine phosphoribosyltransferase deficiency in 10 subjects determined by direct sequencing of amplified transcripts’. J. Clin. Invest. (1989); 84: 342–346.

  7. https://www.omim.org/entry/300322?search=lesch-nyhan%20mutation&highlight=leschnyhan%20lesch%20nyhan%20mutation#40

  * Another researcher, Paul Berg, was awarded the Nobel Prize in 1980 for some foundational work on recombining DNA.

  2

  CREATING THE TOOLBOX TO HACK THE CODE OF LIFE

  For the first time in Earth’s history, one species has the capability to alter the genomes of other living organisms, including itself. Because of gene editing this can be performed in any moderately equipped lab and by people with relatively basic scientific skills. Changing the raw material of natural selection is becoming commodified. New tools are being developed every week to make the process faster, cheaper, even more precise, ever more flexible and applicable. But all of these are enhancements or variations of the original technology. So it’s worth asking – who invented this wonderful new approach and how did they do it?

  The science of progress is the art of the possible

  Sometimes science moves forwards in a very directed fashion. There is a need, and scientists step up to find a way of meeting that need. Think of NASA creating the technology which sent astronauts to the Moon, and more importantly got them back to Earth safely again, in response to President Kennedy’s ambitions for the United States’ space programme. Think of Gertrude Elion and her colleagues, creating azathioprine, the first drug that really prevented rejection in organ transplants and turned a medical dream into a clinical reality.

  But this isn’t really the norm in science, it’s not how the discipline normally progresses. For a start, this approach only works quite late in a technology or innovation cycle. This isn’t to belittle the work of those cited in the previous paragraph, who achieved fabulous outcomes. But the underlying disciplines were far enough advanced that the brutally ambitious targets that had been set were ultimately achievable. Political will matters, but it can’t overcome technical impossibilities. When Queen Victoria let it be known she’d find it convenient to have a railway station near to her country estate in Norfolk, a branch line was built and a station constructed. But if the monarch had announced she wanted her bravest courtiers to fly to the Moon, that target would inevitably have been missed. There was simply no way to approach this target at that point in technological history.

  President Nixon announced a ‘war on cancer’ in 1971, but cancer still kills over 8 million people a year globally.1 In 1971 we didn’t understand enough about all the different forms of cancer to make the political ambition a reality.

  In fact, most great scientific and technological developments have their origins in curiosity-driven research. In 1978 Louise Brown, the world’s first ‘test-tube’/in vitro fertilisation (IVF) baby was born. By 2012 it was estimated that 5 million babies owed their existence to this clinical intervention in all its various forms.2 But this has only happened because of the decades of developmental biology research that were carried out from the early part of the 20th century onwards. The motivation for the scientists who conducted all this basic research wasn’t a drive to address human infertility so that childless women could become mothers. It was simple curiosity about fundamental biological processes. It was only once the field of developmental biology had become very advanced that IVF became a real possibility.

  The same is true for gene editing. Because gene editing is such a game-changing technology that fills a large number of technological needs, it’s tempting to assume that every step of its creation has been driven by a desire to devise a better way of hacking the genome. But it wasn’t. Instead, the foundation of the field came about because a scientist in Spain started to find weird DNA sequences in some bacteria he was studying.

  When bacteria go to war

  Isaac Asimov, the hugely influential science fiction writer and scientist, once said: ‘The most exciting phrase to hear in science, the one that heralds new discoveries, is not “Eureka!” but “That’s funny …”’ The field of gene editing owes its start in life to a 28-year-old PhD student called Francisco Mojica, who was carrying out his doctoral studies at the University of Alicante in Spain. Mojica was sequencing the genome of a particular bacterium, and when he analysed his results he found some sequences that looked unusual to him. He didn’t have a Eureka moment, but more importantly, he didn’t dismiss them as just something trivial and boring. Instead, he thought ‘That’s funny’.

  Mojica was awarded his PhD and eventually started his own group. Despite receiving almost no funding and no interest from his peers in the scientific community, he couldn’t bring himself to give up on the funny little sequences he had found. He sequenced more types of bacteria and by the turn of the millennium, seven years after the initial finding, Mojica had found equivalents of these strange sequences in twenty different species.3

  What was it that made these sequences so unusual, so that they captured Mojica’s interest? The same sequence of about 30 DNA letters was repeated multiple times, but each of these 30 letter blocks was separated by about 36 letters. The 36 letters were different from each other and he called these ‘spacers’. This is shown schematically in Figure 1.

  With no funding, Mojica was severely limited in the experiments he could run to investigate the function of these strange regions. The 30-letter repeats were like nothing else that had been reported so it was hard to know how to begin looking for their function. So after a while Mojica turned his attention to the bits between the repeats, the spacers of 36 DNA letters that varied from each other. Over and over again he entered the sequence of these individual spacers into
computer databases where scientists store the data they generate from sequencing the genes and genomes of a wide variety of organisms. At first he couldn’t find any sequences that matched. But every day scientists throughout the world uploaded more and more sequences into the databases, and one day in 2003 Mojica got a hit.

  Figure 1. The structure of the strange repeat regions that Francisco Mojica identified in bacteria

  The solid triangles are the identical 30-letter sequences. The other blocks are the different sequences of 36 letters that Mojica realised provided a record of infections by viruses, and the defence system to see off future attacks by the same viruses.

  A spacer from a strain of E. coli bacteria that he had sequenced quite recently matched a new sequence in the database, from a virus that infects bacteria. And not just any old bacteria, but E. coli. Even more significantly, the strain of E. coli that contained this viral spacer sequence was one that was resistant to the virus.

  Invigorated by his discovery, Mojica painstakingly ran every spacer sequence he had ever generated – all 4,500 of them – through all the databases again. This time 88 of them found a match in the databases, and in about 65% of these the match was to a sequence from a virus that infected the bacterium that the spacer was in.4