Life doesn’t make trash
A genome is not a blueprint for building a human being, so is there any way to judge whether DNA is junk or not?
by Itai Yanai and Martin Lercher
Illustration by Matt W. Moore
Humans are astounding creatures, our unique and highly complex traits encoded by our genome – a vast sequence of DNA ‘letters’ (called nucleotides) directing the building and maintenance of the body and brain. Yet science has served up the confounding paradox that the bulk of our genome appears to be dead wood, biologically inert junk.
Could all this mysterious ‘dark matter’ in our genome really be non-functional?
Our genome has more than 20,000 genes, relatively stable stretches of DNA transmitted largely unchanged between generations. These genes contain recipes for molecules, especially proteins, that are the main building blocks and molecular machines of our bodies. Yet DNA that codes for such known structures accounts for just over 3 per cent of our genome. What about the other 97 per cent? With the publication of the first draft of the human genome in 2001, that shadow world came into focus. It emerged that roughly half our DNA consisted of ‘repeats’, long stretches of letters sometimes found in millions of copies at seemingly random places throughout the genome. Were all these repeats just junk?
To answer this question, hundreds of scientists worldwide joined a massive science project called the Encyclopedia of DNA Elements, or ENCODE. After working hard for almost a decade, in 2012 ENCODE came to a surprising conclusion: rather than being composed mostly of useless junk, 80 per cent of the human genome is in fact functional.
To reach that conclusion, ENCODE systematically scouted the genome as a whole for specific functions. One function could be coding for proteins; another function could be acting as a ‘molecular switch’ that regulates the operation of other genes. In one experiment, for example, ENCODE surveyed the entire genome for DNA that is bound by ‘transcription factors’ - proteins known for calling other genes into action. In this way, ENCODE compiled a comprehensive and very useful catalogue that provided a functional clue for 80 per cent of the 3 billion nucleotides that comprise all the genes of the human genome. The ENCODE results seemed to confirm that our genome is indeed a tidy blueprint; that almost every bit of the human genome is there for a reason, and that our genetic heritage is not a small heap of information buried under a pile of junk.
Consider the so-called ‘LINE-1 elements’, a DNA sequence formerly classed as junk. Our genome teems with 500,000 copies of this 6,000-letter sequence that seems to do nothing but reproduce copies of itself, the very definition of the ‘selfish gene’. According to ENCODE, these LINE-1 elements are functional since they are biochemically active. But does this mean they function to further human survival itself?
Likely not. ‘Function’ is a loaded word, and ENCODE chose a very inclusive definition: in the ENCODE world, function can be ascribed to any stretch of the genome that is related to a specific biochemical activity. But such inclusiveness can lead to ridiculous conclusions. To make an analogy, consider spam emails. What spam emails mostly do is occupy email servers that aim to separate them from genuine email. Few people would argue that occupying spam filters is a function of spam – but an ENCODE-like definition would say just that. Indeed, many of ENCODE’s 80 per cent ‘functional elements’ are unlikely to contribute to human survival and the reproduction of human genomes, which is what you would expect if you consider function from the perspective of a human blueprint.
Yet viewing our genome as an elegant and tidy blueprint for building humans misses a crucial fact: our genome does not exist to serve us humans at all. Instead, we exist to serve our genome, a collection of genes that have been surviving from time immemorial, skipping down the generations. These genes have evolved to build human ‘survival machines’, programmed as tools to make additional copies of the genes (by producing more humans who carry them in their genomes). From the cold-hearted view of biological reality, we exist only to ensure the survival of these travellers in our genomes.
This is the central idea in Richard Dawkins’s milestone book, The Selfish Gene (1976), and the fundamental shift in perspective it entails might be as hard to accept as it was hard to acknowledge that our world revolves around the sun, not the sun around us. The selfish gene metaphor remains the single most relevant metaphor about our genome.
Building on the work of generations of biologists since Charles Darwin, Dawkins took the theory of evolution to its logical conclusion. Darwin’s greatest contribution to science was the concept of natural selection: the fundamental logical principle that inevitably causes a population to gradually adapt to its environment. At first, variation arises in individuals as genes mutate randomly over time. Then, through the mindless process of natural selection, some individuals fare better than others in the task of surviving and reproducing because of differences in their genes, which are then passed on.
What we see are not the real players of the game of life; we just see the consequences as those players strategize to stay in the game.Darwin showed that one simple logical principle could lead to all of the spectacular living design around us, including humankind, previously believed to have been specially created in the image of a god. The logic of natural selection applies far beyond the evolution of species: anything that is good at replicating itself promotes its own survival.
Our genomes are reassembled from the genes found in our parents’ genomes at each generation: when your mother and father prepared the DNA passed on to you, they recombined the genome copies they inherited from their own parents into new combinations. From the viewpoint of natural selection, each gene is a long-lived replicator, its essential property being its ability to spawn copies. In order to spawn copies, many genes have evolved functions important for the survival of the organism in which they reside. Those genes that fail at replicating are no longer around, while even those that are good face stiff competition from other replicators. Only the best can secure the resources needed to reproduce themselves.
It is those replicators that are at the heart of the natural world, that jump from generation to generation, abusing us (or any other species) as their survival machines. When looking at our genome, we might take pride in how individual genes co-operate in order to build the human body in seemingly unselfish ways. But co-operation in making and maintaining a human body is just a highly successful strategy to make gene copies, perfectly consistent with selfishness.
So why are we fooled into believing that humans (and animals and plants) rather than genes are what counts in biology? It is a matter of scale: the world we can see is too big to include genomes, and our lifespan is too short to see how individual genes come into existence, change, and disappear again, processes that unfold over millions of years. What we see are not the real players of the game of life; we just see the consequences of their strategies to stay in the game.
Many genes in our genomes survived because they contributed to making better survival machines – humans better at spreading those genes. But what about the alleged junk, what about, for example, the 500,000 LINE-1 elements? The answer is beautifully simple: each LINE-1 element consists of a set of genes. Together, these encode proteins that execute a molecular programme of inserting additional copies of itself into the genome – a grandiose ‘copy-paste’ strategy. The fact that there are 500,000 copies of them is a testament to their successful proliferation programme. By copying themselves into the genome over and over again, LINE-1s ensure that they remain associated with those genes that make the survival machine.
Even if a large number of LINE-1 copies are removed, lost, or damaged by mutations, there will always be more copies somewhere else in the genome. This is the only explanation needed to justify the LINE-1s’ continued existence. They don’t need to have a specific function in the human blueprint at all – they are freeloaders. ENCODE, however, would reason that these DNA segments are functional since they engage in the process of transcription, whereby a molecular template of the gene works to churn out more of the same. Thus, while most LINE-1s are no longer even capable of making proteins, ENCODE would conclude that they are part of the human blueprint.
To emphasise this point, consider another kind of junk in the genome, the ‘Alu’ element, about 300 letters long. Each of your two genome halves contains 1 million copies of this gene. What does it do? Looking at Alu’s sequence reveals a very uninteresting gene. The only exception is the very last part of its sequence: it matches precisely the last section of LINE-1 elements. In LINE-1s, this stretch of letters is used as a signal, so that the LINE-1 proteins know which sequence they should copy back into another genomic location. By having the same signal, Alus effectively masquerade as LINE-1 elements, fooling the LINE-1 machinery into copy‑pasting them into the genome. It turns out that the freeloaders themselves have freeloaders!
At the most fundamental level, then, our genome is not a blueprint for making humans at all. Instead, it is a set of genes that seek to replicate themselves, making and using humans as their agents. Our genome does of course contain a human blueprint – but building us is just one of the things our genome does, just one of the strategies used by the genes to stay alive. In their selfish desire to leave offspring, our genes have evolved to form a society where they work together efficiently, dividing the labour to ensure that each makes it into the next generation. Like Adam Smith’s invisible hand, the genes in this society co-operate with one another not from a sense of fairness or design, but simply to maximise their own survival. From the myriad interactions of genes in this complex society emerge the striking biological adaptations we see in the living world.
Junk is not trash, and it might come in handy at some point, even if that is not its function.Our genome is filled with freeloaders that manage to hang on, simply because the damage they do is not large enough to make the effort to weed them out worthwhile for other genes, or because their strategy for survival is so conniving that they are difficult to expel. From the point of view of the society of genes, any freeloader DNA – DNA that does not contribute to the genome’s ability to leave offspring, that is, any DNA that does not contribute to organising our bodies – is junk.
ENCODE has called 80 per cent of the human genome functional, yet 97 per cent of the genome does not encode proteins or other molecules that support human life. Is all this DNA just junk? Of course not. There are undoubtedly many molecules whose function we have not yet grasped. And a blueprint alone is not enough to build anything – you also need assembly instructions and a time plan that orchestrates the building process. The portion of the genome responsible for this organisational feat likely adds another 7 per cent or so to the blueprint’s 3 per cent, leading scientists to suspect that about 10 per cent of the genome is actually needed to specify a functioning human.
There is good evidence for this 10 per cent. If we compare our genome to that of other mammals, we find that 90 per cent of the genome was free to change through random mutations. Those DNA letters apparently did not contribute to the efficiency of the survival machine, us. By contrast, mutations in the remaining 10 per cent were weeded out by natural selection because they would have compromised the DNA sequences’ ability to spread – either by damaging the survival machine’s functioning, or by reducing the sequences’ freeloading capacity. This is the definition of function that has traditionally been used by evolutionary biologists as well as by philosophers of science: if something is conserved by natural selection, then it is functional. Function, then, is identified as the feature that ensures the spread or maintenance of a particular DNA sequence.
Junk is not trash and, as the Nobel laureate and genetics pioneer Sydney Brenner has pointed out, it might come in handy at some point, even if that is not its function. Any stretch of DNA can by accident turn into something that then contributes to the spread and survival of the genome. And sure enough, we do, for example, find individual LINE-1 or Alu sequences whose insertion has changed the expression of neighbouring genes in useful ways. These few members of the freeloader community effectively switched sides: they became part of the society of genes that provides the blueprint for human life.
But such examples don’t mean that our genome hordes junk because it might become useful in some future situation – the vast majority of repeats freeload off our bodies, the survival machines built by a co-operative society of genes. To explain why these junkish repeats litter our genome, we do not need to search for any other explanation, any other function, than their capacity to ensure their own persistence in the society of genes.
A misunderstanding persists in the wrong-headed notion that our genome encodes the blueprint of human life. It does not. The blueprint analogy does not apply to the majority of our genome, nor is the non-blueprint component useless junk. Someone or something benefits from much of this genetic code, but value is in the eye of the beholder. For the majority of functional repeats such as Alu and LINE-1, the only beneficiaries are they themselves; attributing human benefit to junk imagines harmony and purpose where none exist. The truth is that many DNA sequences have survived inside our genome going all the way back to the first replicators, through aeons of evolutionary time.
What a piece of work!
25 August 2014
Itai Yanai is associate professor in biology at the Israel Institute of Technology.
Martin Lercher is professor at the Institute of Bioinformatics in Dusseldorf, Germany. Lercher and Yanai are co-authors of the forthcoming book, The Society of Genes, from Harvard University Press.