Dark Zones of the Genome

Meta: The dark zones in your genome are regions of DNA hiding critical data about genetic variation that hasn’t been entirely or correctly mapped.

Dark zones in your genome

There are dark zones in your genome, but nothing shadowy is going on. They’re “dark” because regions — long stretches of ATCG code — haven’t been entirely or correctly mapped. 

Scientists assumed the regions didn’t have important functions, such as genes coding for a protein. Some thought the DNA in the dark zones was junk, and that was just plain wrong and silly — for most suspected, the zones were hiding critical information.

It’s a technology problem. Current sequencing machines called NexGen use short-read technology. It’s a fast and cheap method that sequences (reads) short stretches of DNA molecules. 

NexGen is adequate for most applications, but reassembling short pieces of a sequence is tricky. The order of the ATs and CGs can get mixed up. More importantly, the machines can’t identify long, repetitive sequences and large duplications in the non-coding zones.

Newer, long-read technology can light up complex areas of the genome. Biotech companies such as PacBio and Oxford Nanopore specialize in long-read tech. They sequence whole, individual molecules rather than breaking them into bits for reassembly. 

Now scientists can see into the dark zones. No longer considered junk, the non-coding regions are the frontiers of genomic discovery. Researchers are using long-read technology to:

  • Untangle long, repetitive elements to determine where a gene starts and ends
  • Reveal “camouflaged” areas to find new genes and de novo mutations that could be variants for disease
  • Identify and decipher copy number variants in long repetitive stretches
  • Find InDels — Insertions and deletions that alter the genomic sequence

In addition to medical applications, geneticists are looking in the dark zones for variants to add to the Pangenome reference library. New variants and structural alterations are also increasing our knowledge of human evolution. 

For example, geneticist Dr. Evan Eichler uses long-read to identify species-specific structural variants that evolved over deep time and made us human. He says, “Long-read sequence and assembly are revolutionizing our ability to assemble new genomes and understand the full spectrum of genetic variation.”