There have been some incredible advances in data storage technology in the last several decades, moving from magnetic tape to optical discs to solid state drives. But with demand for storage capacity continuing to grow exponentially, Microsoft researchers are re-examining a form of data storage technology that is billions of years old: DNA.
The tiny molecule responsible for transmitting the genetic data for every living thing on earth could be the answer to the IT industry’s quest for a more compact storage medium. In fact, researchers from Microsoft and the University of Washington recently succeeded in storing 200 MB of data on a few strands of DNA, occupying a small dot on a test tube many times smaller than the tip of a pencil.
The Internet in a Shoebox
Despite the small space occupied by the DNA strands, the researchers were nonetheless able to successfully store and retrieve high-definition digital video, the top 100 books from Project Guttenberg, and copies of the Universal Declaration of Human Rights in more than 100 languages.
“Think of the amount of data in a big data center compressed into a few sugar cubes,” Microsoft wrote in a blog post announcing the achievement. “Or all the publicly accessible data on the Internet slipped into a shoebox.” That is the promise DNA data storage represents, according to Microsoft. At least, once the scientists are able to overcome a few important roadblocks and scale the technology up.
“DNA is an amazing information storage molecule that encodes data about how a living system works,” said Luis Henrique Ceze, a UW associate professor of computer science and engineering and the university’s principal researcher on the project. “This is one important example of the potential of borrowing from nature to build better computer systems.”
From Binary to Biology
DNA has several advantages as a storage medium. Besides being compact, it is also extremely durable, capable of lasting for a very long time if kept in good conditions.
Although the technology has a long way to go before it can be commercialized, the researchers said they are upbeat. The team has already managed to increase the storage capacity of their DNA system by 1000 times in just the last year.
To store the data as DNA, the binary bits of machine language must first be translated into one of the four nucleotide bases that make up a DNA strand: adenine, cytosine, guanine and thymine. The molecules are then synthetically built following the coding rules. The result is what appears to be a bit of dried salt at the bottom of a test tube.
The team then uses a technique normally employed by molecular biologists known as polymerase chain reaction to make multiple copies of the DNA strands they want to read. They can then sample, sequence, and decode the relevant data.
The entire process is currently too expensive to use to replace magnetic tape storage. But with the costs of tools to manipulate DNA falling thanks to a growing biotech industry, using DNA to store data may eventually become more cost-effective.
Pictured above: UW Associate Professor Luis Henrique Ceze, in blue, and research scientist Lee Organick prepare DNA containing digital data for sequencing, which allows them to "read" and retrieve the original files. Photo by Tara Brown Photography/University of Washington.