September 23, 2020

When Will DNA Solve the Data Storage Crisis?

Craig de Ridder

Pillsbury Winthrop Shaw Pittman LLP

+ Follow Contact

Facebook

Send

Embed

Pillsbury Winthrop Shaw Pittman LLP

TAKEAWAYS

Digital data storage devices are approaching their scaling limits and have limited life spans, and cloud-based storage systems in data centers are costly and consume too much energy.

Synthetic DNA could store vast amounts of data with low maintenance costs for a very long time.

DNA data storage for archival purposes could be available in the near future.

"DNA is an incredible molecule that … provides ultra-high-density storage for thousands of years. In fact, the DNA contained within all cells in a human body could store all the movies created to date in the 21^st century three billion times over.” —Emily Leproust, CEO of Twist Bioscience

The Data Storage Crisis
Ninety percent of the digital data in the world has been generated in the past two years. Moreover, the pace is accelerating with the growth of search engines, social media sites, smart cars and the Internet of Things (IoT). Google receives an estimated 3.5 billion search requests, and WhatsApp users exchange up to 65 billion messages every day. Tesla drivers have exceeded three billion miles driven with Autopilot activated and multiple car sensors collecting data.

International Data Corporation forecasts that global data storage demand will grow to 175 zettabytes (175 trillion gigabytes) by 2025 (up from 33 zettabytes today), which will exceed the storage capacity of currently available storage devices, such as magnetic tapes, hard drives and optical discs. These storage devices are approaching their density limits, can be damaged and have limited life spans. Magnetic tapes, which serve as the basis of most digital archives, have a maximum life span of under 30 years.

Considerable amounts are being spent by public and private organizations when moving data from older storage devices over to newer generations of tapes, drives and discs. These organizations are increasingly turning to cloud-based storage systems, but the large data centers of cloud providers are very costly to build, require transfers to newer storage devices and consume huge amounts of energy. The cloud can be much more expensive than anticipated by organizations due to migration expenses and security and operating costs.

A better data storage option is needed, and synthetic DNA has been considered a promising candidate since Richard Feynman published a paper in 1959 describing the prospects for creating artificial objects similar to those in biology with similar capabilities. In 2012, a team led by George Church at Harvard University converted a 52,000-word book into DNA strands manufactured in a laboratory to demonstrate that DNA could store data.

In 2017, Yaniv Erlich and Dina Zielinski at the New York Genome Center developed a new coding system that randomly separated DNA strings into small tagged “droplets” to achieve significantly greater storage capacity. The researchers worked with Twist Bioscience, a San Francisco-based biology company, that has developed a new synthesis platform by “writing” DNA on silicon chips. Today, the science behind storing digital data in synthetic DNA has been firmly established.

From Bits to Molecules and Back to Bits

“The information density of DNA is remarkable—just one gram can store 215 million gigabytes of data. For context, the average hard drive in a laptop can house just one millionth of that amount.” —John Cumbers in “DNA Storage Is About To Go Viral,” Forbes

Computers and organic cells have a lot in common. In a computer, information is encoded in strings of binary digits (bits), 1s and 0s, which, when read, execute programs. In a cell, information is stored in the four nucleotide bases represented by the letters: A (adenine); T (thymine); C (cytosine); and G (guanine), which, when read, produce proteins. These bases are located in tiny molecules called deoxyribonucleic acid (DNA). Strung together these bases make the biological instructions (code) that govern cells.

DNA data storage involves two main processes: writing the code via DNA synthesis and reading the code via DNA sequencing.

First, a binary code data file is translated into the four base pairings (for example, A-00, T-01, C-10 and G-11), and the DNA molecules are synthesized letter by letter with chemical reactions or enzyme catalysts and indexed. After the segments are written, they are stored in a container that regulates temperature and light to maintain stability. Standard options for DNA storage include frozen in solution, dried or encapsulated in a bead.

Next, a targeted section of the DNA strand is decoded by a commercial sequencing machine (initially developed for genome sequencing) and translated back into the original digital file. Error-correcting algorithms are used during the encode/decode processes so that the data is recovered as error-free as possible.

DNA as a Data Storage Medium

“DNA is an extremely stable molecule with a half-life of over 500 years. If stored in cold conditions, DNA is capable of remaining intact for hundreds of thousands of years. [A] 700,000-year-old horse’s DNA, stored in the permafrost, was sequenced in 2013.” —“The Future of DNA Data Storage,” Potomac Institute for Policy Studies

The information density of DNA and its stability is much greater than existing storage options. The digital information in a warehouse-sized data center could be stored in a space roughly the size of a sugar cube and would require little energy to maintain. And, unlike traditional forms of data storage, DNA technology will always be important to mankind and will never become obsolete. When adopted widely, it will not go the way of the floppy disc.

The cost of DNA synthesis has dropped significantly in the past decade and DNA can be ordered on the websites of companies like Twist Bioscience and Thermo Fisher Scientific.

However, the major constraints on commercial storage are that DNA synthesis and sequencing are still too costly, in part because they are based on organic chemistry methods designed for different use cases and susceptibility to a high rate of errors.

Research Advances
Automation, next-generation DNA synthesizers and sequencers and improved encoding schemes should drive down costs dramatically.

Recent breakthroughs pointing in that direction include the following:

In March of 2019, researchers at Microsoft and the University of Washington announced that they had developed the first fully automated system to store and retrieve data in synthetic DNA (eliminating the need for laboratory technicians)—a key step in moving the technology out of the research laboratory and into commercial data centers.
In June of 2019, Catalog Technologies, a startup based in Boston, set a record in DNA data storage by coding all of Wikipedia (16 gigabytes of data) using technology similar to inkjet printers and a new method of coding with different combinations of pre-built DNA molecules (similar to a printing press with movable typefaces).
In April of 2020, researchers at the University of Illinois and the University of Texas demonstrated a new method of recording information in DNA (akin to the cardboard punch cards used with early computers). They used enzymes to leave small “nicks” in distinct locations on the DNA strand that can be used to hold and retrieve information with less errors.
In a paper published in July of 2020 in Proceedings of the National Academy of Sciences, a group at the University of Texas at Austin described a new encoding algorithm for DNA data storage permitting more efficient and accurate data retrieval. Other forms of DNA storage address replication errors by repeating the code 10 to 15 times over. The researchers found a way to build the DNA in a lattice shape where each piece of data reinforces the next, so that it only needs to be read once.

Commercial Applications of DNA Data Storage

“Our ultimate goal is to put a system into production that, to the end user, looks very much like any other cloud storage service—bits are sent to a data center and stored there and then they just appear when the customer wants them.” —Microsoft principal researcher Karin Strauss, University of Washington News, March 1, 2019

Twist Bioscience currently synthesizes more than one million pieces of synthetic DNA on a single silicon chip using semiconductor technology. The company is working toward the next generation of silicon chip that will allow it to write 10 gigabytes of DNA on each chip, reducing the cost of digital data storage significantly. “We see the first applications of commercial DNA data storage being long-term markets … a project like a clinical trial or government bodies [retaining] vast amounts of historical information … [or] the consumer market for large archives of photos and videos,” said Twist Bioscience CEO Leproust. Twist and Netflix announced last month that they had partnered with Robert Grass, a professor at ETH Zurich, to store an episode of the show Biohackers on DNA. Netflix wanted to illustrate that one of the show’s concepts was more than just science fiction.

Microsoft is planning to store archival data on DNA on its cloud within the next three years. Catalog Technologies claims that it will be making DNA data storage-as-a-service economically feasible in the near future.

Data center operators and cloud providers should already be paying attention to the implications of DNA data storage. Organizations and businesses needing long term data storage solutions for archival purposes should be factoring it into their business plans.

Beyond Data Centers
Other possibilities for DNA data storage are emerging, including the so-called “DNA of Things” (DoT). Robert Grass and Yaniv Erlich in Israel want to use DNA data storage to provide all types of inanimate objects, both solids and liquids, with a memory of their own. The researchers first created a 3D-printed plastic Stanford bunny (a common graphical test object) that contained a digital blueprint for its synthesis. The DNA was encapsulated in silica beads for protection and then fused into the “ink.” Five generations of the bunny were synthesized and after retrieval from a tiny sliver, each contained the memory of the previous generation in the digital files.

For the next, more ambitious experiment, they encapsulated a 1.4 megabyte YouTube video in a transparent plexiglass polymer to be 3D-printed into ordinary glasses. The glasses worked normally when worn, but they contained a secret DNA-encoded video message. Once again, a tiny fragment taken from the frames allowed for full recovery of the file.

Information could be stored or concealed in any everyday object for future reference or replication. In contrast to the IoT, a system of interrelated computing devices and machines, the DoT creates objects that are stand-alone information storage devices, completely off-grid. For now, the focus appears to be on objects with long lifespans, such as construction materials, which could retain their own instructions for replication long after traditional data storage methods have been lost or become obsolete. In the future, greater interest will develop in keeping sensitive data away from the cloud, increasingly associated with security and privacy concerns, and storing massive quantities of information in small devices.

[View source.]

Send Print Report

DISCLAIMER: Because of the generality of this update, the information provided herein may not be applicable in all situations and should not be acted upon without specific legal advice based on particular situations. Attorney Advertising.