All life is based on DNA code. DNA determines your height, sex, hair and eye colour. It also decides if you are an amoeba, a flower, a tree, a tiger or a human. This information is called the geonome. The amount of data needed to describe each individual geonome is enormous, yet it is stored in a tiny structure. DNA is also very persistent. Scientists have decoded DNA from animals that died thousands of years ago. When the DNA helix splits, two new complete helices are formed, both exact copies of the original. So, given how much data can be stored, how it replicates and how secure it is, could DNA then be used to store ordinary data?
DNA is a double helix shaped molecule, with the two strands of the helix connected by amino acids called by their initials; A, T, G, and C. The DNA code is made up from combinations of these amino acids. The actual length of a human DNA helix would be about 2 inches if it was stretched out, but in fact it is very compact. According to the New Scientist magazine, one gram of DNA can potentially hold up to 455 exabytes of data. Other authorities say 2.2 petabytes per gram. Either way that is a lot of data in a very small space. Scientists have been able to read the DNA code for a long time, using a process called sequencing, and synthesizing is the equivalent process of writing out DNA chains. DNA is also incredibly stable, scientists have managed to sequence the complete genome of a fossil horse that lived more than 500,000 years ago. Once last advantage - storing it does not require much energy.
The downside it that it is currently expensive and slow. Storage costs in millions of dollars per gigabyte will put most people off. Data read and write time is still measured in hours.
A Chromosome is a combination of DNA and proteins which keep the DNA helix stable. Chromosomes contain Genes, which are short sequences of DNA and are the basic unit of genetic information. (Which leads to the old joke Question - "How do you tell the sex of a Chromosome?" Answer - "You take its genes off! ") Sorry.
In 2017 the Harvard group adopted a DNA-editing technology called CRISPR, which can identify specific DNA sequences with precision and slice into them like a molecular scalpel. This means it is possible to select a target gene and either remove it or replace it with a new sequence. The Harvard team has used CRISPR DNA-editing technology to record images of a human hand into the genome of E. coli, and then read the images out with higher than 90 percent accuracy.
Researchers at the University of Washington and Microsoft Research have developed a fully automated system for writing, storing and reading data encoded in DNA. In March 2019 they ran a proof-of-concept test, and successfully encoded the word 'hello' in short lenghts of artificial DNA, then converted it back to digital data using a fully automated end-to-end system.
Twist Bioscience states 'We Make DNA'. They have created a revolutionary silicon-based DNA synthesis platform, which is known for being cheap and scalable. While Twist Bioscience primarily supplies DNA for medical research, they state 'Twist is also pursuing longer-term opportunities in digital data storage in DNA'
It seems that there is a lot of active development with DNA storage, some focussed on sequencing techniques that will allow for billions of DNA sequences to be read easily and simultaneously. This should speed up access times to data and bring the price down. However these both need to improve by orders of magnitude before DNA storage can compete with electronic storage. Read back of written data also needs to be closer to 100 percent before the technology can be considered reliable.
In 2020, North Carolina State University researchers announced a new technique that takes us a step closer to a practical DNA storage device. Previous DNA data storage technologies use a technology called PCR to retrieve stored data. The data index is stored at the ends of DNA strands in a double DNA helix, as sequences of DNA called primer-binding sequences. PCR, or polymerase chain reaction, is used to separate the DNA helix so that the data index strands can be read. This PCR process often involves large changes in temperature, and ends up with both the primer-binding sequence indexes and the DNA sequences that are used to store the data essentially running free in a soup of DNA. PCR is then used to sort through this soup to find and extract the data.
This temperature change requirement process is a problem for commercial use, and also the PCR technique destroys the original data.
THe NCSU solution allows users to read or modify data files without destroying them and works at room temperature, making the systems practical and commercial. NCSU calls their solution Dynamic Operations and Reusable Information Storage, or DORIS. Instead of using double-stranded DNA as a primer-binding or index sequence, DORIS uses a single-strand of DNA which hangs off the double-stranded DNA that actually stores data. If you think of this primer-binding sequence 'tail' as a file name, then DORIS can scan down several strands of DNA without needing to separate the helix, and find the strand that contains the required 'file'. Once DORIS finds the correct DNA sequence, it can transcribe the DNA to RNA, and so not destroy the original data and the RNA can then be processed independently.
Another benefit of DORIS is that the single-stranded 'tails' can also be modified, which means the DNA 'files' can be renamed, deleted, or locked to prevent unwanted access to the data.