The one helix and C & G in

The genetic activity of every living organism is organized by billions of individual cells 1. The control-center of each cell is the deoxyribonucleic acid (DNA) that contains a complete set of instructions needed to direct the functioning of each and every one of the cells. The substance of the DNA is the same for all living organisms. The DNA of all organisms has four components in common. DNA sequence consists of four nucleotide bases; namely Adenine, Cytosine, Guanine, and Thymine. They are symbolized using the first character of their names; namely A, C, G and T respectively 2, 3. There is another unknown base element symbolized by the letter N. Therefore the DNA sequence is symbolized as a set of {A, C, G, T, and N}. The first four elements are symbolized as a double helix with A & T in one helix and C & G in another helix. The element N is still remained unidentified and is yet to have graphic illustration but participates in the functionalities of a DNA sequence. The application of DNA sequences in the field of genetic engineering, forensics, bioinformatics, and DNA nanotechnology and anthropology applications has been extensive.

The DNA Database called GenBank is created and maintained by the National Center for Biotechnology Information (NCBI). The other two repositories maintain similar data are European Molecular Biology Laboratory (EMBL) and DNA Database of Japan (DDJB). GenBank data base contents are increasing at an exponential rate day by day and require an enormous amount of storage space and so it is necessary to compress and store the DNA sequence data. The heavy quantity of data generates severe storage and data communications problems. Thus, increases the necessity of the research of DNA compression algorithm to attain better compression ratio for the DNA data set.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

The paper is structured as follows: Section II presents the existing DNA compression algorithms; Section III proposes the new algorithm called HUFFDNAC. Section IV demonstrates the achievability and competence of the proposed method and finally, Section V contains the conclusions.