![]() |
What? |
I. Problems? What Problems?
What is the cause of difficulties in analyzing DNA and RNA sequences?
The accuracy of collection samples is a serious problem:
"A computer-aided analysis of almost 12,000 human-genetics papers has
found more than 700 studies with errors in the DNA or RNA sequences of their experimental reagents1. That amounts to a “problem of alarming proportions”, because it suggests that a worrying fraction of studies on human genes are not reliable, says the team that conducted the analysis, led by cancer researcher Jennifer Byrne at the University of Sydney in Australia."
(Nature, "Errors in genetic sequences", 2021; cf. this). But beyond that there is also:
"The problem of the creation of numerical constants has haunted the Genetic Programming (GP) community for a long time and is still considered one of the principal open research issues. Many problems tackled by GP include finding mathematical formulas, which often contain numerical constants."
(Creation of Numerical Constants in Robust Gene Expression Programming). As a Dylan song lyric indicates "You can't win with a losing hand" (Things Have Changed).
And since the "dealers who hand out the cards" ... 'cards' which are the hundreds of thousands of genetic sequences (e.g. GenBank), it would be nice to have a simple (by comparison) method of scanning the sequence data to find diversions from official values.
II. A Place To Begin
The basics to take note of are that the DNA sequences are composed of "ACTG" nucleotides for DNA contrasted with "ACUG" for RNA.
DNA:
(Adenine) 'A' = "C5H5N5" (5 Carbon, 5 Hydrogen, 5 Nitrogen)
(Cytocine) 'C' = "C4H5N3O1" (4 Carbon, 5 Hydrogen, 3 Nitrogen, 1 Oxygen)
(Thymine) 'T' = "C5H6N2O2" (5 Carbon, 6 Hydrogen, 2 Nitrogen, 2 Oxygen)
(Guanine) 'G' = "C5H5N5O1" (5 Carbon, 5 Hydrogen, 5 Nitrogen, 1 Oxygen)
DNA molecules contain 59 atoms
Carbon Atoms in DNA molecules
(Adenine5,Cytocine4,Thymine5,Guanine5) = 19
19÷59 = 0.322033898
a DNA molecule is 32.2033898 percent Carbon
Hydrogen Atoms in DNA molecules
(Adenine5,Cytocine5,Thymine6,Guanine5) = 21
21÷59 = 0.355932203
a DNA molecule is 35.5932203 percent Hydrogen
Nitrogen Atoms in DNA molecules
(Adenine5,Cytocine3,Thymine2,Guanine5) = 15
15÷59 = 0.254237288
a DNA molecule is 25.4237288 percent Nitrogen
Oxygen Atoms in DNA molecules
(Adenine0,Cytocine1,Thymine2,Guanine1) = 4
4÷59 = 0.06779661
a DNA molecule is 6.779661 percent Oxygen
Thus, the DNA Genetic Constant:
32.2033898 + 35.5932203 + 25.4237288 + 6.779661 = 99.9999999
RNA:
(Adenine) 'A' = "C5H5N5" (5 Carbon, 5 Hydrogen, 5 Nitrogen)
(Cytocine) 'C' = "C4H5N3O1" (4 Carbon, 5 Hydrogen, 3 Nitrogen, 1 Oxygen)
(Uracil) 'U' = "C4H4N2O2" (4 Carbon, 4 Hydrogen, 2 Nitrogen, 2 Oxygen)
(Guanine) 'G' = "C5H5N5O1" (5 Carbon, 5 Hydrogen, 5 Nitrogen, 1 Oxygen)
RNA molecules contain 56 atoms
Carbon Atoms in RNA molecules
(Adenine5,Cytocine4,Uracil4,Guanine5) = 18
18÷56 = 0.321428571
an RNA molecule is 32.1428571 percent Carbon
Hydrogen Atoms in RNA molecules
(Adenine5,Cytocine5,Uracil4,Guanine5) = 19
19÷56 = 0.339285714
an RNA molecule is 33.9285714 percent Hydrogen
Nitrogen Atoms in RNA molecules
(Adenine5,Cytocine3,Uracil2,Guanine5) = 15
15÷56 = 0.267857143
an RNA molecule is 26.7857143 percent Nitrogen
Oxygen Atoms in RNA molecules
(Adenine0,Cytocine1,Uracil2,Guanine1) = 4
4÷56 = 0.071428571
an RNA molecule is 7.1428571 percent Oxygen
Thus, the RNA Genetic Constant:
32.1428571 + 33.9285714 + 26.7857143 + 7.1428571 = 99.9999999
III. Using These Constants
There are many examples of the use of these constants in previous series, including both DNA (On The Origin Of A Genetic Constant, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13), and RNA (On The Origin Of Another Genetic Constant, 2, 3, 4, 5).
The basic process, using GenBank "FASTA" format sequences, is to:
1) load the entire sequence into your software analyzer
2) count the individual 'A', 'C', 'T', 'G' (for DNA; 'T'='U' for RNA) in that sequence
3) sum those individual atom counts
4) divide that sum by the total atoms count (section IV. below)
A general example is found in a previous post appendix where the results of a scan of Cuculus Canorus is detailed (Appendix Cuckoo Chromosomes).
Here is a section from that large sequence in FASTA format:
>NC_071419.1 Cuculus canorus isolate bCucCan1 chromosome 19
TAACCCTAACCCTAAACCCTAAGCCTAACCCTAACCCTACCCTAACCCTAACCCTAACCAAACCCATAAC
CTACCCTAACCCTAACCCTAACCCTAACCATAAACCTAACCCCTAACCCTAAACCCTAAACCCTAACCCG
AACCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCCTAACCCTAAAC
CCTAACCACTAAAACCCTAACCCTAACCCTAAACCCTAACCCTAACCCTAACCCTAACCTAACCCTACCA
CCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCTAACCCTACCCTACCCCTAA
CCCTAACCCTAACCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCTAACCCTAACCCTAA
CCCCTAACCCTAACCCTAAACCTAACCCTAACCCTAACCCTAACCCTAACCCTACCCTAACCCTAACCCT
AACCCTAACCCTAACCCTAACCCTAACCCTAACCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCC
TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAC
...
The process is to consider each 'A' 'C' 'G' and 'T' in DNA sequences or 'A' 'C' 'G' and 'U' in RNA sequences to chronicle the total atom count:
each "(Adenine) 'A' = "C5H5N5" (5 Carbon, 5 Hydrogen, 5 Nitrogen)";
each "(Cytocine) 'C' = "C4H5N3O1" (4 Carbon, 5 Hydrogen, 3 Nitrogen, 1 Oxygen);
each "(Guanine) 'G' = "C5H5N5O1" (5 Carbon, 5 Hydrogen, 5 Nitrogen, 1 Oxygen)"
each "(Thymine) 'T' = "C5H6N2O2" (5 Carbon, 6 Hydrogen, 2 Nitrogen, 2 Oxygen)";
Sometimes there are 'N' letters in the sequence:
(ResearchGate, cf. SEQanswers). The number of unknown nucleotides ('N') is usually small, but can be a large enough percentage to require more analysis.
IV. ... And Then
When all is said and done useful percentages can be derived by dividing the total atom count "188,585,905" into each atom type's count:
carbon count: 60,816,461 ÷ 188,585,905 = 35.25%
hydrogen count: 67,198,608 ÷ 188,585,905 = 35.63%
nitrogen count: 47,829,318 ÷ 188,585,905 = 25.36%
oxygen count: 12,741,518 ÷ 188,585,905 = 6.76%
As shown in the following "Table 1" from that appendix:
Atom | Atom Count | Percent |
Carbon | 60,816,461 | 32.25 |
Hydrogen | 67,198,608 | 35.63 |
Nitrogen | 47,829,318 | 25.36 |
Oxygen | 12,741,518 | 6.76 |
Totals | 188,585,905 | 100.00 |
Finally, after taking note of the 'N' count ("'N' Nucleotide Count: 300") and it's percentage, the degree to which the sequence percents deviate from the official percents can be considered and compared to the official values:
"the DNA Genetic Constant:
32.2033898 + 35.5932203 + 25.4237288 + 6.779661
So, we can calculate the deviation as:
carbon (32.25 - 32.2033898 = 0.0466102) 4.66102%;
hydrogen (35.63 - 35.5932203 = 0.0367797) 3.67797%;
nitrogen (25.36 - 25.4237288 = −0.0637288) -6.37288%;
oxygen (6.76 - 6.779661 = −0.019661) -1.9661%.
The appendices in the On The Origin Of A Genetic Constant and On The Origin Of Another Genetic Constant have thousands of such examples taken from scores of different flora, fauna, humans, viruses, and microbes.
V. Closing Comments
The quality required in genetic sequences will vary with the type of project being considered.
For example, DNA in a murder case in a criminal court would seem to require more accuracy in the sequences at issue than determining DNA content of ancient mummies would.
Anyway, give me a shout at https://dreddblog@gmail.com if need be.
(Thanks to Christie L. Mills for editing this post, and others, for grammar).
The next post in this series is here.