Some time back I asked the keepers of government databases of genetic material to stop using "T" and "t" in RNA nucleotide data because RNA does not contain ("T").
"T" means thymine.
RNA contains uracil "U" and "u" instead of "T" thymine.
A well known source indicates that SARS-CoV-2 is not a DNA virus, instead it is an RNA virus: "Coronaviruses contain a non-segmented, positive-sense RNA genome of ~30 kb" (Coronaviruses, NIH, emphasis added; cf. here).
I have been downloading a lot of data from GenBank to further detail the ~(32/35/25/6) constant pattern in genomes:
"I have finished downloading the almost 8,000 ".gz" compressed files stored at the Genbank FTP site (~1 terabyte of data)."
(On The Origin Of A Genetic Constant - 2). I discovered that in those files I downloaded, there are 234,350 rows of SARS-CoV-2 data which say the viruses is composed of "DNA":
mysql> select count(*) from DNA_html_tables where source like '%SARS%'; + ---------- + | count(*) | + ---------- + | 234350 | + ---------- +
(SQL search results on my SQL server). An example GenBank link to six of them is:
Covid 1Covid 2Covid 3Covid 4Covid 5Covid 6 (here is a better representation of an RNA genome Covid 7, but not a perfect one since it uses "t" instead of "u"; the FASTA version uses "T" instead of "U")
Notice that the "LOCUS" line (first line of a GBFF genome) says "DNA" in each and every one of them (they are actually "RNA" viruses, not "DNA" viruses).
Furthermore, notice that every one of those genomes has "t" instead of "u" in the nucleotide "ORIGIN" area.
Instead, they chroniclewhat they do and don't know in certain instances:
"I just want to begin this by thinking about a bridge. In this particular case, it's an obvious bridge. AND IF YOU THINK ABOUT EVOLUTION, YOU KNOW WHERE WE'VE COME TO, BUT YOU DON'T KNOW WHERE WE BEGAN. So origins of life is one of the most challenging problems facing science. Actually, as my friend and colleague Nick Lane says, it's the black hole of science. It's an embarrassment. And it's a very complicated problem. Obviously, in this university, Jack Szostak and others work on this from one angle. I'm trying to understand it from another angle. But we know a certain amount about how we have a certain number of ingredients. So this is a very, very famous plot. And I'm not going to bother you with all the details, but I just want to give you a little bit of an idea. Victor Goldschmidt, who created this plot originally, was a Swiss Jew born in Zurich.
Carbon Origins?
And his father took a position at the University in Oslo. So when he was a young man, he actually graduated university I believe at the age of 21 with a PhD. He became a professor himself, and he created a field which is geochemistry. He created it. And he was obsessed with understanding the distribution of elements in the universe. So he did several things. He analyzed meteorites, he looked at spectra of stars, and so on. And WE COME UP WITH THIS BASIC PLOT, which is THE TWO FIRST ELEMENTS ARE HYDROGEN AND HELIUM, which are created ABOUT 13.8, 13.9 BILLION YEARS AGO IN THE BIG BANG. And everything else on this plot the astronomers called metals, but EVERYTHING ELSE [atomic] IS CREATED IN A SUPERNOVA ... A very hot short life star. ["NOT OUR STAR" ... not the Sun]."
(Dr. Falkowski, partial transcript of video below, emphasis added). That excerpt is an excellent place for a beginning.
So, let's move on from "hydrogen" (which is one of the "two first elements ... everything else is created in a supernova, not our star") to carbon, nitrogen, and oxygen.
The first element group, represented by hydrogen, is found in DNA (~35%), so was it in the supernova that is said to have created the other three elements in DNA (carbon ~32%, nitrogen ~25%, and oxygen ~6%).
Dr. Falkowski's statement "but you don't know where we began" seems relevant because we don't know how the hydrogen could have gotten into a supernova, and then have gotten here from that supernova.
But more than that, how they all four then ended up in the ~32/35/25/6 percentages and quantities to form abiotic DNA nucleotides.
Especially when one considers the time it takes for particles to travel from point A to point B when induced by gravitational force:
" Formulas, Constants, & Variables Used
Formula: g = G * (H atoms mass / r2) (gravitational acceleration)
Constants: secs_day = (60.0 * 60.0) * 24.0 secs_year = secs_day * 365.25 G = 6.67e-11 (gravitational constant) Variables: r = distance (between center) of atoms (m)
Calculations of gravitational actions between Atom Group A and Atom group B in terms of being pulled into contact by gravity
Group A
Group B
Dist (m)
mass per group (kg)
atoms per group
years to contact
atoms-1A
atoms-1B
1e-09
1.67e-21
1e+06
71,124
atoms-2A
atoms-2B
2e-09
3.34e-21
2e+06
142,245
atoms-3A
atoms-3B
3e-09
5.01e-21
3e+06
213,365
atoms-4A
atoms-4B
4e-09
6.68e-21
4e+06
284,486
atoms-5A
atoms-5B
5e-09
8.35e-21
5e+06
355,606
atoms-6A
atoms-6B
6e-09
1.002e-20
6e+06
426,726
atoms-7A
atoms-7B
7e-09
1.169e-20
7e+06
497,847
atoms-8A
atoms-8B
8e-09
1.336e-20
8e+06
568,967
Adding a bunch of them together to make dust doesn't really solve the Mr. Time problem:
Calculations of various gravitational interactions (Cloud A to Cloud B)
Cloud A
Cloud B
Dist (m)
mass (kg)
years to contact
DC-1A
DC-1B
1
1
121
DC-2A
DC-2B
10
10
1,191
DC-3A
DC-3B
100
100
11,881
DC-4A
DC-4B
1000
1000
118,775
DC-5A
DC-5B
10,000
10,000
1,187,714
DC-6A
DC-6B
100,000
100,000
11,877,100
DC-7A
DC-7B
1,000,000
1,000,000
118,770,950
DC-8A
DC-8B
10,000,000
10,000,000
1,187,709,445
When accelerationof gravity
is considered when contemplating the concept of the protons, electrons,
neutrons becoming builders of atoms, and then atoms becoming builders
of molecules, and non-living molecules then becoming the builders of
life, a particular episode in that 'movie' has to take place in which a
decision must be made."
I have finished downloading the almost 8,000 ".gz" compressed files stored at the Genbank FTP site (~1 terabyte of data).
And I completed the filtering-out of partial genome segments, choosing to use only relatively complete genome sequences (>10,000 bp).
The result is that I am using millions of genomes (now in SQL database tables, which are the end result of that download effort) to more clearly expose "a genetic constant" ~(32/35/25/6).
III. Simplicity
The exercise involved in exposing this "Genetic Constant" is very, very simple at its core:
"The number of atoms in a given nucleotide ('A', 'C', 'G', 'T', and 'U') are:
(Quantum Biology - 18). As you can see the analysis of the atom content of any given genome in the appendix (Link) is a function of simple arithmetic (it is not "rocket science").
The percentages are derived by dividing the count of each atom type by the total atom count."
(On The Origin of The Containment Entity - 16). How that constant is calculated depends on whether the genetic sequence is DNA or RNA (a DNA 'T' count is not the same as an RNA 'U' count; note that this current exercise only involves DNA).
For example a secretary or student could load a genome from the GenBank into a word processor; then proceed to add up the atoms in each 'A', 'C', 'G' and 'T' in the text of that genome until the last nucleotide has been processed; at which time each of those four individual totals would be divided by the total number of atoms in the genome so as to derive the constants, i.e. the percentages.
That is the basic numerical origin of the "~(32/35/25/6)" genetic constant.
In other words the problem is not simplicity, no, it is volume because some genomes are composed of millions of nucleotides, so it would take an inordinate amount of time to calculate them 'by hand'.
Thus, the Dredd Blog software that does them "at light speed" was formed.
IV. Appendices Containing GenBank Genomes
I generated appendices for this series using some taxonomy data from Introduction to Binomial Nomenclature, which I may have to expand by using the more extensive NCBI taxonomy data.
Anyway, the appendices in today's post contain 272 actual GenBank genomes for everything from Arabian Camels to humans to plants (Appendix 1, 2, 3, 4, 5, 6, 7, 8).
V. Closing Comments
I was amazed once again to see the ~(32/35/25/6) constant pattern in the genomes in the appendices.
A snippet from Appendix 8 sums it up:
Grand Totals For Tables 1-272: (Total atom count: 79,158,605,941)
Atom Type
Atom Type Count
Low
High
Variation
Average
Carbon
25,707,295,089
32.3541
32.5186
0.1645%
32.4757%
Hydrogen
28,380,643,650
35.7008
35.9416
0.2408%
35.8529%
Nitrogen
19,727,729,048
24.4739
25.2710
0.7971%
24.9218%
Oxygen
5,342,938,154
6.6279
7.1642
0.5363%
6.7497%
The next post in this series is here, the previous post in this series is here.