Saturday, October 28, 2023

On The Origin Of A Genetic Constant - 6

DNA    ...    RNA

I. Preface

In the previous post of this series we took a look at the ~(32/35/25/6) genetic constant concerning DNA.

Today we ask and answer the same question concerning the ~(32/33/26/7) genetic constant in RNA.

II. Where Did The ~(32/33/26/7) Originate?

That is a fair question about RNA just as it was with DNA.

So, this post will answer the question again, just as the previous post did with DNA,  and will provide the exact values concerning RNA and where the RNA genetic constant "~(32/33/26/7)" originates deep in the atomic structure of many, if not all, genomes.

Let's start with the nucleotide and its atoms, and their quantities that are in BOTH DNA and RNA (just like we did with DNA):

Nucleotides(ACG), atoms, and counts in both DNA and RNA:

A: (Adenine) 15 atoms (5 carbon, 5 hydrogen, 5 nitrogen, 0 oxygen)

C: (Cytocine) 13 atoms (4 carbon, 5 hydrogen, 3 nitrogen, 1 oxygen)

G: (Guanine) 16 atoms (5 carbon, 5 hydrogen, 5 nitrogen, 1 oxygen)

44 total atoms (14 carbon, 15 hydrogen, 13 nitrogen, 2 oxygen)

Percentages:
(carbon 31.8181, hydrogen 34.0909, nitrogen 29.5455, oxygen 4.5455)


Now let's add the missing ingredient ("U") needed to make the nucleotide group complete for RNA:

Additional nucleotide(U), atoms, and count (only in RNA):
U: (Uracil) 12 atoms, (4 carbon, 4 hydrogen, 2 nitrogen, 2 oxygen)
56 total atoms (18 carbon, 19 hydrogen, 15 nitrogen, 4 oxygen)

Percentages:
(carbon 32.1429, hydrogen 33.9286, nitrogen 26.7857, oxygen 7.1429)

That is the source for the RNA (ACGU) ~(32/33/26/7) genetic constant.

It is 18, 19, 15, and 4 divided by 56 (x 100.0) which determines those percentages of those atoms in the genomes of RNA.

Here is an early test on a several hundred RNA genomes (more to come when I build-up my SQL RNA database):

GenBank FASTA Files Genome Analysis Report (RNA)

after processing 100 genomes:
variation count @ <1.0% = 400
variation count @ <2.0% = 0
variation count @ <3.0% = 0
variation count @ <4.0% = 0
variation count @ >=4.0% = 0
after processing 200 genomes::
variation count @ <1.0% = 800
variation count @ <2.0% = 0
variation count @ <3.0% = 0
variation count @ <4.0% = 0
variation count @ >=4.0% = 0
after processing 300 genomes::
variation count @ <1.0% = 1,200
variation count @ <2.0% = 0
variation count @ <3.0% = 0
variation count @ <4.0% = 0
variation count @ >=4.0% = 0
after processing 400 genomes::
variation count @ <1.0% = 1,600
variation count @ <2.0% = 0
variation count @ <3.0% = 0
variation count @ <4.0% = 0
variation count @ >=4.0% = 0
Total processed 405 genomes:
variation count @ <1.0% = 1,620 (100.0000%)
variation count @ <2.0% = 0 (0.0000%)
variation count @ <3.0% = 0 (0.0000%)
variation count @ <4.0% = 0 (0.0000%)
variation count @ >=4.0% = 0 (0.0000%)

III. Update

I added 43,279 SARS-CoV-2 RNA virus genomes to the SQL table (total RNA genomes now is 43,684).

Here are the new results:

GenBank Flat Files Genome Analysis Report

after processing 5,000 genomes:
variation count @ <1.0% = 20,000
variation count @ <2.0% = 0
variation count @ <3.0% = 0
variation count @ <4.0% = 0
variation count @ >=4.0% = 0
after processing 10,000 genomes:
variation count @ <1.0% = 40,000
variation count @ <2.0% = 0
variation count @ <3.0% = 0
variation count @ <4.0% = 0
variation count @ >=4.0% = 0
after processing 15,000 genomes:
variation count @ <1.0% = 60,000
variation count @ <2.0% = 0
variation count @ <3.0% = 0
variation count @ <4.0% = 0
variation count @ >=4.0% = 0
after processing 20,000 genomes:
variation count @ <1.0% = 80,000
variation count @ <2.0% = 0
variation count @ <3.0% = 0
variation count @ <4.0% = 0
variation count @ >=4.0% = 0
after processing 25,000 genomes:
variation count @ <1.0% = 100,000
variation count @ <2.0% = 0
variation count @ <3.0% = 0
variation count @ <4.0% = 0
variation count @ >=4.0% = 0
after processing 30,000 genomes:
variation count @ <1.0% = 120,000
variation count @ <2.0% = 0
variation count @ <3.0% = 0
variation count @ <4.0% = 0
variation count @ >=4.0% = 0
after processing 35,000 genomes:
variation count @ <1.0% = 140,000
variation count @ <2.0% = 0
variation count @ <3.0% = 0
variation count @ <4.0% = 0
variation count @ >=4.0% = 0
after processing 40,000 genomes:
variation count @ <1.0% = 160,000
variation count @ <2.0% = 0
variation count @ <3.0% = 0
variation count @ <4.0% = 0
variation count @ >=4.0% = 0
Total processed 43,684 genomes:
variation count @ <1.0% = 174,736 (100.0000%)
variation count @ <2.0% = 0 (0.0000%)
variation count @ <3.0% = 0 (0.0000%)
variation count @ <4.0% = 0 (0.0000%)
variation count @ >=4.0% = 0 (0.0000%)

IV. Closing Comments

This  may end up being a useful tool for determining the degree of validity of the collecting and processing of RNA samples ("how close does a genome match the constant?" would be the question to ask).

The next post in this series is here, the previous post in this series is here.

Friday, October 27, 2023

On The Origin Of A Genetic Constant - 5

Calculate this!

I. Where Did The ~(32/35/25/6) Originate?

That is a fair question, so this post will answer the question and will provide the updated exact values and where "~(32/35/25/6)" originated.

Let's start with the nucleotide, atom, and their quantities that are in BOTH DNA and RNA:

Nucleotides(ACG), atoms, and counts in both DNA and RNA:

A: (Adenine) 15 atoms (5 carbon, 5 hydrogen, 5 nitrogen, 0 oxygen)

C: (Cytocine) 13 atoms (4 carbon, 5 hydrogen, 3 nitrogen, 1 oxygen)

G: (Guanine) 16 atoms (5 carbon, 5 hydrogen, 5 nitrogen, 1 oxygen)

44 total atoms (14 carbon, 15 hydrogen, 13 nitrogen, 2 oxygen)

Percentages:
(carbon 31.8181, hydrogen 34.0909, nitrogen 29.5455, oxygen 4.5455)


Now let's add the missing ingredient ("T") needed to make the nucleotide group complete for DNA:

Additional nucleotide(T), atoms, and count (only in DNA):

T: (Thymine) 15 atoms, (5 carbon, 6 hydrogen, 2 nitrogen, 2 oxygen)
59 total atoms (19 carbon, 21 hydrogen, 15 nitrogen, 4 oxygen)

Percentages:
(carbon 32.2034, hydrogen 35.5932, nitrogen 25.4237, oxygen 6.7797)

That is the source for the DNA (ACGT) ~(32/35/25/6) genetic constant.

It is 19, 21, 15, and 4 divided by 59 (x 100.0) which determines those percentages of those atoms in the genomes of DNA.


II. Calculation Partially Changed

In the previous posts in this series and others I calculated the variations from the ~(32,35,25,6) constant by subtracting ONLY the 32,35,25, and 6 values without the decimal values (.2034, .5932, .4237, and .7797) in them.

I changed the software so it would calculate the variations using the full natural values (i.e. including the decimals; ergo 32.2034, hydrogen 35.5932, nitrogen 25.4237, oxygen 6.7797).

As a result, the variations obviously changed, but not to the detriment of the hypothesis.

The new calculation has a more accurate foundation which actually further supports the hypothesis about genomes.

Here are the results of the software now that it has been improved:

GenBank Flat Files Genome Analysis Report:

after processing 100,000 genomes:
variation count @ <1.0% = 394,219
variation count @ <2.0% = 5,337
variation count @ <3.0% = 345
variation count @ <4.0% = 96
variation count @ >=4.0% = 1
after processing 200,000 genomes:
variation count @ <1.0% = 779,819
variation count @ <2.0% = 16,466
variation count @ <3.0% = 3,508
variation count @ <4.0% = 189
variation count @ >=4.0% = 9
after processing 300,000 genomes:
variation count @ <1.0% = 1,170,895
variation count @ <2.0% = 24,378
variation count @ <3.0% = 4,461
variation count @ <4.0% = 239
variation count @ >=4.0% = 13
after processing 400,000 genomes:
variation count @ <1.0% = 1,563,636
variation count @ <2.0% = 31,006
variation count @ <3.0% = 4,842
variation count @ <4.0% = 484
variation count @ >=4.0% = 15
after processing 500,000 genomes:
variation count @ <1.0% = 1,961,111
variation count @ <2.0% = 33,418
variation count @ <3.0% = 4,937
variation count @ <4.0% = 499
variation count @ >=4.0% = 16
after processing 600,000 genomes:
variation count @ <1.0% = 2,358,280
variation count @ <2.0% = 36,205
variation count @ <3.0% = 4,978
variation count @ <4.0% = 500
variation count @ >=4.0% = 18
after processing 700,000 genomes:
variation count @ <1.0% = 2,739,424
variation count @ <2.0% = 53,245
variation count @ <3.0% = 6,559
variation count @ <4.0% = 697
variation count @ >=4.0% = 53
after processing 800,000 genomes:
variation count @ <1.0% = 3,136,523
variation count @ <2.0% = 56,056
variation count @ <3.0% = 6,639
variation count @ <4.0% = 704
variation count @ >=4.0% = 55
after processing 900,000 genomes:
variation count @ <1.0% = 3,531,585
variation count @ <2.0% = 60,814
variation count @ <3.0% = 6,814
variation count @ <4.0% = 704
variation count @ >=4.0% = 56
after processing 1,000,000 genomes:
variation count @ <1.0% = 3,930,305
variation count @ <2.0% = 62,013
variation count @ <3.0% = 6,872
variation count @ <4.0% = 722
variation count @ >=4.0% = 60
Total processed 1,052,789 genomes!
variation count @ <1.0% = 4,139,865 (98.3078%)
variation count @ <2.0% = 63,494 (1.5078%)
variation count @ <3.0% = 6,960 (0.1653%)
variation count @ <4.0% = 743 (0.0176%)
variation count @ >=4.0% = 62 (0.0015%)

III. Closing Comment

As you can see, the less-than one percent value increased from "92.6949%" to 98.3078%, so the hypothesis is looking good:

(combined 98.3078% + 1.5078% = 99.8156%).

But rather than bloviate about how that DNA genetic constant got there, let's just say "we don't know" as the professor suggests in the video below. 

UPDATE: The RNA constant is brought up in a new series (On The Origin Of Another Genetic Constant).

The next post in this series is here, the previous post in this series is here.



Wednesday, October 25, 2023

On The Origin Of A Genetic Constant - 4

The DNA of dna

I. Preface

In the previous post of this series, the Dredd Blog software reported 37 instances where the variation from the ~(32/35/25/6) genetic constant was 4% or higher (On The Origin Of A Genetic Constant - 3).

The following list details the "uid", "Link" to the GenBank "version", and "atom id" shown in that software report.

The "uid" is the unique id of the SQL table row (in my SQL database), the URL link is to the GenBank "version" of the genome, and the "atom id" is a 'C' for carbon, an 'H' for hydrogen, an 'N' for nitrogen, and an 'O' for oxygen.

As you can see, the nitrogen atom percentage was the predominate location of greater than 4% in the percent of variation numbers.

II. Analysis

Nitrogen percents appeared in every one of the 37 listed genomes with the highest variation.

It was in the only group of atoms at uid "786399" where hydrogen and oxygen were also high along with the nitrogen:

{786399,CP045289.2:H}
{786399,CP045289.2:N}
{786399,CP045289.2:O}

Anyway, here is the complete list of genomes with 4% or more variation from the norm which were identified in the report:

{92511,MK361035.1:N}
{105813,MZ636522.1:N}
{317731,MQ014117.1:N}
{374519,AP015624.1:N}
{374632,AP015737.1:N}
{374650,AP015755.1:N}
{374739,AP015844.1:N}
{374820,AP015925.1:N}
{375180,AP016285.1:N}
{375269,AP016374.1:N}
{375285,AP016390.1:N}
{375387,AP016492.1:N}
{375492,AP016597.1:N}
{375513,AP016618.1:N}
{375534,AP016639.1:N}
{516503,MF597730.1:N}
{516505,MF597734.1:N}
{537232,FO082333.3:N}
{616035,JF760210.1:N}
{632280,HM640930.1:N}
{633162,LC533411.1:N}
{633167,LC534895.1:N}
{633168,LC535032.1:N}
{633169,LC535118.1:N}
{645952,MG655622.1:N}
{645953,MG655623.1:N}
{645954,MG655624.1:N}
{647440,KX265049.1:N}
{786399,CP045289.2:H}
{786399,CP045289.2:N}
{786399,CP045289.2:O}
{786399,CP045289.2:H}
{786399,CP045289.2:N}
{786399,CP045289.2:O}
{819779,CP095532.1:N}
{918032,AC027353.4:N}
{984989,LN898113.1:N}
{984990,LN898114.1:N}
{994274,LN006378.1:N}
{1043240,OE848969.1:N}

Those are the 37 isolated outliers in the previous post that registered variations of 4% or higher variations in the average percents of atoms in the genome.

Variation can also be the result of collection and handling problems ("variations are to be expected because collecting DNA sequences is not without processing errors (e.g. Why a DNA Sample May Fail, cf. A practical guide to mitochondrial DNA error prevention in clinical, forensic, and population genetics)").

III. Closing Comments

Here are my comments about these GenBank database members after perusing them to see what makes them outliers for the ~(32/35/25/6) genetic constant:

AC027353, AP015737, AP015755, AP015844, AP015925, AP016285, AP016374, AP016390, AP016492, AP016597, AP016618, LN006378 and AP016639
notes: These are cut-out segments rather that a  natural genome

CP095532
notes: This plasmid has unusual strands of repeat nucleotides "tttttttt" (183), "ttttttttg" (121), "ttttttttgt" (113), "ttttttttgtg" (38) etc. Not sufficiently natural.

FO082333
notes: "Large depth read coverage across a clone" ... not natural enough.

HM640930, JF760210, KX265049, LC533411, LC534895, LC535032, LC535118, LN898113, LN898114, MG655622, MG655623, MG655624, MK361035, MZ636522
notes: Mixing of RNA with DNA without designation of uracil "u" conflates the DNA with RNA. The ~(32/35/25/6) genetic constant relates to DNA.

MF597730, MF597734
notes: "This sequence was generated to improve a reference assembly gap in the mouse reference genome sequence." Not natural, just repair jobs.

MQ014117
notes: "synthetic construct" ... not a natural sequence.

OE848969
notes: too many repeat sequences and odd patterns to be natural.

Only 37 problematic rows out of over a million. NOT BAD!

The next post in this series is here, the previous post in this series is here.

Tuesday, October 24, 2023

On The Origin Of A Genetic Constant - 3

DNA Atoms

In this Dredd Blog series a lot of appendices contain HTML tables showing examples of the ~(32/35/25/6) genetic constant concerning the percentages of Carbon, Hydrogen, Nitrogen, and Oxygen atoms in DNA (On The Origin Of A Genetic Constant, 2).

I suspect some readers might ask "but is that a large enough sample to substantiate the ~(32/35/25/6) genetic constant hypothesis?"

Good question.

So, today I present a report that shows the results of analyzing 1,052,789 GenBank genomes instead of a measly 30,000 or so in some of the previous posts of this series.

Here is the report:

GenBank Flat Files Genome Analysis Report

...after processing 100,000 genomes:
variation count @ <1.0% = 368,476
variation count @ <2.0% = 30,784
variation count @ <3.0% = 714
variation count @ <4.0% = 9
variation count @ >=4.0% = 1

...after processing 200,000 genomes:
variation count @ <1.0% = 733,172
variation count @ <2.0% = 62,030
variation count @ <3.0% = 4,717
variation count @ <4.0% = 54
variation count @ >=4.0% = 2

...after processing 300,000 genomes:
variation count @ <1.0% = 1,107,232
variation count @ <2.0% = 86,680
variation count @ <3.0% = 5,956
variation count @ <4.0% = 94
variation count @ >=4.0% = 2

...after processing 400,000 genomes:
variation count @ <1.0% = 1,484,226
variation count @ <2.0% = 108,655
variation count @ <3.0% = 6,754
variation count @ <4.0% = 304
variation count @ >=4.0% = 15

...after processing 500,000 genomes:
variation count @ <1.0% = 1,853,186
variation count @ <2.0% = 139,569
variation count @ <3.0% = 6,862
variation count @ <4.0% = 315
variation count @ >=4.0% = 15

...after processing 600,000 genomes:
variation count @ <1.0% = 2,195,693
variation count @ <2.0% = 197,025
variation count @ <3.0% = 6,888
variation count @ <4.0% = 315
variation count @ >=4.0% = 18

...after processing 700,000 genomes:
variation count @ <1.0% = 2,541,536
variation count @ <2.0% = 248,706
variation count @ <3.0% = 9,229
variation count @ <4.0% = 426
variation count @ >=4.0% = 28

...after processing 800,000 genomes:
variation count @ <1.0% = 2,924,960
variation count @ <2.0% = 265,213
variation count @ <3.0% = 9,272
variation count @ <4.0% = 429
variation count @ >=4.0% = 31

...after processing 900,000 genomes:
variation count @ <1.0% = 3,304,749
variation count @ <2.0% = 285,295
variation count @ <3.0% = 9,387
variation count @ <4.0% = 430
variation count @ >=4.0% = 32

...after processing 1,000,000 genomes:
variation count @ <1.0% = 3,699,202
variation count @ <2.0% = 290,735
variation count @ <3.0% = 9,477
variation count @ <4.0% = 438
variation count @ >=4.0% = 36

Total processed: 1,052,789 genomes!
variation count @ <1.0% = 3,903,413 (92.6949%)
variation count @ <2.0% = 297,538 (7.0657%)
variation count @ <3.0% = 9,591 (0.2278%)
variation count @ <4.0% = 454 (0.0108%)
variation count @ >=4.0% = 37 (0.0009%)

As you can see, the software proceeded through the "DNA_html_tables" on my SQL Server while analyzing the percentages of Carbon, Hydrogen, Nitrogen, and Oxygen atoms in DNA genomes that had been downloaded from the Genbank FTP site.

The vast majority of variations from the ~(32/35/25/6) genetic constant in the report are under 1% variation  ("92.6949%"), second place is under 2% variation ("7.0657%") which totals to 99.7606 (1,052,789 x 99.7606) = 1,050,268 genomes.

Thus, in 1,052,789 genomes there is very little variation in DNA percentages in terms of the ~(32/35/25/6) genetic constant.

I consider these results to provide added support to the ~(32/35/25/6) genetic constant hypothesis.

The next post in this series is here, previous post in this series is here.