Friday, October 20, 2023

Good Nomenclature: A Matter of Life and Death - 6

"Parts is Parts"

Some time back I asked the keepers of government databases of genetic material to stop using "T" and "t" in RNA nucleotide data because RNA does not contain ("T").

"T" means thymine.

RNA contains uracil "U" and "u" instead of "T" thymine.

A well known source indicates that SARS-CoV-2 is not a DNA virus, instead it is an RNA virus: "Coronaviruses contain a non-segmented, positive-sense RNA genome of ~30 kb" (Coronaviruses, NIH, emphasis added; cf. here).

I mentioned this in a previous post to no avail (It's In The GenBank - 4).

I have been downloading a lot of data from GenBank to further detail the ~(32/35/25/6) constant pattern in genomes:

"I have finished downloading the almost 8,000 ".gz" compressed files stored at the Genbank FTP site (~1 terabyte of data)."

(On The Origin Of A Genetic Constant - 2). I discovered that in those files I downloaded, there are 234,350 rows of SARS-CoV-2 data which say the viruses is composed of "DNA":

mysql> select count(*) from DNA_html_tables where source like '%SARS%';
+ ----------  +
| count(*) |
+ ----------  +
|   234350 |
+ ----------  +

(SQL search results on my SQL server). An example GenBank link to six of them is:

Covid 1 Covid 2 Covid 3 Covid 4 Covid 5 Covid 6 (here is a better representation of an RNA genome Covid 7, but not a perfect one since it uses "t" instead of "u"; the FASTA version uses "T" instead of "U")

Notice that the "LOCUS" line (first line of a GBFF genome) says "DNA" in each and every one of them (they are actually "RNA" viruses, not "DNA" viruses).

Furthermore, notice that every one of those genomes has "t" instead of "u" in the nucleotide "ORIGIN" area.

IMO this is not good nomenclature, it is a titanic mistake (Good Nomenclature: A Matter of Life and Death).

The next post in this series is here, the previous post in this series is here.



No comments:

Post a Comment