Monday, June 30, 2025

Abiology Or Quantum Biology? - 3

Where's It At?
In the previous two posts of this series the atom count constants in DNA and RNA were analyzed.

That is because DNA and RNA each have a well defined and unique chemical  fingerprint that can be used to determine the accuracy of a nucleotide sequence composed of "ACTGU" sequences:

'A' = "C5H5N5" (5 Carbon, 5 Hydrogen, 5 Nitrogen)

'C' = "C4H5N3O" (4 Carbon, 5 Hydrogen, 3 Nitrogen, 1 Oxygen)

'T' = "C5H6N2O2" (5 Carbon, 6 Hydrogen, 2 Nitrogen, 2 Oxygen)

'G' = "C5H5N5O" (5 Carbon, 5 Hydrogen, 5 Nitrogen, 1 Oxygen)

'U' = "C4H4N2O2" (4 Carbon, 4 Hydrogen, 2 Nitrogen, 2 Oxygen)

It is well known that current technology for collecting those genetic sequences is not yet perfect, but we can determine how accurate any sequence is by how well its fingerprint matches the real fingerprint.

The method of counting the atoms helps determine the accuracy or lack thereof of the sequences collected and placed into public databases such as GenBank.

Today's appendix (Appendix 1) focuses on that issue by containing only sequences with 1 or more 'N' positions in the featured sequences therein.

So, first let's consider the 'N' positions in human chromosome 1, which has 18,475,408 of them in it.

Check out this official record  if you don't believe it:  GenBank FASTA.

The tables in today's appendix are structured as follows:

Link: NC_000001.11
Organism: chromosome 1

Nucleotide Count: 248,956,422

'A' count: 67,070,277
'C' count: 48,055,043
'G' count: 48,111,528
'T' count: 67,244,164
'N' count: 18,475,408
'N' uncertainty: 0.07421

Atom
Type
Atom
Count
Atom
Percent
DNA
Const
Variation
From Const
Atom Count
Variation
Error
Percent
Plus 'N'
uncert.
carbon 1,104,350,017 32.3930726 32.2033898 0.1896828 209,476,253 18.97 20.38%
hydrogen 1,219,649,224 35.7750580 35.5932203 0.1818377 221,778,214 18.18 19.53%
nitrogen 854,562,482 25.0662418 25.4237288 0.3574870 305,494,976 35.75 38.40%
oxygen 230,654,899 6.7656275 6.7796610 0.0140335 3,236,884 1.40 1.51%
Totals 3,409,216,622 100 100 0.7430410 739,986,327 21.71% 23.32%

Here is a description of those table parts:

Link: link to GenBank data URL
Organism: microbiology nomenclature of current genome
Nucleotide count: nucleotides (A,C,G,T,U) in GenBank sequence

'A' count: Adenine
'C' count: Cytocine
'G' count: Guanine
'T' count: Thymine
'U' count: Uracil
'N' count: unknown nucleotide type
Plus 'N' uncert.: 'N' count relative percent

Atom Type (atoms that make up DNA/RNA nucleotides)
Atom Count quantity of atom type in the subject genome
Atom Percent % of the atom type in the subject genome
DNA/RNA const (DNA or RNA constant of the nucleoties)
Variation From Const (atom % variation from the constant)
Atom Count Variation number of atoms varying from const
Error Percent variation percent
Plus 'N' uncert. error % after 'N' uncertainty is considered

The focus of the operation  to analyze those parts is to note how many atoms of each type should be in the sequence compared to how many there actually are.

Even though this will determine if it is scientifically accurate or not, the judgment as to the adequacy of the sequence for any particular purpose, is determined by the purpose for which this sequence at this time is being used.

Different strokes for different folks?

The previous post in this series is here.



1 comment:

  1. AI Overview
    Grammatical Rules for DNA Sequence Representation

    "In a genetic sequence, the letter "N" signifies that the nucleotide base at that specific position could not be identified during DNA sequencing. It represents an unknown or ambiguous base, meaning any of the four DNA bases (adenine, guanine, cytosine, or thymine) could potentially occupy that spot. This ambiguity often arises from low-quality sequence data or technical limitations during the sequencing process, such as hairpin loops or overlapping traces."

    ReplyDelete