Sunday, November 19, 2023

On The Origin Of Another Genetic Constant - 2

Abridge

I. Good Falsification

This is an update concerning the hypothesis that there are constants in DNA and RNA genomes.

When researchers, scientists, or other persons set forth a hypothesis the hypothesis should be presented with another factor, which is how to falsify the hypothesis.

Sometimes the scientific falsification technique is more difficult to explain than the hypothesis itself and perhaps that is why the scientific falsification part is most often left out.

But that should not keep us from the sound practice, so let's take a look at it.

II. The Genetic Constant Potential Falsification Difficulty

In the print-outs made by the software detailed in this series (e.g. appendices) the falsification potential has been presented as "variation" from the constant presented in the format of percentage of variation (On The Origin Of A Genetic Constant, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10; On The Origin Of Another Genetic Constant).

The argument for falsification might be conceptualized by asking "what is so constant about something that varies Dredd" ?

Indeed, that is an avenue of potential falsification, but at this time it is not solid enough to be relied upon to settle the matter.

In fact, some well-known "constants" such as The Fine Structure Constant and The Hubble Constant have had known variations for some time, but are still referred to as "constants".

Those problems are different from the genetic constant variations, but our genetic science is not mature enough yet to give enough weight to tiny variations to consider them a falsification of the hypothesized ≈(32/35/25/6) DNA genetic constant or the hypothesized ≈(32/33/26/7) RNA genetic constant.

III. Variation in DNA and RNA
Collection, Storage, and Interpretation

In today's appendices I have added another detail which helps to point out more of the reasons for the variation from the constants.

That factor is the collection and storage of the DNA and RNA samples.

It is a factor of the degree of expertise in the researchers, scientists, and other persons who collect, store, and interpret the DNA and RNA samples which make their way into public databases-of-genomes such as GenBank.

The GenBank disclaimer is instructive in that regard: 

"Website Disclaimer

Liability: For documents and software available from this server, the U.S. Government does not warrant or assume any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed.

Endorsement: NCBI does not endorse or recommend any commercial products, processes, or services. The views and opinions of authors expressed on NCBI's Web sites do not necessarily state or reflect those of the U.S. Government, and they may not be used for advertising or product endorsement purposes."

(GenBank Website). That disclaimer does not mean that the data they collect and share is not worthy, or is not very useful.

To the contrary, it is a wonderful thing for the U.S. Government to do IMO.

IV. The Additions To The Data
In Dredd Blog Appendices

A new count and related feature has been added.

It is far more simple to use than some other explanations for the cause of variations in constants such as Detecting and estimating contamination of human DNA samples in sequencing and array-based genotype data.

This Dredd Blog consideration for the variations is a simple detection and count of the "false" or invalid nucleotide letters in the data of the sequences.

When scrutinizing the sequences the preferred method is to use the FASTA version of the sequences (click on the FASTA button at the upper left of the linked page).

Remember that the only valid letters that can be used to represent a nucleotide are 'A', 'C', 'G', 'T', and 'U'.

In the new Dredd Blog feature shown in today's appendices, the notation of how many false or invalid letters show up is stated (such as 'Y', 'M', 'R' etc.).

The 'N' letter placed into the nucleotide sequence by the researcher, scientist, or other person is put there to indicate an unknown.

It can be considered a reason for variation even though it is placed there on purpose (it means "not knowing", i.e. the nucleotide value in the collected sample of DNA or RNA is unknown). 

So, when any letters other than the valid nucleotide letters show up it can be considered a factor of variation.

But those letters standing alone are not conclusive factors of variation or falsification of the constant hypothesis.

V. On To The Appendices

As an example "right off the bat", notice that the genome BA000047.1 has a considerable consideration:

"Valid Nucleotide count: 12,263,273 bp
Invalid Nucleotide count: 1,100,000
Invalid character(s): N count(1,100,000)"

(DNA Appendix 1). Notice also that genome CP015622.1 in the same appendix has only two invalid characters:

"Valid Nucleotide count: 3,047,373 bp
Invalid Nucleotide count: 2
Invalid character(s): KM"

(DNA Appendix 1). Note also that most of the sample genomes don't have invalid characters, so an "n/a" is used to indicate that fact.

The new addition (a count and list of invalid letters) to the appendices of the two genetic constant series afford the reader with valuable information as to the variation considerations.

Listing and counting all of the non-compliant characters in the genome gives the reader one way to scientifically confirm or scientifically falsify the hypothesis.

VI. Closing Comments

The RNA Gen Const Appendix 1 and RNA Gen Const Appendix 2 have similar examples for Dredd Blog reader perusal so that you have a data basis for knowledge (The Pillars of Knowledge: Faith and Trust?).

Bias is not sufficient.

Let's continue to keep an eye on public data because it helps us keep an eye on scientific developments which impact us all (such as atom percents in our DNA):

"ACGT Atom Percents:

Adenine ('A'):
Carbon: 10.3926, Hydrogen: 10.3926, Nitrogen: 10.3926, Oxygen: 0.0000
Cytosine ('C'):
Carbon: 4.7400, Hydrogen: 5.9250, Nitrogen: 3.5550, Oxygen: 1.1850
Guanine ('G'):
Carbon: 6.4405, Hydrogen: 6.4405, Nitrogen: 6.4405, Oxygen: 1.2881
Thymine ('T'):
Carbon: 10.9358, Hydrogen: 13.1230, Nitrogen: 4.3743, Oxygen: 4.3743

Atom percent sums (see ACGT Atom Percents above):

Carbon: 32.5090% (10.3926% + 4.7400% + 6.4405% + 10.9358%)
Hydrogen: 35.8811% (10.3926% + 5.9250% + 6.4405% + 13.1230%)
Nitrogen: 24.7625% (10.3926% + 3.5550% + 6.4405% + 4.3743%)
Oxygen: 6.8474% (0.0000% + 1.1850% + 1.2881% + 4.3743%)

Total percent: 100%

Atom variation from ~(32/35/25/6) constant (see Atom percent sums):

Carbon: 0.3056% (> constant: 32.2034%)
Hydrogen: 0.2879% (> constant: 35.5932%)
Nitrogen: -0.6612% (< constant: 25.4237%)
Oxygen: 0.0677% (> constant: 6.7797%)"

(DNA Appendix 1 at NZ_JABGIE010000657.1). Atoms are not talked about enough, which abridges genetic comprehension so as to cover up the way to find out where the bridge begins.

The next post in this series is here, the previous post in this series is here.



A polymath visits Harvard...


1 comment: