Wednesday, October 25, 2023

On The Origin Of A Genetic Constant - 4

The DNA of dna

I. Preface

In the previous post of this series, the Dredd Blog software reported 37 instances where the variation from the ~(32/35/25/6) genetic constant was 4% or higher (On The Origin Of A Genetic Constant - 3).

The following list details the "uid", "Link" to the GenBank "version", and "atom id" shown in that software report.

The "uid" is the unique id of the SQL table row (in my SQL database), the URL link is to the GenBank "version" of the genome, and the "atom id" is a 'C' for carbon, an 'H' for hydrogen, an 'N' for nitrogen, and an 'O' for oxygen.

As you can see, the nitrogen atom percentage was the predominate location of greater than 4% in the percent of variation numbers.

II. Analysis

Nitrogen percents appeared in every one of the 37 listed genomes with the highest variation.

It was in the only group of atoms at uid "786399" where hydrogen and oxygen were also high along with the nitrogen:

{786399,CP045289.2:H}
{786399,CP045289.2:N}
{786399,CP045289.2:O}

Anyway, here is the complete list of genomes with 4% or more variation from the norm which were identified in the report:

{92511,MK361035.1:N}
{105813,MZ636522.1:N}
{317731,MQ014117.1:N}
{374519,AP015624.1:N}
{374632,AP015737.1:N}
{374650,AP015755.1:N}
{374739,AP015844.1:N}
{374820,AP015925.1:N}
{375180,AP016285.1:N}
{375269,AP016374.1:N}
{375285,AP016390.1:N}
{375387,AP016492.1:N}
{375492,AP016597.1:N}
{375513,AP016618.1:N}
{375534,AP016639.1:N}
{516503,MF597730.1:N}
{516505,MF597734.1:N}
{537232,FO082333.3:N}
{616035,JF760210.1:N}
{632280,HM640930.1:N}
{633162,LC533411.1:N}
{633167,LC534895.1:N}
{633168,LC535032.1:N}
{633169,LC535118.1:N}
{645952,MG655622.1:N}
{645953,MG655623.1:N}
{645954,MG655624.1:N}
{647440,KX265049.1:N}
{786399,CP045289.2:H}
{786399,CP045289.2:N}
{786399,CP045289.2:O}
{786399,CP045289.2:H}
{786399,CP045289.2:N}
{786399,CP045289.2:O}
{819779,CP095532.1:N}
{918032,AC027353.4:N}
{984989,LN898113.1:N}
{984990,LN898114.1:N}
{994274,LN006378.1:N}
{1043240,OE848969.1:N}

Those are the 37 isolated outliers in the previous post that registered variations of 4% or higher variations in the average percents of atoms in the genome.

Variation can also be the result of collection and handling problems ("variations are to be expected because collecting DNA sequences is not without processing errors (e.g. Why a DNA Sample May Fail, cf. A practical guide to mitochondrial DNA error prevention in clinical, forensic, and population genetics)").

III. Closing Comments

Here are my comments about these GenBank database members after perusing them to see what makes them outliers for the ~(32/35/25/6) genetic constant:

AC027353, AP015737, AP015755, AP015844, AP015925, AP016285, AP016374, AP016390, AP016492, AP016597, AP016618, LN006378 and AP016639
notes: These are cut-out segments rather that a  natural genome

CP095532
notes: This plasmid has unusual strands of repeat nucleotides "tttttttt" (183), "ttttttttg" (121), "ttttttttgt" (113), "ttttttttgtg" (38) etc. Not sufficiently natural.

FO082333
notes: "Large depth read coverage across a clone" ... not natural enough.

HM640930, JF760210, KX265049, LC533411, LC534895, LC535032, LC535118, LN898113, LN898114, MG655622, MG655623, MG655624, MK361035, MZ636522
notes: Mixing of RNA with DNA without designation of uracil "u" conflates the DNA with RNA. The ~(32/35/25/6) genetic constant relates to DNA.

MF597730, MF597734
notes: "This sequence was generated to improve a reference assembly gap in the mouse reference genome sequence." Not natural, just repair jobs.

MQ014117
notes: "synthetic construct" ... not a natural sequence.

OE848969
notes: too many repeat sequences and odd patterns to be natural.

Only 37 problematic rows out of over a million. NOT BAD!

The next post in this series is here, the previous post in this series is here.

No comments:

Post a Comment