Friday, August 4, 2023

Chimera Detection

Genieology, Geneology
I. Background

In a recent post I wrote:

"In a future post I will show how it can be used to detect chimera genomes so one doesn't use a chimera genome when only a sound genome is called for, or one does use a chimera genome when analyzing only chimera cases."

(Quantum Biology - 22). Regular readers know that Dredd Blog has considered, for quite a while, instances of generic chimera caused by antibiotics and other toxic chemicals (e.g. Some Of My Best Friends Are Germs - 2; MetaSUB - 3).

The mass media folk like to talk about only the chimera that are 'natural', (whatever that means), but that is if anything, a 'drop in the bucket' of the increasing quantities of the phenomenon.

Lots of software, perhaps even AI, are already developed or in the process of being developed (GitHub-1, Improving Long Read Assembly of Microbial Genomes and Plasmids, chimeric reads...cause severe disruption in many single-cell studies, Chimera genetics(Wikipedia), Chimera molecular biology(Wikipedia), Chimera virus (Wikipedia), ChimPipe: Accurate detection ... chimeras from RNA-seq data, How DNA is forensically extracted, Shotgun Sequencing, Github).

If you don't want to spend countless hours perusing one "chimera fixing" software package after another, I don't either (but 'it is a thing': The "It's In Your Genes" Myth - 2).

Even though I am more capable of fathoming them after decades of software engineer work, it is still more trouble than it is worth ("got AI" ... LOL).

II. Chimera Detecting Technique (Counting Atoms)

So, as regular readers know, I am working on a technique that may detect chimera before they put icing on the cake to cover it up.

The appendices today show results that are being used with the atom counting technique (Appendix 1, Appendix 2, Appendix 3, Appendix 4, Appendix 5, Appendix 6, Appendix 7, Appendix 8, Appendix 9).

They show results of analyses of various complete genomes, gene TATA Box areas within those genomes, and "cleaned genes" (TATA Box areas that have been "corrected").

The genomes are vary in kind from modern and ancient human chromosomes to bacteria.

III. How It Works

The nine appendices contain three types of analysis.

One type of analysis counts the atoms in every 'A','C','G', and 'T' nucleotide ("Full Genome") in the relevant genome of the GenBank database; another type of analysis extracts only the gene nucleotides between a promoter and a terminator ("Promoter-Terminator Genes"); and the final type of analysis used in these appendices is of the same genes, but they have had chimera or other invalid codons removed ("Cleaned Genes").

In other words, all three use the same genome to begin with, but the difference in outcome is that genes, not single nucleotides, are selected for atom counting, one is the full gene, the other is after bad condons have been removed.

The result is that in the case of the gene analysis, not as many are valid, so there are fewer "Table" print outs of gene atom counts.

If you haven't already, see the MIT video below for an example transcription because it lays out the dynamics that take place in DNA/RNA.

The variations in the tables is caused by the construct of the genomes which can be caused by improper collection of samples, improper analysis by the researchers, or database errors, or by the degradation of the samples prior to their being collected.

So far then, the ~32/35/25/6 fingerprint pattern is not a slam dunk way of detecting chimera, but it offers clues (see episode two).

I want to try this on some COV-Sars-2 viruses in a future post of this series. 

The next post in this series is here.



No comments:

Post a Comment