Sunday, June 28, 2026

Human DNA Found In 2-3 Mya eDNA?

Ancient human DNA found
on cave art for the first time

I. Foreward

In a Nature paper a while back (A 2-million-year-old ecosystem in Greenland uncovered by environmental DNA) environmental DNA (eDNA) was discussed.

I had looked at a more recent paper which also concerned eDNA:

Highlights • eDNA is a genetic mixture of different living organisms often referred to as a genetic shake. • eDNA approach is an environmentally friendly and non-invasive survey method. • eDNA represents all DNA present in the environment derived from biological tissue fragments and excrement. • eDNA metabarcoding characterizes the DNA of various species from a single eDNA sample.

(A systematic review on environmental DNA (eDNA) Science).  The ancient eDNA interested me most (cf. Abiology Or Quantum Biology? - 5), so I asked myself why not find "human eDNA"?

II. A Combo? 

The title of this series tells you what became an interest I wanted to share with Dredd Blog readers after I looked into the eDNA files from the Greenland project mentioned in the Nature paper above (located here).

I have downloaded about half of those files so far, and have processed the GCF_015227675.2_mRatBN7.2_genomic.fna.gz to extract FASTA chromosome files 1-20, X, and Y from it.

Next I loaded my modern human DNA files from my SQL database with a program I wrote.

It makes a "string" out of that human DNA for quicker searches (it is over 50 meg long).

That program then loads SQL tables composed of each of the chromosome files extracted from the GCF_015227675.2_mRatBN7.2 file.

I can report that four human DNA matches were found in the 2-3 million year old eDNA files as follows.

III. First Things First

When searching for sections of DNA in other DNA the software prepares the nucleotides so that partial segments can't be confused one for the other (see the example in the video below).

First you isolate a tatabox section by finding a promoter (TATAAA) and a terminator (TATCTC) segment:

TATAAAATGGCCGAGCGGTCTAAGGCGCTGC
GTTCAGGTCGCAGTCTCCCCTGGAGGCGTGG
GTTCGAATCCCACTCCTGATATCTCTATCTC 

Next erase the promoter and terminator from the string of nucleotides: 

ATGGCCGAGCGGTCTAAGGCGCTGC
GTTCAGGTCGCAGTCTCCCCTGGAGGCGTGG
GTTCGAATCCCACTCCTGA

Next, sequentially go thru the strand placing a '*' every third location

ATG*GCC*GAG*CGG*TCT*AAG*GCG*CTG*C
GT*TCA*GGT*CGC*AGT*CTC*CCC*TGG*AGG*CGT*GG
G*TTC*GAA*TCC*CAC*TCC*TGA

Then you look within that segment for a start codon (ATG) and one of three stop codons (TAA, TGA,TAG).

The ATG begins the transcription sequence, and the first (closest one following the start codon) instance of a stop codon ends the in-frame string of nucleotides. 

I do this to avoid what is described as an "out of frame" event that messes things up.

If there aren't three nucleotides separated by '*' then the strand is "not in-frame".

I store them in the SQL database with the '*' characters in place to preserve the proper arrangement. 

IV. Rats!

After looking up the sequence numbers found in file GCF_015227675.2_mRatBN7.2, significant questions are raised.

One is, how are Norwegian Rat DNA, sequenced in 2-3 million year old eDNA sequences found in a frozen Greenland sub-icecap realm,  related to these human DNA segments (see appendices A1, A2, A3)?

But I digress, on to the next huge DNA file from Greenland.




A3

This is an appendix to: Human DNA Found In 2-3 Mya eDNA?


>NC_051347.1 Rattus norvegicus strain BN/NHsdMcwi chromosome 12, mRatBN7.2, whole genome shotgun sequence
comment @ gbff file:

Rattus norvegicus strain BN/NHsdMcwi chromosome 12, mRatBN7.2, whole genome shotgun sequence

NCBI Reference Sequence: NC_051347.1

    Record suppressed. This RefSeq record was removed as a result of standard genome annotation processing. Contact info@ncbi.nlm.nih.gov for further information.

FASTA Graphics 
Go to:

LOCUS       NC_051347           46669029 bp    DNA     linear   CON 11-JUN-2023
DEFINITION  Rattus norvegicus strain BN/NHsdMcwi chromosome 12, mRatBN7.2,
            whole genome shotgun sequence.
ACCESSION   NC_051347
VERSION     NC_051347.1
DBLINK      BioProject: PRJNA677964
            BioSample: SAMN16261960
            Assembly: GCF_015227675.2
KEYWORDS    WGS; RefSeq.
SOURCE      Rattus norvegicus (Norway rat)
  ORGANISM  Rattus norvegicus
            Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
            Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha;
            Muroidea; Muridae; Murinae; Rattus.
COMMENT     REFSEQ INFORMATION: The reference sequence is identical to
            CM026985.1.
            Assembly name: mRatBN7.2
            The genomic sequence for this RefSeq record is from the
            whole-genome assembly released by the Wellcome Sanger Institute on
            2020/11/10. The original whole-genome shotgun project has the
            accession JACYVU000000000.1.
            
************************************************************************************            
related comments:
Rattus norvegicus (rat) is a species of rodent in the family Muridae that is widely used as an experimental model organism. 
https://www.ncbi.nlm.nih.gov/nuccore/?term=Rattus+norvegicus

Rattus norvegicus
Norway rat (Rattus norvegicus) is a species of rodent in the family Muridae that is widely used as an experimental model organism.

NCBI Taxonomy ID
    10116
Taxonomic rank
    species
Current scientific name
    Rattus norvegicusType Material
Basionym
    Mus norvegicus

    Berkenhout, 1769
Common name
    Norway rat

A2

This is an appendix to: Human DNA Found In 2-3 Mya eDNA?



mysql> select uid from chromosome09 where seg_str='|ATG*TTT*CCC*TTC*ATA*CAT*CAC*GTG*ACT*CTA*TTC*CTT*GTG*AAC*ATC*AGC*TAA|';
+-------+
| uid |
+-------+
| 44739 |
+-------+
mysql> select uid from chromosome13 where seg_str='|ATG*GCC*GAG*CGG*TCT*AAG*GCG*CTG*CGT*TCA*GGT*CGC*AGT*CTC*CCC*TGG*AGG*CGT*GGG*TTC*GAA*TCC*CAC*TCC*TGA|';
+-------+
| uid |
+-------+
| 57294 |
+-------+
mysql> select uid from chromosome13 where seg_str='|ATG*TTC*AGT*GGA*ATC*CAA*CAG*GAG*AGG*AGG*TGT*GAA*CAG*GTG*CAG*GGT*TGC*TGA|';
+-------+
| uid |
+-------+
| 60989 |
+-------+
mysql> select uid from chromosome13 where seg_str='|ATG*GGA*GTC*CAT*GGG*GTC*TCG*TTA*TAT*ATA*ATG*GTT*GGT*TTA*ATT*TGT*TTC*ACT*TCT*ATC*TTG*TTA*ATT*GTA*AAT*TGA|';
+-------+
| uid |
+-------+
| 61010 |
+-------+

***************************************************************************************

mysql> select uid from human_mrna_segs where mrna_segs='ATG*TTT*CCC*TTC*ATA*CAT*CAC*GTG*ACT*CTA*TTC*CTT*GTG*AAC*ATC*AGC*TAA|';
+-------+
| uid |
+-------+
| 94962 |
+-------+
mysql> select uid from human_mrna_segs where mrna_segs='ATG*GGA*GTC*CAT*GGG*GTC*TCG*TTA*TAT*ATA*ATG*GTT*GGT*TTA*ATT*TGT*TTC*ACT*TCT*ATC*TTG*TTA*ATT*GTA*AAT*TGA|';
+--------+
| uid |
+--------+
| 111723 |
+--------+
mysql> select uid from human_mrna_segs where mrna_segs='ATG*TTC*AGT*GGA*ATC*CAA*CAG*GAG*AGG*AGG*TGT*GAA*CAG*GTG*CAG*GGT*TGC*TGA|';
+--------+
| uid |
+--------+
| 111661 |
+--------+
mysql> select uid from human_mrna_segs where mrna_segs='ATG*GCC*GAG*CGG*TCT*AAG*GCG*CTG*CGT*TCA*GGT*CGC*AGT*CTC*CCC*TGG*AGG*CGT*GGG*TTC*GAA*TCC*CAC*TCC*TGA|';
+--------+
| uid |
+--------+
| 279761 |
+--------+





A1

This is an appendix to: Human DNA Found In 2-3 Mya eDNA?



NW_015495299.1 Homo sapiens found ancient greenland |ATG*TTT*CCC*TTC*ATA*CAT*CAC*GTG*ACT*CTA*TTC*CTT*GTG*AAC*ATC*AGC*TAA|
comment in nuccore gbff file:
LOCUS NW_015495299 535088 bp DNA linear CON 06-AUG-2025
DEFINITION Homo sapiens chromosome 2 genomic patch of type NOVEL, GRCh38.p14
PATCHES HSCHR2_6_CTG7_2.
ACCESSION NW_015495299
VERSION NW_015495299.1
DBLINK BioProject: PRJNA168
Assembly: GCF_000001405.40
KEYWORDS RefSeq; NOVEL_PATCH.
SOURCE Homo sapiens (human)
ORGANISM Homo sapiens
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
Catarrhini; Hominidae; Homo.
COMMENT REFSEQ INFORMATION: The reference sequence is identical to
KQ983256.1.
In this scaffold, AC225768.2 contains a 4608 bp insertion relative
to GRCh38 chromosome 2 (NC_000002.12) representing novel sequence
not present in GRCh38.
Region: REGION233.
Assembly Name: GRCh38.p14 PATCHES
The DNA sequence is composed of genomic sequence, primarily
finished clones that were sequenced as part of the Human Genome
Project. PCR products and WGS shotgun sequence have been added
where necessary to fill gaps or correct errors. All such additions
are manually curated by GRC staff. For more information see:
https://genomereference.org.
**********************************************************************************************
NW_025791780.1 Homo sapiens (human) DNA found ancient greenland |ATG*GCC*GAG*CGG*TCT*AAG*GCG*CTG*CGT*TCA*GGT*CGC*AGT*CTC*CCC*TGG*AGG*CGT*GGG*TTC*GAA*TCC*CAC*TCC*TGA|
comment in nuccore gbff file:
LOCUS NW_025791780 383128 bp DNA linear CON 06-AUG-2025
DEFINITION Homo sapiens chromosome 6 genomic patch of type NOVEL, GRCh38.p14
PATCHES HSCHR6_1_CTG1.
ACCESSION NW_025791780
VERSION NW_025791780.1
DBLINK BioProject: PRJNA168
Assembly: GCF_000001405.40
KEYWORDS RefSeq; NOVEL_PATCH.
SOURCE Homo sapiens (human)
ORGANISM Homo sapiens
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
Catarrhini; Hominidae; Homo.
COMMENT REFSEQ INFORMATION: The reference sequence is identical to
MU273357.1.
This scaffold includes CHM1 clone AC270753.1, which contains an
approximately 12kbp inversion haplotype relative to GRCh38 chr 6.
Region: REGION320.
Assembly Name: GRCh38.p14 PATCHES
The DNA sequence is composed of genomic sequence, primarily
finished clones that were sequenced as part of the Human Genome
Project. PCR products and WGS shotgun sequence have been added
where necessary to fill gaps or correct errors. All such additions
are manually curated by GRC staff. For more information see:
https://genomereference.org.
****************************************************************************
NT_187519.1 Homo sapiens (human) DNA found ancient greenland |ATG*TTC*AGT*GGA*ATC*CAA*CAG*GAG*AGG*AGG*TGT*GAA*CAG*GTG*CAG*GGT*TGC*TGA|
comment in nuccore gbff file:
LOCUS NT_187519 911658 bp DNA linear CON 06-AUG-2025
DEFINITION Homo sapiens chromosome 1 genomic scaffold, GRCh38.p14 alternate
locus group ALT_REF_LOCI_1 HSCHR1_3_CTG32_1.
ACCESSION NT_187519
VERSION NT_187519.1
DBLINK BioProject: PRJNA168
Assembly: GCF_000001405.40
KEYWORDS RefSeq; ALTERNATE_LOCUS.
SOURCE Homo sapiens (human)
ORGANISM Homo sapiens
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
Catarrhini; Hominidae; Homo.
REFERENCE 1 (bases 1 to 911658)
AUTHORS Genovese,G., Handsaker,R.E., Li,H., Kenny,E.E. and McCarroll,S.A.
TITLE Mapping the human reference genome's missing sequence by three-way
admixture in Latino genomes
JOURNAL Am. J. Hum. Genet. 93 (3), 411-421 (2013)
PUBMED 23932108
REFERENCE 2 (bases 1 to 911658)
AUTHORS Genovese,G., Handsaker,R.E., Li,H., Altemose,N., Lindgren,A.M.,
Chambert,K., Pasaniuc,B., Price,A.L., Reich,D., Morton,C.C.,
Pollak,M.R., Wilson,J.G. and McCarroll,S.A.
TITLE Using population admixture to help complete maps of the human
genome
JOURNAL Nat. Genet. 45 (4), 406-414 (2013)
PUBMED 23435088
REFERENCE 3 (bases 1 to 911658)
CONSRTM International Human Genome Sequencing Consortium
TITLE Finishing the euchromatic sequence of the human genome
JOURNAL Nature 431 (7011), 931-945 (2004)
PUBMED 15496913
REFERENCE 4 (bases 1 to 911658)
AUTHORS Lander,E.S., Linton,L.M., Birren,B., Nusbaum,C., Zody,M.C.,
Baldwin,J., Devon,K., Dewar,K., Doyle,M., FitzHugh,W., Funke,R.,
Gage,D., Harris,K., Heaford,A., Howland,J., Kann,L., Lehoczky,J.,
LeVine,R., McEwan,P., McKernan,K., Meldrim,J., Mesirov,J.P.,
Miranda,C., Morris,W., Naylor,J., Raymond,C., Rosetti,M.,
Santos,R., Sheridan,A., Sougnez,C., Stange-Thomann,N.,
Stojanovic,N., Subramanian,A., Wyman,D., Rogers,J., Sulston,J.,
Ainscough,R., Beck,S., Bentley,D., Burton,J., Clee,C., Carter,N.,
Coulson,A., Deadman,R., Deloukas,P., Dunham,A., Dunham,I.,
Durbin,R., French,L., Grafham,D., Gregory,S., Hubbard,T.,
Humphray,S., Hunt,A., Jones,M., Lloyd,C., McMurray,A., Matthews,L.,
Mercer,S., Milne,S., Mullikin,J.C., Mungall,A., Plumb,R., Ross,M.,
Shownkeen,R., Sims,S., Waterston,R.H., Wilson,R.K., Hillier,L.W.,
McPherson,J.D., Marra,M.A., Mardis,E.R., Fulton,L.A.,
Chinwalla,A.T., Pepin,K.H., Gish,W.R., Chissoe,S.L., Wendl,M.C.,
Delehaunty,K.D., Miner,T.L., Delehaunty,A., Kramer,J.B., Cook,L.L.,
Fulton,R.S., Johnson,D.L., Minx,P.J., Clifton,S.W., Hawkins,T.,
Branscomb,E., Predki,P., Richardson,P., Wenning,S., Slezak,T.,
Doggett,N., Cheng,J.F., Olsen,A., Lucas,S., Elkin,C.,
Uberbacher,E., Frazier,M., Gibbs,R.A., Muzny,D.M., Scherer,S.E.,
Bouck,J.B., Sodergren,E.J., Worley,K.C., Rives,C.M., Gorrell,J.H.,
Metzker,M.L., Naylor,S.L., Kucherlapati,R.S., Nelson,D.L.,
Weinstock,G.M., Sakaki,Y., Fujiyama,A., Hattori,M., Yada,T.,
Toyoda,A., Itoh,T., Kawagoe,C., Watanabe,H., Totoki,Y., Taylor,T.,
Weissenbach,J., Heilig,R., Saurin,W., Artiguenave,F., Brottier,P.,
Bruls,T., Pelletier,E., Robert,C., Wincker,P., Smith,D.R.,
Doucette-Stamm,L., Rubenfield,M., Weinstock,K., Lee,H.M.,
Dubois,J., Rosenthal,A., Platzer,M., Nyakatura,G., Taudien,S.,
Rump,A., Yang,H., Yu,J., Wang,J., Huang,G., Gu,J., Hood,L.,
Rowen,L., Madan,A., Qin,S., Davis,R.W., Federspiel,N.A.,
Abola,A.P., Proctor,M.J., Myers,R.M., Schmutz,J., Dickson,M.,
Grimwood,J., Cox,D.R., Olson,M.V., Kaul,R., Raymond,C., Shimizu,N.,
Kawasaki,K., Minoshima,S., Evans,G.A., Athanasiou,M., Schultz,R.,
Roe,B.A., Chen,F., Pan,H., Ramser,J., Lehrach,H., Reinhardt,R.,
McCombie,W.R., de la Bastide,M., Dedhia,N., Blocker,H.,
Hornischer,K., Nordsiek,G., Agarwala,R., Aravind,L., Bailey,J.A.,
Bateman,A., Batzoglou,S., Birney,E., Bork,P., Brown,D.G.,
Burge,C.B., Cerutti,L., Chen,H.C., Church,D., Clamp,M.,
Copley,R.R., Doerks,T., Eddy,S.R., Eichler,E.E., Furey,T.S.,
Galagan,J., Gilbert,J.G., Harmon,C., Hayashizaki,Y., Haussler,D.,
Hermjakob,H., Hokamp,K., Jang,W., Johnson,L.S., Jones,T.A.,
Kasif,S., Kaspryzk,A., Kennedy,S., Kent,W.J., Kitts,P.,
Koonin,E.V., Korf,I., Kulp,D., Lancet,D., Lowe,T.M., McLysaght,A.,
Mikkelsen,T., Moran,J.V., Mulder,N., Pollara,V.J., Ponting,C.P.,
Schuler,G., Schultz,J., Slater,G., Smit,A.F., Stupka,E.,
Szustakowski,J., Thierry-Mieg,D., Thierry-Mieg,J., Wagner,L.,
Wallis,J., Wheeler,R., Williams,A., Wolf,Y.I., Wolfe,K.H.,
Yang,S.P., Yeh,R.F., Collins,F., Guyer,M.S., Peterson,J.,
Felsenfeld,A., Wetterstrand,K.A., Patrinos,A., Morgan,M.J., de
Jong,P., Catanese,J.J., Osoegawa,K., Shizuya,H., Choi,S. and
Chen,Y.J.
CONSRTM International Human Genome Sequencing Consortium
TITLE Initial sequencing and analysis of the human genome
JOURNAL Nature 409 (6822), 860-921 (2001)
PUBMED 11237011
REMARK Erratum:[Nature 2001 Aug 2;412(6846):565]
COMMENT REFSEQ INFORMATION: The reference sequence is identical to
KI270763.1.
An ALT_LOCI has been created to capture sequence from AC225099.4
present in the 1kG ph1 decoy but absent from the Reference
Assembly. Data used in the assembly of this scaffold were
contributed in part by the authors of PMID: 23932108 (Mapping the
Human Reference Genome's Missing Sequence by Three-Way Admixture in
Latino Genomes. Genovese G, Handsaker RE, Li H, Kenny EE, McCarroll
SA.Am J Hum Genet. 2013) and PMID:23435088 (Using population
admixture to help complete maps of the human genome.Genovese G,
Handsaker RE, Li H, Altemose N, Lindgren AM, Chambert K, Pasaniuc
B, Price AL, Reich D, Morton CC, Pollak MR, Wilson JG, McCarroll
SA.Nat Genet. 2013 Apr;45(4):406-14).
Region: REGION111.
Assembly Name: GRCh38.p14 ALT_REF_LOCI_1
The DNA sequence is composed of genomic sequence, primarily
finished clones that were sequenced as part of the Human Genome
Project. PCR products and WGS shotgun sequence have been added
where necessary to fill gaps or correct errors. All such additions
are manually curated by GRC staff. For more information see:
https://genomereference.org.

******************************************************************************
NT_187519.1 Homo sapiens (human) DNA found ancient greenland |ATG*GGA*GTC*CAT*GGG*GTC*TCG*TTA*TAT*ATA*ATG*GTT*GGT*TTA*ATT*TGT*TTC*ACT*TCT*ATC*TTG*TTA*ATT*GTA*AAT*TGA|
same comments as just above