Tuesday, March 15, 2022

It's In The GenBank - 4

Fig. 1 Transcription: DNA~>mRNA

I. Background

Dredd Blog posts have featured genes of microbes and viruses for years (On the Origin of the Genes of Viruses, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17).

The general focus and effort is to find details less covered by mainstream media, including not only those which may impact disease outbreaks locally, but also those that may become pandemics (On The Origin Of The Home Of COVID-19, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27).

II. Lately

I have been trying to persuade microbiologists, virologists, and government (e.g. NCBI GenBank) to change errors in the format used in genome nucleotide (base pair) sequencing (It's In The GenBank, 2, 3; Small Things Considered).

My requests for a discussion made on 'scientific blogs' tend to be either marked as spam, deleted, delayed or otherwise 'disappeared'.

But "lo and behold" honest, full, professional, and friendly consideration was given by the esteemed and helpful NCBI (of the dot gov realm) when Eric Cox forwarded my email to Wayne Matten:

[my email to Customer services was forwarded to a specialist]

NCBI Customer Services Division NCBI | NLM | NIH 

Case Information: Case #: CAS-867526-D3B5Z4 Customer Name: unknown unknown Customer Email: dreddblog@gmail.com Case Created: 3/11/2022, 10:59:57 AM Summary: Feedback via Datasets feedback form Details: 

Hi Helpdesk, We got the following feedback via the Datasets feedback button. Since this is not a Datasets issue but rather an NCBI policy (and I don't know the answer!), I'm forwarding to you guys. [my email:] "I have tried and tried to communicate with anyone having information about why RNA genomes are not represented (in GenBank GBFF or FASTA files) with a 'U' (Uracil) nucleotide, but are instead represented by a 'T' (Thymine) nucleotide. see https://blogdredd.blogspot.com/2022/02/its-in-genbank-3.html (email: dreddblog@gmail.com)" 

Thanks! Eric Cox 

[this is the result which was sent back to Dredd Blog]

Re: case #CAS-867526-D3B5Z4: Feedback via Datasets feedback form TRACKING:000270000004949

Hello, Thank you for your question. The short answer is that this is the convention for GenBank and the other members of the INSDC, http://www.insdc.org/ [https://www.insdc.org/documents/feature-table#7.1.2] 

The practical explanation is that RNA sequences are generated via cDNA technology, so all U's are replaced by T's. 

The cDNA sequences are then submitted to one of the INSDC members. Directly sequencing RNA is possible, although on a smaller scale than DNA sequencing, yet the INSDC maintains the convention, most likely to simplify data management. 

Best regards, Wayne 


Wayne Matten, PhD 

At least the .gov sites did not censor Dredd Blog, or mark my errors as spam, which was a pleasant surprise.

I responded as follows:

Hello Wayne,

Thank you kindly for your response.

The problem I have is that RNA does not contain 'T' (thymine) and DNA does not contain 'U' uracil (NIH, Some Of My Best Friends Are Germs).

When we place the 'T' in a database of RNA nucleotides we are promoting false information.

Attached are some examples in a .zip file of what they would look like if corrected.


Wayne replied,

Hello Dredd,
I see your point of view. Another way to look at it is, since researchers submit their sequences with T's in place of U's, we would be misrepresenting their data by replacing the T's with U's.
The reason researchers submit sequences the way they do is because that is the convention. So within the constraints of the convention, the data are not false.
Here is a bit of official documentation:
Best regards,

 I replied:

That is not the whole picture. Just misrepresenting 'U' uracil for 'T' thiamine is not the only falsification. The sequence itself changes. The positions of the base pairs change. "We've been doing it wrong so long that makes it right" is not a practice I subscribe to. Here is a paper by a serious scientist who knows a 'U' from a 'T' (Mutational Analysis of the Influenza Virus cRNA Promoter and Identification of Nucleotides Critical for Replication). If he sends a GBFF file, or FASTA, he will send the right letters (don't reject his data for that).

The .zip file I sent you shows how the locations of the molecules change during translation.

I know you can't do anything about it, so thanks for responding anyway.

Some example files that were in the .zip file I sent to Wayne are in today's Appendix.

III. Closing Comments

The graphic at Fig. 1 reveals how RNA is depicted in drawings intended to show the chemical makeup of RNA ('U' for uracil).

Fig. 2 GenBank can do this too
The graphic at Fig. 2 shows how serious scientific papers use that same technique to reveal deep-down RNA genomic dynamics ('U' for uracil).

In the case of the scientific paper (see link at Fig. 2) if 'T' for thymine had been used instead of 'U' for uracil, the paper would not have been published in The Journal of Virology.

I don't think GenBank (a very good source of data) should continue its practice of using a 'T' in place of the 'U' when placing data in its vast store of database information.

We must take this and related issues very seriously:

"Assembly pipelines often result in viral genomes contaminated with host genetic material, some of which are currently deposited into public databases ... For years now the study of viruses and their genetic composition has been important in their identification and classification. Especially in these times of the pandemic turmoil, accurate knowledge of a virus’ exact genetic composition can help identify its strengths and weaknesses allowing us to track its evolution and assist in the development of vaccines and antiviral agents. The reconstruction of these genomic sequences is called the assembly process, a bioinformatics approach which can be complicated and full of pitfalls. This work identifies one such issue, concerning artifacts introduced in viral genomes from the new technologies of nucleic acid sequencing. The proposed algorithm helps alleviate this problem by tentatively removing these problematic regions while keeping the vast majority of the genetic information required to produce a more complete viral genome. This work is anticipated to assist in the submission of higher integrity and accuracy viral genomes in public databases used for novel virus identification and characterization ... Open databases, such as GenBank, while they contain most of the current sequences, are poorly curated regarding the integrity and the accuracy of the submitted sequences. Indeed, various reports have highlighted contamination of sequences with bacterial moieties that were erroneously incorporated in the final assembly. Such errors are especially important as they result in false positive identification of viruses that happen to contain in their proposed viral genomes parts of host DNA or RNA."

(ZWA: Viral genome assembly and characterization hindrances from virus-host chimeric reads; a refining approach, emphasis added; cf. Some Of My Best Friends Are Germs - 2)."Parts is parts" is not the way to deal with this.

On another front, Dredd Blog recently pointed out that the IPCC's latest report is critical of the way officialdom reacts to problems that are a concern to the public, calling it a "crime" (How Microbes Communicate In The Tiniest Language - 3).

The next post in this series is here, the previous post in this series is here.

Welcome to Wayne's World:

Appendix NTB

This is an appendix to: It's In The GenBank - 4

Proposed GenBank RNA format:

>NC_026431.1 982bp cRNA Influenza A virus (A/California/07/2009(H1N1))
uacucagaag auuggcucca gcuuugcaug caagaaagau aguagggcag uccgggggag
uuucggcucu agcgcgucuc ugaccuuuca cagaaacguc cuuucuugug ucuagaacuc
cgagaguacc uuaccgauuu cuguucuggu uagaacagug gagacugauu cccuuaaaau
ccuaaacaca agugcgagug gcacggguca cucgcuccug acgucgcauc ugcgaaacag
guuuuacggg auuuacccuu accccugggc uuguuguacc uaucucguca auuugauaug
uucuucgagu uuucucuuua uugcaaggua ccccgguucc uccacaguga uucgauaagu
ugaccacgug aacggucaac guacccggag uauauguugu ccuacccuug ucacuggugu
cuucgacgaa aaccagauca cacacgguga acacuugucu aacgacuaag ugucguagcc
agagugucug ucuaccgaug auggugguua ggugauuagu ccguacuuuu gucuuaccac
gaccgaucgu gaugccguuu ccgauaccuu gucuaccgac cuagcucacu uguccgucgc
cuccgguacc uccaacgauu agucugaucc gucuaccaug uacguuacuc uugauaaccc
ugaguaggau cgaggucacg accagacuuu cuacuggaag aacuuuuaaa cguccggaug
gucuucgcuu acccucacgu cuacgucgcu aaguucacua ggagagcagu aacgucguuu
auaguaaccc uagaacgugg acuauaacac cuaaugacua gcagaaaaaa aguuuacaua
aauagcagcg aaauuuaugc caaacuuuuc ucccggaaga ugccuuccuc acggacucag
guacucccuu cuuauaguug uccuugucgu cucacgacac cuacaacugc uaccaguaaa
acaguuguau cucgaucuca uu

Current GenBank RNA format:

>NC_026431.1 982bp cRNA Influenza A virus (A/California/07/2009(H1N1))
atgagtcttc taaccgaggt cgaaacgtac gttctttcta tcatcccgtc aggccccctc
aaagccgaga tcgcgcagag actggaaagt gtctttgcag gaaagaacac agatcttgag
gctctcatgg aatggctaaa gacaagacca atcttgtcac ctctgactaa gggaatttta
ggatttgtgt tcacgctcac cgtgcccagt gagcgaggac tgcagcgtag acgctttgtc
caaaatgccc taaatgggaa tggggacccg aacaacatgg atagagcagt taaactatac
aagaagctca aaagagaaat aacgttccat ggggccaagg aggtgtcact aagctattca
actggtgcac ttgccagttg catgggcctc atatacaaca ggatgggaac agtgaccaca
gaagctgctt ttggtctagt gtgtgccact tgtgaacaga ttgctgattc acagcatcgg
tctcacagac agatggctac taccaccaat ccactaatca ggcatgaaaa cagaatggtg
ctggctagca ctacggcaaa ggctatggaa cagatggctg gatcgagtga acaggcagcg
gaggccatgg aggttgctaa tcagactagg cagatggtac atgcaatgag aactattggg
actcatccta gctccagtgc tggtctgaaa gatgaccttc ttgaaaattt gcaggcctac
cagaagcgaa tgggagtgca gatgcagcga ttcaagtgat cctctcgtca ttgcagcaaa
tatcattggg atcttgcacc tgatattgtg gattactgat cgtctttttt tcaaatgtat
ttatcgtcgc tttaaatacg gtttgaaaag agggccttct acggaaggag tgcctgagtc
catgagggaa gaatatcaac aggaacagca gagtgctgtg gatgttgacg atggtcattt
tgtcaacata gagctagagt aa

Proposed GenBank RNA format:

>NC_026432.1 863bp cRNA Influenza A virus (A/California/07/2009(H1N1))
uaccugaggu ugugguacag uucgaaaguc caucugacaa aggaaaccgu auaggcguuc
gcuaaacguc uguuaccuaa cccacuacgg gguaaggaac uagccgaggc ggcucuaguu
uucaggaauu uuccuucucc guugugggaa ccggagcuau agcuuugucg gugagaacaa
cccuuuguuu agcaccuuac cuagaacuuu cuccuuaggu cgcucuguga aucuuacugu
uaacguagac auggaugaag cgcgauggaa agacuguacu gggagcuccu uuacagugcu
cugaccaagu acgaguacgg auccguuuuc uauuauccgg gagaaacgca cgcuaaccug
guccgcuagu accuuuucuu guaucaugac uuucgcuuga agucacauua gaaauuggcu
aaucucugga acuaugauga uucccgaaag ugacuccucc cucguuauca accucuuuaa
agugguaaug gaagagaagg uccuguauga auacuccuac aguuuuuacg ucaaccccag
gaguagccuc cugaacuuac cuuaccauug ugccaagcuc agagacuuuu auaugucucu
aagcgaaccu cuuugacacu acucuuaccc ucuggaagug auggaggucu cgucuuuacu
uuucaccgcu cucguuaacc cugucuuuaa acuccuuuau uccaccaauu aacuucuuua
cgccgugucu aacuuucgcu gucucuuauc aaagcuuguu uauuguaaau acguucggaa
uguugaugac gaacuucauc uuguucucua uucucgaaag agcaaagucg aauaaauuac
uauuuuuugu gggaacaaag aug

Current GenBank RNA format:

>NC_026432.1 863bp cRNA Influenza A virus (A/California/07/2009(H1N1))
atggactcca acaccatgtc aagctttcag gtagactgtt tcctttggca tatccgcaag
cgatttgcag acaatggatt gggtgatgcc ccattccttg atcggctccg ccgagatcaa
aagtccttaa aaggaagagg caacaccctt ggcctcgata tcgaaacagc cactcttgtt
gggaaacaaa tcgtggaatg gatcttgaaa gaggaatcca gcgagacact tagaatgaca
attgcatctg tacctacttc gcgctacctt tctgacatga ccctcgagga aatgtcacga
gactggttca tgctcatgcc taggcaaaag ataataggcc ctctttgcgt gcgattggac
caggcgatca tggaaaagaa catagtactg aaagcgaact tcagtgtaat ctttaaccga
ttagagacct tgatactact aagggctttc actgaggagg gagcaatagt tggagaaatt
tcaccattac cttctcttcc aggacatact tatgaggatg tcaaaaatgc agttggggtc
ctcatcggag gacttgaatg gaatggtaac acggttcgag tctctgaaaa tatacagaga
ttcgcttgga gaaactgtga tgagaatggg agaccttcac tacctccaga gcagaaatga
aaagtggcga gagcaattgg gacagaaatt tgaggaaata aggtggttaa ttgaagaaat
gcggcacaga ttgaaagcga cagagaatag tttcgaacaa ataacattta tgcaagcctt
acaactactg cttgaagtag aacaagagat aagagctttc tcgtttcagc ttatttaatg
ataaaaaaca cccttgtttc tac

Proposed GenBank RNA format:

>NC_026427.1 985bp cRNA Influenza A virus (A/Shanghai/02/2013(H7N9))
uacucagaag auuggcucca gcuuugcaug caagagagau aguaagguag uccgggggag
uuucggcucu agcgugucuc ugaacuccua caaaaacguc ccuucuugcg ucuagagcuc
cgagaguacc ucaccuauuu cuguucuggu uaggacagug gagacugauu ccccuaaaau
cccaaacaca agugcgagug gcacggguca cucgcuccug acgucgcauc ugccaaacag
guuuugcggg auuuacccuu accucugggu uuguuguacc uguuccgcca auuuaauaug
uucuuugacu ucucccuuua cuguaaagua ccucguuucc uucaacguga gucaaugagu
ugaccacgcg aacggucgac guacccagag uauauguugu cuuaccccug acacuggcgu
cuuccccgag aaccugauca uacacgguga acacucgucu aacgacugcg uguuguagcc
aggguguccg ucuaccgcug augaugauug ggugauuaau ccguacucuu aucuuaccau
gaucggucgu gaugccgauu ccgauaccuc gucuaccgac cuaguucacu uguccgucgc
cuucgguacc uucaacguuc aguccgaucc guuuaccacg uccgauacuc uugucaaccc
ugagugggau ugaggucaug uccagauuuu cuacuagaau aacuuuuaaa cguccggaug
gucuuggccu acccucacgu ugacgucgcc aaguucacuc ggagaucagc aacgucgauu
guaauaaccc uauaacguga acuauaacac cuaagaacua gcagaaaaga aguuuacgua
aauagcagca aaauuuaugc caaacuuuuc ucccggaaga ugccuuccuu acggacucag
auacucccuu cuuauagccg uccuugucgu cuuacgacac cuacaacugc uaccaguaaa
acaguuguau cucgacuuca uuuuu

Current GenBank RNA format:

>NC_026427.1 985bp cRNA Influenza A virus (A/Shanghai/02/2013(H7N9))
atgagtcttc taaccgaggt cgaaacgtac gttctctcta tcattccatc aggccccctc
aaagccgaga tcgcacagag acttgaggat gtttttgcag ggaagaacgc agatctcgag
gctctcatgg agtggataaa gacaagacca atcctgtcac ctctgactaa ggggatttta
gggtttgtgt tcacgctcac cgtgcccagt gagcgaggac tgcagcgtag acggtttgtc
caaaacgccc taaatgggaa tggagaccca aacaacatgg acaaggcggt taaattatac
aagaaactga agagggaaat gacatttcat ggagcaaagg aagttgcact cagttactca
actggtgcgc ttgccagctg catgggtctc atatacaaca gaatggggac tgtgaccgca
gaaggggctc ttggactagt atgtgccact tgtgagcaga ttgctgacgc acaacatcgg
tcccacaggc agatggcgac tactactaac ccactaatta ggcatgagaa tagaatggta
ctagccagca ctacggctaa ggctatggag cagatggctg gatcaagtga acaggcagcg
gaagccatgg aagttgcaag tcaggctagg caaatggtgc aggctatgag aacagttggg
actcacccta actccagtac aggtctaaaa gatgatctta ttgaaaattt gcaggcctac
cagaaccgga tgggagtgca actgcagcgg ttcaagtgag cctctagtcg ttgcagctaa
cattattggg atattgcact tgatattgtg gattcttgat cgtcttttct tcaaatgcat
ttatcgtcgt tttaaatacg gtttgaaaag agggccttct acggaaggaa tgcctgagtc
tatgagggaa gaatatcggc aggaacagca gaatgctgtg gatgttgacg atggtcattt
tgtcaacata gagctgaagt aaaaa

Proposed GenBank RNA format:

>NC_026428.1 841bp cRNA Influenza A virus (A/Shanghai/02/2013(H7N9))
uaccuaaggu uaugacacag uucgaagguc caucugacga aagaaaccgu acaggcguuu
gcuaaacguc ugguucuuua cccacuacgg gguaaagauc uggccgaagc ggcucuaguc
uucagggacu cuccuucuuc gucgugagaa ccagaccugu agucuugacg gugcgcacuu
ccuuucguau aucaccucgc cuaaaaucuc cuucucaguc uacuucguaa auuuuacuca
uaacgaaguc acggucgagg ugcgauagau ugacuguacu gagaacuucu uuacaguucu
cuaaccaauu acgaguaagg guuugucuuu uauuguccca gggauacgua aucuuaccug
guucguuauc accuguuuuu guaguguaac uuucguuuaa agucacacua aaaguuagcc
gaacuucggg acuaugauga aucucgaaaa ugccuucuuc cucguuaaca uccgcuuuag
agugguaaug gaagagaagg uccuguauga cuguuccuac aguuuuuacg uuaacucuag
gaguagccuc cuaaacuuac cuuacuauug ugucaagcuc agagacuuug agaugucucu
aagcgaaccu cuucgucgcu acuccuaccc ucuagaggug agagauguuu caucuuugcc
cuuuaccucu cuugucaauu cggucuucaa gcuucuuuau ucuaccaacu aacuucuuca
ugcuguaucu aauuuuuaau gccucuuauc gaaacucguu uauugaaaau acguucggaa
uguugauaac gaacuucacc ucguucucua uucuugaaag agcaaagucg aauaaauuac

Current GenBank RNA format:

>NC_026428.1 841bp cRNA Influenza A virus (A/Shanghai/02/2013(H7N9))
atggattcca atactgtgtc aagcttccag gtagactgct ttctttggca tgtccgcaaa
cgatttgcag accaagaaat gggtgatgcc ccatttctag accggcttcg ccgagatcag
aagtccctga gaggaagaag cagcactctt ggtctggaca tcagaactgc cacgcgtgaa
ggaaagcata tagtggagcg gattttagag gaagagtcag atgaagcatt taaaatgagt
attgcttcag tgccagctcc acgctatcta actgacatga ctcttgaaga aatgtcaaga
gattggttaa tgctcattcc caaacagaaa ataacagggt ccctatgcat tagaatggac
caagcaatag tggacaaaaa catcacattg aaagcaaatt tcagtgtgat tttcaatcgg
cttgaagccc tgatactact tagagctttt acggaagaag gagcaattgt aggcgaaatc
tcaccattac cttctcttcc aggacatact gacaaggatg tcaaaaatgc aattgagatc
ctcatcggag gatttgaatg gaatgataac acagttcgag tctctgaaac tctacagaga
ttcgcttgga gaagcagcga tgaggatggg agatctccac tctctacaaa gtagaaacgg
gaaatggaga gaacagttaa gccagaagtt cgaagaaata agatggttga ttgaagaagt
acgacataga ttaaaaatta cggagaatag ctttgagcaa ataactttta tgcaagcctt
acaactattg cttgaagtgg agcaagagat aagaactttc tcgtttcagc ttatttaatg

Proposed GenBank RNA format:

>NC_006312.2 1,180bp RNA Influenza C virus (C/Ann Arbor/1/50)
ucgucuucgu ccccuaaagu uuuguuaccg uguacuuuau gacuaacgac uuugucuccg
uaaagauuuu uuacaacgag gacucugguc cugucguuau uaaagucguu auuguccucc
uuuuagucgg acguuuaguc gucgauuuga cuaauucuua cuuguagaag gggauuacag
accucuucgg ugguguuacg uguagcaaua cuccacgaau auaggacuuu auuuugguac
cuuuuuccgu ucgcuguacg acuuauuucg uugaagauca aacuuuuuua gucuuccuuc
ucuguauucu uucguuuacu uucgucgacc ucugaagaac ccucaccuca guuacuacuu
uuacucccgg aagucucuac ugguuuauua ccuuuaccaa cuucuucaua uacuaguggg
ucugcugaug uguggucugu aggcuuaucc uuguuagugu cgaaccaacu cuacguuuuu
guucuuuuca cuuucuaugu ccucauuaca gagucuuuca ccuucuuguc gaaauuuuua
aguacuucau ucuuuucggu cgugucguua cuugcucuaa cgaccauaau gaccggaacc
ucuucuucgu gauagagagg uuucuguuug ucuuucaaac cgguauaaua cauuagugug
aaaaccuuca uuauauuacu cuggggugaa ccuuuuucgu uauuuuccuc aacuuccguc
ucaaccucuc uacccugcuu accguuacuu uaccaaucaa caauauuaua caaagagaua
uuguucaguu ggacgaagac gaacguuaga uuucuggaca gauuuugaua aauuguuaug
acuacgccau ugacaaguaa caaaauuacu uuugguuccu auguacgauu guaaucggag
aaacccuaau cccuauuaau gauacaacau aaauaaucau uuuuaguauu aacuugaaca
guuaccaaaa cacgagccgu cuacccucuc uaccacaccu cuauauuucu gguguuaaua
cggacuuuaa cugagcuacc uuuuucuaua acgggaaaga ucccucucug aacuggaccc
ucuccuacga ggacugcuuu ggcuguugag ugguuaagga aaaagguuac uaccauaaaa
acuuuaaauu aauggaacuu uuuuagggga acgaugacga

Current GenBank RNA format:

>NC_006312.2 1,180bp RNA Influenza C virus (C/Ann Arbor/1/50)
agcagaagca ggggatttca aaacaatggc acatgaaata ctgattgctg aaacagaggc
atttctaaaa aatgttgctc ctgagaccag gacagcaata atttcagcaa taacaggagg
aaaatcagcc tgcaaatcag cagctaaact gattaagaat gaacatcttc ccctaatgtc
tggagaagcc accacaatgc acatcgttat gaggtgctta tatcctgaaa taaaaccatg
gaaaaaggca agcgacatgc tgaataaagc aacttctagt ttgaaaaaat cagaaggaag
agacataaga aagcaaatga aagcagctgg agacttcttg ggagtggagt caatgatgaa
aatgagggcc ttcagagatg accaaataat ggaaatggtt gaagaagtat atgatcaccc
agacgactac acaccagaca tccgaatagg aacaatcaca gcttggttga gatgcaaaaa
caagaaaagt gaaagataca ggagtaatgt ctcagaaagt ggaagaacag ctttaaaaat
tcatgaagta agaaaagcca gcacagcaat gaacgagatt gctggtatta ctggccttgg
agaagaagca ctatctctcc aaagacaaac agaaagtttg gccatattat gtaatcacac
ttttggaagt aatataatga gaccccactt ggaaaaagca ataaaaggag ttgaaggcag
agttggagag atgggacgaa tggcaatgaa atggttagtt gttataatat gtttctctat
aacaagtcaa cctgcttctg cttgcaatct aaagacctgt ctaaaactat ttaacaatac
tgatgcggta actgttcatt gttttaatga aaaccaagga tacatgctaa cattagcctc
tttgggatta gggataatta ctatgttgta tttattagta aaaatcataa ttgaacttgt
caatggtttt gtgctcggca gatgggagag atggtgtgga gatataaaga ccacaattat
gcctgaaatt gactcgatgg aaaaagatat tgccctttct agggagagac ttgacctggg
agaggatgct cctgacgaaa ccgacaactc accaattcct ttttccaatg atggtatttt
tgaaatttaa ttaccttgaa aaaatcccct tgctactgct

Proposed GenBank RNA format:

>NC_006306.2 935bp cRNA Influenza C virus (C/Ann Arbor/1/50)
ucgucuucgu ccccaugaaa agguuuuaca ggcuguuuug ucaguuuagu uguuuaaauu
accguaaaca ucgguguuuu uacaaucucu cuguucuucu aaaucugugu acgugacuuu
acguucaucu uuuuuacuuu ugcaguuguu uucgguccaa cuuuugucuu agaagaaaac
guggaucuug uacccuucua cguuauuuuc uaccacucga agauaaguug cccugcuaag
acguucgucu cagaggaugu uacugcgguc gcaggcaucu uuacuucccc uucuuuaaag
gauaacuaaa acgagguucu uuguaucgug guuaacccgu uuuagguuau auaaacagug
guacauaagg auugaaacua ccuuugcaga cccuucguug cuacauagua guagcaccuc
guugaaacug uuucuguuac uuaacguuga caaaaguuuc uuguuaaacc acgguagguu
uaggaagugc auacucuaac ucgauacgua aacaaaacau aacgucuuua ugauucuucu
agacaccuau ggaguagcga ucuguucacc ggccuuaacu uuguccuuaa ucuuuuacaa
agucuacgua auuuucgccu aagcaauacc gauggcuacu uuagagagag ugauaugagg
uuucauaguu uaguccucgg gucgagcuag ggauaacccc uuuacuuugu ggucuauaac
uguucugacu ucgaauauac gagagcgaau cucuucgacc uggaauugga cucguuucgu
cagaauccuu agguuuuaag acuucuagaa uaaaacuagu auauauuguc ucuacaaaca
uuuuugugau auaauuacua uuuuagaaac acauuaagug aauauauuaa caaaauucaa
caauaagguu ucaauuuuuu ggggaacgag gacga

Current GenBank RNA format:

>NC_006306.2 935bp cRNA Influenza C virus (C/Ann Arbor/1/50)
agcagaagca ggggtacttt tccaaaatgt ccgacaaaac agtcaaatca acaaatttaa
tggcatttgt agccacaaaa atgttagaga gacaagaaga tttagacaca tgcactgaaa
tgcaagtaga aaaaatgaaa acgtcaacaa aagccaggtt gaaaacagaa tcttcttttg
cacctagaac atgggaagat gcaataaaag atggtgagct tctattcaac gggacgattc
tgcaagcaga gtctcctaca atgacgccag cgtccgtaga aatgaagggg aagaaatttc
ctattgattt tgctccaaga aacatagcac caattgggca aaatccaata tatttgtcac
catgtattcc taactttgat ggaaacgtct gggaagcaac gatgtatcat catcgtggag
caactttgac aaagacaatg aattgcaact gttttcaaag aacaatttgg tgccatccaa
atccttcacg tatgagattg agctatgcat ttgttttgta ttgcagaaat actaagaaga
tctgtggata cctcatcgct agacaagtgg ccggaattga aacaggaatt agaaaatgtt
tcagatgcat taaaagcgga ttcgttatgg ctaccgatga aatctctctc actatactcc
aaagtatcaa atcaggagcc cagctcgatc cctattgggg aaatgaaaca ccagatattg
acaagactga agcttatatg ctctcgctta gagaagctgg accttaacct gagcaaagca
gtcttaggaa tccaaaattc tgaagatctt attttgatca tatataacag agatgtttgt
aaaaacacta tattaatgat aaaatctttg tgtaattcac ttatataatt gttttaagtt
gttattccaa agttaaaaaa ccccttgctc ctgct

Proposed GenBank RNA format:

>NC_003847.1 826bp ss-RNA Panicum mosaic satellite virus
cccauaaggu ugcgaucguu gcucacauuc ugcagguaga cguucaccgc guugucguua
acuugaucag agugcuccuu augaggacua ccgaggauuc gcaagguccg cuagauuagc
agcccgcccg agggcccgac gacgguguag ugaccacaug cuaugcacga ugcaguggaa
cugccucgcg cgaugaugga gaaaagucuc cgucucaaag ggcugggagu uccccuaccc
ccuggcacgu aagguccaac agcgcaaaug uuaggucccc cacagucguc ggggggacua
cauauugcgc gcggacauau ugggcccgcu gugucugaga cagguacggu ggccccaugu
caacuacccg ugucagggau ccuggcaagc cgagugggga ucccacccgg ucuuguugac
caagaaacca uugugacuuc uucggcucug guaaaaccgg uagcugccug agcacagaug
guucccacga uugcgggggu cguuauggca guaacaaugc ccaacgaaau ccgaccgcgg
aucgcucgaa gucagaagua uuucauaggg ggagaauacc ccacagcggc auguguggca
aucagggcgc ggagauggcc gacagccuga ggaugguacg ggaggccacc uacaacuccu
ucacccccuu aguccuccga ugcuucggca gcgucguuuc cgcuggcaca cauguugguu
gggccggucu uuauaugggg cuuucccccc caggacgccg acccagggga aaaguuaccg
uuacgguaaa aggacccccc ucuacgcaga gggggaggau ccuggg

Current GenBank RNA format:

>NC_003847.1 826bp ss-RNA Panicum mosaic satellite virus
gggtattcca acgctagcaa cgagtgtaag acgtccatct gcaagtggcg caacagcaat
tgaactagtc tcacgaggaa tactcctgat ggctcctaag cgttccaggc gatctaatcg
tcgggcgggc tcccgggctg ctgccacatc actggtgtac gatacgtgct acgtcacctt
gacggagcgc gctactacct cttttcagag gcagagtttc ccgaccctca aggggatggg
ggaccgtgca ttccaggttg tcgcgtttac aatccagggg gtgtcagcag cccccctgat
gtataacgcg cgcctgtata acccgggcga cacagactct gtccatgcca ccggggtaca
gttgatgggc acagtcccta ggaccgttcg gctcacccct agggtgggcc agaacaactg
gttctttggt aacactgaag aagccgagac cattttggcc atcgacggac tcgtgtctac
caagggtgct aacgccccca gcaataccgt cattgttacg ggttgcttta ggctggcgcc
tagcgagctt cagtcttcat aaagtatccc cctcttatgg ggtgtcgccg tacacaccgt
tagtcccgcg cctctaccgg ctgtcggact cctaccatgc cctccggtgg atgttgagga
agtgggggaa tcaggaggct acgaagccgt cgcagcaaag gcgaccgtgt gtacaaccaa
cccggccaga aatatacccc gaaagggggg gtcctgcggc tgggtcccct tttcaatggc
aatgccattt tcctgggggg agatgcgtct ccccctccta ggaccc

Proposed GenBank RNA format:

>NC_007377.1 1,027bp cRNA Influenza A virus (A/Korea/426/1968(H2N2))
ucguuuucgu ccaucuauaa cuuucuacuc ggaagauugg cuccagcuuu gcaugcaaga
gagauagcag ggcaguccgg gggaguuucg gcucuagcgu gucucugaac uucuacagaa
acgacccuuc uugugucuag aacuccgaga guaccuuacc gauuucuguu cugguuagga
caguggagac ugauuccccu aaaacccuaa acauaagugc gaguggcacg guucacucgc
uccugacguc gcaucugcga aacagguuuu acgggaguua cccuuacccc uagguuuauu
guaccugucu cgucaauuug acauaucuuu cgaauucucc cucuauugua agguaccccg
guuucuucau cgcgagucaa uaagacgacc acgugaacgg ucaacguacc cggaguauau
guuguccuac ccccgacacu ggugacuuca ccggaaacgg caccauacac guuggacacu
ugucuaacga cugagggucg uauccagagu guccguuuac cacuguuguu gguuagguga
uuauucugua cucuugucuu accaagaccg gucgugaugu cgauuccgau accucguuua
ccgaccuagc ucacucguuc gucgucuccg guaccuccaa cgaucagucc gguccguuua
ccacguccgu uacucucggu aacccugagg aggaucgagg ucacgaccag auuuucuacu
agaagaacuu uuaaacgucc ggauagucuu ugcuuacccc cacgucuacg uugcuaaguu
cacuggggga acaacaacga cgcucauagu aacccuagaa cgugaaauau aacaccuaag
aacuagcaga aaaaaaguuu acguaaauag cgaagaaauu ugugccagac uuuucucccg
gaagaugccu uccucaugga cucagauacu cccuucuuau agcuuuccuu gucgucucac
gacaccuacg acugcuauca guaaaacagu cguaucucga ccucauuuuu ugauggaaca

Current GenBank RNA format:

>NC_007377.1 1,027bp cRNA Influenza A virus (A/Korea/426/1968(H2N2))
agcaaaagca ggtagatatt gaaagatgag ccttctaacc gaggtcgaaa cgtacgttct
ctctatcgtc ccgtcaggcc ccctcaaagc cgagatcgca cagagacttg aagatgtctt
tgctgggaag aacacagatc ttgaggctct catggaatgg ctaaagacaa gaccaatcct
gtcacctctg actaagggga ttttgggatt tgtattcacg ctcaccgtgc caagtgagcg
aggactgcag cgtagacgct ttgtccaaaa tgccctcaat gggaatgggg atccaaataa
catggacaga gcagttaaac tgtatagaaa gcttaagagg gagataacat tccatggggc
caaagaagta gcgctcagtt attctgctgg tgcacttgcc agttgcatgg gcctcatata
caacaggatg ggggctgtga ccactgaagt ggcctttgcc gtggtatgtg caacctgtga
acagattgct gactcccagc ataggtctca caggcaaatg gtgacaacaa ccaatccact
aataagacat gagaacagaa tggttctggc cagcactaca gctaaggcta tggagcaaat
ggctggatcg agtgagcaag cagcagaggc catggaggtt gctagtcagg ccaggcaaat
ggtgcaggca atgagagcca ttgggactcc tcctagctcc agtgctggtc taaaagatga
tcttcttgaa aatttgcagg cctatcagaa acgaatgggg gtgcagatgc aacgattcaa
gtgaccccct tgttgttgct gcgagtatca ttgggatctt gcactttata ttgtggattc
ttgatcgtct ttttttcaaa tgcatttatc gcttctttaa acacggtctg aaaagagggc
cttctacgga aggagtacct gagtctatga gggaagaata tcgaaaggaa cagcagagtg
ctgtggatgc tgacgatagt cattttgtca gcatagagct ggagtaaaaa actaccttgt

Proposed GenBank RNA format:

>NC_007380.1 838bp cRNA Influenza A virus (A/Korea/426/1968(H2N2))
uaccuaagau ugugacacag uucaaaaguc caucuaacga aggaaaccgu acaggcuuuu
guucaacauc ugguucuuga uccacuacgg gguaaggaac uagccgaagc ggcucuaguc
uucagggauu ccccuucucc gucgugagag cuagaucugu agcuucgucg gugggcacaa
ccuuucgucu aucaucucuc cuaagacuuc cuucuuaggc uacuccguga auuuuacugg
uaccggaggc guggacgaag cgcuauggau ugacuguacu gauaacuccu uaacaguucc
cugaccaagu acgauuacgg guucgucuuu caccuuccgg gagaaacgua gucuuaucug
guccguuagu accuauucuu guaguacaac uuucgcuuaa agucacacua aaaacuggcc
gaucucuggg auuauaauga uucccgaaag uggcuucucc cucguuaaca accgcuuuaa
agugguaacg gaagagaagg uccuguauga uaacuccuac aguuuuuacg uuaaccccag
gaguagccuc cugaacuuac cuuacuauug ugucaagcuc agagauuuug agaugucucu
aagcgaaccu cuucgucauu acucuuaccc ucuggaggug agugagguuu ugucuuugcc
uuuuaccgcu cuuguuaauc caguuuucaa gcuucucuau ucuaccgacu aacuucuuca
cucugugucu aacuucuauu gucucuuauc aaaacucguu uauuguaaau acguucggaa
ugucgaugau aaacuucacc uuguucucua uucuugaaag agcaaagucg aauaaauu

Current GenBank RNA format:

>NC_007380.1 838bp cRNA Influenza A virus (A/Korea/426/1968(H2N2))
atggattcta acactgtgtc aagttttcag gtagattgct tcctttggca tgtccgaaaa
caagttgtag accaagaact aggtgatgcc ccattccttg atcggcttcg ccgagatcag
aagtccctaa ggggaagagg cagcactctc gatctagaca tcgaagcagc cacccgtgtt
ggaaagcaga tagtagagag gattctgaag gaagaatccg atgaggcact taaaatgacc
atggcctccg cacctgcttc gcgataccta actgacatga ctattgagga attgtcaagg
gactggttca tgctaatgcc caagcagaaa gtggaaggcc ctctttgcat cagaatagac
caggcaatca tggataagaa catcatgttg aaagcgaatt tcagtgtgat ttttgaccgg
ctagagaccc taatattact aagggctttc accgaagagg gagcaattgt tggcgaaatt
tcaccattgc cttctcttcc aggacatact attgaggatg tcaaaaatgc aattggggtc
ctcatcggag gacttgaatg gaatgataac acagttcgag tctctaaaac tctacagaga
ttcgcttgga gaagcagtaa tgagaatggg agacctccac tcactccaaa acagaaacgg
aaaatggcga gaacaattag gtcaaaagtt cgaagagata agatggctga ttgaagaagt
gagacacaga ttgaagataa cagagaatag ttttgagcaa ataacattta tgcaagcctt
acagctacta tttgaagtgg aacaagagat aagaactttc tcgtttcagc ttatttaa

Proposed GenBank RNA format:

>NC_007367.1 1,027bp cRNA Influenza A virus (A/New York/392/2004(H3N2))
ucguuuucgu ccaucuauaa cuuucuacuc ggaagauugg cuccagcuuu gcauacaaga
gagauagcaa gguaguccgg gggaguuucg gcucuagcgc gucucugaac uucuacagaa
acgacccuuu uugugucuag aacuccgaga guaccuuacc gauuucuguu cugguuaaga
caguggagac ugauuccccu aaaaccccaa acacaagugc gaguggcacg ggucacucgc
uccugacguc gcaucugcga aacagguuuu acgggaguua cccuuaccuc uagguuuauu
guaccuguuu cgucaauuug acauauccuu ugaauucucc cucuauugca agguaccccg
guuucuuuau cgagagucaa uaagacgacc acgugaacgg ucaacguacc cggaguauau
guuauccuac ccccgacauu ggugacuuca ccguaaaccg gaccauacac guuguacacu
ugucuaacga cugagggucg uguccagagu auccguuuac caccguuguu gguuagguaa
uuauuuugua cucuugucuu accaaaaccg gucgugaugu cgauuccgau accucguuua
ccgaccuagu ucacucgucc gucgccuccg guaccuuuaa cgaucagucc gguccguuua
ccacguccgu uacucucggc aacccugagu aggaucgagg ucaugaccag auucucuacu
agaagaacuu uuaaacgucu ggauagucuu ugcuuacccc cacgucuacg uugcuaaguu
cacugggcga acaacaacgg cgcucauagu aacccuagaa cgugaacuau aacaccuaag
aacuagcaga aaaaaaguuu acgcagauag cugagaaguu ugugccggaa uuuucuccgg
gaagaugccu uccucaugga cucagauacu cccuucuuau agcuuuccuu gucgucuuac
gacaccuacg acugcuguca guaaaacagu cguaucucaa ccucauuuuu ugauggaaca

Current GenBank RNA format:

>NC_007367.1 1,027bp cRNA Influenza A virus (A/New York/392/2004(H3N2))
agcaaaagca ggtagatatt gaaagatgag ccttctaacc gaggtcgaaa cgtatgttct
ctctatcgtt ccatcaggcc ccctcaaagc cgagatcgcg cagagacttg aagatgtctt
tgctgggaaa aacacagatc ttgaggctct catggaatgg ctaaagacaa gaccaattct
gtcacctctg actaagggga ttttggggtt tgtgttcacg ctcaccgtgc ccagtgagcg
aggactgcag cgtagacgct ttgtccaaaa tgccctcaat gggaatggag atccaaataa
catggacaaa gcagttaaac tgtataggaa acttaagagg gagataacgt tccatggggc
caaagaaata gctctcagtt attctgctgg tgcacttgcc agttgcatgg gcctcatata
caataggatg ggggctgtaa ccactgaagt ggcatttggc ctggtatgtg caacatgtga
acagattgct gactcccagc acaggtctca taggcaaatg gtggcaacaa ccaatccatt
aataaaacat gagaacagaa tggttttggc cagcactaca gctaaggcta tggagcaaat
ggctggatca agtgagcagg cagcggaggc catggaaatt gctagtcagg ccaggcaaat
ggtgcaggca atgagagccg ttgggactca tcctagctcc agtactggtc taagagatga
tcttcttgaa aatttgcaga cctatcagaa acgaatgggg gtgcagatgc aacgattcaa
gtgacccgct tgttgttgcc gcgagtatca ttgggatctt gcacttgata ttgtggattc
ttgatcgtct ttttttcaaa tgcgtctatc gactcttcaa acacggcctt aaaagaggcc
cttctacgga aggagtacct gagtctatga gggaagaata tcgaaaggaa cagcagaatg
ctgtggatgc tgacgacagt cattttgtca gcatagagtt ggagtaaaaa actaccttgt

Proposed GenBank RNA format:

>NC_007370.1 890bp cRNA Influenza A virus (A/New York/392/2004(H3N2))
ucguuuucgu cccacuguuu cuguauuacc uaagguugug acacaguuca aagguccauc
uaacgaaaga aaccguauag gccuuuguuc aacaucuggu ucuugacuca cuacggggua
aggaacuagc cgaagcggcu cuagucucca gggauucccc uucuccguua ugagagccag
aucuguaguu ucgucggugg guacaaccuu ucguuuaaca ucuuuucuaa gacuuucuuc
uuagacuacu ccgugaauuu uacugguacc agaggugugg acgaagcgcu auguauugac
uguacugaua acuccuuaac aguucuuuga ccaaguacga uuacggguuc gucuuucacc
uuccuggaga aacguagucu uaccuggucc guuaguaccu cuuuuuguag uacaacuuuc
gcuuaaaguc acacuaaaaa cuggcugauc ucugguauca uaaugauucc cgaaaguggc
uucucccucg uuaacaaccg cuuuagagug guaacggaag aaaagguccu guaugauaac
uccuacaguu uuuacguuaa ccccaggagu agccuccuga acuuaccuua cuauuguguc
aagcucagag auuuuuagau gucucuaagc gaaccucuuc gucauuacuc uuacccccug
gaggugaaug agguuuuguc uuugccuuuu accgcucuug ucgauccagu uuucaaacuu
cucuauucua ccgacuaacu ucuucacucu gugucugauu uuuguugacu uuuaucgaaa
cuuguuuauu guaaguacgu ucguaauguu gacgacaaac uucaccuugu ccucuauucu
ugaaagagua aagucgaaua aauuacuauu uuuuguggga acaaagauga

Current GenBank RNA format:

>NC_007370.1 890bp cRNA Influenza A virus (A/New York/392/2004(H3N2))
agcaaaagca gggtgacaaa gacataatgg attccaacac tgtgtcaagt ttccaggtag
attgctttct ttggcatatc cggaaacaag ttgtagacca agaactgagt gatgccccat
tccttgatcg gcttcgccga gatcagaggt ccctaagggg aagaggcaat actctcggtc
tagacatcaa agcagccacc catgttggaa agcaaattgt agaaaagatt ctgaaagaag
aatctgatga ggcacttaaa atgaccatgg tctccacacc tgcttcgcga tacataactg
acatgactat tgaggaattg tcaagaaact ggttcatgct aatgcccaag cagaaagtgg
aaggacctct ttgcatcaga atggaccagg caatcatgga gaaaaacatc atgttgaaag
cgaatttcag tgtgattttt gaccgactag agaccatagt attactaagg gctttcaccg
aagagggagc aattgttggc gaaatctcac cattgccttc ttttccagga catactattg
aggatgtcaa aaatgcaatt ggggtcctca tcggaggact tgaatggaat gataacacag
ttcgagtctc taaaaatcta cagagattcg cttggagaag cagtaatgag aatgggggac
ctccacttac tccaaaacag aaacggaaaa tggcgagaac agctaggtca aaagtttgaa
gagataagat ggctgattga agaagtgaga cacagactaa aaacaactga aaatagcttt
gaacaaataa cattcatgca agcattacaa ctgctgtttg aagtggaaca ggagataaga
actttctcat ttcagcttat ttaatgataa aaaacaccct tgtttctact