Tuesday, December 26, 2023

The Advent Of The Sulfur Atom - 2

Fig. 1 RNA Codons

This series is about atoms in proteins, and there are good reasons why:

"Proteins are key to all processes in our cells and understanding their functions and regulation is of major importance."

(Researchers solve a mystery). The genetic proteins are more important in terms of understanding genetic dynamics than genes themselves are (but protein dynamics are more difficult to grasp).

Somehow this reminds me of the Ringo Starr song about 'the blues' which contains the lyrics "you know it don't come easy" (but I digress, so back to the protein stuff):

"The human body consists of tens of thousands of proteins. What’s more, these occur in several variants whose concentration in the organism can change over time. Matthias Mann from the Max Planck Institute of Biochemistry in Martinsried therefore needs clever algorithms and a lot of computing power for his research. His goal, after all, is to decode the entire human proteome – that is, the full set of proteins in the human body – for the benefit of medical science."

"Proteome: It is now estimated that the human body contains between 80,000 and 400,000 proteins. However, they aren’t all produced by all the body’s cells at any given time. Cells have different proteomes depending on their cell type. There are thus at least 250 different proteomes corresponding to the 250 cell types in the human body. The proteome depends on many factors. For example, different proteins may be produced depending on an organism’s age, diet and state of health. The protein composition is also affected by environmental influences such as medications and pollutants."

(The Protein Puzzle, emphasis added). These days anything that is impacted by weather, medication, and pollution is getting more and more complicated.

But we should nevertheless try to grasp as much as we can:

"Protein sequences are the fundamental determinants of biological structure and function." 

(National Library of Medicine). One of the ways of grasping things is to take them apart piece by piece.

So, today I am adding an example of what is required to take apart a protein made by a ribosome; a ribosome that has been sent an mRNA message from a cell nucleus.

I mean the nucleus that made the mRNA message from the DNA of the organism the DNA, mRNA, and ribosome is within (see video below).

I wrote software to process a database of proteins I downloaded from GenBank.

The software attempts to 'reverse engineer' the protein to determine what the mRNA looked like before it left the nucleus, after being transcribed from the base pairs of DNA in the organism's genome.

Here is one example out of about 90,000 that were generated from the software I wrote early this morning:

Results:

Protein XP_008921924.1: [Manacus vitellinus]
MGPVMPPSKKPEGSGISVPSGLSQRYQDSDLSKALHDDEDLDFSLPAIRLDKGAMEDEEL
TNLNWLHESKNLLKSFGDSVLRSVSPVQDIGDDTPPSPAHSDMPYDAKQNPNCKPPYSFS
CLIFMAIEDSPTKRLPVKDIYNWILEHFPYFANAPTGWKNSVRHNLSLNKCFKKVDKDRS
QSIGKGSLWCIDPEYRQNLIQALKKTPYHPYSHVFNTPPTSPQAYQSTSGPPIWPGSTFF
KRNGALLQVPSGVIQNGARVLSRGIFPGARPLPINPIGSMAVAVRNGITNCRMRTESEQS
CGSPVVSGDPKEDHNYSSAKSSNHRSTSPASDSVSSSSADDQYEFATKGSQDSSEGSEVS
FQSHESHSETEEEDKKQSRKETKDSLADSGYASQHKKRQHLMKAKKVPSDTLPLKKRRTE
KPPESDDEEMKEAAGSLLHLAGIRSCLNNITNRTAKGQKEQKETTKN

mRNA XP_008921924.1:
AUGGGUCCU{GUU.GUC.GUA.GUG}AUGCCUCCU{UCU.UCC.UCA.UCG.AGU}
{AAA.AAG}{AAA.AAG}CCU{GAA.GAG}GGU{UCU.UCC.UCA.UCG.AGU
}GGU{AUU.AUC.AUA}{UCU.UCC.UCA.UCG.AGU}{GUU.GUC.GUA.GUG
}CCU{UCU.UCC.UCA.UCG.AGU}GGU{CUU.CUC.CUA.CUG.UUA.UUG}{
UCU.UCC.UCA.UCG.AGU
}{CAA.CAG}CGU{UAU.UAC}{CAA.CAG}GAU
{UCU.UCC.UCA.UCG.AGU}GAU{CUU.CUC.CUA.CUG.UUA.UUG}{UCU.U
CC.UCA.UCG.AGU
}{AAA.AAG}GCU{CUU.CUC.CUA.CUG.UUA.UUG}CAU
GAUGAU{GAA.GAG}GAU{CUU.CUC.CUA.CUG.UUA.UUG}GAU{UUU.UUC}
{UCU.UCC.UCA.UCG.AGU}{CUU.CUC.CUA.CUG.UUA.UUG}CCUGCU{AU
U.AUC.AUA
}CGU{CUU.CUC.CUA.CUG.UUA.UUG}GAU{AAA.AAG}GGUGC
UAUG{GAA.GAG}GAU{GAA.GAG}{GAA.GAG}{CUU.CUC.CUA.CUG.UU
A.UUG
}ACUAAU{CUU.CUC.CUA.CUG.UUA.UUG}AAUUGG{CUU.CUC.CUA.
CUG.UUA.UUG
}CAU{GAA.GAG}{UCU.UCC.UCA.UCG.AGU}{AAA.AAG}
AAU{CUU.CUC.CUA.CUG.UUA.UUG}{CUU.CUC.CUA.CUG.UUA.UUG}{A
AA.AAG
}{UCU.UCC.UCA.UCG.AGU}{UUU.UUC}GGUGAU{UCU.UCC.UC
A.UCG.AGU
}{GUU.GUC.GUA.GUG}{CUU.CUC.CUA.CUG.UUA.UUG}CGU
{UCU.UCC.UCA.UCG.AGU}{GUU.GUC.GUA.GUG}{UCU.UCC.UCA.UCG.
AGU
}CCU{GUU.GUC.GUA.GUG}{CAA.CAG}GAU{AUU.AUC.AUA}GGUG
AUGAUACUCCUCCU{UCU.UCC.UCA.UCG.AGU}CCUGCUCAU{UCU.UCC.UCA.
UCG.AGU
}GAUAUGCCU{UAU.UAC}GAUGCU{AAA.AAG}{CAA.CAG}AAU
CCUAAU{UGU.UGC}{AAA.AAG}CCUCCU{UAU.UAC}{UCU.UCC.UCA.U
CG.AGU
}{UUU.UUC}{UCU.UCC.UCA.UCG.AGU}{UGU.UGC}{CUU.C
UC.CUA.CUG.UUA.UUG
}{AUU.AUC.AUA}{UUU.UUC}AUGGCU{AUU.AU
C.AUA
}{GAA.GAG}GAU{UCU.UCC.UCA.UCG.AGU}CCUACU{AAA.AAG}
CGU{CUU.CUC.CUA.CUG.UUA.UUG}CCU{GUU.GUC.GUA.GUG}{AAA.AA
G
}GAU{AUU.AUC.AUA}{UAU.UAC}AAUUGG{AUU.AUC.AUA}{CUU.C
UC.CUA.CUG.UUA.UUG
}{GAA.GAG}CAU{UUU.UUC}CCU{UAU.UAC}{
UUU.UUC
}GCUAAUGCUCCUACUGGUUGG{AAA.AAG}AAU{UCU.UCC.UCA.UC
G.AGU
}{GUU.GUC.GUA.GUG}CGUCAUAAU{CUU.CUC.CUA.CUG.UUA.UUG
}{UCU.UCC.UCA.UCG.AGU}{CUU.CUC.CUA.CUG.UUA.UUG}AAU{AAA
.AAG
}{UGU.UGC}{UUU.UUC}{AAA.AAG}{AAA.AAG}{GUU.GUC.
GUA.GUG
}GAU{AAA.AAG}GAUCGU{UCU.UCC.UCA.UCG.AGU}{CAA.CA
G
}{UCU.UCC.UCA.UCG.AGU}{AUU.AUC.AUA}GGU{AAA.AAG}GGU{
UCU.UCC.UCA.UCG.AGU
}{CUU.CUC.CUA.CUG.UUA.UUG}UGG{UGU.UGC
}{AUU.AUC.AUA}GAUCCU{GAA.GAG}{UAU.UAC}CGU{CAA.CAG}A
AU{CUU.CUC.CUA.CUG.UUA.UUG}{AUU.AUC.AUA}{CAA.CAG}GCU{
CUU.CUC.CUA.CUG.UUA.UUG
}{AAA.AAG}{AAA.AAG}ACUCCU{UAU.U
AC
}CAUCCU{UAU.UAC}{UCU.UCC.UCA.UCG.AGU}CAU{GUU.GUC.GUA
.GUG
}{UUU.UUC}AAUACUCCUCCUACU{UCU.UCC.UCA.UCG.AGU}CCU{
CAA.CAG
}GCU{UAU.UAC}{CAA.CAG}{UCU.UCC.UCA.UCG.AGU}ACU
{UCU.UCC.UCA.UCG.AGU}GGUCCUCCU{AUU.AUC.AUA}UGGCCUGGU{UC
U.UCC.UCA.UCG.AGU
}ACU{UUU.UUC}{UUU.UUC}{AAA.AAG}CGUAA
UGGUGCU{CUU.CUC.CUA.CUG.UUA.UUG}{CUU.CUC.CUA.CUG.UUA.UUG}
{CAA.CAG}{GUU.GUC.GUA.GUG}CCU{UCU.UCC.UCA.UCG.AGU}GGU{
GUU.GUC.GUA.GUG
}{AUU.AUC.AUA}{CAA.CAG}AAUGGUGCUCGU{GUU
.GUC.GUA.GUG
}{CUU.CUC.CUA.CUG.UUA.UUG}{UCU.UCC.UCA.UCG.A
GU
}CGUGGU{AUU.AUC.AUA}{UUU.UUC}CCUGGUGCUCGUCCU{CUU.CUC
.CUA.CUG.UUA.UUG
}CCU{AUU.AUC.AUA}AAUCCU{AUU.AUC.AUA}GGU
{UCU.UCC.UCA.UCG.AGU}AUGGCU{GUU.GUC.GUA.GUG}GCU{GUU.GUC
.GUA.GUG
}CGUAAUGGU{AUU.AUC.AUA}ACUAAU{UGU.UGC}CGUAUGCGU
ACU{GAA.GAG}{UCU.UCC.UCA.UCG.AGU}{GAA.GAG}{CAA.CAG}{
UCU.UCC.UCA.UCG.AGU
}{UGU.UGC}GGU{UCU.UCC.UCA.UCG.AGU}CC
U{GUU.GUC.GUA.GUG}{GUU.GUC.GUA.GUG}{UCU.UCC.UCA.UCG.AGU
}GGUGAUCCU{AAA.AAG}{GAA.GAG}GAUCAUAAU{UAU.UAC}{UCU.U
CC.UCA.UCG.AGU
}{UCU.UCC.UCA.UCG.AGU}GCU{AAA.AAG}{UCU.U
CC.UCA.UCG.AGU
}{UCU.UCC.UCA.UCG.AGU}AAUCAUCGU{UCU.UCC.UC
A.UCG.AGU
}ACU{UCU.UCC.UCA.UCG.AGU}CCUGCU{UCU.UCC.UCA.UCG
.AGU
}GAU{UCU.UCC.UCA.UCG.AGU}{GUU.GUC.GUA.GUG}{UCU.UCC
.UCA.UCG.AGU
}{UCU.UCC.UCA.UCG.AGU}{UCU.UCC.UCA.UCG.AGU}
{UCU.UCC.UCA.UCG.AGU}GCUGAUGAU{CAA.CAG}{UAU.UAC}{GAA.
GAG
}{UUU.UUC}GCUACU{AAA.AAG}GGU{UCU.UCC.UCA.UCG.AGU}{
CAA.CAG
}GAU{UCU.UCC.UCA.UCG.AGU}{UCU.UCC.UCA.UCG.AGU}{
GAA.GAG
}GGU{UCU.UCC.UCA.UCG.AGU}{GAA.GAG}{GUU.GUC.GUA.
GUG
}{UCU.UCC.UCA.UCG.AGU}{UUU.UUC}{CAA.CAG}{UCU.UCC.
UCA.UCG.AGU
}CAU{GAA.GAG}{UCU.UCC.UCA.UCG.AGU}CAU{UCU.U
CC.UCA.UCG.AGU
}{GAA.GAG}ACU{GAA.GAG}{GAA.GAG}{GAA.GA
G
}GAU{AAA.AAG}{AAA.AAG}{CAA.CAG}{UCU.UCC.UCA.UCG.AGU
}CGU{AAA.AAG}{GAA.GAG}ACU{AAA.AAG}GAU{UCU.UCC.UCA.UC
G.AGU
}{CUU.CUC.CUA.CUG.UUA.UUG}GCUGAU{UCU.UCC.UCA.UCG.AG
U
}GGU{UAU.UAC}GCU{UCU.UCC.UCA.UCG.AGU}{CAA.CAG}CAU{A
AA.AAG
}{AAA.AAG}CGU{CAA.CAG}CAU{CUU.CUC.CUA.CUG.UUA.UU
G
}AUG{AAA.AAG}GCU{AAA.AAG}{AAA.AAG}{GUU.GUC.GUA.GUG}
CCU{UCU.UCC.UCA.UCG.AGU}GAUACU{CUU.CUC.CUA.CUG.UUA.UUG}C
CU{CUU.CUC.CUA.CUG.UUA.UUG}{AAA.AAG}{AAA.AAG}CGUCGUACU
{GAA.GAG}{AAA.AAG}CCUCCU{GAA.GAG}{UCU.UCC.UCA.UCG.AGU
}GAUGAU{GAA.GAG}{GAA.GAG}AUG{AAA.AAG}{GAA.GAG}GCUGC
UGGU{UCU.UCC.UCA.UCG.AGU}{CUU.CUC.CUA.CUG.UUA.UUG}{CUU.
CUC.CUA.CUG.UUA.UUG
}CAU{CUU.CUC.CUA.CUG.UUA.UUG}GCUGGU{A
UU.AUC.AUA
}CGU{UCU.UCC.UCA.UCG.AGU}{UGU.UGC}{CUU.CUC.C
UA.CUG.UUA.UUG
}AAUAAU{AUU.AUC.AUA}ACUAAUCGUACUGCU{AAA.AA
G
}GGU{CAA.CAG}{AAA.AAG}{GAA.GAG}{CAA.CAG}{AAA.AAG}
{GAA.GAG}ACUACU{AAA.AAG}AAU{UAA.UAG.UGA}

(Fig. 2). The areas underlined between a '{' and a '}' are multiple choices, i.e. those several codons (a codon is three nucleotides in sequence) can all code for the amino acid at that location.

Fig. 3 Manacus vitellinus

For example, notice that "AUGGGUCCU" is composed of three codons that begin the mRNA sequence.

They were reverse engineered from the first three amino acids ("MGP") in the protein "XP_008921924.1" (each letter/character in the protein represents an amino acid).

You can reverse engineer them using the data in Fig. 1.

For example, notice that proteins of this sort begin with an 'M'.

This is the 'start' amino acid, which is specified in the mRNA by the codon 'AUG'.

To further reverse engineer the protein so as to derive the specific codon out of the list of codons between the curly braces '{' and '}' we must look at the GenBank GBFF file.

We'll take that up next time.

The next post in this series is here, the previous post in this series is here.


Where:

"met-his-tyr-leu-asp-ser-arg-leu" codons become
"M....H...Y....L....D....S....R....L" amino acids in proteins:


No comments:

Post a Comment