Amino Acid Sequence: Definition, Chart and Analysis Methods
Need Assistance?
  • US & Canada:
    +
  • UK: +

Amino Acid Sequence: Definition, Chart and Analysis Methods

Amino acids are the building blocks of proteins, the fundamental molecules that make up all living organisms. The order in which these amino acids are linked together forms the amino acid sequence, also known as the primary structure of a protein. The amino acid sequence determines the three-dimensional structure and function of a protein and is the key to understanding the efficacy of a protein in vivo.

What is the Amino Acid Sequence?

The amino acids in a sequence are usually represented by one- or three-letter codes. For instance, the letter A or the three-letter code Ala stand for the amino acid alanine. By stringing these codes together, the complete amino acid sequence of a protein can be described in a concise and standardized way. There are 20 standard amino acids that make up proteins, each with a unique side chain that gives it specific properties. These amino acids can be arranged in an almost infinite number of combinations, leading to the diversity of proteins found in nature. In addition to the 20 standard amino acids, non-standard amino acids are found in proteins. These amino acids can be modified after they are added to the growing protein chain, resulting in changes in protein structure and function. Some non-standard amino acids are essential for specific biological functions, while others are the result of chemical modifications that occur during and after protein synthesis.

Knowledge of the sequence of amino acids in a protein can provide insight into the protein's three-dimensional structure, function, cellular location, and evolution. For example, based on the similarity of their amino acid sequences, families of proteins that share certain common structural or functional features can be easily identified. Individual proteins are assigned to families based on the degree of similarity of their amino acid sequences. Members of a family typically share 25% or more of their sequences, and proteins in these families typically share at least some structural and functional features. In addition, certain amino acid sequences serve as signals that determine the cellular location, chemical modifications, and half-life of proteins. Special signal sequences, often at the amino terminus, are used to target certain proteins for export from the cell; other proteins are targeted for distribution to the nucleus, cell surface, cytosol, or other cellular locations.

Amino Acids and Proteins

Proteins are among the most important macromolecules in living organisms, with diverse functions and structures. Proteins are long chain-like molecules composed of multiple amino acids linked by peptide bonds, and usually contain hundreds to thousands of amino acid residues. Amino acids are the basic building blocks of proteins, and different combinations and arrangements of amino acids determine the structural and functional diversity of proteins. Amino acids are used to make industrial proteins, including enzymes for industrial use, recombinant proteins in pharmaceuticals, and protein supplements in nutritional products. The following table lists some amino acid products that can be used in recombinant protein production:

CatalogProduct NameCAS NumberCategoryInquiry
BAT-002155O-Trityl-L-serine25840-83-9L-Amino AcidsInquiry
BAT-002805Boc-L-serine3262-72-4BOC-Amino AcidsInquiry
BAT-003598DL-Threonine80-68-2DL-Amino AcidsInquiry
BAT-002736Boc-D-threonine55674-67-4BOC-Amino AcidsInquiry
BAT-000461Nα-Z-L-asparagine2304-96-3CBZ-Amino AcidsInquiry
BAT-002718Boc-D-glutamine61348-28-5BOC-Amino AcidsInquiry
BAT-003290Z-D-glutamine13139-52-1CBZ-Amino AcidsInquiry
BAT-003452Phthaloyl-glycine4702-13-0Cyclic Amino AcidsInquiry
BAT-003623Trifluoroacetyl glycine383-70-0DL-Amino AcidsInquiry
BAT-004210Trityl-glycine5893-05-0L-Amino AcidsInquiry

Amino Acid Sequence Chart

An amino acid sequence diagram is an important tool used in biochemistry and molecular biology to analyze and interpret the sequence of amino acids in a protein. It is a visual representation of the sequence of amino acids in a protein, usually displayed as a series of letters that represent specific amino acids in the protein. Each letter corresponds to a specific amino acid, and the sequence of letters in the diagram represents the linear order of the amino acids in the protein.

sequence of amino acidsFig. 1. Amino acid sequencing chart.

An amino acid sequence diagram is a reference guide for scientists and researchers who study proteins and their functions. By analyzing the amino acid sequence of a protein, researchers can gain insight into its structure, function, and evolutionary relationships. They can also use this information to predict the three-dimensional structure of a protein and study how it interacts with other molecules in the cell.

DNA Sequence to Amino Acid

The sequence of amino acids in a protein is determined by the genetic code, which is encoded in the DNA of an organism. The process of protein synthesis begins with the transcription of a gene, a specific region of DNA that codes for a particular protein. During transcription, the DNA is copied into messenger RNA (mRNA), which then carries the genetic information to the ribosome, the cellular machinery responsible for protein synthesis.

mRNA to Amino Acid Sequence

On the ribosome, the mRNA is read in groups of three nucleotides, called codons, each of which corresponds to a specific amino acid. A molecule called transfer RNA (tRNA) carries the corresponding amino acids to the ribosome, where they are added to the growing protein chain in the order specified by the mRNA sequence. As the ribosome reads the mRNA and adds the amino acids to the growing protein chain, an amino acid sequence is formed. This sequence is critical in determining the folding and structure of the protein and its function within the cell. Even small changes in the amino acid sequence can have a dramatic effect on the properties of the protein, leading to disease and disorders if not properly regulated.

How to Find Amino Acid Sequence?

Amino acid sequencing is the process of identifying the arrangement of amino acids in proteins and peptides. There are several methods and tools that can be used to find the amino acid sequence of a protein. One of the most common and widely used methods is to use databases such as UniProt, NCBI, and ExPASy. These databases contain a large number of protein sequences that have been experimentally determined or predicted by computational analysis. By searching these databases using the name or accession number of the protein, the amino acid sequence of the target protein can be easily found.

Another common way to find the amino acid sequence of a protein is through protein sequencing technology. Protein sequencing involves breaking down a protein into individual amino acids and then determining the order in which they are arranged. Currently, amino acid sequencing mainly includes Edman degradation and mass spectrometry. These methods provide researchers with a direct and accurate way to determine the amino acid sequence of a protein.

  • Edman Degradation Method

The identification of amino acid sequences in proteins or peptide chains can be achieved by Edman degradation, a method developed by Pehr Edman. This method allows the labeling and cleavage of peptides from the N-terminus without breaking the peptide bonds between other amino acid residues. The Edman degradation reaction was automated by Edman and Beggs in 1967. Today, automated Edman degradation (i.e., protein sequencers) is widely used and is capable of sequencing peptides up to 50 amino acids.

Edman degradationFig. 2. Edman chemistry and automated N-terminal sequence analysis (Curr Protoc Protein Sci. 2001, (11): Unit 11.10).

The cyclic degradation of peptides is based on the reaction of phenylisothiocyanate with the free amino group of the N-terminal residue, so that the amino acids are removed one by one and identified as their phenylthiourea derivatives. In the specific process, the non-charged peptide is reacted with phenylisothiocyanate (PITC) under slightly alkaline conditions to form a phenylthiocarbamoyl peptide (PTC-peptide) derivative. Then, under acidic conditions, the thiocarbonyl sulfide of this derivative attacks the carbonyl carbon of the N-terminal amino acid. The first amino acid is removed in the form of a phenylthiourea derivative (ATZ-amino acid), and the rest of the peptide can be isolated and subjected to the next degradation cycle. Once formed, this oxazolone derivative is more stable than the phenylthiocarbamoyl derivative. The ATZ amino acid is then removed by extraction with ethyl acetate and converted into a phenylthiourea derivative (PTH-amino acid). The PTH residue generated in each cycle can be identified by chromatography.

The N-terminal amino acid of a protein can be cleaved off. Therefore, in this process, the first cycle therefore determines the exact N-terminal amino acid. In addition, because the released amino acids are identified and quantified by chromatography, amino acids with the same molecular weight can be identified. For example, isoleucine and leucine have a mass of 113 Da, but they have different retention times. In addition, Edman sequencing can be performed on PVDF blots of 1D and 2D gels, which makes N-terminal sequencing of proteins in a mixture possible. However, when the N-terminus of the peptide is chemically modified, such as acetylation, Edman degradation sequencing will not be available. And because PITC cannot react with non-α-amino acids, sequencing will stop if a non-α-amino acid like isoaspartic acid is encountered. In addition, larger proteins cannot be sequenced by Edman sequencing.

  • Mass Spectrometry

Protein mass spectrometry is the process of applying mass spectrometry to the study of proteins. Mass spectrometry is an important method for accurately determining protein mass and characterizing proteins, and a variety of methods and instruments have been developed for its many uses. Its applications include identifying proteins and their post-translational modifications, elucidating protein complexes and their subunits and functional interactions, and global measurements of proteins in proteomics. It can also be used to localize proteins in different organelles and determine the interactions between different proteins and with membrane lipids. The two main methods used for the ionization of proteins in mass spectrometry are electrospray ionization (ESI) and matrix-assisted laser desorption/ionization (MALDI). These ionization techniques are used in conjunction with a mass analyzer such as a tandem mass spectrometer. Typically, proteins are either analyzed intact using a "top-down" approach or first digested into fragments using a "bottom-up" approach. Sometimes a "midstream" approach, which is somewhere in between, may also be used to analyze larger peptide fragments.

MS analysis of amino acid sequencesFig. 3. Mass spectrometry analysis of amino acid sequences (Computational Systems Biology of Cancer. Chapman & Hall/CRC Mathematical & Computational Biology , 2012).

Protein mass spectrometry requires that proteins in solution or solid form be converted to an ionized form in the gas phase and injected and accelerated in an electric or magnetic field prior to analysis. The two main methods for ionizing proteins are electrospray ionization (ESI) and matrix-assisted laser desorption/ionization (MALDI). In electrospray, ions are generated from proteins in solution, which allows for the ionization of fragile molecules intact, sometimes retaining non-covalent interactions. In MALDI, proteins are usually embedded in a matrix in solid form, and ions are generated by laser pulses. Electrospray produces more multivalent ions than MALDI, allowing for the measurement of high-quality proteins and providing better fragment identification, while MALDI is fast and less likely to be affected by contaminants, buffers, and additives. Overall protein mass analysis is mainly performed using time-of-flight (TOF) mass spectrometry or Fourier transform ion cyclotron resonance (FT-ICR). Both types of instruments are preferred here because of their wide mass range and, in the case of FT-ICR, their high mass accuracy.

CatalogProduct NameCAS NumberCategoryInquiry
BAT-003422Acetyl-L-proline68-95-1Cyclic Amino AcidsInquiry
BAT-003423Acetyl-L-proline amide16395-58-7Cyclic Amino AcidsInquiry
BAT-003427DL-Proline609-36-9Cyclic Amino AcidsInquiry
BAT-014305L-Hydroxyproline51-35-4Cyclic Amino AcidsInquiry
BAT-007195cis-D-4-Hydroxyproline2584-71-6Cyclic Amino AcidsInquiry
BAT-005729trans-D-4-Hydroxyproline3398-22-9Cyclic Amino AcidsInquiry
BAT-003475D-Arginine157-06-2D-Amino AcidsInquiry
BAT-008096D-Aspartic acid1783-96-6D-Amino AcidsInquiry
BAT-004935Ala-Val-OH3303-45-5OthersInquiry
AT-004962Boc-Val-Val-OH69209-73-0OthersInquiry

References:

  1. Reim, D.F. et al. N-terminal sequence analysis of proteins and peptides. Curr Protoc Protein Sci. 2001, (11): Unit 11.10.
  2. Philippe Hupé - Emmanuel Barillot, Laurence Calzone, Philippe Hupé, Jean-Philippe Vert, Andrei Zinovyev. Computational Systems Biology of Cancer. Chapman & Hall/CRC Mathematical & Computational Biology , 2012.
Online Inquiry
Verification code
Inquiry Basket