Visualization of Protein Structure
Introduction
Proteins are the fundamental molecules that play an important role in biological processes. Protein structures are primarily formed by the condensation of amino acids forming peptide bonds. Protein structure consists of a central carbon atom (alpha carbon) linked to an amino group, a carboxyl group, a hydrogen atom, and a side chain (R group). Understanding the three-dimensional structure of a protein is a prerequisite for understanding protein function. The existing experimental methods, such as X-ray crystallography, Nuclear Magnetic Resonance (NMR), Electron Microscopy, Small Angle X-ray scattering (SAXS), and the latest X-ray Free-Electron Lasers (XFEL), provide straightforward steps to determine structural and behavioral dynamics, there are still challenges with respect to complexity, time and cost and recently it has been noted that the gap between known protein sequence and the protein structure is increasing exponentially. This marked the need for computational techniques to predict molecular dynamics of proteins that have known sequences and unknown structures, understand protein folding, and determine specific functions of proteins and drug design processes. Many computational techniques are available for protein prediction structure in structural biology, complementing expensive and laborious experimental methods with higher accuracy deepening the basic understanding of the relationship between protein structure and function.
Theory
Proteins structures are complex polypeptide structures with one or more long chains of amino acid residues. All the 20 amino acids have a common central carbon atom (Cα) to which a hydrogen atom is attached, an amino group (NH2), and a carboxyl group (COOH). R group of amino acids determines the functional component and determines the polarity and charge of amino acids that finally affects both the chemical and biological properties of the protein. During protein synthesis, amino acids are joined end-to-end with the formation of peptide bonds where the carboxyl group of one amino acid condenses with the amino group of the next with the elimination of a water molecule. The process continues to elongate the peptide chain. Proteins play a wide array of roles at the cellular level to the molecular level, and research works focus on determining the complete protein structure from an atomic level. Protein functions in every task of cellular life, for example, structural proteins maintain cell shape, catalyze the biochemical reactions inside cells, monitoring metabolic signals from outside the cell and in the replication process.
Depending on the complexity level, protein structure can be divided into four main categories, primary, secondary, and tertiary, and some proteins have a quaternary structure. In the primary structure of proteins, amino acids are arranged to form polypeptide chains. That is, each amino acid is connected to the next amino acid, the amine group of one molecule and the carboxyl group of the adjacent molecule forms a peptide bond (-CO-NH) with the elimination of the water molecule. The bond between the N terminal and C terminal of amino acids is resistant to chemical actions, heat, and mutations in this sequence can affect protein folding and protein function.
Proteins generally assume unique secondary, tertiary, and quaternary structures for producing native confirmations for a protein molecule. Secondary structures refer to the local folding of polypeptide chains with the interactions of atoms in the backbone. Hydrogen bonding occurs between local amino acid groups in the polypeptide chain. These folding usually produce helical pleated sheets and random confirmations. Usually, there exist two types of secondary structures: α-helix and β-pleated sheets that are thought to have structural importance in most globular and fibrous proteins. α-helix and β-pleated confirmations were the thermodynamically stable form of the secondary structure of proteins. Helical structures indicate the number of residues per turn of the helix and the distance between the alpha carbon atom of the nearby amino acid groups parallel to the helical structure. An α- helix is a right-handed coil of amino-acid sequence present in a polypeptide chain, including 4-40 amino acid residues. Here, hydrogen bonding is formed between the oxygen atom of the polypeptide backbone carbonyl group in one amino acid and the polypeptide backbone amino group of another amino acid (N−H group to the C=O group of the amino acid), forming a helical turn with 3.6 amino acid residues. The side chains (R group) protruding out from α- helix is not involved in forming H bonds that maintain the α-helix structure. The pitch of the helix represents the repeating units of the α–helical structure. The pitch of a single turn of helix is 5.4 A. Examples: Keratin, Myoglobin, and Haemoglobin.
At the tertiary level, proteins have a three-dimensional shape which is due to interactions between the R groups of the amino acids that make up the protein molecule. Functional groups on the outer surface interact with other molecules forming hydrogen bonding, ionic bonding, dipole-dipole interactions, and London dispersion forces (non-covalent bonds). Hydrophobic interactions are also common in tertiary structure, where nonpolar amino acids with hydrophobic R groups form clustered structures inside the protein, leaving hydrophilic amino acids on the outside to interact with surrounding water molecules. Covalent linkages between the sulfur-containing side chains of cysteine produce disulfide bonds which also contribute to the tertiary structure of proteins. It has been noted that the tertiary structure of amino acids provides functional properties to the polypeptide chain. When protein, such as enzymes, lose their structure, for example, due to high temperature, the protein gets denatured and lose its biological properties, which indicates that primary structure is applicable in determining the more complex folding. Examples of tertiary structures include globular proteins (most proteins and enzymes within our cells) and fibrous proteins (cartilage).
Usually, most proteins are made up of a single polypeptide chain and have only three levels of structure. Some proteins have multiple polypeptide chains, with many subunits, and these subunits join to form the quaternary structure of a protein. Each of these subunits has its own primary, secondary, and tertiary structure and is held by hydrogen bonds and van der Waals forces between nonpolar side chains. Examples included hemoglobin and DNA polymerase. The most common genetic disorders, cystic fibrosis, sickle cell anemia, and albinism, etc., are due to mutations in the primary protein structures leading to alterations in the secondary, tertiary, and probably quarterly structure.
Visualization of Protein Structure
Understanding the structure and function of proteins is helpful for biologists in studying the role of proteins in evolution, mutation, drug discovery process, gene finding, micro-array technologies, and solving biological problems. The advanced visualization of protein structures and structure-function relationships relies on molecular-level visualization to evaluate atomistic details of proteins and protein representations and related biomolecular structures. To visualize and analyze biological structures, many resources are available online for exploring structural information at the molecular level. Software for determining and examining structural information of biomolecules is called structure visualization tools. A few examples include pyMOL, RasMol, VMD, Jmol, JSmol, Chimera, Chime, Cn3D, and Swiss-PDB viewer.
Users can visualize the secondary and tertiary structure of proteins with the aid of a developed graphical user interface. This simulator was designed and developed by referring to an online protein visualizing tool, PDBe Molstar. Mol* (molstar) is a web-based open-source toolkit used throughout RCSB.org, generating the 2D images available for every structure and entity in the PDB.
Protein Data Bank (PDB)
Protein Data Bank (PDB) represents a single global archive for storing, processing, and distributing three-dimensional (3D) structure data of biological macromolecules. Protein Data Bank (pdb.org) was established in 1971 and is thought to be the first open-access, molecular data resource in biology structure determination. It serves as a global repository determining structural information of proteins, DNA, RNA, and their complexes with other molecules and metal ions. It included the data derived from NMR, X-ray crystallography, cryoelectron microscopy, and theoretical modeling. Since 2003, the PDB has been managed jointly managed by the Worldwide Protein Data Bank (wwPDB), a network of four organizations
• Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB, PDB)
• PDB in Europe (PDBe) (Europe),
• PDB Japan (PDBj) (Japan),
• Biological Magnetic Resonance Data Bank (BMRB) (USA)
The mission was to maintain a single Protein Data Bank repository of macromolecular structural data that can be freely and publicly available to the global community for promoting applied research in education across biosciences. The wwPDB partners ensure adherence to the FAIR Principles of Findability-Accessibility-Interoperability-Reusability.
PDB includes more than 144 000 experimentally determined structures of protein. The number of structures of proteins added to the database is growing rapidly, including ribosomal subunits, viruses, and enzyme complexes. PDBe developed a browser-based approach to get access to protein structure for analyzing the structural archive using classification systems that are familiar to chemists and biologists. PDBe works in association with European Bioinformatics Institute (EBI) and the international scientific community to develop new resources with value-added information. PDBe Molstar represents a streamlined structure viewer that enables to explore a PDB structure within a browser rather than requiring pre-installed molecular graphics software. It displays validation and domain information for the given protein. Simultaneously it visualizes hundreds of (superimposed) protein structures, play molecular dynamics trajectories, renders cell-level models at atomic detail with tens of millions of atoms, or displays huge models obtained by I/HM such as the Nuclear Pore Complex (https://molstar.org/). Data in PDB is stored in PDB file format, mmCIF format (macromolecular crystallographic information file), and PDBML format (Protein data bank markup language) and is associated with a pdbid.
PDB identification code or PDB Id is a 4-character alphanumeric accession code given to every molecular model in the PDB databank. The first character of the protein code is a number ranging from 1-9, and the last three characters can be either numeral (in the range 0-9) or letters (in the range A-Z). PDB identifiers are used at all levels of the structural hierarchy specifically to select, visualize, and locate a specific ligand and amino acids in a protein chain. Also PDB IDs were used to connect different data resources in the PDB archives and indicate the structural relationship between various biological molecules. Some example of PDB IDS is 7YTT,7UQ2, 7W4K, 8DOV etc.
RASMOL
Rasmol is a user-friendly open-source molecular graphics program created by Roger Sayle in 1992 for visualizing protein structure, specifically to explore and depict structures of biological macromolecules DNA, proteins, and smaller molecules found in PDB for preparing publication-quality images. The RasMol visualizes the biomolecule with a variety of color schemes and molecule representations, including depth-cued wireframes, 'Dreiding' sticks, space-filling (CPK) spheres, either smooth-shaded solid ribbons or parallel strands, hydrogen bonding, ball and stick, solid and strand biomolecular ribbons, atom labels, and dot surfaces.
PyMOL
PyMOL is an open-source molecular visualization system with a cross-platform that provides a user-friendly interface to visualize the 3D structure of proteins, nucleic acids, small molecules, and other trajectories in different representations, including ribbons, cartoons, dots, surfaces, spheres, sticks, and lines. PyMOL is applied in computational drug discovery processes to find new drug candidates for various targets. Using PyMOL, one can study the structure and function of molecules for designing and optimizing drugs, protein-ligand interactions studies, and visual representation of proteins and their main structural features.