What is PTGL?

Table of contents


Overview

PTGL stands for Protein Topology Graph Library. It is a database of protein structure topologies modeled as undirected, labeled graphs. It provides a web server to visualize and analyze protein structure topologies at different scales, e.g., secondary structure level and chain level. The graph computation was done by our software Visualization of Protein-Ligand Graphs (VPLG). VPLG is based on the 3D atomic coordinates from the PDB either as legacy PDB file or as macromolecular Crystallographic Information File (mmCIF). Reading mmCIFs allows processing large protein structures (> 62 chains or > 99,999 atoms). The secondary structure assignment is parsed with small adaptions from a DSSP file.


Protein Graph computation

a) Sphere-style, b) stick-style and c) cartoon-style three-dimensional protein structure representation. d) Red and black circles denote helices and strands, respectively. e) Lines connecting the circles denote spatial neighborhood. f) Protein Graph visualization.


Contact definition

We define contacts on different levels of abstraction or scales, respectively.

Atom

Atoms are modelled as hard spheres with a radius of 2 Å for atoms of amino acids and of 3 Å for ligand atoms. An atom-atom contact is defined if two hard spheres overlap. For atoms of amino acids the contact is differentiated depending on the position of the involved atoms in the amino acid backbone or side chain. This yields atom level contacts of the following types:

Residue

A residue-residue contact is defined if two residues share an atom contact.

Secondary structure element

Depending on the type of the secondary structure elements (SSEs), we applied a rule set:
SSE 1 typeSSE 2 typeRequired contacts
Beta strandBeta strandBB > 1 or BC > 2
HelixBeta strandBB > 1 or BC > 3 or CC > 3
HelixHelixBC > 3 or CC > 3
LigandAny typeLX >= 1
For more details on the contact definition, please see the following publication: Schäfer T, May P, Koch I (2012). Computation and Visualization of Protein Topology Graphs Including Ligand Information. German Conference on Bioinformatics 2012; 108-118.

Chain

A chain-chain contact is defined if two chains share atleat one residue-residue contact-


Protein Graphs

A Protein Graph is defined as labeled, undirected graph. The vertices correspond to the secondary structure elements or ligands, and they are labeled with the secondary structure element type (alpha helix, beta strand or ligand). The vertices of the Protein Graph are enumerated as they occur in the sequence from the N- to the C-terminus.

The edges of the Protein Graph represent spatial adjacencies of secondary structure elements (see contact definition). According to the direction of the spatical adjacent SEEs, their orientation to each other can be parallel (p), anti-parallel (a), or mixed (m).


Visualization

In the graph visualizations available on the PTGL server, the secondary structure elements are ordered as red circles (helices), black quadrats (strands), or magenta rings (ligands) on a straight line according to their sequential order from the N- to the C-terminus. The spatial neighborhoods are drawn as arcs between secondary structure elements. The edges are colored according to their labeling as parallel (red), anti-parallel (blue), mixed (green) or ligand (magenta). This is the key for the images:

PTGL graph image key

Below the vertices there are the numbers of the secondary structure elements for this type of Protein Graph (PG) and for the occurrence in the sequence of the whole list of secondary structure elements (SQ).


Graph types

If only a certain secondary structure element type is of interest, the graph modelling allows to exclude the non-interesting secondary structure element types. According to the secondary structure element type of interest, the Protein Graph can be defined as an Alpha graph, Beta graph, or Alpha-Beta graph. If you are interested in the ligands as well, you can also use the Alpha-Ligand graph, the Beta-Ligand graph, and the Alpha-Beta-Ligand graph.

The Alpha graph contains only alpha helices and the contacts between them. The Alpha-Beta graph contains alpha helices, beta strands and the contacts betweem them, and so on.

As an example for Protein Graphs and their graph types, we present the structure with PDB ID 7tim [Davenport et al., 1991, Biochemistry].


Alpha Graph
The Alpha-Graph of the protein 7tim chain A consisting of 13 helices.

Alpha Graph of 7timA


Beta Graph
The Beta-Graph of the protein 7tim chain A consisting of 8 strands. Note the beta barrel in the protein, which is clearly visible as a circle of parallel beta-strands in this graph.

Beta Graph of 7timA


Alpha-Beta Graph
The Alpha-Beta Graph of the protein 7tim chain A consisting of 21 secondary structure elements (13 helices and 8 strands).

Alpha-Beta Graph of 7timA


Alpha-Ligand Graph
The Alpha-Ligand Graph of the protein 7tim chain A consisting of 13 helices and 1 ligand.

Alpha-Ligand Graph of 7timA


Beta-Ligand Graph
The Beta-Ligand-Graph of the protein 7tim chain A consisting of 8 strands and 1 ligand.

Beta-Ligand Graph of 7timA


Alpha-Beta-Ligand Graph
The Alpha-Beta-Ligand Graph of the protein 7tim chain A consisting of 22 secondary structure elements (13 helices, 8 strands and 1 ligand).

Alpha-Beta-Ligand Graph of 7timA


Folding Graphs

A connected component of a Protein Graph is called Folding Graph. Folding Graphs are denoted with capital letters in alphabetical order according to their occurrence in the sequence, beginning at the N-terminus. Folding Graphs consisting of only one secondary structure element are found mostly at the protein surface and not in the protein core. Especially in beta-sheet containing Folding Graphs, the maximal vertex degree of the Folding Graphs is rarely larger than two. Thus, we distinguish between so-called bifurcated and non-bifurcated topological structures. A Protein Graph or a Folding Graph is called bifurcated, if the vertex degree is greater than two, and called non- bifurcated otherwise.


Visualization

In the graph visualizations available on the PTGL server, Folding Graphs follow the Protein Graph visualization. The footer additionally contains the number of the secondary structure element in the Folding Graph (FG).

As an example we present an antigen receptor protein structure with PDB ID 1bec [Bentley et al., 1995, Science]. 1bec is a transport membrane protein that detects foreign molecules at the cell surface. The protein consists of one chain A and exhibits three Folding Graphs. It has two domains, which are represented by the Folding Graphs A and C, which are mainly built by strands. Two Folding Graphs (Folding Graphs 1bec_A and 1bec_C) are large enough to be of interest, and one Folding Graph (1bec_B) consists only of a single helix (see Protein Graph of 1bec: helix 13).


3D structure of 1bec:

3D structure of 1bec

Alpha-Beta Protein Graph of 1bec:

Alpha-Beta Protein Graph of 1becA

Alpha-Beta Folding Graph A of 1bec:

Alpha-Beta Folding Graph A of 1becA

Alpha-Beta Folding Graph C of 1bec:

Alpha-Beta Folding Graph C of 1becA


Linear Notations

A notation serves as a unique, canonical, and linear description and classification of structures. The notations for Folding Graphs resemble a protein structure as a linear sequence of secondary structure elements and describe the arrangement of secondary structure elements uniquely. Linear notations enable you to search the PTGL for protein motifs. Searching for a structure, SQL-based string matching in the linear notation strings is used to find all folding graphs which match the query.

The linear notations are written in different brackets:

Secondary structure elements are denoted by single characters:

For alpha or beta Folding Graphs, the characters of secondary structure elements not described are left out.

Edges are completely described by the start and end vertex and their label, i.e., parallel, antiparallel, mixed or ligand. The start and end vertex are saved implicitly as the linear notation traverses the graph and only saves in which direction, i.e. "+" towards C-terminus and "-" towards N-terminus, the edge goes. Edge labels are denoted by single characters:


Notation types

There are two possibilities of representing Folding Graphs: first, one can order the secondary structure elements in one line according to their occurrence in sequence, or second, according to their occurrence in space. In the first case, the adjacent (ADJ), the reduced (RED), and the sequence (SEQ) Folding Graphs, secondary structure elements are ordered as points on a straight line according to their sequential order from the N- to the C-terminus. In the second case, the key (KEY) Folding Graph, secondary structure elements are represented as red rectangles and black arrows for helices and strands, respectively. They are ordered in a straight line corresponding to their spatial arrangement. This is difficult, because in most proteins, secondary structure elements exhibit more than two spatial neighbours. Therefore, KEY Folding Graphs can only be drawn for non-bifurcated Folding Graphs.

Folding Graphs for the different graph types can be derived from the different Protein Graph types: alpha, beta, alpha-beta, alpha-ligand, beta-ligand and alpha-beta-ligand.


Adjacent (ADJ) Folding Graphs

Secondary structure elements are ordered by their occurence in the sequence, from N- to C-terminus. All vertices of the Protein Graph are considered in the adjacent notation of a Folding Graph. This means that adjacent Folding Graphs account for secondary structure elements laying between the secondary structure elements of the Folding Graph without being connected to one of them. Vertices of the Protein Graph that are unconnected to vertices of the Folding Graph are colored grey.

ADJ Alpha-Beta Folding Graph of 1d3tB

Adjacent linear notation: {e,1ae,1ae,5ae,-1ae,-2mh,1me,-5ae,5ze,1ae,11mh,-10me,1zh,9mh,-1me,-4ae,-2ae,-1ae,5ae,1ae,1ae,-5zh,1me}


Reduced (RED) Folding Graphs

Secondary structure elements are ordered by their occurence in the sequence, from N- to C-terminus. Reduced Folding Graphs are the same as adjacent Folding Graphs, but only those secondary structure elements part of the Folding Graph are considered.

ADJ Alpha-Beta Folding Graph of 1d3tB

Reduced linear notation: {e,1ae,1ae,4ae,-1ae,-2mh,1me,-4ae,4ze,1ae,10mh,-9me,1zh,8mh,-1me,-3ae,-2ae,-1ae,4ae,1ae,1ae,-4zh,1me}


Sequential (SEQ) Folding Graphs

Secondary structure elements are ordered by their occurence in the sequence, from N- to C-terminus. Sequential Folding Graphs are the same as adjacent Folding Graphs, but the edges stand for sequential instead of spatial neighborhood. Because vertices from the Protein Graph that are unconnected to vertices of the Folding Graph are included, but left out from the sequential neighborhood consideration, they are bypassed in the sequence of edges.

Although the sequence notation is trivial, the graphs can be useful, for example, searching for ψ-loops requires a special SEQ notation.

ADJ Alpha-Beta Folding Graph of 1d3tB

Sequential linear notation: {e,1e,1e,2h,1e,1e,1e,1h,1e,1e,1h,1e,2e,1e,1e,1h}


KEY Folding Graphs

KEY Folding Graphs can only be created for non-bifurcated Folding Graphs. KEY Folding Grapgs are very close to the topology diagrams of biologists, e.g. Brändén and Tooze (1999). Topologies are described by diagrams of black arrows for strands and red rectangles for helices. As in reduced Folding Graphs, only secondary structure elements of the Folding Graph are considered. Secondary structure elements are ordered spatially and connected in sequential order. See the KEY Folding Graph of the Alpha-Beta Folding Graph B of a histocompatibility antigen 1iebB. The Folding Graph consists of three helices and four strands. This topology exhibits one cross-over connection from helix 6 to helix 7 and forms an Alpha-Beta barrel structure.

Linear notation characteristics: If the arrangement of secondaray structure elements is parallel, an x is noted (Richardson, 1977). In this case, the protein chain moves to the other side of the sheet by crossing the sheet (cross over). Antiparallel arrangements are called same end and are more stable (Chothia and Finkelstein , 1990). Mixed arrangements are defined as same end.

KEY Alpha-Beta Folding Graph of 3j9yO

KEY linear notation: [e,1e,1h,1xh]


Complex Graphs

A Complex Graph is defined as undirected graph. The vertices correspond to protein chains and are named by their author-provided chain ID from the PDB file. Edges denote a spatial contact and the edge weight corresponds to the number of residue-residue contacts. Below the graph there is a label per vertex for the number of the vertex (C#), chain name (CN) and its molecule identifier (ML).

Complex Graph


Motifs

A motif is a common supersecondary structure. A motif consists of only a few secondary structure elements, and it may occur with very different functions. PTGL implements motif detection in Protein Graphs for some chosen motifs based on the linear notations of folding graphs. This enables the search for all chains containing one of the predefined motifs. If you want to search for an arbitrary arrangement of secondary structure elements, use the linear notation search.


Alpha motifs

Four Helix Bundle

The Four Helix Bundle is a protein motif which consists of four alpha helices which arrange in a bundle. There are two types of the Four Helix Bundle which differ in the connections between the alpha helices. The first type of the Four Helix Bundle is all antiparallel and the second type has two pairs of parallel helices which have an antiparallel connection.
Found 6726 times in the current database.


Globin Fold

The Globin Fold is an alpha helix structure motif which is composed of a bundle, consisting of eight alpha helices, which are connected over short loop regions. The helices do not have a fixed arrangement, but the last two helices in sequential order are antiparallel.
Found 3280 times in the current database.


Beta motifs

Up-and-down barrel

The up-and-down barrel is composed of a series of antiparallel beta strands which are connected via hydrogen bonds. There are two major families of the up-and-down barrel, the ten-stranded and the eight-stranded version.
Found 1810 times in the current database.


Immunoglobin fold

The immunoglobulin fold is a two-layer sandwich. Usually, it consists of seven antiparallel beta strands, arranged in two beta sheets. The first is composed of four and the second of three strands. Both are connected via a disulfide bond to build the sandwich.
Found 12397 times in the current database.


Beta Propeller

This beta motif contains between four and eight beta sheets, which are arranged around the center of the protein. Each sheet is formed by four antiparallel beta strands. One sheet makes up one of the propeller blades. To build a four-bladed propeller, for example, four of these sheets are grouped together.
Found 4718 times in the current database.


Jelly Roll

The Jelly Roll motif has a barrel structure, which seems like a jelly roll. The barrel includes eight beta strands, which build a two-layer sandwich of four strands.
Found 5346 times in the current database.


Publications