Table of contents

What is PTGL?

PTGL is a web-based database application for the analysis protein topologies. It uses a graph-based model to describe the structure of protein chains on the super-secondary structure level. A protein graph is computed from the 3D atomic coordinates of a single chain in a PDB file and the secondary structure assignments of the DSSP algorithm. The computation of the protein graph is done by our software Visualization of Protein Ligand Graphs (VPLG). In a protein graph graph, vertices represent secondary structure elements (SSEs, usually alpha helices and beta strands) or ligand molecules while the edges model contacts and relative orientations between them. The result is an undirected, labelled graph for a single protein chain.

Protein graph computation

Protein Graphs

Using the 3D structure data from the PDB, the SSEs are defined according to the assignment of the DSSP algorithm with some modifications. Then, the spatial contacts between the SSEs are computed according to Koch et al., 2013. For the ligand versions, the explanation can be found in Schäfer et al., 2012. This information forms the basis for the description of protein structures as graphs.

A Protein graph is defined as labeled, undirected graph. The vertices correspond to the SSEs or ligands, and they are labeled with the SSE type (alpha helix, beta strand or ligand). The vertices of the Protein graph are enumerated as they occur in the sequence from the N- to the C-terminus.

The edges of the Protein graph represent spatial adjacencies of SSEs. These adjacencies are defined through atom contacts between SSEs, based on the van-der-Waals radius. According to this direction two spatial neighboured SSEs, which are connected, could have a parallel (p), anti-parallel (a), or mixed (m) neighbourhood.

Protein graph

Atom contacts and SSE contacts

Protein graphs are based on contacts between SSEs. Here, we explain how an SSE contact is defined in the PTGL. The computation of SSE contacts is a 2-step process: first, the atom contacts for the residues are computed, and the residues are assigned to SSEs. Then, a ruleset is used to determine whether enough atom contacts exist between a pair of SSEs to define this as a contact on the SSE level.

Atom level contacts
We use a hard-sphere model to compute atom contacts. Atom positions are parsed from PDB files, and a collision sphere with radius 2 Angstroem is assigned to each protein atom (hydrogens are ignored). For ligand atoms, the radius is 3 Angstroem. If the collision spheres of 2 atoms from different residues overlap, this is considered an atom level contact.
The different atoms of a residue are backbone atoms or side chain atoms. Based on this differentation, each atom level contact is assigned to one of the following types:

SSE level contacts
Based on the atom level contacts, a rule set is applied to decide whether or not a pair of SSEs is in contact. The rules depend on the SSE types, and are as follows:
SSE 1 typeSSE 2 typeRequired contacts
Beta strandBeta strandBB > 1 or CC > 2
HelixBeta strand(BB > 1 and BC > 3) or CC > 3
HelixHelixBC > 3 or CC > 3
LigandAny typeLX >= 1

For more details on the contact definition, please see the following publication: Schäfer T, May P, Koch I (2012). Computation and Visualization of Protein Topology Graphs Including Ligand Information. German Conference on Bioinformatics 2012; 108-118.

Graph Types

If only a certain SSE type is of interest, the graph modelling allows to exclude the non-interesting SSE types. According to the SSE type of interest, the Protein graph can be defined as an Alpha graph, Beta graph, or Alpha-Beta graph. If you are interested in the ligands as well, you can also use the Alpha-Ligand graph, the Beta-Ligand graph, and the Alpha-Beta-Ligand graph.

The Alpha graph only contains alpha helices and the contacts between them. The Alpha-Beta graph contains alpha helices, beta strands and the contacts betweem them. And so on.

Interpreting the graph images

In the graph visualizations available on the PTGL server, the SSEs are ordered as red circles (helices), black quadrats (strands), or magenta rings (ligands) on a straight line according to their sequential order from the N- to the C-terminus. The spatial neighbourhoods are drawn as arcs between SSEs. The edges are coloured according to their labelling, red for parallel, green for mixed, blue for anti-parallel, and magenta for ligand neighbourhood. Here is the key for the images:

PTGL graph image key

PDB 7TIM as an example for the different graph types

Alpha Graph
The Alpha-Graph of the protein 7TIM chain A consisting only of 13 helices.

Alpha Graph of 7timA

Beta Graph
The Beta-Graph of the protein 7TIM chain A consisting only of 8 strands. Note the beta barrel in the protein, which is clearly visible as a circle of parallel beta-strands in this graph.

Alpha Graph of 7timA

Alpha-Beta Graph
The Alpha-Beta Graph of the protein 7TIM chain A consisting of 21 SSEs (13 helices and 8 strands).

Alpha-Beta Graph of 7timA

Alpha-Ligand Graph
The Alpha-Ligand Graph of the protein 7TIM chain A consisting of 13 helices and 1 ligand.

Alpha-Ligand Graph of 7timA

Beta-Ligand Graph
The Beta-Ligand-Graph of the protein 7TIM chain A consisting of 8 strands and 1 ligand.

Beta-Ligand Graph of 7timA

Alpha-Beta-Ligand Graph
The Alpha-Beta-Ligand Graph of the protein 7TIM chain A consisting of 22 SSEs (13 helices, 8 strands and 1 ligand).

Alpha-Beta-Ligand Graph of 7timA

Folding Graphs

A connected component of the Protein graph is called Folding graph. Folding graphs are denoted with capital letters in alphabetical order according to their occurrence in the sequence, beginning at the N-terminus.

Protein graphs are built of one or more Folding graphs. Below, you find the schematic representation of the antigen receptor protein 1BEC. Helices are coloured red and strands blue. 1BEC is a transport membrane protein that detects foreign molecules at the cell surface. It has two domains, which are represented by the Folding graphs A and E, which are mainly built by strands. The protein consists of one chain A and exhibits six Folding graphs. Two large Folding graphs (Folding graphs 1BEC_A and 1BEC_E), and four Folding graphs 1BEC_B, 1BEC_C, 1BEC_D, and 1BEC_F consisting only of a single helix (see Protein graph of 1bec: helices 9, 11, 14, and 22). Folding graphs consisting of only one SSE are found mostly at the protein surface and not in the protein core.

Especially in beta-sheet containing Folding graphs, the maximal vertex degree of the Folding graphs is rarely larger than two. Thus, we distinguish between so-called bifurcated and non-bifurcated topological structures. A Protein graph or a Folding graph is called bifucated, if there is any vertex degree greater than 2, if not, the graph is non- bifurcated.

3D structure of 1BEC:

3D structure of 1BEC

Alpha-Beta Protein graph of 1BEC:

Alpha-Beta Protein graph of 1BEC

Alpha-Beta Folding graph A of 1BEC:

Alpha-Beta Folding graph A of 1BEC

Alpha-Beta Folding graph B of 1BEC:

Alpha-Beta Folding graph B of 1BEC

Linear Notations

A notation serves as a unique, canonical, and linear description and classification of structures. The notations for Folding graphs reveal to the feature of protein structure as a linear sequence of amino acids, and describe the arrangement of SSEs correctly and completely.

There are two possibilities of representing Protein graphs: first, one can order the SSEs in one line according to their occurrence in sequence, or second, according to their occurrence in space. In the first case, the adjacent notation, ADJ, the reduced notation, RED, and the sequence notation, SEQ, SSEs are ordered as points on a straight line according to their sequential order from the N- to the C-terminus.

It is difficult to draw the spatial arrangements of the SSEs in a straight line, because in most proteins SSEs exhibit more than two spatial neighbours. Therefore, the second description type, the key notation, KEY, can be drawn only for non-bifurcated Folding graphs. Helices and strands are represented by cylinders and arrows, respectively. The sequential neighbourhood is described by arcs between arrows and cylinders.

The notations are written in different brackets: [] denote non-bifurcated, {} bifurcated folding graphs, and () indicate barrel structures.

The adjucent and reduced notation

All vertices of the Protein graph are considered in the adjacent (ADJ) notation of a Folding graph. SSEs of the Folding graph are ordered according to their occurrence in the sequence. Beginning with the first SSE and following the spatial neighbourhoods the sequential distances are noted followed by the neighbourhood type.

The reduced (RED) notation is the same as for ADJ notation, but only those SSEs of the considered Folding graph count. See below, the ADJ and RED notations of the Beta-Folding graph E in human alpha thrombin chain B(1D3T). The beta sheet consists of six strands arranged both in parallel with one additional mixed edge to helix 12.

ADJ Notation

Adjacent notation

RED Notation

Reduced notation

KEY Notation
The KEY notation is very close to the topology diagrams of biologists, e.g. Brändén and Tooze (1999). Topologies are described by diagrams of arrows for strands and cylinders for helices. As in the RED notation SSEs of the considered Folding graph are taken into account. SSEs are ordered spatially and are connected in sequential order. Beginning with the first SSE in the sequence and following the sequential edges, the spatial distances are noted; in Alpha-Beta graphs followed by the type of the SSE, h for a helix and e for a strand. If the arrangement of SSEs is parallel an x is noted (Richardson(1977)). In this case the protein chain moves on the other side of the sheet by crossing the sheet (cross over). Antiparallel arrangements are called same end, and are more stable, Chothia and Finkelstein (1990). Mixed arrangements are defined as same end. The notation starts with the type of the first SSE. See the KEY notation of the Alpha-Beta Folding graph B chain B of the histocompatibility antigen (1IEB). The Folding graph consists of 3 helices and 4 strands. This topology exhibits one cross over connection from helix 6 to helix 7 and forms an Alpha-Beta barrel structure.

Key notation

SEQ Notation
This notation is the same as the ADJ notation, but the sequential differences are counted. Although the SEQ notation is trivial, the notation can be useful, for example, searching for ψ-loops requires a special SEQ notation.

Sequence notation

The linear notations enable you to search the PTGL for protein motifs (and arbitruary other 3D arrangements of SSEs). When you search for a motif, SQL-based string matching in the linear notation strings is used to find all folding graphs which match a query.

Linking PTGL

You can link PTGL in several ways, depending on the kind of data you want:


We are offering a REST API for programmers. Please see the API documentation for details.