PTGL is a web-based database application for the analysis protein topologies. It uses a graph-based model to describe the structure
of protein chains on the super-secondary structure level. A protein graph is computed from the 3D atomic coordinates of a single chain in
a PDB file and the secondary structure assignments of the DSSP algorithm. The computation of the protein graph is done by our software Visualization of Protein Ligand Graphs (VPLG). In a protein graph graph, vertices represent secondary
structure elements (SSEs, usually alpha helices and beta strands) or ligand molecules while the edges model contacts and relative orientations between
them. The result is an undirected, labelled graph for a single protein chain.

Using the 3D structure data from the PDB, the SSEs are defined according to
the assignment of the DSSP algorithm with some
modifications. Then, the spatial
contacts between the SSEs are computed according to Koch *et al.*, 2013. For the ligand versions, the explanation can
be found in Schäfer *et al.*, 2012. This information forms the basis for the description of protein structures as
graphs.

A Protein graph is defined as labeled, undirected graph. The vertices correspond to the SSEs or ligands, and they are labeled with the SSE type (alpha helix, beta strand or ligand). The vertices of the Protein graph are enumerated as they occur in the sequence from the N- to the C-terminus.

The edges of the Protein graph represent spatial adjacencies of SSEs. These adjacencies are defined through atom contacts between SSEs, based on the van-der-Waals radius. According to this direction two spatial neighboured SSEs, which are connected, could have a parallel (p), anti-parallel (a), or mixed (m) neighbourhood.

Protein graphs are based on contacts between SSEs. Here, we explain how an SSE contact is defined in the PTGL. The computation of SSE contacts is a 2-step process: first, the atom contacts for the residues are computed, and the residues are assigned to SSEs. Then, a ruleset is used to determine whether enough atom contacts exist between a pair of SSEs to define this as a contact on the SSE level.

We use a hard-sphere model to compute atom contacts. Atom positions are parsed from PDB files, and a collision sphere with radius 2 Angstroem is assigned to each protein atom (hydrogens are ignored). For ligand atoms, the radius is 3 Angstroem. If the collision spheres of 2 atoms from different residues overlap, this is considered an atom level contact.

The different atoms of a residue are backbone atoms or side chain atoms. Based on this differentation, each atom level contact is assigned to one of the following types:

- BB: backbone - backbone contact
- BC: backbone - side chain contact
- CC: side chain - side chain contact
- LB: ligand - backbone contact
- LC: ligand - side chain contact
- LL: ligand - ligand contact
- LX: ligand - non-ligand contact, i.e., LX = LB or LC

Based on the atom level contacts, a rule set is applied to decide whether or not a pair of SSEs is in contact. The rules depend on the SSE types, and are as follows:

SSE 1 type | SSE 2 type | Required contacts |
---|---|---|

Beta strand | Beta strand | BB > 1 or CC > 2 |

Helix | Beta strand | (BB > 1 and BC > 3) or CC > 3 |

Helix | Helix | BC > 3 or CC > 3 |

Ligand | Any type | LX >= 1 |

For more details on the contact definition, please see the following publication: **Schäfer T, May P, Koch I (2012). Computation and Visualization of Protein Topology Graphs Including Ligand Information. German Conference on Bioinformatics 2012; 108-118**.

If only a certain SSE type is of interest, the graph modelling allows to exclude the non-interesting SSE types. According to the SSE type of interest, the Protein graph can be defined as an Alpha graph, Beta graph, or Alpha-Beta graph. If you are interested in the ligands as well, you can also use the Alpha-Ligand graph, the Beta-Ligand graph, and the Alpha-Beta-Ligand graph.

The Alpha graph only contains alpha helices and the contacts between them. The Alpha-Beta graph contains alpha helices, beta strands and the contacts betweem them. And so on.

In the graph visualizations available on the PTGL server, the SSEs are ordered as red circles (helices), black quadrats (strands), or magenta rings (ligands) on a straight line according to their sequential order from the N- to the C-terminus. The spatial neighbourhoods are drawn as arcs between SSEs. The edges are coloured according to their labelling, red for parallel, green for mixed, blue for anti-parallel, and magenta for ligand neighbourhood. Here is the key for the images:

A connected component of the Protein graph is called Folding graph. Folding graphs are denoted with capital letters in alphabetical order according to their occurrence in the sequence, beginning at the N-terminus.

Protein graphs are built of one or more Folding graphs. Below, you find the schematic representation of the antigen receptor protein 1BEC. Helices are coloured red and strands blue. 1BEC is a transport membrane protein
that detects foreign molecules at the cell surface. It has two domains, which are represented by the Folding graphs A and E, which are mainly
built by strands. The protein consists of one chain A and exhibits six Folding graphs. Two large Folding graphs (Folding graphs 1BEC_A and
1BEC_E), and four Folding graphs 1BEC_B, 1BEC_C, 1BEC_D, and 1BEC_F consisting only of a single helix (see

A notation serves as a unique, canonical, and linear description and classification of structures. The notations for Folding graphs reveal to the feature of protein structure as a linear sequence of amino acids, and describe the arrangement of SSEs correctly and completely.

There are two possibilities of representing Protein graphs: first, one can order the SSEs in one line according to their occurrence in sequence, or second, according to their occurrence in space. In the first case, the adjacent notation, ADJ, the reduced notation, RED, and the sequence notation, SEQ, SSEs are ordered as points on a straight line according to their sequential order from the N- to the C-terminus.

It is difficult to draw the spatial arrangements of the SSEs in a straight line, because in most proteins SSEs exhibit more than two spatial neighbours. Therefore, the second description type, the key notation, KEY, can be drawn only for non-bifurcated Folding graphs. Helices and strands are represented by cylinders and arrows, respectively. The sequential neighbourhood is described by arcs between arrows and cylinders.

The notations are written in different brackets: [] denote non-bifurcated, {} bifurcated folding graphs, and () indicate barrel structures.All vertices of the Protein graph are considered in the adjacent (ADJ) notation of a Folding graph. SSEs of the Folding graph are ordered according to their occurrence in the sequence. Beginning with the first SSE and following the spatial neighbourhoods the sequential distances are noted followed by the neighbourhood type.

The reduced (RED) notation is the same as for ADJ notation, but only those SSEs of the considered Folding graph count. See below, the ADJ and RED notations of the Beta-Folding graph E in human alpha thrombin chain B(1D3T). The beta sheet consists of six strands arranged both in parallel with one additional mixed edge to helix 12.The linear notations enable you to search the PTGL for protein motifs (and arbitruary other 3D arrangements of SSEs). When you search for a motif, SQL-based string matching in the linear notation strings is used to find all folding graphs which match a query.

You can link PTGL in several ways, depending on the kind of data you want:

- Link to all protein graphs of a chain:
- Format: http://ptgl.uni-frankfurt.de/results.php?q=<pdbid><chain>
- The allowed values for the parameters are:
- <pdbid>: a PDB identifier
- <chain>: a PDB chain name
- Example for PDB 7tim, chain A: http://ptgl.uni-frankfurt.de/results.php?q=7timA
- Link to all folding graph linear notations of a protein graph:
- Format: http://ptgl.uni-frankfurt.de/foldinggraphs.php?pdbchain=<pdbid><chain>&graphtype_int=<graphtype_code>¬ationtype=<notation>
- The allowed values for the parameters are:
- <pdbid>: a PDB identifier
- <chain>: a PDB chain name
- <graphtype_code>: 1=alpha, 2=beta, 3=albe, 4=alphalig, 5=betalig, 6=albelig
- <notation>: a notaion: adj, red, seq or key
- Example for the ADJ notation folding graphs of the alpha protein graph of PDB 7tim chain A: http://ptgl.uni-frankfurt.defoldinggraphs.php?pdbchain=7timA&graphtype_int=1¬ationtype=adj

We are offering a REST API for programmers. Please see the API documentation for details.