TM-align

Status

Done

What it does:

Performs TM-align between specific chains in two PDB files

(TM-align is an algorithm for sequence-independent protein structure comparisons)

When to use:

•

When you want to align two protein chains using only structural information, without sequence/residue alignment.

•

If you’re not sure whether two chains have equal sequence or at least equal sequence length, use TM-align to find the best structural match.

•

When comparing protein structures with potentially different evolutionary origins but similar folds

Python code

from Bio.PDB import PDBParser
from Bio.Data.PDBData import protein_letters_3to1
from tmtools import tm_align
import numpy as np

def perform_tm_align(pdb_fn1: str, chain_id1: str, pdb_fn2: str, chain_id2: str, missing_coord_placeholder: float = 99.999):
    """
    Performs TM-align between specific chains in two PDB files.
    
    Parameters:
    -----------
    pdb_fn1 : str
        Path to the first PDB file
    chain_id1 : str
        Chain ID in the first PDB file
    pdb_fn2 : str
        Path to the second PDB file
    chain_id2 : str
        Chain ID in the second PDB file
    missing_coord_placeholder : float
        Value used as a placeholder for missing coordinates
        
    Returns:
    --------
    result : dict
        Dictionary containing TM-align results including TM-score, rotation matrix, etc.
    
    Raises:
    -------
    KeyError: If specified chain IDs don't exist in the structures
    ValueError: If structures have multiple models or insufficient CA atoms
    """

    parser = PDBParser(QUIET=True)
    structure1 = parser.get_structure('X', pdb_fn1)
    structure2 = parser.get_structure('Y', pdb_fn2)
    
    # Ensure we're working with single models
    if len(structure1) != 1:
        raise ValueError(f"First structure contains {len(structure1)} models, expected 1")
    if len(structure2) != 1:
        raise ValueError(f"Second structure contains {len(structure2)} models, expected 1")

    # Check if chains exist
    if chain_id1 not in structure1[0]:
        raise KeyError(f"Chain '{chain_id1}' not found in {pdb_fn1}")
    if chain_id2 not in structure2[0]:
        raise KeyError(f"Chain '{chain_id2}' not found in {pdb_fn2}")

    chain1 = structure1[0][chain_id1]
    chain2 = structure2[0][chain_id2]

    # Extract sequences
    seq1 = ''.join([protein_letters_3to1.get(residue.get_resname(), 'X') for residue in chain1.get_residues()])
    seq2 = ''.join([protein_letters_3to1.get(residue.get_resname(), 'X') for residue in chain2.get_residues()])

    # Extract CA coordinates and filter out missing atoms
    ca_coords1 = []
    for residue in chain1.get_residues():
        if 'CA' in residue:
            ca_coords1.append(residue['CA'].coord)
    
    if not ca_coords1:
        raise ValueError(f"No CA atoms found in chain '{chain_id1}' of {pdb_fn1}")
        
    ca_coords1 = np.array(ca_coords1)
    ca_mask1 = ca_coords1[:, 0] != missing_coord_placeholder
    ca_coords1 = ca_coords1[ca_mask1]
    
    if len(ca_coords1) == 0:
        raise ValueError(f"No valid CA atoms found in chain '{chain_id1}' after filtering missing coordinates")
        
    seq1 = ''.join(list(np.array(list(seq1))[ca_mask1]))

    ca_coords2 = []
    for residue in chain2.get_residues():
        if 'CA' in residue:
            ca_coords2.append(residue['CA'].coord)
            
    if not ca_coords2:
        raise ValueError(f"No CA atoms found in chain '{chain_id2}' of {pdb_fn2}")
        
    ca_coords2 = np.array(ca_coords2)
    ca_mask2 = ca_coords2[:, 0] != missing_coord_placeholder
    ca_coords2 = ca_coords2[ca_mask2]
    
    if len(ca_coords2) == 0:
        raise ValueError(f"No valid CA atoms found in chain '{chain_id2}' after filtering missing coordinates")
        
    seq2 = ''.join(list(np.array(list(seq2))[ca_mask2]))

    # Perform TM-align
    result = tm_align(ca_coords1, ca_coords2, seq1, seq2)
    return result
Python
복사

Example usage

pdb_fn1 = "1mbn.pdb"
chain_1 = "A"
pdb_fn2 = "1pmb.pdb"
chain_2 = "A"
missing_coord_placeholder = 99.999

result = perform_tm_align(pdb_fn1, chain_id1, pdb_fn2, chain_id2, missing_coord_placeholder)
Python
복사

output

Tips and Tricks

•

TM-score has the value in (0, 1], with values >0.5 generally indicating the same fold

•

The `tmtools` package must be installed via pip install tmtools

•

The function checks for CA atoms in each residue, making it more robust against incomplete structures