Search

remove_missing_atoms

Status
Done
What it does:
Removes atoms with missing coordinates from a structure file
When to use:
When there are missing atoms marked with a specific coordinate placeholder value

Python code

from Bio.PDB import PDBParser import numpy as np def remove_missing_atoms(pdb_fn: str, missing_coord_placeholder: float): """ Removes atoms with missing coordinates from a PDB structure. Parameters: ----------- pdb_fn : str Path to the PDB file missing_coord_placeholder : float Value used as a placeholder for missing coordinates Returns: -------- structure : Bio.PDB.Structure.Structure Structure object with missing atoms removed """ parser = PDBParser(QUIET=True) structure = parser.get_structure('X', pdb_fn) for model in list(structure): for chain in list(model): for residue in list(chain): for atom in list(residue): if np.any(atom.coord == missing_coord_placeholder): chain.detach_child(residue.id) if len(residue) == 0: chain.detach_child(residue.id) if len(chain) == 0: model.detach_child(chain.id) if len(model) == 0: structure.detach_child(model.id) return structure
Python
복사

Example usage

pdb_fn = "1ABC.pdb" missing_coord_placeholder = 99.999 # Common placeholder value structure = remove_missing_atoms(pdb_fn, missing_coord_placeholder)
Python
복사

Tips and Tricks

Always iterate over each child with for {ChildInstance} in list({ParentInstance}): .
If you iterate each child with for {ChildInstance} in {ParentInstance}: while performing {ParentInstance}.detach_child({ChildInstance}.id), the size of the collection (ParentInstance) changes, which can lead to runtime errors if you don’t use list({ParentInstance}). This is because for {ChildInstance} in {ParentInstance}: is essentially iterating over a Python Iterator.
The function removes atoms with the specified placeholder value and then cleans up any empty residues, chains, or models that might result from removing those atoms.
Common placeholder values in PDB files include 99.999, 0.0, or NaN, depending on the source of the file.