What it does:
Removes atoms with missing coordinates from a structure file
When to use:
When there are missing atoms marked with a specific coordinate placeholder value
Python code
from Bio.PDB import PDBParser
import numpy as np
def remove_missing_atoms(pdb_fn: str, missing_coord_placeholder: float):
"""
Removes atoms with missing coordinates from a PDB structure.
Parameters:
-----------
pdb_fn : str
Path to the PDB file
missing_coord_placeholder : float
Value used as a placeholder for missing coordinates
Returns:
--------
structure : Bio.PDB.Structure.Structure
Structure object with missing atoms removed
"""
parser = PDBParser(QUIET=True)
structure = parser.get_structure('X', pdb_fn)
for model in list(structure):
for chain in list(model):
for residue in list(chain):
for atom in list(residue):
if np.any(atom.coord == missing_coord_placeholder):
chain.detach_child(residue.id)
if len(residue) == 0:
chain.detach_child(residue.id)
if len(chain) == 0:
model.detach_child(chain.id)
if len(model) == 0:
structure.detach_child(model.id)
return structure
Python
복사
Example usage
pdb_fn = "1ABC.pdb"
missing_coord_placeholder = 99.999 # Common placeholder value
structure = remove_missing_atoms(pdb_fn, missing_coord_placeholder)
Python
복사
Tips and Tricks
•
Always iterate over each child with for {ChildInstance} in list({ParentInstance}): .
If you iterate each child with for {ChildInstance} in {ParentInstance}: while performing {ParentInstance}.detach_child({ChildInstance}.id), the size of the collection (ParentInstance) changes, which can lead to runtime errors if you don’t use list({ParentInstance}). This is because for {ChildInstance} in {ParentInstance}: is essentially iterating over a Python Iterator.
•
The function removes atoms with the specified placeholder value and then cleans up any empty residues, chains, or models that might result from removing those atoms.
•
Common placeholder values in PDB files include 99.999, 0.0, or NaN, depending on the source of the file.