from rdkit import Chem
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import AllChem
= True
IPythonConsole.drawOptions.addStereoAnnotation =True IPythonConsole.ipython_useSVG
Assigning chiral information with SMARTS templates
How to transform an achiral molecule into a chiral based on a chiral template in RDKit
Introduction
Molecules can be described with undefined stereochemistry, for example by not specifiying whether a certain stereocenter is R or S. Here, we will investigate how we can fix the stereochemistry of such molecules based on a SMARTS template that matches (parts of) the molecule. This could be a useful trick for example with SMILES strings generated with STONED SELFIES that do not (yet) have chiral information.
Code
We import the usual RDKit packages and turn on stereo annotion of the images.
First we will generate the molecule without chiral information
= "C(Cl)(Br)CC"
smiles = Chem.MolFromSmiles(smiles)
mol display(mol)
Our goal is to convert this molecule into the reference below, which has a R configuration of the stereocenter.
= "Br[C@@H](Cl)CC"
smiles_ref = Chem.MolFromSmiles(smiles_ref)
mol_ref display(mol_ref)
We construct a SMARTS pattern with chiral information that will match the stereocenter. RDKit’s visualization of SMARTS patterns is quite limited, and we could instead use SMARTSView to get a better view.
= "[C][C@@H]([Br])[Cl]"
smarts = Chem.MolFromSmarts(smarts)
pattern display(pattern)
We use the GetSubstructMatch
method to obtain the atom indices of mol
that match the pattern. They will be listed in the order of the atoms in the SMARTS pattern. The display of the Mol object is automatically updated to show the matched atoms.
= mol.GetSubstructMatch(pattern)
match print("Matched indices:", match)
display(mol)
Matched indices: (3, 0, 2, 1)
The key feature for determining chiral tags in RDKit is the order of the bonds in the Mol object. We will therefore use the following strategy:
- Reorder the bonds in the Mol object so that they match the order in the SMARTS pattern
- Set the chiral tags of each matched atom so that they match the SMARTS pattern
First, we find the indices of the bonds in mol
that match those in pattern
:
= []
indices_matched for bond in pattern.GetBonds():
= bond.GetBeginAtomIdx(), bond.GetEndAtomIdx()
i, j = mol.GetBondBetweenAtoms(match[i], match[j])
bond
indices_matched.append(bond.GetIdx())print("Matched bond indices:", indices_matched)
Matched bond indices: [2, 1, 0]
Then we reorder the bond indices from mol
so that the matched bonds (a) come before the unmatched ones and (b) have the same order as in pattern
= list(range(mol.GetNumBonds()))
indices_all = [i for i in indices_all if i not in indices_matched]
indices_rest = indices_matched + indices_rest
indices_new print("Rest of bond indices:", indices_rest)
print("New bond index order:", indices_new)
Rest of bond indices: [3]
New bond index order: [2, 1, 0, 3]
To actually reorder the bonds, we need to create an editable RWMol
object. We (a) remove all the bonds and then (b) add them back together in the new order.
= Chem.RWMol(mol)
rw_mol
= []
bond_info for bond in list(rw_mol.GetBonds()):
bond_info.append((bond.GetBeginAtomIdx(), bond.GetEndAtomIdx(), bond.GetBondType()))
rw_mol.RemoveBond(bond.GetBeginAtomIdx(), bond.GetEndAtomIdx())
for i in indices_new:
*bond_info[i])
rw_mol.AddBond( display(rw_mol)
Finally, we set the chiral tags of each matched atom to that of the SMARTS pattern and recover a new Mol
object that matches the stereochemistry that we want.
for i, atom in enumerate(pattern.GetAtoms()):
= atom.GetChiralTag()
chiral_tag
rw_mol.GetAtomWithIdx(match[i]).SetChiralTag(chiral_tag)
= rw_mol.GetMol()
mol_new
Chem.SanitizeMol(mol_new)
Chem.AssignCIPLabels(mol_new)
=["Original", "Templated", "Reference"], molsPerRow=3, useSVG=True) Chem.Draw.MolsToGridImage([mol, mol_new, mol_ref], legends
Function
Finally we can put everything together in one function:
def apply_chiral_template(mol, pattern):
# Apply SMARTS pattern to Mol
= mol.GetSubstructMatch(pattern)
match
# Find indices of matched bonds in Mol
= []
indices_matched for bond in pattern.GetBonds():
= bond.GetBeginAtomIdx(), bond.GetEndAtomIdx()
i, j = mol.GetBondBetweenAtoms(match[i], match[j])
bond
indices_matched.append(bond.GetIdx())
# Reorder bond indices to match SMARTS pattern
= list(range(mol.GetNumBonds()))
indices_all = [i for i in indices_all if i not in indices_matched]
indices_rest = indices_matched + indices_rest
indices_new
# Create new Mol. Delete and re-add bonds in new order
= Chem.RWMol(mol)
rw_mol
= []
bond_info for bond in list(rw_mol.GetBonds()):
bond_info.append((bond.GetBeginAtomIdx(), bond.GetEndAtomIdx(), bond.GetBondType()))
rw_mol.RemoveBond(bond.GetBeginAtomIdx(), bond.GetEndAtomIdx())
for i in indices_new:
*bond_info[i])
rw_mol.AddBond(
# Set chiral tags from template to new Mol
for i, atom in enumerate(pattern.GetAtoms()):
= atom.GetChiralTag()
chiral_tag
rw_mol.GetAtomWithIdx(match[i]).SetChiralTag(chiral_tag)
# Recover new Mol
= rw_mol.GetMol()
new_mol
Chem.SanitizeMol(new_mol)
return new_mol
More complex example
Now we will use the function on a more complex example with four stereocenters. Note that R and S labels depend on the CIP order of the substituents, which is not defined in terms of the wildcard atom “*”
= "CC1C(C)C2CCC1C2"
smiles = Chem.MolFromSmiles(smiles)
mol
= "C[C@@H]1[C@H](C)[C@H]2CC[C@@H]1C2"
smiles_ref = Chem.MolFromSmiles(smiles_ref)
mol_ref
= "[*]-[C@@H]1-[C@H](-[*])-[C@H]2-C-C-[C@@H]1-C2"
smarts = Chem.MolFromSmarts(smarts)
pattern # Needed to avoid bug in MolsToGridImage
Chem.SanitizeMol(pattern)
=["Undefined", "Defined", "SMARTS"], useSVG=True) Chem.Draw.MolsToGridImage([mol, mol_ref, pattern], legends
To get the right CIP labels, we need to apply the function AssignCIPLabels
but that should not be crucial except for visualization.
= apply_chiral_template(mol, pattern)
mol_new
Chem.AssignCIPLabels(mol_new)
=["Original", "Templated", "Reference"], molsPerRow=3, useSVG=True) Chem.Draw.MolsToGridImage([mol, mol_new, mol_ref], legends