MolSetInspector

MolSetInspector (Molecular Sets Inspector) is a Python package which facilitates the processing of multiple molecular sets stored in various text file formats. As its input, it takes a directory containing the sets of molecules stored in sdf, csv, smi or txt files. The sets are read and joined in one library consisting of distinct molecules. During processing, the molecules are canonicalized, can be standardised (neutralised, unsalted etc.) and tautomers can be removed. As a result, MolSetInspector outputs the intersections of individual molecular sets, the IDs of defective (not parsed) molecules and the list of distinct molecules including a hit table (a hit table shows in which set/s was the molecule found). MolSetInspector can also filter distinct molecules by their diversity using two approaches: by setting 1) a maximum total number of diverse molecules or 2) the maximum similarity treshold of a molecular pair in the set.