
Chemistry is the natural science that probes the composition, structure, properties, and reactivity of atoms, molecules, and materials.
“One of the simplest questions that can be asked about molecular diversity is how many organic molecules are possible in total?” – Jean-Louis Reymond, The Chemical Space Project.
In practice, the answer to this question is not trivial and an upper bound for “drug-like” molecules has been proposed to be in the regions of one hundred quinsexagintillion, or 10^200; for reference, ~10^80 is proposed to be the magnitude of the number of atoms in the known universe. Currently, an upper limit on the number of distinct chemical structures contained in existing computer databases is about 100 billion. To use a space analogy, we have explored a region barely larger than the size of the earth.
To categorize chemical space as large is an understatement. Yet it is within this space that all known medicines have been discovered, and where the cure for cancer and the optimum light-absorbing material for solar cells lie.
The size and complex nature of this space make computational tools not only attractive to explore it, but necessary, as even the use of all the atoms in the known universe wouldn’t allow for even a subset of all possible compounds to be created.
Focusing on “drug-like” molecules and the typical representation of the drug discovery process, there are two key issues with current experimental methods: cost and time. A typical timeline for the development of even the most important of drugs can often take 10-15 years and cost ~1-2 billion dollars for each new drug approved for clinical use. The mRNA vaccines developed during the COVID-19 pandemic aside. Most drug candidates start with a process called ”high-throughput screening”, which is often the first step in the drug discovery pipeline. This step alone takes an immense amount of time and resources, as thousands of compounds are tested, and each of the compounds must be made or isolated. The first 4 steps of the drug discovery pipeline often have a high chance of failure and can leave the process with no viable candidate. Additionally, the statistic is that 90% of clinical drugs fail within the clinical phases of this process.
Now, what if we lived in a world where we could accurately and efficiently design bespoke drugs and pharmaceutical therapies, where you could design and optimize the molecule through computational methods and determine the permeability, selectivity, solubility, and the most bio-available and stable form of a therapy before ever stepping into a laboratory? Would that endeavor be interesting or useful? I believe so.