Superlabs

Bayesian Optimization on the discovery of new biomaterials.

This project is part of my work at Superlabs located at greek national labs NCSR Demokritos.

Incentives

The aim of this project is the acceleration of the discovery of new biomaterials formulations. Instead of manually testing every single formulation from a vast design space of combinations, an autonomous platform is capable of conducting the entire circle of the experiments. Even though, the burden of testing is assigned to a machine, the time needed for the achievement of new formulation remains unchanged. The time and resources needed for the discovery of new formulations can be greatly reduced using Data-Driven optimization and specifically Bayesian Optimization.


Surrogate Model

We need to optimize certain properties of the final formulation. The relationship between input concentrations of raw materials and that properties is usually a black-box derivative-free function. In addition, the evaluation of this function is expensive, so we build a surrogate model that succesfully approximates the unknown function and is cheap to evaluate.

Fig.1: Surrogate Model.

Gaussian Process Regression

In order to create our surrogate model Gaussian Process (GP) Regression is used. GP is a non-parametric bayesian approach to regression. It allows the embodiment of prior knowledge to our model and provides uncertainty indicators. The more we gain insight by sampling our black-box function the more precise the surrogate model.

Fig.2: Gaussian Process Regression.

Bayesian Optimization

The uncertainty bounds that GP Regression produces are being exploited by Bayesian Optimization in order to either exploit or explore our function. Various acquisition functions have been developed that try to find an answer to that dilemma. If our target is solely to learn our function a pure exploration acquisition function is being optimized that proposes the next point to be observed. Figure 3 shows an Upper Confidence Bound (UCB) acquisition function in action.

Fig.3: Pure Exploration Animation. (refresh to replay)

Other acquisition functions can lead to the optimum of our unknown function in a limited number of expensice observations. Expected Improvement is an acquisition function that its maximum indicates the observations that are predicted to contribure the more to the function for every cycle. In Figure 4 one can observe that our algorithm needs a reduced number of observations to find the minimun of the unknown function. Furthemore, there is no need for global diminishing of uncertainty as it is highly unlikely for these areas to hide an optimum point.

Fig.4: Expected Improvement Animation. (refresh to replay)

Packages Used