ACTIVE DESIGN OF EXPERIMENTS
Many independently adjustable formulation and processing variables often affect the performance characteristics of a product in a very complex and nonlinear manner. These dependencies sometimes also feature interactions between different independent variables. Data mining techniques are rapidly becoming powerful tools in optimizing such systems. This article introduces one such technique.
Design-of-experiments (DOE) strategies are commonly used in many industries to try to identify the best solutions to such problems in a systematic and mathematically rigorous manner. It is, however, often cumbersome, and sometimes even unmanageable due to the amount of experimental effort required, to perform a complete conventional DOE when the number of independent variables needing to be considered is very large. For example, in following a conventional DOE approach, one could run a full-factorial DOE (2N experiments where N is the number of independent variables), try to figure out which variables are important based upon a linear model assumption, and then try to optimize the variables which were tagged as significant. The problem is that, when one is done with the first round of experiments (1024 experiments if N=10), one would still know very little about the system response behavior, and furthermore one would have assumed that one’s simplifying assumptions are not too simplifying. This is the main reason why companies often give up on performing a conventional DOE and continue to try to optimize very complex systems by using trial-and-error experimentation.
The use of an active design-of-experiments (ADOE) rather than conventional DOE is a revolutionary new approach for surmounting these challenges. All experiments that will be performed during a conventional DOE effort are determined in advance. By contrast, powerful data mining tools (symbolic regression techniques) are used to learn from the data continuously when pursuing an ADOE approach, potentially resulting in tremendous gains in efficiency and efficacy. The typical ADOE strategy is to: (a) collect an initial (small) dataset, (b) build a model (which will feature a trustability measure) by applying symbolic regression techniques to this initial dataset, (c) collect more data to confirm/deny the estimated optimal response and to thus reduce the uncertainty of the model (effectively, seeking to maximize the information content of each collected data sample) and (d) build more models and repeat b-d until “good enough” has been achieved. It is seen, therefore, that ADOE is essentially an adaptive data collection approach.
The implementation of an ADOE approach can also be viewed as an "artificial intelligence" endeavor involving the development of an "expert system" which "learns" as work proceeds rather than being stuck to following a pre-assigned script.
It is important to emphasize that ADOE opens up new possibilities which would not be viable with conventional DOE approaches. The performance gain relative to conventional DOE depends on the number of input variables as well as the dimensionality (number of variables) of the response space. Experience with previous applications of ADOE suggests that one typically sees at least an order-of-magnitude efficiency gain relative to a conventional DOE in a design involving 10 to 12 variables.
Data Modeler by Evolved Analytics is a powerful tool for ADOE design. Applications of ADOE thus far have been held closely as trade secrets by the companies that have used it so that there is no published case study yet in the open literature.