Racing algorithms

Given an optimization problem, the solution provided by a stochastic optimizers will not be deterministic, as their names implied. Nevertheless, the intrinsic properties of each individual optimizers may make themselves perform better on some classes of problems and worse on certain problems. Well, we are not too concerned about this No Free Lunch Theorem here. There is another interesting question: Given a problem (or a list of problems), which optimization algorithms at our disposal is the most appropriate one, that will yield the best optimization performance? Is there a way to find the answer without resorting to pure brute force approach?

The approach of racing algorithms is based on statistical testing. Each algorithm will be evaluated iteratively throughout the race. The performance of an algorithm is defined as its ability to evolve a population, measured by the fitness of the evolved champion. Whenever statistical significant evidence about the inferiority of an algorithm is found, that algorithm will be dropped out of the race. Naturally, it is guaranteed that the eventually survived algorithms are statistically significantly better than those which are discarded. The underlying statistical testing machinery used is Friedman test.

Using race_algo to race algorithms

Let us first set up a a problem and the algorithms of interest.

from PyGMO import *
prob = problem.ackley()
algo_list = []
algo_list.append(algorithm.de(gen=100))
algo_list.append(algorithm.cmaes(gen=100))
algo_list.append(algorithm.pso(gen=100))

race_algo contains the mechanisms that can be used to race different algorithms. Its interface is very similar to that of pop_race. The following code sets up the necessary racing object and start the race. The population size intialization argument specifies how large the internal populations to be evolved should be, while each algorithm is being evaluated.

racer = util.race_algo(algo_list, prob, pop_size=100, seed=0)
winners, n_evaluation = racer.run(n_final=1)

There are a couple of arguments that can be specified for run, such as max_count which indicates the allowed computation budget. As above, the indices of the winning algorithm are stored in winners, while n_evaluation reflects how many calls to evolve() has been made.

In addition, it is possible to cover a range of problems simultaneously.

probs = [problem.ackley(), problem.griewank()]
racer = util.race_algo(algo_list, probs, pop_size=100, seed=0)
winners, n_evaluation = racer.run(n_final=1)

Internally, when race_algo is assigning a performance measure to each algorithm, it randomly samples one of the problems and evolve popualtion with respect to that problem.

Example: Identifying good configurations of an algorithm

This example shows how race_algo can be used to select the good configurations of an algorithm. Taking pso_gen as an example, there are quite a few parameters to be configured. We will focus on the three of the parameters as shown in the following code.

from PyGMO import *
import itertools

prob = problem.ackley()

pop_size = 20
fits = []

variants = [1,2,3,4,5,6]
neighb_types = [1,2,3,4]
neighb_param = [1,2,3,4,5]
pars = list(itertools.product(variants, neighb_types, neighb_param))
algo_list = []
for p in pars:
    args = {}
    args['variant'], args['neighb_type'], args['neighb_param'] = p
    algo_list.append(algorithm.pso(gen=100,**args))

racer = util.race_algo(algo_list, prob, pop_size=pop_size, seed=0)
winners, n_evaluated = racer.run(5, 0, len(algo_list)*30, 0.05, [], True, True)

print 'Winning algorithm:'
for i in winners:
    print zip(['variant', 'neighb_type', 'neighb_param'], pars[i])

print 'Evaluated algorithms for %d times' % n_evaluated

algo_default = algorithm.pso_gen(gen=100)
algo_selected = algo_list[winners[0]]
pop = population(prob, pop_size)
pop_default = algo_default.evolve(pop)
pop_selected = algo_selected.evolve(pop)
print 'Default algorithm:', pop_default.champion.f
print 'Selected algorithm:', pop_selected.champion.f

The output should be similar to the following (feel free to try a few more times due to the effects of random population initialization):

Winning algorithm:
[('variant', 5), ('neighb_type', 1), ('neighb_param', 1)]
[('variant', 5), ('neighb_type', 1), ('neighb_param', 2)]
[('variant', 5), ('neighb_type', 1), ('neighb_param', 3)]
[('variant', 5), ('neighb_type', 1), ('neighb_param', 4)]
[('variant', 5), ('neighb_type', 1), ('neighb_param', 5)]
Evaluated algorithms for 354 times
Default algorithm: (0.8463040459318276,)
Selected algorithm: (0.05200250235673254,)

Initially, the combinations of parameters result in a total of 120 configurations to be chosen from. Using race_pop, staistical significant results can be obtained after 354 evaluations. Without racing mechanism, this is translated to running about only 3 trials on each algorithm configuration, based on which statistical significant conclusion is difficult to be established. This issue, as well as the appropriate distribution of computational budget, is automatically handled by race_algo.

Why not choosing a single best winner?

However, race_algo is not magic. Consider changing n_final in the above example from 5 to 4 (or less). The output are:

Winning algorithm:
[('Variant', 5), ('neighb_type', 1), ('neighb_param', 1)]
[('Variant', 5), ('neighb_type', 1), ('neighb_param', 2)]
[('Variant', 5), ('neighb_type', 1), ('neighb_param', 3)]
[('Variant', 5), ('neighb_type', 1), ('neighb_param', 4)]
Evaluated algorithms for 3599 times
Default algorithm: (2.187472414230023,)
Selected algorithm: (0.04567759094164403,)

The consumed evaluation count is dramatically larger! The reason is that the last 5 surviving algorithms are so close to each other that it is very difficult to statistically significantly distinguish their performance. In fact, the race is forcefully terminated as the provided budget is exhausted.

Note

Racing mechanism is most useful when the diversity of the entities in the racing pool is large, as the worse ones can be quickly weeded out with high staistical confidence, saving the evaluation budget the rest. When the entities are too similar to each other, the required evaluation will be very high in order to derive statistical significant conclusion.