Evolution strategies

Handwriting Version for this post

Problems in BackPropagation and Gradient Descent

the gradient of reward signals given to the agent is realised many timesteps in the future. Questions above can be seem as Credit Assignment
there is the issue of being stuck in a local optimum.

Pseudo code of Basic Evolution Strategy

solver = EvolutionStrategy()
while True:
  # ask the ES to give us a set of candidate solutions
  solutions = solver.ask()
  # create an array to hold the fitness results.
  fitness_list = np.zeros(solver.popsize)
  # evaluate the fitness for each given solution.
  for i in range(solver.popsize):
    fitness_list[i] = evaluate(solutions[i])
  # give list of fitness results back to ES
  solver.tell(fitness_list)
  # get best parameter, fitness from ES
  best_solution, best_fitness = solver.result()
  if best_fitness > MY_REQUIRED_FITNESS:
    break

Advantages in Evolution Strategies

Easier to scale in a distributed setting(easy to parallelize).
It does not suffer in settings with sparse rewards.
It has fewer hyperparameters.
It is effective at finding solutions for RL tasks.

Improvement of Covariance Matrix Adaptive Evolution Strategy

We want to explore more and increase the standard deviation of our search space.
And there are times when we are confident we are close to good optima and just want to fine-tune the solution.

Details of algorithm

Calculate the fitness score of each candidate solution in generation $(g)$.
Isolates the best 25% of the population in generation $(g)$, in purple.

Using only the best solutions, along with the mean $\mu^{(g)}$ of the current generation (the green dot), calculate the covariance matrix $C^{(g+1)}$ of the next generation.