troysquillaci.me

My collection of frivolous fables.

Simple Genetic Algorithm

Written on

Note: This program has been updated to work with Python 3.

Evolutionary computation is a subfield of artificial intelligence concerned with solving optimization problems with techniques inspired by Darwinian evolution. Typically a metaheuristic is used in conjunction with trail-and-error methods on a population of candidate solutions until a viable one is discovered. Such techniques are often used in situations where the search space is far too vast for exhaustive methods to be effective. Evolutionary systems are non-deterministic and typically yield different results each run, nevertheless they have become an important tool in finding solutions to some complex problems.

Naturally, this approach is iterative and requires populations to demonstrate reproductive mechanisms that are common in Darwinian evolution. Genetic algorithms are one of the simplest ways to integrate such concepts into computer programs. The following is a brief introduction to these simple and fascinating strategies at tackling complex problems. The code provided is written is Python, it is encouraged you follow along!

Approximate String Matching

Genetic algorithms require problems to exhibit a notion of partial correctness. Take for example a scenario in which two strings, "TroqSyuiilaei" and "TroySquillaci", are compared for equality. Rather than performing an exact string match (that will lead to a binary response of true or false), we can use fuzzy logic to define degrees of truth that can be used in quantifying partial equality. With these "stepping stones" of partial information, we may be able to make informed modifications to the former string until it matches the latter.

If incremental progress cannot be either performed or determined then the genetic algorithm essentially boils down to a far less efficient exhaustive search.

Genetic Algorithm Overview

The approximate string matching problem is simple, intuitive, and fully captures the core components of a genetic algorithm. Our genetic algorithm will take an input string provided by the user then generate a population of random strings. Throughout the course of evolution it will:

  • Compare the random strings for equality with the input string using fuzzy string matching.
  • Rank the strings in terms of equality and choose a subset of them to undergo reproduction.
  • Reproduce a new population with the chosen strings.
  • Repeat until a threshold equality value is met.

We're going to use a Python package called fuzzywuzzy to perform fuzzy string matching for us, so go ahead and download it now.

pip install fuzzywuzzy

Genetic Algorithm Implementation

Genetic algorithms are incredibly simple to implement. Its important to understand that the functions described below are implementations of fundamental abstract components of any genetic algorithm. Depending on the application, other implementations of the same components are likely necessary to achieve desirable results.

Initializing the Population

The first stage in any genetic algorithm involves initializing a population of domain-specific agents which are representative of candidate solutions. In this case we are generating random strings. A wrapper class is defined for an agent which contains the random string along with its fitness value.

class Agent:

    def __init__(self, length):

        self.string = ''.join(random.choice(string.ascii_letters) for _ in range(length))
        self.fitness = -1

    def __str__(self):

        return f'String: {self.string} Fitness: {self.fitness}'

This makes initializing a population easy.

def init_agents(population, length):

    return [Agent(length) for _ in range(population)]

Fitness

Next is the hardest part, quantifying the fitness of agents in the population. In this case, strings are compared to the input string with fuzzy string matching. This produces an equality value between 0 (no match) and 100 (exact match).

def fitness(agents):

    for agent in agents:

        agent.fitness = fuzz.ratio(agent.string, in_str)

    return agents

Fitness functions are considered hard to design because they are commonly susceptible to deception. In other words what you might think to be a reasonable function to quantify fitness may inadvertently exclude the possibility of traversing parts of the search space where viable solution reside. Recent research surrounding this problem has led to evidence suggesting that rewarding agents based on other metrics (aside from fitness) can lead to more diverse populations better able to explore the search space. This is not explained here however, it is out of the scope of this article.

Selection

Once fitness values for all agents are calculated, we've got to determine which ones are more likely to produce successful offspring for the next generation. This is called selection. We're going to use a particular implementation call truncation selection which sorts the agents by fitness values and then chooses a proportion of top candidates. Agents not selected will not survive and be excluded from the next generation. Unlike living systems, a well performing agent can stick around in the population indefinitely, so long as it continuously demonstrates high fitness.

def selection(agents):

    agents = sorted(agents, key=lambda agent: agent.fitness, reverse=True)
    print('\n'.join(map(str, agents)))
    agents = agents[:int(0.2 * len(agents))]

    return agents

Some other common selection methods include roulette selection and tournament selection. It is suggested to experiments with different methods because they impose different selective pressures on a population, which can potentially stunt growth (too selective) or prevent good solutions from establishing prominence (not selective enough).

Crossover

Now that we've got a group of chosen agents, we need to recombine their genotypes (strings) by using crossover to produce the next generation. One-point crossover is a good choice here. It simply chooses two selected agents at random to act as parents, then randomly selects a location in their genotypes and splits both of them into two disjoint segments. Segments from a parent are recombined with the segments from the other parent yielding two offspring. They are introduced with the parents back into the population.

def crossover(agents):

    offspring = []

    for _ in range((population - len(agents)) // 2):

        parent1 = random.choice(agents)
        parent2 = random.choice(agents)
        child1 = Agent(in_str_len)
        child2 = Agent(in_str_len)
        split = random.randint(0, in_str_len)
        child1.string = parent1.string[0:split] + parent2.string[split:in_str_len]
        child2.string = parent2.string[0:split] + parent1.string[split:in_str_len]

        offspring.append(child1)
        offspring.append(child2)

    agents.extend(offspring)

    return agents

Other crossover methods exist but typically have a lesser impact on the overall effectiveness of the algorithm. Still, you should experiment with different methods.

Mutation

We're now going to use mutation to randomly alter parts of the genotypes of all agents. This is done to prevent stagnation by allowing the population to explore the search space. In this case a simple uniform mutation is suitable.

def mutation(agents):

    for agent in agents:

        for idx, param in enumerate(agent.string):

            if random.uniform(0.0, 1.0) <= 0.1:

                g_start = agent.string[0:idx]
                g_middle = random.choice(string.ascii_letters)
                g_end = agent.string[idx+1:in_str_len]

                agent.string = g_start + g_middle + g_end

    return agents

Mutation also has a variety of implementations, which you should try to determine the best option.

Core Loop

Finally we must construct the core loop that controls the evolutionary process. Additional middleware can be added here to introduce functionality like logging and data collection if so desired.

def ga():

    agents = init_agents(population, in_str_len)

    for generation in range(generations):

        print('Generation: ' + str(generation))

        agents = fitness(agents)
        agents = selection(agents)
        agents = crossover(agents)
        agents = mutation(agents)

        if any(agent.fitness >= 90 for agent in agents):

            print('Threshold met!')
            sys.exit(0)

Random Resources