Genetic algorithm Search for stable FCC alloys #

In this tutorial we will emulate an older paper [Johannesson] and determine the most stable FCC alloy using the genetic algorithm. Since the purpose is only the tutorial we will limit the phase space to the elements supported by the EMT potential. The search is also equivalent to the recent search for mixed metal ammines with superior properties for ammonia storage described here:

P. B. Jensen, S. Lysgaard, U. J. Quaade and T. Vegge

Designing Mixed Metal Halide Ammines for Ammonia Storage Using Density Functional Theory and Genetic Algorithms

Physical Chemistry Chemical Physics, Vol 16, No. 36, pp. 19732-19740, (2014)

Basic outline of the search #

Choose the phase space of your problem. Is the number of possible individuals large enough to prevent a full screening and is the fitness function too discontinuous for a traditional optimization by derivation? If so continue.
Choose model structures and calculate references in those structures. Put the results somewhere accesible for a script initiated by the genetic algorithm.
Choose suitable parameters like population size (general rule of thumb for the population size: \(log_2(N)\) < pop size < \(2log_2(N)\), where \(N\) is the size of the phase space), convergence criteria etc.
Create the initial population.
Choose procreation operators, i.e. how should offspring be produced. New operators can easily be created by modifying the existing operators.
Run the algorithm.

Here we would like to predict the most stable fcc alloys. In this tutorial we only have the ase.calculators.emt available thus we are limited to the supported metal elements: Al, Ni, Cu, Pd, Ag, Pt and Au. We limit ourselves to at most 4 different metals in one structure, thereby having only \(7^4 = 2401\) candidates in the phase space, symmetry would make this number even lower but the number is fitting for this tutorial.

For a real application of the algorithm it is necessary to use a more sophisticated calculator, in that case each individual calculation is performed on a cluster by submitting to a queuing system. How this is achieved in the algorithm is covered in Optimization with a Genetic Algorithm.

defined for an alloy ABC₂: A + B + 2C -> ABC₂ as: \(\Delta H_f = E_{ABC2} - E_A - E_B - 2E_C\)

Setting up reference database #

Now we need to set up a database in which reference calculations can be stored. This can either be in a central database server where keywords distinguish between different references or dedicated separate databases for each different type of reference calculations.

In the following script, ga_fcc_references.py, we put the references in the database file refs.db. Our model structure is fcc which is loaded with ase.lattice.cubic.FaceCenteredCubic(). We perform a volume relaxation to find the optimal lattice constant and lowest energy, which we save in the database as key-value pairs for quick retrieval.

import numpy as np

from ase.calculators.emt import EMT
from ase.db import connect
from ase.eos import EquationOfState
from ase.lattice.cubic import FaceCenteredCubic

db = connect('refs.db')

metals = ['Al', 'Au', 'Cu', 'Ag', 'Pd', 'Pt', 'Ni']
for m in metals:
    atoms = FaceCenteredCubic(m)
    atoms.calc = EMT()
    e0 = atoms.get_potential_energy()
    a = atoms.cell[0][0]

    eps = 0.05
    volumes = (a * np.linspace(1 - eps, 1 + eps, 9)) ** 3
    energies = []
    for v in volumes:
        atoms.set_cell([v ** (1.0 / 3)] * 3, scale_atoms=True)
        energies.append(atoms.get_potential_energy())

    eos = EquationOfState(volumes, energies)
    v1, e1, B = eos.fit()

    atoms.set_cell([v1 ** (1.0 / 3)] * 3, scale_atoms=True)
    ef = atoms.get_potential_energy()

    db.write(
        atoms,
        metal=m,
        latticeconstant=v1 ** (1.0 / 3),
        energy_per_atom=ef / len(atoms),
    )

Initial population #

We choose a population size of 10 individuals and create the initial population by randomly selecting four elements for each starting individual.

import random

from ase import Atoms
from ase.ga.data import PrepareDB

metals = ['Al', 'Au', 'Cu', 'Ag', 'Pd', 'Pt', 'Ni']

population_size = 10

# Create database
db = PrepareDB('fcc_alloys.db', population_size=population_size, metals=metals)

# Create starting population
for i in range(population_size):
    atoms_string = [random.choice(metals) for _ in range(4)]
    db.add_unrelaxed_candidate(
        Atoms(atoms_string), atoms_string=''.join(atoms_string)
    )

Note how we add the population size and metals as extra key-value pairs when we create the database fcc_alloys.db. We can then retrieve these parameters later when running the main script to avoid having to input the same parameters twice.

We can study our initial population by doing (on the command-line):

$ ase db fcc_alloys.db -c +atoms_string

the term atoms_string determines the order in which the elements are put into the model structure. So it is possible to fully describe an individual by just providing the atoms_string.

Run the algorithm #

The following script runs the algorithm, also find it here: ga_fcc_alloys_main.py. Note that the relaxation script is imported from an external file ga_fcc_alloys_relax.py.

from ga_fcc_alloys_relax import relax

from ase.ga.convergence import GenerationRepetitionConvergence
from ase.ga.data import DataConnection
from ase.ga.element_crossovers import OnePointElementCrossover
from ase.ga.element_mutations import RandomElementMutation
from ase.ga.offspring_creator import OperationSelector
from ase.ga.population import Population

# Specify the number of generations this script will run
num_gens = 40

db = DataConnection('fcc_alloys.db')
ref_db = 'refs.db'

# Retrieve saved parameters
population_size = db.get_param('population_size')
metals = db.get_param('metals')

# Specify the procreation operators for the algorithm
# Try and play with the mutation operators that move to nearby
# places in the periodic table
oclist = (
    [1, 1],
    [RandomElementMutation(metals), OnePointElementCrossover(metals)],
)
operation_selector = OperationSelector(*oclist)

# Pass parameters to the population instance
pop = Population(data_connection=db, population_size=population_size)

# We form generations in this algorithm run and can therefore set
# a convergence criteria based on generations
cc = GenerationRepetitionConvergence(pop, 3)

# Relax the starting population
while db.get_number_of_unrelaxed_candidates() > 0:
    a = db.get_an_unrelaxed_candidate()
    relax(a, ref_db)
    db.add_relaxed_step(a)
pop.update()

# Run the algorithm
for _ in range(num_gens):
    if cc.converged():
        print('converged')
        break
    for i in range(population_size):
        a1, a2 = pop.get_two_candidates(with_history=False)
        op = operation_selector.get_operator()
        a3, desc = op.get_new_individual([a1, a2])

        db.add_unrelaxed_candidate(a3, description=desc)

        relax(a3, ref_db)
        db.add_relaxed_step(a3)

    pop.update()

    # Print the current population to monitor the evolution
    print(['-'.join(p.get_chemical_symbols()) for p in pop.pop])

In this script we run a generational GA as opposed to the pool GA outlined in Optimization with a Genetic Algorithm. This is achieved by having two for-loops; the innermost loop runs the number of times specified by the population size it corresponds to one generation. The outermost loop runs as many generations as specified in num_gens. The function pop.update() is called after the innermost loop has finished thereby only adding individuals to the population after a whole generation is calculated.

After each generation is finished the population is printed to the screen so we can follow the evolution. The calculated individuals are continuously added to fcc_alloys.db, we can evaluate them directly by doing from the command line (in another shell instance if the GA is still running):

$ ase db fcc_alloys.db -c +atoms_string,raw_score,generation,hof -s raw_score

Note: When reading the database using ase db, it might be necessary to increase the number of shown entries, e.g. ase db fcc-alloys.db --limit N, where N is the number of entries to show (as default only the first 20 entries are shown, --limit 0 will show all. For further info use ase db --help, or consult the ase db manual).

To prevent clutter we import the relax function from the following script:

import numpy as np

from ase.calculators.emt import EMT
from ase.db import connect
from ase.eos import EquationOfState
from ase.lattice.cubic import FaceCenteredCubic


def relax(input_atoms, ref_db):
    atoms_string = input_atoms.get_chemical_symbols()

    # Open connection to the database with reference data
    db = connect(ref_db)

    # Load our model structure which is just FCC
    atoms = FaceCenteredCubic('X', latticeconstant=1.0)
    atoms.set_chemical_symbols(atoms_string)

    # Compute the average lattice constant of the metals in this individual
    # and the sum of energies of the constituent metals in the fcc lattice
    # we will need this for calculating the heat of formation
    a = 0
    ei = 0
    for m in set(atoms_string):
        dct = db.get(metal=m)
        count = atoms_string.count(m)
        a += count * dct.latticeconstant
        ei += count * dct.energy_per_atom
    a /= len(atoms_string)
    atoms.set_cell([a, a, a], scale_atoms=True)

    # Since calculations are extremely fast with EMT we can also do a volume
    # relaxation
    atoms.calc = EMT()
    eps = 0.05
    volumes = (a * np.linspace(1 - eps, 1 + eps, 9)) ** 3
    energies = []
    for v in volumes:
        atoms.set_cell([v ** (1.0 / 3)] * 3, scale_atoms=True)
        energies.append(atoms.get_potential_energy())

    eos = EquationOfState(volumes, energies)
    v1, ef, _B = eos.fit()
    latticeconstant = v1 ** (1.0 / 3)

    # Calculate the heat of formation by subtracting ef with ei
    hof = (ef - ei) / len(atoms)

    # Place the calculated parameters in the info dictionary of the
    # input_atoms object
    input_atoms.info['key_value_pairs']['hof'] = hof

    # Raw score must always be set
    # Use one of the following two; they are equivalent
    input_atoms.info['key_value_pairs']['raw_score'] = -hof
    # set_raw_score(input_atoms, -hof)

    input_atoms.info['key_value_pairs']['latticeconstant'] = latticeconstant

    # Setting the atoms_string directly for easier analysis
    atoms_string = ''.join(input_atoms.get_chemical_symbols())
    input_atoms.info['key_value_pairs']['atoms_string'] = atoms_string

The relaxation script is naturally similar to the script we used to calculate the references.

Note that the global optimum is PtNi₃ with a -0.12 eV heat of formation, whereas the second worst alloy is AlNi₃ heat of formation 0.26 eV. This result is in complete contrast to the conclusion obtained in [Johannesson], where AlNi₃ is the most stable alloy within the phase space chosen here. Obviously there is a limit to the predictive power of EMT!

Extending the algorithm #

There are different ways one can extend the algorithm and make it more complex and sophisticated, all employed in [Jensen]:

Extra mutation operators #

Instead of only using random operations we can include some that mutates elements to other elements nearby in the periodic table:

from ase.ga.element_mutations import RandomElementMutation
from ase.ga.element_mutations import MoveDownMutation
from ase.ga.element_mutations import MoveUpMutation
from ase.ga.element_mutations import MoveLeftMutation
from ase.ga.element_mutations import MoveRightMutation
from ase.ga.element_crossovers import OnePointElementCrossover

...

oclist = ([4,1,1,1,1,8], [RandomElementMutation([metals]),
                          MoveDownMutation([metals]),
                          MoveUpMutation([metals]),
                          MoveLeftMutation([metals]),
                          MoveRightMutation([metals]),
                          OnePointElementCrossover([metals])])
operation_selector = OperationSelector(*oclist)

These operators takes advantage of the fact that chemically like elements (close in the periodic table) exhibit similar properties and the substitution of one to a chemically similar elements could refine the properties of an alloy in the population. A natural extension of these operators would be to use a different ordering of the elements than the periodic table; e.g. Pettifor chemical scale, electronegativity, etc.

Note how we have set the probabilities for selecting operators differently. The probability for RandomElementMutation is equal to the sum of the move mutations. Similarly the probability of OnePointElementCrossover is equal to the sum of all the mutation operators. This is to prevent the search from being purely local.

Prevent identical calculations from being performed #

In the current main script there is no check to determine whether an identical calculation has been performed, this is easy to check in this regime where model structures are used and we can just use the atoms_string. We insert the following in the inner loop:

for i in range(population_size):
    dup = True
    while dup:
        a1, a2 = pop.get_two_candidates(with_history=False)
        op = operation_selector.get_operator()
        a3, desc = op.get_new_individual([a1, a2])

        dup = db.is_duplicate(atoms_string=''.join(a3.get_chemical_symbols()))

Since the fcc model structure is completely symmetric we could compare sorted versions of the atoms_string, thereby ruling out individuals containing the same elements in different order.

Reuse of calculations between algorithm runs #

Since genetic algorithms are inherently random in nature one can never be sure to obtain the global minimum with only one algorithm run, it is customary to perform more runs and check that the results agree. In this case it is vital to be able to reuse identical calculations between runs.

We do the following from the command line to create a new database file containing only the relaxed structures:

$ ase db fcc_alloys.db relaxed=1 -i all_relaxed.db

We subsequently add this to the relaxation script:

def relax(input_atoms, ref_db):
    atoms_string = input_atoms.get_chemical_symbols()
    relaxed_db = connect('all_relaxed.db')
    save_relax = True
    try:
        dct = relaxed_db.get(atoms_string=''.join(atoms_string))
    except KeyError:
        # Open connection to the database with reference data
        db = connect(ref_db)

    # Omitting lines up to the point where hof has been calculated
        ...

    else:
        hof = dct.hof
        latticeconstant = dct.latticeconstant
        save_relax = False
    # Place the calculated parameters in the info dictionary of the
    # input_atoms object

    ...

    # Put this at the very end
    if save_relax:
        relaxed_db.write(input_atoms,relaxed=1,
                         key_value_pairs=input_atoms.info['key_value_pairs'])

Before the actual calculation is performed all_relaxed.db is checked to see if it has been calculated before; if so we just collect the heat of formation, but if not we do the calculation and save it directly to all_relaxed.db. Note: this addition assumes that Prevent identical calculations from being performed.

[Johannesson] (1,2)

G. Jóhannesson, T. Bligaard, A. Ruban, H. Skriver, K. Jacobsen and J. Nørskov. Combined Electronic Structure and Evolutionary Search Approach to Materials Design, Phys. Rev. Lett., Vol 88, No. 25, pp. 1-5 (2002)

[Jensen]

P. B. Jensen, S. Lysgaard, U. J. Quaade and T. Vegge. Designing Mixed Metal Halide Ammines for Ammonia Storage Using Density Functional Theory and Genetic Algorithms Phys. Chem. Chem. Phys., Vol 16, No. 36, pp. 19732-19740, (2014)

Genetic algorithm Search for stable FCC alloys

Contents