Genetic algorithm Search for stable FCC alloys
In this tutorial we will emulate an older paper [Johannesson] and determine
the most stable FCC alloy using the genetic algorithm. Since the purpose is
only the tutorial we will limit the phase space to the elements supported by
the EMT potential
. The search is also equivalent
to the recent search for mixed metal ammines with superior properties for
ammonia storage described here:
P. B. Jensen, S. Lysgaard, U. J. Quaade and T. VeggePhysical Chemistry Chemical Physics, Vol 16, No. 36, pp. 19732-19740, (2014)
Basic outline of the search
Choose the phase space of your problem. Is the number of possible individuals large enough to prevent a full screening and is the fitness function too discontinuous for a traditional optimization by derivation? If so continue.
Choose model structures and calculate references in those structures. Put the results somewhere accesible for a script initiated by the genetic algorithm.
Choose suitable parameters like population size (general rule of thumb for the population size: \(log_2(N)\) < pop size < \(2log_2(N)\), where \(N\) is the size of the phase space), convergence criteria etc.
Create the initial population.
Choose procreation operators, i.e. how should offspring be produced. New operators can easily be created by modifying the existing operators.
Run the algorithm.
Here we would like to predict the most stable fcc alloys. In this tutorial we
only have the ase.calculators.emt
available thus we are limited to the
supported metal elements: Al, Ni, Cu, Pd, Ag, Pt and Au. We limit ourselves
to at most 4 different metals in one structure, thereby having only \(7^4 =
2401\) candidates in the phase space, symmetry would make this number even
lower but the number is fitting for this tutorial.
For a real application of the algorithm it is necessary to use a more sophisticated calculator, in that case each individual calculation is performed on a cluster by submitting to a queuing system. How this is achieved in the algorithm is covered in Optimization with a Genetic Algorithm.
defined for an alloy ABC2: A + B + 2C -> ABC2 as: \(\Delta H_f = E_{ABC2} - E_A - E_B - 2E_C\)
Setting up reference database
Now we need to set up a database in which reference calculations can be stored. This can either be in a central database server where keywords distinguish between different references or dedicated separate databases for each different type of reference calculations.
In the following script, ga_fcc_references.py
, we put the
references in the database file refs.db. Our model structure is fcc which
is loaded with ase.lattice.cubic.FaceCenteredCubic()
. We perform a
volume relaxation to find the optimal lattice constant and lowest energy,
which we save in the database as key-value pairs for quick retrieval.
import numpy as np
from ase.calculators.emt import EMT
from ase.db import connect
from ase.eos import EquationOfState
from ase.lattice.cubic import FaceCenteredCubic
db = connect('refs.db')
metals = ['Al', 'Au', 'Cu', 'Ag', 'Pd', 'Pt', 'Ni']
for m in metals:
atoms = FaceCenteredCubic(m)
atoms.calc = EMT()
e0 = atoms.get_potential_energy()
a = atoms.cell[0][0]
eps = 0.05
volumes = (a * np.linspace(1 - eps, 1 + eps, 9))**3
energies = []
for v in volumes:
atoms.set_cell([v**(1. / 3)] * 3, scale_atoms=True)
energies.append(atoms.get_potential_energy())
eos = EquationOfState(volumes, energies)
v1, e1, B = eos.fit()
atoms.set_cell([v1**(1. / 3)] * 3, scale_atoms=True)
ef = atoms.get_potential_energy()
db.write(atoms, metal=m,
latticeconstant=v1**(1. / 3),
energy_per_atom=ef / len(atoms))
Initial population
We choose a population size of 10 individuals and create the initial population by randomly selecting four elements for each starting individual.
import random
from ase import Atoms
from ase.ga.data import PrepareDB
metals = ['Al', 'Au', 'Cu', 'Ag', 'Pd', 'Pt', 'Ni']
population_size = 10
# Create database
db = PrepareDB('fcc_alloys.db',
population_size=population_size,
metals=metals)
# Create starting population
for i in range(population_size):
atoms_string = [random.choice(metals) for _ in range(4)]
db.add_unrelaxed_candidate(Atoms(atoms_string),
atoms_string=''.join(atoms_string))
Note how we add the population size and metals as extra key-value pairs when we create the database fcc_alloys.db. We can then retrieve these parameters later when running the main script to avoid having to input the same parameters twice.
We can study our initial population by doing (on the command-line):
$ ase db fcc_alloys.db -c +atoms_string
the term atoms_string
determines the order in which the elements are put
into the model structure. So it is possible to fully describe an individual
by just providing the atoms_string
.
Run the algorithm
The following script runs the algorithm, also find it here:
ga_fcc_alloys_main.py
. Note that the relaxation script is
imported from an external file ga_fcc_alloys_relax.py
.
from ga_fcc_alloys_relax import relax
from ase.ga.convergence import GenerationRepetitionConvergence
from ase.ga.data import DataConnection
from ase.ga.element_crossovers import OnePointElementCrossover
from ase.ga.element_mutations import RandomElementMutation
from ase.ga.offspring_creator import OperationSelector
from ase.ga.population import Population
# Specify the number of generations this script will run
num_gens = 40
db = DataConnection('fcc_alloys.db')
ref_db = 'refs.db'
# Retrieve saved parameters
population_size = db.get_param('population_size')
metals = db.get_param('metals')
# Specify the procreation operators for the algorithm
# Try and play with the mutation operators that move to nearby
# places in the periodic table
oclist = ([1, 1], [RandomElementMutation(metals),
OnePointElementCrossover(metals)])
operation_selector = OperationSelector(*oclist)
# Pass parameters to the population instance
pop = Population(data_connection=db,
population_size=population_size)
# We form generations in this algorithm run and can therefore set
# a convergence criteria based on generations
cc = GenerationRepetitionConvergence(pop, 3)
# Relax the starting population
while db.get_number_of_unrelaxed_candidates() > 0:
a = db.get_an_unrelaxed_candidate()
relax(a, ref_db)
db.add_relaxed_step(a)
pop.update()
# Run the algorithm
for _ in range(num_gens):
if cc.converged():
print('converged')
break
for i in range(population_size):
a1, a2 = pop.get_two_candidates(with_history=False)
op = operation_selector.get_operator()
a3, desc = op.get_new_individual([a1, a2])
db.add_unrelaxed_candidate(a3, description=desc)
relax(a3, ref_db)
db.add_relaxed_step(a3)
pop.update()
# Print the current population to monitor the evolution
print(['-'.join(p.get_chemical_symbols()) for p in pop.pop])
In this script we run a generational GA as opposed to the pool GA outlined in
Optimization with a Genetic Algorithm. This is achieved by having
two for-loops; the innermost loop runs the number of times specified by the
population size it corresponds to one generation. The outermost loop runs as
many generations as specified in num_gens
. The function
pop.update()
is called after the innermost loop has finished thereby
only adding individuals to the population after a whole generation is
calculated.
After each generation is finished the population is printed to the screen so
we can follow the evolution. The calculated individuals are continuously
added to fcc_alloys.db
, we can evaluate them directly by doing from the
command line (in another shell instance if the GA is still running):
$ ase db fcc_alloys.db -c +atoms_string,raw_score,generation,hof -s raw_score
Note: When reading the database using ase db
, it might be necessary to
increase the number of shown entries, e.g. ase db fcc-alloys.db --limit
N
, where N
is the number of entries to show (as default only the first 20
entries are shown, --limit 0
will show all. For further info use ase db
--help
, or consult the ase db manual).
To prevent clutter we import the relax function from the following script:
import numpy as np
from ase.calculators.emt import EMT
from ase.db import connect
from ase.eos import EquationOfState
from ase.lattice.cubic import FaceCenteredCubic
def relax(input_atoms, ref_db):
atoms_string = input_atoms.get_chemical_symbols()
# Open connection to the database with reference data
db = connect(ref_db)
# Load our model structure which is just FCC
atoms = FaceCenteredCubic('X', latticeconstant=1.)
atoms.set_chemical_symbols(atoms_string)
# Compute the average lattice constant of the metals in this individual
# and the sum of energies of the constituent metals in the fcc lattice
# we will need this for calculating the heat of formation
a = 0
ei = 0
for m in set(atoms_string):
dct = db.get(metal=m)
count = atoms_string.count(m)
a += count * dct.latticeconstant
ei += count * dct.energy_per_atom
a /= len(atoms_string)
atoms.set_cell([a, a, a], scale_atoms=True)
# Since calculations are extremely fast with EMT we can also do a volume
# relaxation
atoms.calc = EMT()
eps = 0.05
volumes = (a * np.linspace(1 - eps, 1 + eps, 9))**3
energies = []
for v in volumes:
atoms.set_cell([v**(1. / 3)] * 3, scale_atoms=True)
energies.append(atoms.get_potential_energy())
eos = EquationOfState(volumes, energies)
v1, ef, B = eos.fit()
latticeconstant = v1**(1. / 3)
# Calculate the heat of formation by subtracting ef with ei
hof = (ef - ei) / len(atoms)
# Place the calculated parameters in the info dictionary of the
# input_atoms object
input_atoms.info['key_value_pairs']['hof'] = hof
# Raw score must always be set
# Use one of the following two; they are equivalent
input_atoms.info['key_value_pairs']['raw_score'] = -hof
# set_raw_score(input_atoms, -hof)
input_atoms.info['key_value_pairs']['latticeconstant'] = latticeconstant
# Setting the atoms_string directly for easier analysis
atoms_string = ''.join(input_atoms.get_chemical_symbols())
input_atoms.info['key_value_pairs']['atoms_string'] = atoms_string
The relaxation script is naturally similar to the script we used to calculate the references.
Note that the global optimum is PtNi3 with a -0.12 eV heat of formation, whereas the second worst alloy is AlNi3 heat of formation 0.26 eV. This result is in complete contrast to the conclusion obtained in [Johannesson], where AlNi3 is the most stable alloy within the phase space chosen here. Obviously there is a limit to the predictive power of EMT!
Extending the algorithm
There are different ways one can extend the algorithm and make it more complex and sophisticated, all employed in [Jensen]:
Extra mutation operators
Instead of only using random operations we can include some that mutates elements to other elements nearby in the periodic table:
from ase.ga.element_mutations import RandomElementMutation
from ase.ga.element_mutations import MoveDownMutation
from ase.ga.element_mutations import MoveUpMutation
from ase.ga.element_mutations import MoveLeftMutation
from ase.ga.element_mutations import MoveRightMutation
from ase.ga.element_crossovers import OnePointElementCrossover
...
oclist = ([4,1,1,1,1,8], [RandomElementMutation([metals]),
MoveDownMutation([metals]),
MoveUpMutation([metals]),
MoveLeftMutation([metals]),
MoveRightMutation([metals]),
OnePointElementCrossover([metals])])
operation_selector = OperationSelector(*oclist)
These operators takes advantage of the fact that chemically like elements (close in the periodic table) exhibit similar properties and the substitution of one to a chemically similar elements could refine the properties of an alloy in the population. A natural extension of these operators would be to use a different ordering of the elements than the periodic table; e.g. Pettifor chemical scale, electronegativity, etc.
Note how we have set the probabilities for selecting operators differently.
The probability for RandomElementMutation
is equal to the sum of the
move mutations. Similarly the probability of OnePointElementCrossover
is equal to the sum of all the mutation operators. This is to prevent the
search from being purely local.
Prevent identical calculations from being performed
In the current main script there is no check to determine whether an
identical calculation has been performed, this is easy to check in this
regime where model structures are used and we can just use the
atoms_string
. We insert the following in the inner loop:
for i in range(population_size):
dup = True
while dup:
a1, a2 = pop.get_two_candidates(with_history=False)
op = operation_selector.get_operator()
a3, desc = op.get_new_individual([a1, a2])
dup = db.is_duplicate(atoms_string=''.join(a3.get_chemical_symbols()))
Since the fcc model structure is completely symmetric we could compare sorted
versions of the atoms_string
, thereby ruling out individuals containing
the same elements in different order.
Reuse of calculations between algorithm runs
Since genetic algorithms are inherently random in nature one can never be sure to obtain the global minimum with only one algorithm run, it is customary to perform more runs and check that the results agree. In this case it is vital to be able to reuse identical calculations between runs.
We do the following from the command line to create a new database file containing only the relaxed structures:
$ ase db fcc_alloys.db relaxed=1 -i all_relaxed.db
We subsequently add this to the relaxation script:
def relax(input_atoms, ref_db):
atoms_string = input_atoms.get_chemical_symbols()
relaxed_db = connect('all_relaxed.db')
save_relax = True
try:
dct = relaxed_db.get(atoms_string=''.join(atoms_string))
except KeyError:
# Open connection to the database with reference data
db = connect(ref_db)
# Omitting lines up to the point where hof has been calculated
...
else:
hof = dct.hof
latticeconstant = dct.latticeconstant
save_relax = False
# Place the calculated parameters in the info dictionary of the
# input_atoms object
...
# Put this at the very end
if save_relax:
relaxed_db.write(input_atoms,relaxed=1,
key_value_pairs=input_atoms.info['key_value_pairs'])
Before the actual calculation is performed all_relaxed.db
is checked to
see if it has been calculated before; if so we just collect the heat of
formation, but if not we do the calculation and save it directly to
all_relaxed.db
. Note: this addition assumes that Prevent identical
calculations from being performed.
G. Jóhannesson, T. Bligaard, A. Ruban, H. Skriver, K. Jacobsen and J. Nørskov. Combined Electronic Structure and Evolutionary Search Approach to Materials Design, Phys. Rev. Lett., Vol 88, No. 25, pp. 1-5 (2002)
P. B. Jensen, S. Lysgaard, U. J. Quaade and T. Vegge. Designing Mixed Metal Halide Ammines for Ammonia Storage Using Density Functional Theory and Genetic Algorithms Phys. Chem. Chem. Phys., Vol 16, No. 36, pp. 19732-19740, (2014)