MACE¶
Higher order equivariant message passing graph neural network [1].
Implementation¶
The architecture is a very thin wrapper around the official MACE implementation, which is hosted here. Arbitrary heads are added on top of MACE to predict an arbitrary number of targets with arbitrary symmetry properties. These heads take as input the output node features of MACE and pass them through a linear layer (or MLP for invariant targets) to obtain the final predictions.
One important feature is that the architecture is ready to take a pretrained MACE model
file as input. The heads required to predict the targets will be added on top of the
MACE model, so one can continue training for arbitrary targets. See the
mace_model hyperparameter for more details. For simply exporting a foundation MACE
model to use as a metatomic model (e.g. in ASE or LAMMPS), see exporting a
foundation MACE model.
Installation¶
To install this architecture along with the metatrain package, run:
pip install metatrain[mace]
where the square brackets indicate that you want to install the optional
dependencies required for mace.
Default Hyperparameters¶
The description of all the hyperparameters used in mace is provided
further down this page. However, here we provide you with a yaml file containing all
the default hyperparameters, which might be convenient as a starting point to
create your own hyperparameter files:
architecture:
name: experimental.mace
model:
mace_model: null
mace_head_target: energy
r_max: 5.0
num_radial_basis: 8
radial_type: bessel
num_cutoff_basis: 5
max_ell: 3
interaction: RealAgnosticResidualInteractionBlock
num_interactions: 2
hidden_irreps: 128x0e + 128x1o + 128x2e
edge_irreps: null
apply_cutoff: true
avg_num_neighbors: 1
pair_repulsion: false
distance_transform: null
correlation: 3
gate: silu
interaction_first: RealAgnosticResidualInteractionBlock
MLP_irreps: 16x0e
radial_MLP:
- 64
- 64
- 64
use_embedding_readout: false
use_last_readout_only: false
use_agnostic_product: false
training:
optimizer: adam
learning_rate: 0.01
weight_decay: 5.0e-07
amsgrad: true
beta: 0.9
lr_scheduler: ReduceLROnPlateau
lr_scheduler_gamma: 0.9993
lr_factor: 0.8
lr_scheduler_patience: 50
distributed: false
distributed_port: 39591
batch_size: 16
num_epochs: 1000
log_interval: 1
checkpoint_interval: 100
scale_targets: true
atomic_baseline: {}
fixed_scaling_weights: {}
per_structure_targets: []
num_workers: null
log_mae: true
log_separate_blocks: false
best_model_metric: mae_prod
grad_clip_norm: 1.0
loss: mse
Tuning hyperparameters¶
The default hyperparameters above will work well in most cases, but they may not be optimal for your specific dataset. There is good number of parameters to tune, both for the model and the trainer. Since seeing them for the first time might be overwhelming, here we provide a list of the parameters that are in general the most important:
- ModelHypers.mace_model: str | None = None
Path to a pretrained MACE model file.
For example, this can be a foundation MACE model. If not provided, a new MACE model will be initialized from scratch using the rest of hyperparameters of the architecture.
- ModelHypers.r_max: float = 5.0
Cutoff radius for neighbor search.
This should be set to a value after which most of the interactions between atoms is expected to be negligible. A lower cutoff will lead to faster models.
- TrainerHypers.learning_rate: float = 0.01
Learning rate of the optimizer.
- TrainerHypers.batch_size: int = 16
The number of samples to use in each batch of training. This hyperparameter controls the tradeoff between training speed and memory usage. In general, larger batch sizes will lead to faster training, but might require more memory.
- ModelHypers.hidden_irreps: str = '128x0e + 128x1o + 128x2e'
Irreps for hidden node features.
This defines the shape of the node features at each layer of the MACE model (except for the last layer, which only contains scalars). The notation for the irreps is e3nn’s standard notation. Essentially, the irreps string is a sum of terms of the form
{multiplicity}x{ell}{parity}, where{multiplicity}is the number of channels with angular momentum{ell}and parity{parity}(efor even,ofor odd). For example,16x0e + 32x1omeans that there are 16 scalar channels (\(\ell=0\)) and 32 vector channels (\(\ell=1\)) at each layer.Increasing the multiplicities makes the network wider, which generally leads to better accuracy at the cost of increased training and evaluation time.
Increasing the maximum \(\ell\) included in the irreps allows the network to capture more complex angular dependencies. However, its effect might be heavily dependent on your dataset and target. The hidden irreps should include at least up to the maximum \(\ell\) of the target you are training on. For example, if you are training on dipole moments (\(\ell=1\)), the hidden irreps should include at least \(\ell=1\) channels.
Note
At the time of writing, MACE enforces that all channels of
hidden_irrepsshould have the same multiplicity.
- ModelHypers.correlation: int = 3
Correlation order at each layer.
After computing pair-wise (2-body) messages between atoms, MACE applies products that construct higher order correlations between messages. This hyperparameter controls the amount of products applied. For example,
correlation=1means that the interactions are purely 2-body, whilecorrelation=2would roughly equate to including 3-body interactions, and so on.This hyperparameter together with
max_elldetermine the maximum angular momentum that will be non-zero inhidden_irreps, which ismax_ell * correlation.
- ModelHypers.num_interactions: int = 2
Number of message passing steps.
MACE’s last message passing step only outputs scalar features, so if you are training on a target that is not scalar (e.g. a vector or some spherical tensor with higher order), the effective number of message passing steps for that target will be
num_interactions - 1.
- TrainerHypers.loss: str | dict[str, LossSpecification] = 'mse'
This section describes the loss function to be used. See the Loss functions for more details.
Model hyperparameters¶
The parameters that go under the architecture.model section of the config file
are the following:
- ModelHypers.mace_model: str | None = None¶
Path to a pretrained MACE model file.
For example, this can be a foundation MACE model. If not provided, a new MACE model will be initialized from scratch using the rest of hyperparameters of the architecture.
- ModelHypers.mace_head_target: str = 'energy'¶
Target to which the MACE head is related.
metatrainadds arbitrary heads on top of MACE to predict arbitrary targets. However, MACE models have themselves a head. This hyperparameter specifies which metatrain target corresponds to the MACE head. For this target, no new head will be added, and the output of MACE’s head will be used directly.Note
MACE models with multiple heads also exist, but
metatrainonly supports using this hyperparameter to deal with single-head MACE models for now.
- ModelHypers.r_max: float = 5.0¶
Cutoff radius for neighbor search.
This should be set to a value after which most of the interactions between atoms is expected to be negligible. A lower cutoff will lead to faster models.
- ModelHypers.radial_type: Literal['bessel', 'gaussian', 'chebyshev'] = 'bessel'¶
Type of radial basis functions to use in the radial embedding.
- ModelHypers.max_ell: int = 3¶
Highest \(\ell\) of spherical harmonics used in the interactions.
Note that this is not the maximum \(\ell\) in
hidden_irreps, since hidden_irreps can contain \(\ell\) values as high asmax_ell*correlation.
- ModelHypers.interaction: Literal['RealAgnosticResidualInteractionBlock', 'RealAgnosticAttResidualInteractionBlock', 'RealAgnosticInteractionBlock', 'RealAgnosticDensityInteractionBlock', 'RealAgnosticDensityResidualInteractionBlock', 'RealAgnosticResidualNonLinearInteractionBlock'] = 'RealAgnosticResidualInteractionBlock'¶
Name of interaction block.
Class that will be used to compute interactions between atoms at each layer.
- ModelHypers.num_interactions: int = 2¶
Number of message passing steps.
MACE’s last message passing step only outputs scalar features, so if you are training on a target that is not scalar (e.g. a vector or some spherical tensor with higher order), the effective number of message passing steps for that target will be
num_interactions - 1.
Irreps for hidden node features.
This defines the shape of the node features at each layer of the MACE model (except for the last layer, which only contains scalars). The notation for the irreps is e3nn’s standard notation. Essentially, the irreps string is a sum of terms of the form
{multiplicity}x{ell}{parity}, where{multiplicity}is the number of channels with angular momentum{ell}and parity{parity}(efor even,ofor odd). For example,16x0e + 32x1omeans that there are 16 scalar channels (\(\ell=0\)) and 32 vector channels (\(\ell=1\)) at each layer.Increasing the multiplicities makes the network wider, which generally leads to better accuracy at the cost of increased training and evaluation time.
Increasing the maximum \(\ell\) included in the irreps allows the network to capture more complex angular dependencies. However, its effect might be heavily dependent on your dataset and target. The hidden irreps should include at least up to the maximum \(\ell\) of the target you are training on. For example, if you are training on dipole moments (\(\ell=1\)), the hidden irreps should include at least \(\ell=1\) channels.
Note
At the time of writing, MACE enforces that all channels of
hidden_irrepsshould have the same multiplicity.
- ModelHypers.distance_transform: Literal['Agnesi', 'Soft'] | None = None¶
Use distance transform for radial basis functions
- ModelHypers.correlation: int = 3¶
Correlation order at each layer.
After computing pair-wise (2-body) messages between atoms, MACE applies products that construct higher order correlations between messages. This hyperparameter controls the amount of products applied. For example,
correlation=1means that the interactions are purely 2-body, whilecorrelation=2would roughly equate to including 3-body interactions, and so on.This hyperparameter together with
max_elldetermine the maximum angular momentum that will be non-zero inhidden_irreps, which ismax_ell * correlation.
- ModelHypers.gate: Literal['silu', 'tanh', 'abs'] | None = 'silu'¶
Non linearity used for the non linear readouts.
This determines which kind of non-linearity is applied in the non linear readouts. The non linear readouts are: MACE’s internal MLP readout (applied only at the last layer) and arbitrary MLP heads added on top of MACE by
metatrain.The non-linearity is applied only to scalar channels, therefore it won’t have any effect for non-scalar targets.
- ModelHypers.interaction_first: Literal['RealAgnosticResidualInteractionBlock', 'RealAgnosticInteractionBlock', 'RealAgnosticDensityInteractionBlock', 'RealAgnosticDensityResidualInteractionBlock', 'RealAgnosticResidualNonLinearInteractionBlock'] = 'RealAgnosticResidualInteractionBlock'¶
Name of interaction block for the first interaction layer.
Class that will be used to compute interactions between atoms at the first layer.
- ModelHypers.MLP_irreps: str = '16x0e'¶
Hidden irreps of the MLP readouts.
The MLP readouts are: MACE’s internal MLP readout (applied only at the last layer) and arbitrary MLP heads added on top of MACE by
metatrain.The non-linearity is applied only to scalar channels, therefore these irreps should only contain scalar channels.
- ModelHypers.radial_MLP: list[int] = [64, 64, 64]¶
Width of the radial MLP.
Only used for MACE’s internal MLP.
Trainer hyperparameters¶
The parameters that go under the architecture.trainer section of the config file
are the following:
- TrainerHypers.optimizer: Literal['adam', 'adamw', 'schedulefree'] = 'adam'¶
Optimizer for parameter optimization
- TrainerHypers.batch_size: int = 16¶
The number of samples to use in each batch of training. This hyperparameter controls the tradeoff between training speed and memory usage. In general, larger batch sizes will lead to faster training, but might require more memory.
- TrainerHypers.atomic_baseline: dict[str, float | dict[int, float]] = {}¶
The baselines for each target.
By default,
metatrainwill fit a linear model (CompositionModel) to compute the least squares baseline for each atomic species for each target.However, this hyperparameter allows you to provide your own baselines. The value of the hyperparameter should be a dictionary where the keys are the target names, and the values are either (1) a single baseline to be used for all atomic types, or (2) a dictionary mapping atomic types to their baselines. For example:
atomic_baseline: {"energy": {1: -0.5, 6: -10.0}}will fix the energy baseline for hydrogen (Z=1) to -0.5 and for carbon (Z=6) to -10.0, while fitting the baselines for the energy of all other atomic types, as well as fitting the baselines for all other targets.
atomic_baseline: {"energy": -5.0}will fix the energy baseline for all atomic types to -5.0.
atomic_baseline: {"mtt:dos": 0.0}sets the baseline for the “mtt:dos” target to 0.0, effectively disabling the atomic baseline for that target.This atomic baseline is substracted from the targets during training, which avoids the main model needing to learn atomic contributions, and likely makes training easier. When the model is used in evaluation mode, the atomic baseline is added on top of the model predictions automatically.
Note
This atomic baseline is a per-atom contribution. Therefore, if the property you are predicting is a sum over all atoms (e.g., total energy), the contribution of the atomic baseline to the total property will be the atomic baseline multiplied by the number of atoms of that type in the structure.
Note
If a MACE model is loaded through the
mace_modelhyperparameter, the atomic baselines in the MACE model are used by default for the target indicated inmace_head_target. If you want to override them, you need to set explicitly the baselines for that target in this hyperparameter.
- TrainerHypers.fixed_scaling_weights: dict[str, float | dict[int, float]] = {}¶
Weights for target scaling.
This is passed to the
fixed_weightsargument ofScaler.train_model, see its documentation to understand exactly what to pass here.Note
If a MACE model is loaded through the
mace_modelhyperparameter, the scales in the MACE model are used by default for the target indicated inmace_head_target. If you want to override them, you need to set explicitly the baselines for that target in this hyperparameter.
- TrainerHypers.num_workers: int | None = None¶
Number of workers for data loading. If not provided, it is set automatically.
- TrainerHypers.best_model_metric: Literal['rmse_prod', 'mae_prod', 'loss'] = 'mae_prod'¶
Metric used to select best checkpoint (e.g.,
rmse_prod)
- TrainerHypers.grad_clip_norm: float = 1.0¶
Maximum gradient norm value, by default inf (no clipping)
- TrainerHypers.loss: str | dict[str, LossSpecification] = 'mse'¶
This section describes the loss function to be used. See the Loss functions for more details.
Exporting a foundation MACE model¶
As it is now, exporting a foundation MACE model from one of their provided model
files involves using mtt train
with 0 epochs. To do so, use the following options.yaml file:
architecture:
name: experimental.mace
model:
# Replace mace_model with the path to your file
mace_model: path/to/foundation/mace/model.model
mace_head_target: energy
training:
num_epochs: 0
batch_size: 1
training_set: dummy_dataset.xyz
validation_set: dummy_dataset.xyz
with dummy_dataset.xyz being any dataset containing at least one structure
with just the energy property. For example, you can use:
2
Properties=species:S:1:pos:R:3:forces:R:3 energy=-2.1
H 0.0 0.0 0.0 0.0 0.0 0.0
H 1.0 0.0 0.0 0.0 0.0 0.0