MACE

Higher order equivariant message passing graph neural network [1].

Implementation

The architecture is a very thin wrapper around the official MACE implementation, which is hosted here. Arbitrary heads are added on top of MACE to predict an arbitrary number of targets with arbitrary symmetry properties. These heads take as input the output node features of MACE and pass them through a linear layer (or MLP for invariant targets) to obtain the final predictions.

One important feature is that the architecture is ready to take a pretrained MACE model file as input. The heads required to predict the targets will be added on top of the MACE model, so one can continue training for arbitrary targets. See the mace_model hyperparameter for more details. For simply exporting a foundation MACE model to use as a metatomic model (e.g. in ASE or LAMMPS), see exporting a foundation MACE model.

Installation

To install this architecture along with the metatrain package, run:

pip install metatrain[mace]

where the square brackets indicate that you want to install the optional dependencies required for mace.

Default Hyperparameters

The description of all the hyperparameters used in mace is provided further down this page. However, here we provide you with a yaml file containing all the default hyperparameters, which might be convenient as a starting point to create your own hyperparameter files:

architecture:
  name: experimental.mace
  model:
    mace_model: null
    mace_head_target: energy
    r_max: 5.0
    num_radial_basis: 8
    radial_type: bessel
    num_cutoff_basis: 5
    max_ell: 3
    interaction: RealAgnosticResidualInteractionBlock
    num_interactions: 2
    hidden_irreps: 128x0e + 128x1o + 128x2e
    edge_irreps: null
    apply_cutoff: true
    avg_num_neighbors: 1
    pair_repulsion: false
    distance_transform: null
    correlation: 3
    gate: silu
    interaction_first: RealAgnosticResidualInteractionBlock
    MLP_irreps: 16x0e
    radial_MLP:
    - 64
    - 64
    - 64
    use_embedding_readout: false
    use_last_readout_only: false
    use_agnostic_product: false
  training:
    optimizer: adam
    learning_rate: 0.01
    weight_decay: 5.0e-07
    amsgrad: true
    beta: 0.9
    lr_scheduler: ReduceLROnPlateau
    lr_scheduler_gamma: 0.9993
    lr_factor: 0.8
    lr_scheduler_patience: 50
    distributed: false
    distributed_port: 39591
    batch_size: 16
    num_epochs: 1000
    log_interval: 1
    checkpoint_interval: 100
    scale_targets: true
    atomic_baseline: {}
    fixed_scaling_weights: {}
    per_structure_targets: []
    num_workers: null
    log_mae: true
    log_separate_blocks: false
    best_model_metric: mae_prod
    grad_clip_norm: 1.0
    loss: mse

Tuning hyperparameters

The default hyperparameters above will work well in most cases, but they may not be optimal for your specific dataset. There is good number of parameters to tune, both for the model and the trainer. Since seeing them for the first time might be overwhelming, here we provide a list of the parameters that are in general the most important:

ModelHypers.mace_model: str | None = None

Path to a pretrained MACE model file.

For example, this can be a foundation MACE model. If not provided, a new MACE model will be initialized from scratch using the rest of hyperparameters of the architecture.

ModelHypers.r_max: float = 5.0

Cutoff radius for neighbor search.

This should be set to a value after which most of the interactions between atoms is expected to be negligible. A lower cutoff will lead to faster models.

TrainerHypers.learning_rate: float = 0.01

Learning rate of the optimizer.

TrainerHypers.batch_size: int = 16

The number of samples to use in each batch of training. This hyperparameter controls the tradeoff between training speed and memory usage. In general, larger batch sizes will lead to faster training, but might require more memory.

ModelHypers.hidden_irreps: str = '128x0e + 128x1o + 128x2e'

Irreps for hidden node features.

This defines the shape of the node features at each layer of the MACE model (except for the last layer, which only contains scalars). The notation for the irreps is e3nn’s standard notation. Essentially, the irreps string is a sum of terms of the form {multiplicity}x{ell}{parity}, where {multiplicity} is the number of channels with angular momentum {ell} and parity {parity} (e for even, o for odd). For example, 16x0e + 32x1o means that there are 16 scalar channels (\(\ell=0\)) and 32 vector channels (\(\ell=1\)) at each layer.

Increasing the multiplicities makes the network wider, which generally leads to better accuracy at the cost of increased training and evaluation time.

Increasing the maximum \(\ell\) included in the irreps allows the network to capture more complex angular dependencies. However, its effect might be heavily dependent on your dataset and target. The hidden irreps should include at least up to the maximum \(\ell\) of the target you are training on. For example, if you are training on dipole moments (\(\ell=1\)), the hidden irreps should include at least \(\ell=1\) channels.

Note

At the time of writing, MACE enforces that all channels of hidden_irreps should have the same multiplicity.

ModelHypers.correlation: int = 3

Correlation order at each layer.

After computing pair-wise (2-body) messages between atoms, MACE applies products that construct higher order correlations between messages. This hyperparameter controls the amount of products applied. For example, correlation=1 means that the interactions are purely 2-body, while correlation=2 would roughly equate to including 3-body interactions, and so on.

This hyperparameter together with max_ell determine the maximum angular momentum that will be non-zero in hidden_irreps, which is max_ell * correlation.

ModelHypers.num_interactions: int = 2

Number of message passing steps.

MACE’s last message passing step only outputs scalar features, so if you are training on a target that is not scalar (e.g. a vector or some spherical tensor with higher order), the effective number of message passing steps for that target will be num_interactions - 1.

TrainerHypers.loss: str | dict[str, LossSpecification] = 'mse'

This section describes the loss function to be used. See the Loss functions for more details.

Model hyperparameters

The parameters that go under the architecture.model section of the config file are the following:

ModelHypers.mace_model: str | None = None

Path to a pretrained MACE model file.

For example, this can be a foundation MACE model. If not provided, a new MACE model will be initialized from scratch using the rest of hyperparameters of the architecture.

ModelHypers.mace_head_target: str = 'energy'

Target to which the MACE head is related.

metatrain adds arbitrary heads on top of MACE to predict arbitrary targets. However, MACE models have themselves a head. This hyperparameter specifies which metatrain target corresponds to the MACE head. For this target, no new head will be added, and the output of MACE’s head will be used directly.

Note

MACE models with multiple heads also exist, but metatrain only supports using this hyperparameter to deal with single-head MACE models for now.

ModelHypers.r_max: float = 5.0

Cutoff radius for neighbor search.

This should be set to a value after which most of the interactions between atoms is expected to be negligible. A lower cutoff will lead to faster models.

ModelHypers.num_radial_basis: int = 8

Number of radial basis functions for the radial embedding.

ModelHypers.radial_type: Literal['bessel', 'gaussian', 'chebyshev'] = 'bessel'

Type of radial basis functions to use in the radial embedding.

ModelHypers.num_cutoff_basis: int = 5

Number of basis functions for smooth cutoff

ModelHypers.max_ell: int = 3

Highest \(\ell\) of spherical harmonics used in the interactions.

Note that this is not the maximum \(\ell\) in hidden_irreps, since hidden_irreps can contain \(\ell\) values as high as max_ell*correlation.

ModelHypers.interaction: Literal['RealAgnosticResidualInteractionBlock', 'RealAgnosticAttResidualInteractionBlock', 'RealAgnosticInteractionBlock', 'RealAgnosticDensityInteractionBlock', 'RealAgnosticDensityResidualInteractionBlock', 'RealAgnosticResidualNonLinearInteractionBlock'] = 'RealAgnosticResidualInteractionBlock'

Name of interaction block.

Class that will be used to compute interactions between atoms at each layer.

ModelHypers.num_interactions: int = 2

Number of message passing steps.

MACE’s last message passing step only outputs scalar features, so if you are training on a target that is not scalar (e.g. a vector or some spherical tensor with higher order), the effective number of message passing steps for that target will be num_interactions - 1.

ModelHypers.hidden_irreps: str = '128x0e + 128x1o + 128x2e'

Irreps for hidden node features.

This defines the shape of the node features at each layer of the MACE model (except for the last layer, which only contains scalars). The notation for the irreps is e3nn’s standard notation. Essentially, the irreps string is a sum of terms of the form {multiplicity}x{ell}{parity}, where {multiplicity} is the number of channels with angular momentum {ell} and parity {parity} (e for even, o for odd). For example, 16x0e + 32x1o means that there are 16 scalar channels (\(\ell=0\)) and 32 vector channels (\(\ell=1\)) at each layer.

Increasing the multiplicities makes the network wider, which generally leads to better accuracy at the cost of increased training and evaluation time.

Increasing the maximum \(\ell\) included in the irreps allows the network to capture more complex angular dependencies. However, its effect might be heavily dependent on your dataset and target. The hidden irreps should include at least up to the maximum \(\ell\) of the target you are training on. For example, if you are training on dipole moments (\(\ell=1\)), the hidden irreps should include at least \(\ell=1\) channels.

Note

At the time of writing, MACE enforces that all channels of hidden_irreps should have the same multiplicity.

ModelHypers.edge_irreps: str | None = None

Irreps for edge features.

ModelHypers.apply_cutoff: bool = True

Apply cutoff to the radial basis functions before MLP

ModelHypers.avg_num_neighbors: float = 1

Normalization factor for the messages.

ModelHypers.pair_repulsion: bool = False

Use pair repulsion term with ZBL potential

ModelHypers.distance_transform: Literal['Agnesi', 'Soft'] | None = None

Use distance transform for radial basis functions

ModelHypers.correlation: int = 3

Correlation order at each layer.

After computing pair-wise (2-body) messages between atoms, MACE applies products that construct higher order correlations between messages. This hyperparameter controls the amount of products applied. For example, correlation=1 means that the interactions are purely 2-body, while correlation=2 would roughly equate to including 3-body interactions, and so on.

This hyperparameter together with max_ell determine the maximum angular momentum that will be non-zero in hidden_irreps, which is max_ell * correlation.

ModelHypers.gate: Literal['silu', 'tanh', 'abs'] | None = 'silu'

Non linearity used for the non linear readouts.

This determines which kind of non-linearity is applied in the non linear readouts. The non linear readouts are: MACE’s internal MLP readout (applied only at the last layer) and arbitrary MLP heads added on top of MACE by metatrain.

The non-linearity is applied only to scalar channels, therefore it won’t have any effect for non-scalar targets.

ModelHypers.interaction_first: Literal['RealAgnosticResidualInteractionBlock', 'RealAgnosticInteractionBlock', 'RealAgnosticDensityInteractionBlock', 'RealAgnosticDensityResidualInteractionBlock', 'RealAgnosticResidualNonLinearInteractionBlock'] = 'RealAgnosticResidualInteractionBlock'

Name of interaction block for the first interaction layer.

Class that will be used to compute interactions between atoms at the first layer.

ModelHypers.MLP_irreps: str = '16x0e'

Hidden irreps of the MLP readouts.

The MLP readouts are: MACE’s internal MLP readout (applied only at the last layer) and arbitrary MLP heads added on top of MACE by metatrain.

The non-linearity is applied only to scalar channels, therefore these irreps should only contain scalar channels.

ModelHypers.radial_MLP: list[int] = [64, 64, 64]

Width of the radial MLP.

Only used for MACE’s internal MLP.

ModelHypers.use_embedding_readout: bool = False

Use embedding readout for the final output

ModelHypers.use_last_readout_only: bool = False

Use only the last readout for the final output.

This is only used by the internal MACE readout, arbitrary heads by metatrain always use as input a concatenation of the node features from all layers.

ModelHypers.use_agnostic_product: bool = False

Use element agnostic product

Trainer hyperparameters

The parameters that go under the architecture.trainer section of the config file are the following:

TrainerHypers.optimizer: Literal['adam', 'adamw', 'schedulefree'] = 'adam'

Optimizer for parameter optimization

TrainerHypers.learning_rate: float = 0.01

Learning rate of the optimizer.

TrainerHypers.weight_decay: float = 5e-07

Weight decay (L2 penalty).

TrainerHypers.amsgrad: bool = True

Use amsgrad variant of optimizer.

TrainerHypers.beta: float = 0.9

Beta parameter for the optimizer

TrainerHypers.lr_scheduler: str = 'ReduceLROnPlateau'

Type of learning rate scheduler.

TrainerHypers.lr_scheduler_gamma: float = 0.9993

Gamma parameter for learning rate scheduler.

TrainerHypers.lr_factor: float = 0.8

Learning rate factor

TrainerHypers.lr_scheduler_patience: int = 50

Scheduler patience.

TrainerHypers.distributed: bool = False

Whether to use distributed training

TrainerHypers.distributed_port: int = 39591

Port for DDP communication

TrainerHypers.batch_size: int = 16

The number of samples to use in each batch of training. This hyperparameter controls the tradeoff between training speed and memory usage. In general, larger batch sizes will lead to faster training, but might require more memory.

TrainerHypers.num_epochs: int = 1000

Number of epochs.

TrainerHypers.log_interval: int = 1

Interval to log metrics.

TrainerHypers.checkpoint_interval: int = 100

Interval to save checkpoints.

TrainerHypers.scale_targets: bool = True

Normalize targets to unit std during training.

TrainerHypers.atomic_baseline: dict[str, float | dict[int, float]] = {}

The baselines for each target.

By default, metatrain will fit a linear model (CompositionModel) to compute the least squares baseline for each atomic species for each target.

However, this hyperparameter allows you to provide your own baselines. The value of the hyperparameter should be a dictionary where the keys are the target names, and the values are either (1) a single baseline to be used for all atomic types, or (2) a dictionary mapping atomic types to their baselines. For example:

  • atomic_baseline: {"energy": {1: -0.5, 6: -10.0}} will fix the energy baseline for hydrogen (Z=1) to -0.5 and for carbon (Z=6) to -10.0, while fitting the baselines for the energy of all other atomic types, as well as fitting the baselines for all other targets.

  • atomic_baseline: {"energy": -5.0} will fix the energy baseline for all atomic types to -5.0.

  • atomic_baseline: {"mtt:dos": 0.0} sets the baseline for the “mtt:dos” target to 0.0, effectively disabling the atomic baseline for that target.

This atomic baseline is substracted from the targets during training, which avoids the main model needing to learn atomic contributions, and likely makes training easier. When the model is used in evaluation mode, the atomic baseline is added on top of the model predictions automatically.

Note

This atomic baseline is a per-atom contribution. Therefore, if the property you are predicting is a sum over all atoms (e.g., total energy), the contribution of the atomic baseline to the total property will be the atomic baseline multiplied by the number of atoms of that type in the structure.

Note

If a MACE model is loaded through the mace_model hyperparameter, the atomic baselines in the MACE model are used by default for the target indicated in mace_head_target. If you want to override them, you need to set explicitly the baselines for that target in this hyperparameter.

TrainerHypers.fixed_scaling_weights: dict[str, float | dict[int, float]] = {}

Weights for target scaling.

This is passed to the fixed_weights argument of Scaler.train_model, see its documentation to understand exactly what to pass here.

Note

If a MACE model is loaded through the mace_model hyperparameter, the scales in the MACE model are used by default for the target indicated in mace_head_target. If you want to override them, you need to set explicitly the baselines for that target in this hyperparameter.

TrainerHypers.per_structure_targets: list[str] = []

Targets to calculate per-structure losses.

TrainerHypers.num_workers: int | None = None

Number of workers for data loading. If not provided, it is set automatically.

TrainerHypers.log_mae: bool = True

Log MAE alongside RMSE

TrainerHypers.log_separate_blocks: bool = False

Log per-block error.

TrainerHypers.best_model_metric: Literal['rmse_prod', 'mae_prod', 'loss'] = 'mae_prod'

Metric used to select best checkpoint (e.g., rmse_prod)

TrainerHypers.grad_clip_norm: float = 1.0

Maximum gradient norm value, by default inf (no clipping)

TrainerHypers.loss: str | dict[str, LossSpecification] = 'mse'

This section describes the loss function to be used. See the Loss functions for more details.

Exporting a foundation MACE model

As it is now, exporting a foundation MACE model from one of their provided model files involves using mtt train with 0 epochs. To do so, use the following options.yaml file:

architecture:
    name: experimental.mace
    model:
        # Replace mace_model with the path to your file
        mace_model: path/to/foundation/mace/model.model
        mace_head_target: energy
    training:
        num_epochs: 0
        batch_size: 1

training_set: dummy_dataset.xyz
validation_set: dummy_dataset.xyz

with dummy_dataset.xyz being any dataset containing at least one structure with just the energy property. For example, you can use:

2
Properties=species:S:1:pos:R:3:forces:R:3 energy=-2.1
H 0.0 0.0 0.0 0.0 0.0 0.0
H 1.0 0.0 0.0 0.0 0.0 0.0

References