# Machine-learning interatomic potential training data for nitric acid (HNO3)

This dataset provides the training data used to develop
machine-learning interatomic potentials (MLIP) to describe
nitric acid solutions. This data supports the publication

"Modeling the Behavior of Concentrated Aqueous HNO3 Using Machine
Learning Interatomic Potentials"

Mohammadhasan Dinpajooh, Michael D. Lacount, Scott E. Muller, Neil J Henson, 
Daniel Mejia Rodriguez, Axel Gomez, Christopher J. Mundy, and Andrew M. Ritzmann

Corresponding authors:
Mohammadhasan Dinpajooh (hadi.dinpajooh@pnnl.gov)
Axel Gomez (ag3199@princeton.edu)
Christopher J. Mundy (chris.mundy@pnnl.gov)
Andrew M. Ritzmann (andrew.ritzmann@pnnl.gov)

## Dataset description

- List of Files
  1. hno3_structures_blypd2.xyz    (86MB, text format)
  2. hno3_structures_pbed3.xyz     (86MB, text format)
  3. example-dataset.pdf           (303KB, pdf format)

The first two files contain the structures used to train the
machine-learing potentials with energies and forces using the
Becke-Lee-Yang-Parr exchange-correlation (XC) functional with D2 dispersion
corrections (hno3_structures_blypd2.xyz) and the Perdew-Burker-Ernzerhof XC 
functional with D3 dispersion corrections (hno3_structures_pbed3.xyz). A
truncated version of the xyz files with additional simulation details
is provided in the file example-dataset.pdf.

### Data source

The initial structures used in training were taken from "Dissociation
of Strong Acid Revisited: X-ray Photoelectron Spectroscopy and
Molecular Dynamics Simulations of HNO<sub>3</sub> in Water" by
T. Lewis, B. Winter, A. C. Stern, M. D. Baer,
C. J. Mundy, D. J. Tobias, and J. C. Hemminger, <i>Journal of
Physical Chemistry B</i>, 2011, 115(30): 9445-9451 (https://doi.org/10.1021/jp205510q).

These are approximately 800 of the (greater than) 2900 structures
included in the xyz files.

The file hno3_structures_pbed3.xyz contains 11 structures not
present in hno3_structures_blypd2.xyz. These are the first 11
structures in hno3_structures_pbed3.xyz. Five structures
from hno3_structures_blypd2.xyz are not present in hno3_structures_pbed3.xyz
because the PBE-D3 calculations failed to converge. These are the
structures at indices 596, 1460, 1922, 1949, and 2209 (using 1-based
indexing).


## Interpretting the data files

The xyz files are in "extended xyz format" which provides support
for documenting metadata along with each structure. This follows the
traditional xyz format:

```
Num_Atoms
<Comment>
AtomType1 X1 Y1 Z1
AtomType2 X2 Y2 Z2
...
```
where Num_Atoms is the number of atoms in the structure, \<comment\> is
a comment line describing the structure, AtomType1 is the atomic
symbol of the first atom, and X1, Y1, and Z1 are the coordinates
of the first atom (continuing in the same manner for atom 2, etc...).

In the extended xyz format, the comment line explicitly provides
information about the data associated with each line, the lattice
vectors for the structure, the calculated energy, and the periodic
boundary conditions.

For example, consider the following comment line:

```
Lattice="13.745143556437252 0.0 0.0 0.0 13.745143556437252 0.0 0.0 0.0 13.745143556437252" Properties=species:S:1:pos:R:3:forces:R:3 energy=-43210.327253523166 pbc="T T T"
```

In this line, the lattice is specified by nine entries:
Lattice="ax ay az bx by bz cx cy cz" which is a cubic cell with lattice
constant 13.745143556437252 angstroms in this example. The entry
"Properties=species:S:1:pos:R:3:forces:R:3" specifies that each line
will contain the species in the first column (as a string), the position
in the second, third, and fourth columns (as real numbers), and the forces
in the fifth, sixth, and seventh columns (as real numbers). Finally, the
calculated energy is specified using the energy tag and then the
periodic boundary conditions are noted using pbc="T T T" to indicate
that the structure is periodic in all directions.

## Reading the xyz files

The xyz files may be loaded using the Atomic Simulation Environment
library in the python programming language. For example, all structures
could be loaded using the following code:

```
import ase.io

# Parser will default to filetype='extxyz' for this file.
hno3_structures = ase.io.read('hno3_structures_blypd2.xyz', index=':')
```
