Input and Output Routines¶
The jungle of file formats¶
There are a lot of file formats used and produced by chemistry applications. Each program has his way to store geometries, trajectories, energies and properties etc. chemlab tries to encompass all of those different properties by using a lightweight way to handle such differences.
Reading and writing data¶
The classes responsible for the I/O are subclasses of
chemlab.io.handlers.IOHandler
. These handlers take a
file-like object as the first argument and they work all in the same
way, here is an example of GroHandler:
from chemlab.io.handlers import GromacsIO
fd = open('waterbox.gro', 'rb')
infile = GromacsIO(fd)
system = infile.read('system')
# Modify system as you wish...
fd = open('waterbox_out.gro', 'w')
outfile = GromacsIO(fd)
outfile.write('system', system)
You first create the handler instance for a certain format and then you can read a certain feature provided by the handler. In this example we read and write the system feature.
Some file formats may have some extra data for each atom, molecule or
system. For example the ”.gro” file formats have his own way to call
the atoms in a water molecule: OW
, HW1
, HW2
. To handle
such issues, you can write this information in the export arrays
contained in the data structures, such as Atom.export
,
Molecule.export
, and their array-based counterparts
Molecule.atom_export_array
, System.mol_export
and
System.atom_export_array
.
Those attributes are especially important where you write in some data format, since you may have to provide those attribute when you initialize your Atom, Molecule and System.
You can easily open a data file without even having to search his format
handler by using the utility function chemlab.io.datafile()
this is
the recommended way for automatically opening a file:
from chemlab.io import datafile
# For reading
sys = datafile('waterbox.gro').read('system')
t, coords = datafile('traj.xtc').read('trajectory')
# For writing
datafile("output.gro", "w").write("system", sys)
Implementing your own IOHandler¶
Implementing or improving an existing IOHandler is a great way to participate in chemlab development. Fortunately, it’s extremely easy to set one up.
It boils down to a few steps:
- Subclass
IOHandler
; - Define the class attributes can_read and can_write;
- Implement the write and read methods for the features that you added in can_read and can_write;
- Write the documentation for each feature.
Here is an example of the xyz handler:
import numpy as np
from chemlab.io.handlers import IOHandler
from chemlab.core import Molecule
class XyzIO(IOHandler):
'''The XYZ format is described in this wikipedia article
http://en.wikipedia.org/wiki/XYZ_file_format.
**Features**
.. method:: read("molecule")
Read the coordinates in a :py:class:`~chemlab.core.Molecule` instance.
.. method:: write("molecule", mol)
Writes a :py:class:`~chemlab.core.Molecule` instance in the XYZ format.
'''
can_read = ['molecule']
can_write = ['molecule']
def read(self, feature):
self.check_feature(feature, "read")
lines = self.fd.readlines()
num = int(lines[0])
title = lines[1]
if feature == 'title':
return title
if feature == 'molecule':
type_array = []
r_array = []
for l in lines[2:]:
type, x, y, z = l.split()
r_array.append([float(x),float(y),float(z)])
type_array.append(type)
r_array = np.array(r_array)/10 # To nm
type_array = np.array(type_array)
return Molecule.from_arrays(r_array=r_array, type_array=type_array)
def write(self, feature, mol):
self.check_feature(feature, "write")
lines = []
if feature == 'molecule':
lines.append(str(mol.n_atoms))
lines.append('Generated by chemlab')
for t, (x, y, z) in zip(mol.type_array, mol.r_array):
lines.append(' %s %.6f %.6f %.6f' %
(t, x*10, y*10, z*10))
self.fd.write('\n'.join(lines))
A few remarks:
- It is recommended to use the method
check_feature()
before performing read/write. This will check that the feature is present in the can_read/can_write list;
- If you want to squeeze out performance you should use
Molecule.from_arrays()
andSystem.from_arrays()
;
- You can read whatever data you wish, for example the
EdrIO
handler does not read Molecule or System at all;
You can definitely take inspiration from the handlers included in chemlab, Supported File Formats.