Input and Output Routines¶
The jungle of file formats¶
There are a lot of file formats used and produced by chemistry applications. Each program has his way to store geometries, trajectories, energies and properties etc. chemlab tries to encompass all of those different properties by using a lightweight way to handle such differences.
Reading and writing data¶
The classes responsible for the I/O are subclasses of
chemlab.io.handlers.IOHandler. These handlers take a
file-like object as the first argument and they work all in the same
way, here is an example of GroHandler:
from chemlab.io.handlers import GromacsIO fd = open('waterbox.gro', 'rb') infile = GromacsIO(fd) system = infile.read('system') # Modify system as you wish... fd = open('waterbox_out.gro', 'w') outfile = GromacsIO(fd) outfile.write('system', system)
You first create the handler instance for a certain format and then you can read a certain feature provided by the handler. In this example we read and write the system feature.
Some file formats may have some extra data for each atom, molecule or
system. For example the ”.gro” file formats have his own way to call
the atoms in a water molecule:
HW2. To handle
such issues, you can write this information in the export arrays
contained in the data structures, such as
Molecule.export, and their array-based counterparts
Those attributes are especially important where you write in some data format, since you may have to provide those attribute when you initialize your Atom, Molecule and System.
You can easily open a data file without even having to search his format
handler by using the utility function
chemlab.io.datafile() this is
the recommended way for automatically opening a file:
from chemlab.io import datafile # For reading sys = datafile('waterbox.gro').read('system') t, coords = datafile('traj.xtc').read('trajectory') # For writing datafile("output.gro", "w").write("system", sys)
Implementing your own IOHandler¶
Implementing or improving an existing IOHandler is a great way to participate in chemlab development. Fortunately, it’s extremely easy to set one up.
It boils down to a few steps:
- Define the class attributes can_read and can_write;
- Implement the write and read methods for the features that you added in can_read and can_write;
- Write the documentation for each feature.
Here is an example of the xyz handler:
import numpy as np from chemlab.io.handlers import IOHandler from chemlab.core import Molecule class XyzIO(IOHandler): '''The XYZ format is described in this wikipedia article http://en.wikipedia.org/wiki/XYZ_file_format. **Features** .. method:: read("molecule") Read the coordinates in a :py:class:`~chemlab.core.Molecule` instance. .. method:: write("molecule", mol) Writes a :py:class:`~chemlab.core.Molecule` instance in the XYZ format. ''' can_read = ['molecule'] can_write = ['molecule'] def read(self, feature): self.check_feature(feature, "read") lines = self.fd.readlines() num = int(lines) title = lines if feature == 'title': return title if feature == 'molecule': type_array =  r_array =  for l in lines[2:]: type, x, y, z = l.split() r_array.append([float(x),float(y),float(z)]) type_array.append(type) r_array = np.array(r_array)/10 # To nm type_array = np.array(type_array) return Molecule.from_arrays(r_array=r_array, type_array=type_array) def write(self, feature, mol): self.check_feature(feature, "write") lines =  if feature == 'molecule': lines.append(str(mol.n_atoms)) lines.append('Generated by chemlab') for t, (x, y, z) in zip(mol.type_array, mol.r_array): lines.append(' %s %.6f %.6f %.6f' % (t, x*10, y*10, z*10)) self.fd.write('\n'.join(lines))
A few remarks:
- It is recommended to use the method
check_feature()before performing read/write. This will check that the feature is present in the can_read/can_write list;
- If you want to squeeze out performance you should use
- You can read whatever data you wish, for example the
EdrIOhandler does not read Molecule or System at all;
You can definitely take inspiration from the handlers included in chemlab, Supported File Formats.