See Principles of data package for a more general discussion of design issues.
When developing or using nipy, many data files can be useful. We divide the data files nipy uses into at least 3 categories
Files used for routine testing are typically very small data files. They are shipped with the software, and live in the code repository. For example, in the case of nipy itself, there are some test files that live in the module path nipy.testing.data. Nibabel ships data files in nibabel.tests.data. See Adding test data for discussion.
template data and example data are example of data packages. What follows is a discussion of the design and use of data packages.
The programmer can use the data like this:
from nibabel.data import make_datasource
templates = make_datasource(dict(relpath='nipy/templates'))
fname = templates.get_filename('ICBM152', '2mm', 'T1.nii.gz')
where fname will be the absolute path to the template image ICBM152/2mm/T1.nii.gz.
The programmer can insist on a particular version of a datasource:
>>> if templates.version < '0.4':
... raise ValueError('Need datasource version at least 0.4')
Traceback (most recent call last):
...
ValueError: Need datasource version at least 0.4
Traceback (most recent call last):
...
ValueError: Need datasource version at least 0.4
If the repository cannot find the data, then:
>>> make_datasource(dict(relpath='nipy/implausible'))
Traceback (most recent call last):
...
nibabel.data.DataError: ...
Traceback (most recent call last):
...
nibabel.data.DataError: ...
where DataError gives a helpful warning about why the data was not found, and how it should be installed.
The example data and template data may be important, and so we want to warn the user if NIPY cannot find either of the two sets of data when installing the package. Thus:
python setup.py install
will import nipy after installation to check whether these raise an error:
>>> from nibabel.data import make_datasource
>>> templates = make_datasource(dict(relpath='nipy/templates'))
>>> example_data = make_datasource(dict(relpath='nipy/data'))
and warn the user accordingly, with some basic instructions for how to install the data.
The routine make_datasource will look for data packages that have been installed. For the following call:
>>> templates = make_datasource(dict(relpath='nipy/templates'))
the code will:
The paths collected by nibabel.data.get_data_paths() are constructed from ‘:’ (Unix) or ‘;’ separated strings. The source of the strings (in the order in which they will be used in the search above) are:
To be a valid NIPY project data package, you need to satisfy:
We recommend that:
Remember that there is a distinction between the NIPY project - the umbrella of neuroimaging in python - and the NIPY package - the main code package in the NIPY project. Thus, if you want to install data under the NIPY package umbrella, your data might go to /usr/share/nipy/nipy/packagename (on Unix). Note nipy twice - once for the project, once for the package. If you want to install data under - say - the pbrain package umbrella, that would go in /usr/share/nipy/pbrain/packagename.
The following tree is an example of the kind of pattern we would expect in a data directory, where the nipy-data and nipy-templates packages have been installed:
<ROOT>
`-- nipy
|-- data
| |-- config.ini
| `-- placeholder.txt
`-- templates
|-- ICBM152
| `-- 2mm
| `-- T1.nii.gz
|-- colin27
| `-- 2mm
| `-- T1.nii.gz
`-- config.ini
The <ROOT> directory is the directory that will appear somewhere in the list from nibabel.data.get_data_path(). The nipy subdirectory signifies data for the nipy package (as opposed to other NIPY-related packages such as pbrain). The data subdirectory of nipy contains files from the nipy-data package. In the nipy/data or nipy/templates directories, there is a config.ini file, that has at least an entry like this:
[DEFAULT]
version = 0.2
giving the version of the data package.
We use python distutils to install data packages, and the data_files mechanism to install the data. On Unix, with the following command:
python setup.py install --prefix=/my/prefix
data will go to:
/my/prefix/share/nipy
For the example above this will result in these subdirectories:
/my/prefix/share/nipy/nipy/data
/my/prefix/share/nipy/nipy/templates
because nipy is both the project, and the package to which the data relates.
If you install to a particular location, you will need to add that location to the output of nibabel.data.get_data_path() using one of the mechanisms above, for example, in your system configuration:
export NIPY_DATA_PATH=/my/prefix/share/nipy
For a particular data package - say nipy-templates - distributions will want to:
For the latter, the most obvious route is to copy an .ini file named for the data package into the NIPY etc_dir. In this case, on Unix, we will want a file called /etc/nipy/nipy_templates.ini with contents:
[DATA]
path = /usr/share/nipy
This section describes how we (the nipy community) implement data packages at the moment.
The data in the data packages will not usually be under source control. This is because images don’t compress very well, and any change in the data will result in a large extra storage cost in the repository. If you’re pretty clear that the data files aren’t going to change, then a repository could work OK.
The data packages will be available at a central release location. For now this will be: http://nipy.org/data-packages/ .
A package, such as nipy-templates-0.2.tar.gz will have the following sort of structure:
<ROOT>
|-- setup.py
|-- README.txt
|-- MANIFEST.in
`-- templates
|-- ICBM152
| |-- 1mm
| | `-- T1_brain.nii.gz
| `-- 2mm
| `-- T1.nii.gz
|-- colin27
| `-- 2mm
| `-- T1.nii.gz
`-- config.ini
There should be only one nipy/packagename directory delivered by a particular package. For example, this package installs nipy/templates, but does not contain nipy/data.
Making a new package tarball is simply:
The process of making a release should be:
There is an example nipy data package nipy-examplepkg in the examples directory of the NIPY repository.
The machinery for creating and maintaining data packages is available at https://github.com/nipy/data-packaging.
See the README.txt file there for more information.