So, we have two different ways of adding test data.
Small files are around 50K or less when compressed. By “compressed”, we mean, compressed with zlib, which is what git uses when storing the file in the repository. You can check the exact length directly with Python and a script like:
import sys
import zlib
for fname in sys.argv[1:]:
with open(fname, 'rb') as fobj:
contents = fobj.read()
compressed = zlib.compress(contents)
print(fname, len(compressed) / 1024.)
One way of making files smaller when compressed is to set uninteresting values to zero or some other number so that the compression algorithm can be more effective.
Please don’t compress the file yourself before committing to a git repo unless there’s a really good reason; git will do this for you when adding to the repository, and it’s a shame to make git compress a compressed file.
We very much prefer files with completely open licenses such as the PDDL 1.0 or the CC0 license.
The files in the nibabel/tests/data will get distributed with the nibabel source code, and this can easily get installed without the user having an opportunity to review the full license. We don’t think this is compatible with extra license terms like agreeing to cite the people who provided the data or agreeing not to try and work out the identity of the person who has been scanned, because it would be too easy to miss these requirements when using nibabel. It is fine to use files with these kind of licenses, but they should go in their own repository to be used as a submodule, so they do not need to be distributed with nibabel.
If the file is less then about 50K compressed, and the license is open, then you might want to commit the file under nibabel/tests/data.
Put the license for any new files in the COPYING file at the top level of the nibabel repo. You’ll see some examples in that file already.
Make a new git repository with the data.
There are example repos at
Despite the fact that both the examples are on github, Bitbucket is good for repos like this because they don’t enforce repository size limits.
Don’t forget to include a LICENSE and README file in the repo.
When all is done, and the repository is safely on the internet and accessible, add the repo as a submodule to the nitests-data directory, with something like this:
git submodule add https://bitbucket.org/nipy/rosetta-samples.git nitests-data/rosetta-samples
You should now have a checked out copy of the rosetta-samples repository in the nibabel-data/rosetta-samples directory. Commit the submodule that is now in your git staging area.
If you are writing tests using files from this repository, you should use the needs_nibabel_data decorator to skip the tests if the data has not been checked out into the submodules. See nibabel/tests/test_parrec_data.py for an example. For our example repository above it might look something like:
from .nibabel_data import get_nibabel_data, needs_nibabel_data
ROSETTA_DATA = pjoin(get_nibabel_data(), 'rosetta-samples')
@needs_nibabel_data('rosetta-samples')
def test_something():
# Some test using the data
Tests run via nibabel on travis start with an automatic checkout of all submodules in the project, so all test data submodules get checked out by default.
If you are running the tests locally, you may well want to do:
git submodule update --init
from the root nibabel directory. This will checkout all the test data repositories.
The limiting factor is how long it takes travis-ci to checkout the data for the tests. Up to a hundred megabytes in one repository should be OK. The joy of submodules is we can always drop a submodule, split the repository into two and add only one back, so you aren’t committing us to anything awful if you accidentally put some very large files into your own data repository.
If you are not sure, try us with a pull request to nibabel github, or on the nipy mailing list, we will try to help.