Utilities for getting array slices out of file-like objects
calc_slicedefs(sliceobj, in_shape, itemsize, ...) | Return parameters for slicing array with sliceobj given memory layout |
canonical_slicers(sliceobj, shape[, check_inds]) | Return canonical version of sliceobj for array shape shape |
fileslice(fileobj, sliceobj, shape, dtype[, ...]) | Slice array in fileobj using sliceobj slicer and array definitions |
fill_slicer(slicer, in_len) | Return slice object with Nones filled out to match in_len |
is_fancy(sliceobj) | Returns True if sliceobj is attempting fancy indexing |
optimize_read_slicers(sliceobj, in_shape, ...) | Calculates slices to read from disk, and apply after reading |
optimize_slicer(slicer, dim_len, all_full, ...) | Return maybe modified slice and post-slice slicing for slicer |
predict_shape(sliceobj, in_shape) | Predict shape of array from slicing array shape shape with sliceobj |
read_segments(fileobj, segments, n_bytes) | Read n_bytes byte data implied by segments from fileobj |
slice2len(slicer, in_len) | Output length after slicing original length in_len with slicer |
slice2outax(ndim, sliceobj) | Matching output axes for input array ndim ndim and slice sliceobj |
slicers2segments(read_slicers, in_shape, ...) | Get segments from read_slicers given input in_shape and memory steps |
strided_scalar(shape[, scalar]) | Return array shape shape where all entries point to value scalar |
threshold_heuristic(slicer, dim_len, stride) | Whether to force full axis read or contiguous read of stepped slice |
Return parameters for slicing array with sliceobj given memory layout
Calculate the best combination of skips / (read + discard) to use for reading the data from disk / memory, then generate corresponding segments, the disk offsets and read lengths to read the memory. If we have chosen some (read + discard) optimization, then we need to discard the surplus values from the read array using post_slicers, a slicing tuple that takes the array as read from a file-like object, and returns the array we want.
Parameters: | sliceobj : object
in_shape : sequence
itemsize : int
offset : int
order : {‘C’, ‘F’}
heuristic : callable, optional
|
---|---|
Returns: | segments : list
read_shape : tuple
post_slicers : tuple
|
Return canonical version of sliceobj for array shape shape
sliceobj is a slicer for an array A implied by shape.
Does not handle fancy indexing (indexing with arrays or array-like indices)
Parameters: | sliceobj : object
shape : sequence
check_inds : {True, False}, optional
|
---|---|
Returns: | can_slicers : tuple
|
Slice array in fileobj using sliceobj slicer and array definitions
fileobj contains the contiguous binary data for an array A of shape, dtype, memory layout shape, dtype, order, with the binary data starting at file offset offset.
Our job is to return the sliced array A[sliceobj] in the most efficient way in terms of memory and time.
Sometimes it will be quicker to read memory that we will later throw away, to save time we might lose doing short seeks on fileobj. Call these alternatives: (read + discard); and skip. This routine guesses when to (read+discard) or skip using the callable heuristic, with a default using a hard threshold for the memory gap large enough to prefer a skip.
Parameters: | fileobj : file-like object
sliceobj : object
shape : sequence
dtype : dtype object
offset : int, optional
order : {‘C’, ‘F’}, optional
heuristic : callable, optional
|
---|---|
Returns: | sliced_arr : array
|
Return slice object with Nones filled out to match in_len
Also fixes too large stop / start values according to slice() slicing rules.
The returned slicer can have a None as slicer.stop if slicer.step is negative and the input slicer.stop is None. This is because we can’t represent the stop as an integer, because -1 has a different meaning.
Parameters: | slicer : slice object in_len : int
|
---|---|
Returns: | can_slicer : slice object
|
Returns True if sliceobj is attempting fancy indexing
Parameters: | sliceobj : object
|
---|---|
Returns: | tf: bool :
|
Calculates slices to read from disk, and apply after reading
Parameters: | sliceobj : object
in_shape : sequence
itemsize : int
heuristic : callable
|
---|---|
Returns: | read_slicers : tuple
post_slicers : tuple
|
Return maybe modified slice and post-slice slicing for slicer
Parameters: | slicer : slice object or int dim_len : int
all_full : bool
is_slowest : bool
stride : int
heuristic : callable, optional
|
---|---|
Returns: | to_read : slice object or int
post_slice : slice object
|
Notes
This is the heart of the algorithm for making segments from slice objects.
A contiguous slice is a slice with slice.step in (1, -1)
A full slice is a continuous slice returning all elements.
The main question we have to ask is whether we should transform to_read, post_slice to prefer a full read and partial slice. We only do this in the case of all_full==True. In this case we might benefit from reading a continuous chunk of data even if the slice is not continuous, or reading all the data even if the slice is not full. Apply a heuristic heuristic to decide whether to do this, and adapt to_read and post_slice slice accordingly.
Otherwise (apart from constraint to be positive) return to_read unaltered and post_slice as slice(None)
Predict shape of array from slicing array shape shape with sliceobj
Parameters: | sliceobj : object
in_shape : sequence
|
---|---|
Returns: | out_shape : tuple
|
Read n_bytes byte data implied by segments from fileobj
Parameters: | fileobj : file-like object
segments : sequence
n_bytes : int
|
---|---|
Returns: | buffer : buffer object
|
Output length after slicing original length in_len with slicer Parameters ———- slicer : slice object in_len : int
Returns: | out_len : int
|
---|
Notes
Returns same as len(np.arange(in_len)[slicer])
Matching output axes for input array ndim ndim and slice sliceobj
Parameters: | ndim : int
sliceobj : object
|
---|---|
Returns: | out_ax_inds : tuple
|
Get segments from read_slicers given input in_shape and memory steps
Parameters: | read_slicers : object
in_shape : sequence
offset : int
itemsize : int
|
---|---|
Returns: | segments : list
|
Return array shape shape where all entries point to value scalar
Parameters: | shape : sequence
scalar : scalar
|
---|---|
Returns: | strided_arr : array
|
Whether to force full axis read or contiguous read of stepped slice
Allows fileslice() to sometimes read memory that it will throw away in order to get maximum speed. In other words, trade memory for fewer disk reads.
Parameters: | slicer : slice object, or int
dim_len : int
stride : int
skip_thresh : int, optional
|
---|---|
Returns: | action : {‘full’, ‘contiguous’, None}
|
Notes
Let’s say we are in the middle of reading a file at the start of some memory length \(B\) bytes. We don’t need the memory, and we are considering whether to read it anyway (then throw it away) (READ) or stop reading, skip \(B\) bytes and restart reading from there (SKIP).
After trying some more fancy algorithms, a hard threshold (skip_thresh) for the maximum skip distance seemed to work well, as measured by times on nibabel.benchmarks.bench_fileslice