Chapter 4: Library Reference
PyTables implements several classes to represent
the different nodes in the object tree. They are named
File, Group, Leaf,
Table, Array, EArray,
VLArray and UnImplemented. Another
one allows the user to complement the information on these
different objects; its name is
AttributeSet. Finally, another important class
called IsDescription allows to build a
Table record description by declaring a subclass
of it. Many other classes are defined in
PyTables, but they can be regarded as helpers
whose goal is mainly to declare the data type
properties of the different first class objects and will
be described at the end of this chapter as well.
An important function, called openFile is
responsible to create, open or append to files. In addition, a
few utility functions are defined to guess if the user
supplied file is a PyTables or HDF5
file. These are called isPyTablesFile and
isHDF5, respectively. Finally, there exists a
function called whichLibVersion that informs
about the versions of the underlying C libraries (for example,
the HDF5 or the Zlib).
Let's start discussing the first-level variables and
functions available to the user, then the different classes
defined in PyTables.
4.1 tables variables and
functions
4.1.1 Global variables
- __version__
- The PyTables
version number.
- ExtVersion
- The version of the Pyrex
extension module. This might be useful when reporting
bugs.
- HDF5Version
- The underlying HDF5 library version number.
4.1.2 Global functions
- copyFile(srcFilename=None, dstFilename=None, title=None,
filters=None, copyuserattrs=1, overwrite=0)
- Copy a closed PyTables (or generic
HDF5) file specified by
srcFilename to dstFilename. Returns a
tuple in the form (ngroups, nleaves,
nbytes) specifiying the number of groups, leaves
and bytes copied.
- title
- The title for the new
file. If not specified, the source file title will
be copied.
- filters
- A Filters instance (see
4.12.1). If
specified, it will override the original filter
properties in all
source nodes.
- copyuserattrs
- You can prevent the
user attributes from being copied by setting this
parameter to 0. The default is to copy them.
- overwrite
- If
dstFilename file already exists and
overwrite is 1, it will be silently overwritten. The
default is not overwriting.
- isHDF5(filename)
- Determines whether
filename is in the HDF5 format or not. When successful,
returns a positive value, for TRUE, or 0 (zero), for
FALSE. Otherwise returns a negative value. To this
function to work, it needs a closed file.
- isPyTablesFile(filename)
- Determines
whether a file is in the PyTables format.
When successful, returns the format version string, for
TRUE, or 0 (zero), for FALSE. Otherwise returns a
negative value. To this function to work, it needs a
closed file.
- openFile(filename, mode='r', title='', trMap={},
rootUEP="/", filters=None)
- Open a PyTables (or generic
HDF5) file and returns a File
object.
- filename
- The name of the file
(supports environment variable expansion). It is
suggested that it should have any of
".h5", ".hdf" or
".hdf5" extensions, although this is
not mandatory.
- mode
- The mode to open the file. It
can be one of the following:
- 'r'
- read-only; no data can be
modified.
- 'w'
- write; a new file is
created (an existing file with the same name
would be deleted).
- 'a'
- append; an existing file is
opened for reading and writing, and if the file
does not exist it is created.
- 'r+'
- is similar to 'a', but the
file must already exist.
- title
- If filename is new, this will
set a title for the root group in this file. If
filename is not new, the title will be read from
disk, and this will not have any effect.
- trMap
- A dictionary to map names in
the object tree Python namespace into different HDF5
names in file namespace. The keys are the Python
names, while the values are the HDF5 names. This is
useful when you need to use HDF5 node names with
invalid or reserved words in Python.
- rootUEP
- The root User Entry
Point. This is a group in the HDF5 hierarchy which
will be taken as the starting point to create the
object tree. The group has to be named after its
HDF5 name and can be a path. If it does not exist, a
RuntimeError exception is issued. Use
this if you do not want to build the entire object tree, but rather
only a subtree of it.
- filters
- An instance of the
Filters class (see section4.12.1) that provides
information about the desired I/O filters applicable
to the leaves that hangs directly from root
(unless other filters properties are specified for
these leaves). Besides, if you do not specify filter
properties for its child groups, they will inherit
these ones. So, if you open a new file with this
parameter set, all the leaves that would be created
in the file will recursively inherit this filtering
properties (again, if you don't prevent that from
happening by specifying other filters on the child
groups or leaves).
- whichLibVersion(libname)
- Returns info
about versions of the underlying C libraries. libname can be whether
"hdf5", "zlib",
"lzo" or "ucl". It always
returns a tuple of 3 elements. When successful, the
first element of this tuple has a positive value, and is
0 (zero) when library is not available (for example LZO
or UCL). In case the library is available, the second
element of tuple contains the library version and the
third element the date (if available) of that version.
4.2 The File class
This class is returned when a PyTables file is
opened with the openFile function. It has
methods to flush and close files. Also, the
File class offer methods to create, rename and
delete nodes, as well as to traverse the object tree. One
of its attributes (rootUEP) represents the
user entry point to the object tree attached to the
file.
Next, we will discuss the attributes and methods for File
class4).
4.2.1 File instance
variables
- filename
- Filename opened.
- format_version
- The
PyTables version number of this file.
- isopen
- It takes the value 1 if the
underlying file is open. 0 otherwise.
- mode
- Mode in which the filename was
opened.
- root
- The root of the object
tree hierarchy. It is a Group instance.
- rootUEP
- The UEP (User Entry Point)
group in file (see ??).
- title
- The title of the root group in
file.
- trMap
- This is a dictionary that maps
node names between python and HDF5 domain names. Its
initial values are set from the trMap parameter
passed to the openFile function. You can
change its contents after a file is opened and
the new map will take effect over any new object added
to the tree.
- filters
- Container for filter properties
associated to this file. See section 4.12.1 for more
information on this object.
- objects
- Dictionary with all objects
(groups or leaves) on tree.
- groups
- Dictionary with all object
groups on tree.
- leaves
- Dictionary with all object
leaves on tree.
4.2.2 File methods
copyChildren(whereSrc, whereDst, recursive=0,
filters=None, copyuserattrs=1, start=0, stop=None,
step=1, overwrite = 0)
Copy (recursively) the children of a group into another
location. Returns a tuple in the form (ngroups,
nleaves, nbytes) specifiying the number of
groups, leaves and bytes copied.
- whereSrc
- The parent group where the
children to be copied are hanging on. This parameter
can be a path string (for example
"/level1/group5"), or another
Group instance.
- whereDst
- The parent group where the
source children will be copied to. This group must exist
or a LookupError will be issued. This
parameter can be a path string (for example
"/level1/group6"), or another
Group instance.
- recursive
- Specifies whether the copy
should recurse into subgroups or not. The default is
not recurse.
- filters
- Whether or not override the
original filter properties present in source nodes.
This parameter must be an instance of the
Filters class (see section4.12.1). The default is to
copy the filter attributes from source children.
- copyuserattrs
- You can prevent the
user attributes from being copied by setting this
parameter to 0. The default is to copy them.
- start, stop, step
- Specifies the range
of rows in child leaves to be copied; the default is
to copy all the rows.
- overwrite
- Whether the possible
existing children hanging from whereDst
and having the same names than whereSrc
children should overwrite the destination nodes or
not.
copyFile(dstFilename=None, title=None,
filters=None, copyuserattrs=1, overwrite=0)
Copy the contents of this file to dstFilename.
If the filename already exists it won't be overwritten
unless overwrite is set to true (see later).
Returns a tuple in the form (ngroups, nleaves,
nbytes) specifiying the number of groups, leaves
and bytes copied.
- title
- The title for the new file. If
not specified, the source file title will be copied.
- filters
- Whether or not override the
original filter properties present in source nodes.
This parameter must be an instance of the
Filters class (see section4.12.1). The default is to
copy the filter attributes from source children.
- copyuserattrs
- You can prevent the
user attributes from being copied by setting this
parameter to 0. The default is to copy them.
- copyuserattrs
- You can prevent the
user attributes from being copied by setting this
parameter to 0. The default is to copy them.
- overwrite
- Whether overwrite or not the
possibly existing dstFilename file. The
default is not overwrite it.
createGroup(where, name, title='', filters=None)
Create a new Group instance with name name in
where location.
- where
- The parent group where the new
group will hang from. where parameter can be
a path string (for example
"/level1/group5"), or another Group
instance.
- name
- The name of the new group.
- title
- A description for this
group.
- filters
- An instance of the
Filters class (see section4.12.1) that provides
information about the desired I/O filters applicable
to the leaves that hangs directly from this new group
(unless other filters properties are specified for
these leaves). Besides, if you do not specify filter
properties for its child groups, they will inherit
these ones.
createTable(where, name,
description, title='', filters=None,
expectedrows=10000)
Create a new Table instance with name
name in where location.
- where
- The parent group where the new
table will hang from. where parameter can be
a path string (for example
"/level1/leaf5"), or Group instance.
- name
- The name of the new table.
- description
- An instance of a
user-defined class (derived from the
IsDescription class) where table fields
are defined. However, in certain situations, it is
more handy to allow this description to be supplied as
a dictionary (for example, when you do not know
beforehand which structure will have your table). In
such a cases, you can pass the description as a
dictionary as well. See section 3.3 for an example of
use. Finally, a RecArray object from the
numarray package is also accepted, and
all the information about columns and other metadata
is used as a basis to create the Table
object. Moreover, if the RecArray has
actual data this is also injected on the newly created
Table object.
- title
- A description for this object.
- filters
- An instance of the
Filters class (see section 4.12.1) that provides
information about the desired I/O filters to be
applied during the life of this object.
- expectedrows
- An user estimate of the
number of records that will be on table. If not
provided, the default value is appropriate for tables
until 1 MB in size (more or less, depending on the
record size). If you plan to save bigger tables you
should provide a guess; this will optimize the HDF5
B-Tree creation and management process time and memory
used. See section 5.4
for a discussion on that issue.
createArray(where, name,
object, title='')
Create a new Array instance with name
name in where location.
- object
- The regular array to be
saved. Currently accepted values are: lists, tuples,
scalars (int and float), strings and
(multidimensional) Numeric and
NumArray arrays (including
CharArrays string arrays). However, these
objects must be regular (i.e. they cannot be like, for
example, [[1,2],2]). Also, objects that
has some of its dimension equal to zero are not
supported (this will be solved when unlimited arrays
will be implemented).
See createTable
description ?? for more information on the
where, name and title,
parameters.
createEArray(where, name,
atom, title='', filters=None, expectedrows=1000)
Create a new EArray instance with name
name in where location.
- atom
- An Atom instance
representing the shape, type and
flavor of the atomic objects to be saved.
One (and only one) of the shape dimensions must be 0. The dimension being 0
means that the resulting EArray object
can be extended along it. Multiple enlargeable
dimensions are not supported right now. See section 4.11.3 for the supported
set of Atom class descendants.
- expectedrows
- In the case of
enlargeable arrays this represents an user estimate
about the number of row elements that will be added to
the growable dimension in the EArray object. If not
provided, the default value is 1000 rows. If you plan
to create both much smaller or much bigger EArrays try
providing a guess; this will optimize the HDF5 B-Tree
creation and management process time and the amount of
memory used.
See createTable
description ?? for more information on the
where, name, title,
and filters parameters.
createVLArray(where,
name, atom=None, title='', filters=None,
expectedsizeinMB=1.0)
Create a new VLArray instance with name
name in where location. See the section 4.8 for a
description of the VLArray class.
- atom
- An Atom instance
representing the shape, type and flavor of the atomic
object to be saved. See section 4.11.3 for the supported set
of Atom class descendants.
- expectedsizeinMB
- An user estimate
about the size (in MB) in the final
VLArray object. If not provided, the
default value is 1 MB. If you plan to create both
much smaller or much bigger VLA's try providing a
guess; this will optimize the HDF5 B-Tree creation and
management process time and the amount of memory used.
See createTable
description ?? for more information on the
where, name, title, and
filters parameters.
getNode(where, name='',
classname='')
Returns the object node name under
where location.
- where
- Can be a path string or
Group instance. If where
doesn't exists or has already a child called
name, a ValueError error is
raised.
- name
- The object name desired. If
name is a null string (''), or not
supplied, this method assumes to find the object in
where.
- classname
- If supplied, returns only
an instance of this class name. Possible values are:
'Group', 'Leaf',
'Table', 'Array',
'EArray', 'VLArray' and
'UnImplemented'. Note that these values
are strings.
getAttrNode(where,
attrname, name='' )
Returns the attribute attrname under
where.name location.
- where
- Can be a path string or
Group instance. If where
doesn't exists or has not a child called
name, a ValueError error is
raised.
- attrname
- The name of the attribute
to get.
- name
- The node name desired. If
name is a null string (''), or not
supplied, this method assumes to find the object in
where.
setAttrNode(where,
attrname, attrvalue, name='')
Sets the attribute attrname with value
attrvalue under where.name location.
- where
- Can be a path string or
Group instance. If where
doesn't exists or has not a child called
name, a ValueError error is
raised.
- attrname
- The name of the attribute
to set on disk.
- attrvalue
- The value of the
attribute to set. Only strings attributes are
supported natively right now. However, you can
always use (c)Pickle so as to serialize
any object you want save therein.
- name
- The node name desired. If
name is a null string (''), or not
supplied, this method assumes to find the object in
where.
listNodes(where,
classname='')
Returns a list with all the object nodes (Group or
Leaf) hanging from where. The list is
alpha-numerically sorted by node name.
- where
- The parent group. Can be a
path string or Group instance.
- classname
- If a classname
parameter is supplied, the iterator will return only
instances of this class (or subclasses of it). The
only supported classes in classname are
'Group', 'Leaf',
'Table', 'Array',
'EArray', 'VLArray' and
'UnImplemented'. Note that these values
are strings.
removeNode(where, name = "",
recursive=0)
Removes the object node
name under where location.
- where
- Can be a path string or
Group instance. If where
doesn't exists or has not a child called
name, a LookupError error is
raised.
- name
- The name of the node to be
removed. If not provided, the where node is
changed.
- recursive
- If not supplied, the
object will be removed only if it has no
children. If supplied with a true value, the object
and all its descendants will be completely
removed.
renameNode(where, newname,
name)
Rename the object node name under
where location.
- where
- Can be a path string or
Group instance. If where
doesn't exists or has not a child called
name, a LookupError error is
raised.
- newname
- Is the new name to be
assigned to the node.
- name
- The name of the node to be
changed. If not provided, the where node is
changed.
walkGroups(where='/')
Iterator that returns the list of Groups (not
Leaves) hanging from where. If where
is not supplied, the root object is taken as origin. The
returned Group list is in a top-bottom order, and
alpha-numerically sorted when they are at the same level.
- where
- The origin group. Can be a
path string or Group instance.
flush()
Flush all the leaves in the object tree.
close()
Flush all the leaves in object tree and close the file.
4.2.3 File special
methods
Following are described the methods that automatically
trigger actions when a File instance is
accessed in a special way (e.g.,
fileh("/detector") will cause a call to
group.__call__("/detector")).
__call__(where="/",
classname="")
Recursively iterate over the children in the
File instance. It takes two parameters:
- where
- If supplied, the iteration
starts from this group.
- classname
- (String) If
supplied, only instances of this class are
returned.
Example of use:
# Recursively print all the nodes hanging from '/detector'
print "Nodes hanging from group '/detector':"
for node in h5file("/detector"):
print node
__iter__()
Iterate over the children on the File
instance. However, this does not accept parameters. This
iterator is recursive.
Example of use:
# Recursively list all the nodes in the object tree
h5file = tables.openFile("vlarray1.h5")
print "All nodes in the object tree:"
for node in h5file:
print node
4.3 The Group class
Instances of this class are a grouping structure containing
instances of zero or more groups or leaves, together with
supporting metadata.
Working with groups and leaves is similar in many ways to
working with directories and files, respectively, in a Unix
filesystem. As with Unix directories and files, objects in
the object tree are often described by giving their full (or
absolute) path names. This full path can be specified either
as a string (like in '/group1/group2') or as a
complete object path written in natural name schema
(like in
file.root.group1.group2) as
discussed in the section 1.2.
A collateral effect of the natural naming schema
is that you must be aware when assigning a new attribute
variable to a Group object to not collide with existing
children node names. For this reason and to not pollute the
children namespace, it is explicitly forbidden to assign
"normal" attributes to Group instances, and the only ones
allowed must start with some reserved prefixes, like
"_f_" (for methods) or "_v_" (for
instance variables) prefixes. Any attempt to assign a new
attribute that does not starts with these prefixes, will
raise a NameError exception.
Other effect is that you cannot use reserved Python names
or other non-allowed python names (like for example "$a" or
"44") as node names. You can, however, make use of the
trMap (translation map dictionary) parameter in
the openFile function (see section ??) in order to use non-valid
Python names as node names in the file.
4.3.1 Group instance
variables
- _v_title
- A description for this group.
- _v_name
- The name of this group.
- _v_hdf5name
- The name of this group in
HDF5 file namespace.
- _v_pathname
- A string representation of the group location
in tree.
- _v_parent
- The parent Group instance.
- _v_rootgroup
- Pointer to the root group object.
- _v_file
- Pointer to the associated File object.
- _v_depth
- The depth level in tree for
this group.
- _v_nchildren
- The number of children
(groups or leaves) hanging from this instance.
- _v_children
- Dictionary with all nodes
(groups or leaves) hanging from this instance.
- _v_groups
- Dictionary with all node
groups hanging from this instance.
- _v_leaves
- Dictionary with all node
leaves hanging from this instance.
- _v_attrs
- The associated
AttributeSet instance (see 4.10).
- _v_filters
- Container for filter
properties. See section 4.12.1 for more
information on this object.
4.3.2 Group methods
This class define the
__setattr__,
__getattr__ and
__delattr__ and
they work as normally intended. So, you can access, assign
or delete children to a group by just using the next
constructs:
# Add a Table child instance under group with name "tablename"
group.tablename = Table(recordDict, "Record instance")
table = group.tablename # Get the table child instance
del group.tablename # Delete the table child instance
Caveat: The following
methods are documented for completeness, and they can be
used without any problem. However, you should use the
high-level counterpart methods in the File
class, because these are most used in documentation and
examples, and are a bit more powerful than those exposed
here.
_f_join(name)
Helper method to correctly concatenate a name child object
with the pathname of this group.
_f_rename(newname)
Change the name of this group to newname.
_f_remove(recursive=0)
Remove this object. If recursive is true,
force the removal even if this group has children.
_f_getAttr(attrname)
Gets the HDF5 attribute attrname of this
group.
_f_setAttr(attrname, attrvalue)
Sets the attribute attrname of this group to
the value attrvalue. Only string values are
allowed.
_f_listNodes(classname='')
Returns a list with all the object nodes
hanging from this instance. The list is
alpha-numerically sorted by node name. If a
classname parameter is supplied, it will only
return instances of this class (or subclasses of
it). The supported classes in classname are
'Group', 'Leaf',
'Table' and 'Array',
'EArray', 'VLArray' and
'UnImplemented'.
_f_walkGroups()
Iterator that returns the list of Groups (not Leaves)
hanging from self. The returned Group list is
in a top-bottom order, and alpha-numerically sorted when
they are at the same level.
_f_close()
Close this group, making it and its children
unaccessible in the object tree.
_f_copyChildren(where, recursive=0, filters=None,
copyuserattrs=1, start=0, stop=None, step=1,
overwrite=0)
Copy (recursively) the children of this group into
another location specified by where (it can be
a path string or a Group object). Returns a
tuple in the form (ngroups, nleaves,
nbytes) specifiying the number of groups, leaves
and bytes copied.
- recursive
- Specifies whether the copy
should recurse into subgroups or not. The default is
not recurse.
- filters
- Whether or not override the
original filter properties present in source nodes.
This parameter must be an instance of the
Filters class (see section4.12.1). The default is to
copy the filter attributes from source children.
- copyuserattrs
- You can prevent the
user attributes from being copied by setting this
parameter to 0. The default is to copy them.
- start, stop, step
- Specifies the range
of rows in child leaves to be copied; the default is
to copy all the rows.
- overwrite
- Whether the possible
existing children hanging from this group and having
the same names than where children should
overwrite the destination nodes or not.
4.3.3 Group special
methods
Following are described the methods that automatically
trigger actions when a Group instance is
accessed in a special way (e.g.,
group("Table") will be equivalent to a call
to group.__call__("Table")).
__call__(classname="",
recursive=0)
Iterate over the children in the Group
instance. It takes two parameters:
- classname
- (String) If
supplied, only instances of this class are
returned.
- recursive
- (Integer) If
false, only children hanging immediately after the group
are returned. If true, a recursion over all the groups
hanging from it is performed.
Example of use:
# Recursively print all the arrays hanging from '/'
print "Arrays the object tree '/':"
for array in h5file.root(classname="Array", recursive=1):
print array
__iter__()
Iterate over the children on the group instance. However,
this does not accept parameters. This iterator is
not recursive.
Example of use:
# Non-recursively list all the nodes hanging from '/detector'
print "Nodes in '/detector' group:"
for node in h5file.root.detector:
print node
4.4 The Leaf class
The goal of this class is to provide a place to put common
functionality of all its descendants as well as provide a
way to help classifying objects on the tree. A
Leaf object is an end-node, that is, a node
that can hang directly from a group object, but that is not
a group itself and, thus, it cannot have descendents. Right
now, the set of end-nodes is composed by Table,
Array, EArray,
VLArray and UnImplemented class
instances. In fact, all the previous classes inherits from
the Leaf class.
4.4.1 Leaf instance
variables
The public variables and methods that class descendants
inherits from Leaf are listed below.
- name
- The Leaf node name in Python
namespace.
- hdf5name
- The Leaf node name in HDF5
namespace.
- objectID
- The HDF5 object ID of the Leaf
node.
- title
- The Leaf title (actually a
property rather than a plain attribute).
- shape
- The shape of the associated data
in the Leaf.
- byteorder
- The byteorder of
the associated data of the Leaf.
- attrs
- The associated
AttributeSet instance (see 4.10).
- filters
- Container for filter
properties. See section 4.12.1 for more
information on this object.
Besides, the next instance variables are also defined and
have similar meaning as its counterparts in the
Group class:
- _v_hdf5name
- The name of this leaf in
HDF5 file namespace.
- _v_pathname
- A string representation of the leaf location
in tree.
- _v_parent
- The parent Group instance.
- _v_rootgroup
- Pointer to the root
Group object.
- _v_file
- Pointer to the associated
File object.
- _v_depth
- The depth level in tree for
this leaf.
4.4.2 Leaf methods
copy(where, name, title=None,
filters=None, copyuserattrs=1, start=0, stop=None,
step=1, overwrite=0)
Copy this leaf into another location. It returns a
tuple
(object, nbytes) where
object is the newly created object and
nbytes is the number of bytes copied. The
meaning of the parameters is explained below:
- where
- Can be a path string or
Group instance. If where
doesn't exists or has not a child called
name, a LookupError error is
raised.
- name
- The name of the destination
node.
- title
- The new title for
destination. If None, the original title is copied.
- filters
- An instance of the
Filters (see section 4.12.1) class. A
None value means that the source properties are
copied as is.
- copyuserattrs
- Whether copy the user
attributes of the source leaf to the destination or
not. The default is to copy them.
- start, stop, step
- Specifies the
range of rows to be copied; the default is to copy
all the rows.
- overwrite
- If the destination node
name already exists this specifies whether
it should be overwritten or not. The default is not
overwrite it.
remove()
Remove this leaf.
rename(newname)
Change the name of this leaf to newname.
getAttr(attrname)
Gets the HDF5 attribute attrname of this leaf.
setAttr(attrname,
attrvalue)
Sets the attribute attrname of this leaf to
the value attrvalue.
flush()
Flush the leaf buffers (if any).
close()
Flush
the leaf buffers (if any) and close the dataset.
4.5 The Table class
Instances of this class represents table objects in the
object tree. It provides methods to read/write data and
from/to table objects in the file.
Data can be read from or written to tables by accessing to
an special object that hangs from Table. This
object is an instance of the Row class (see
4.5.4). See the tutorial
sections chapter 3 on how to use the
Row interface. The columns of the tables can
also be easily accessed (and more specifically, they can be
read but not written) by making use of the
Column class, through the use of an
extension of the natural naming schema applied
inside the tables. See the section 4.5.6 for some examples of
use of this capability.
Note that this object inherits all the public attributes
and methods that Leaf already has.
4.5.1 Table instance
variables
- description
- The metaobject describing
this table.
- row
- The Row instance for
this table (see 4.5.4).
- nrows
- The number of rows in this table.
- rowsize
- The size, in bytes, of each
row.
- cols
- A Cols (see section 4.5.5) instance that
serves as accessor to Column (see section 4.5.6) objects.
- colnames
- The field names for the table
(list).
- coltypes
- The data types for the table
fields (dictionary).
- colshapes
- The shapes for the table
fields (dictionary).
4.5.2 Table methods
append(rows=None)
Append a series of rows to this Table
instance. rows is an object that can keep the
rows to be append in several formats, like a
RecArray, a list of tuples, list of
Numeric/NumArray/CharArray
objects, string, Python buffer or None (no append will
result). Of course, this rows object has to be
compliant with the underlying format of the
Table instance or a ValueError
will be issued.
Example of use:
from tables import *
class Particle(IsDescription):
name = StringCol(16, pos=1) # 16-character String
lati = IntCol(pos=2) # integer
longi = IntCol(pos=3) # integer
pressure = Float32Col(pos=4) # float (single-precision)
temperature = FloatCol(pos=5) # double (double-precision)
fileh = openFile("test4.h5", mode = "w")
table = fileh.createTable(fileh.root, 'table', Particle, "A table")
# Append several rows in only one call
table.append([("Particle: 10", 10, 0, 10*10, 10**2),
("Particle: 11", 11, -1, 11*11, 11**2),
("Particle: 12", 12, -2, 12*12, 12**2)])
fileh.close()
iterrows(start=None,
stop=None, step=1)
Returns an iterator yielding Row (see section 4.5.4) instances built
from rows in table. If a range is supplied (i.e. some of
the start, stop or step
parameters are passed), only the appropriate rows are
returned. Else, all the rows are returned. See also the
__call__() and __iter__()
special methods in section 4.5.3 for shorter
ways to call this iterator.
The meaning of the start, stop and
step parameters is the same as in the
range() python function, except that
negative values of step are not
allowed. Moreover, if only start is
specified, then stop will be set to
start+1. If you do not specify neither
start nor stop, then all the rows in the object are
selected.
read(start=None, stop=None,
step=1, field=None, flavor="numarray")
Returns the actual data in Table. If
field is not supplied, it returns the data as a
RecArray object table.
The meaning of the start, stop and
step parameters is the same as in the
range() python function, except that
negative values of step are not
allowed. Moreover, if only start is
specified, then stop will be set to
start+1. If you do not specify neither
start nor stop, then all the rows in
the object are selected.
The rest of the parameters are described next:
- field
- If specified, only the column
field is returned as a NumArray
object. If this is not supplied, all the fields are
selected and a RecArray is returned.
- flavor
- When a field in table is
selected, passing a flavor parameter make an
additional conversion to happen in the default
"numarray" returned
object. flavor must have any of the next
values: "numarray" (i.e. no conversion is
made), "Numeric", "Tuple" or
"List".
removeRows(start=None,
stop=None)
Removes a range of rows in the table. If only
start is supplied, this row is to be
deleted. If a range is supplied, i.e. both the
start and stop parameters are passed,
all the rows in the range are removed. A step
parameter is not supported, and it is not foreseen to
implement it anytime soon.
- start
- Sets the starting row to
be removed. It accepts negative values meaning that
the count starts from the end. A value of 0 means
the first row.
- stop
- Sets the last row to be
removed to stop - 1, i.e. the end point is
omitted (in the Python range
tradition). It accepts, likewise start,
negative values. A special value of
None means the last row.
4.5.3 Table special
methods
Following are described the methods that automatically
trigger actions when a Table instance is
accessed in a special way (e.g.,
table["var2"] will be equivalent to a call to
table.__getitem__("var2")).
__call__(start=None,
stop=None, step=1)
It returns the same iterator than
Table.iterrows(start, stop, step). It is,
therefore, a shorter way to call it.
Example of use:
result = [ row['var2'] for row in table(step=4)
if row['var1'] <= 20 ]
Which is equivalent to:
result = [ row['var2'] for row in table.iterrows(step=4)
if row['var1'] <= 20 ]
__iter__()
It returns the same iterator than
Table.iterrows(0,0,1). However, this does not
accept parameters.
Example of use:
result = [ row['var2'] for row in table
if row['var1'] <= 20 ]
Which is equivalent to:
result = [ row['var2'] for row in table.iterrows()
if row['var1'] <= 20 ]
__getitem__(key)
It takes different actions depending on the
type of the key parameter:
- key is an
Integer
- The corresponding
table row is returned as a
RecArray.Record object.
- key is a
Slice
- The row slice
determined by key is returned as a
RecArray object.
- key is a
String
- The key
is interpreted as a column name of the table,
and, if it exists, it is read and returned as a
NumArray or CharArray object
(whatever is appropriate).
Example of use:
record = table[4]
recarray = table[4:1000:2]
narray = table["var2"]
Which is equivalent to:
record = table.read(start=4)[0]
recarray = table.read(start=4, stop=1000, step=2)
narray = table.read(field="var2")
4.5.4 The Row class
This class is used to fetch and set values on the table
fields. It works very much like a dictionary, where the keys
are the field names of the associated table and the values
are the values of those fields in a specific row.
This object turns out to actually be an extension type, so
you won't be able to access their documentation
interactively. Neither you won't be able to access it's
internal attributes (they are not directly accessible from
Python), although that accessors (i.e. methods that
return an internal attribute) has been defined for the most
important variables.
Row
methods
- append()
- Once you
have filled the proper fields for the current row, calling
this method actually commit this data to the disk
(actually data is written to the output buffer).
- nrow()
- Accessor that returns the current
row in the table. It is useful to know which row is being
dealt with in the middle of a loop.
4.5.5 The Cols class
This class is used as an accessor to the table
columns following the natural name convention, so that you
can access the different columns because there exist one
attribute with the name of the columns for each associated
Column instances. Besides, and like the
Row class, it works similar to a dictionary,
where the keys are the column names of the associated
table and the values are Column
instances. See section 4.5.6
for examples of use.
4.5.6 The Column class
Each instance of this class is associated with one column
of every table. These instances are used to fetch (but not
set) actual data from the table columns. The access
interface is like a regular list, and you can select
individual values or slices.
Column instance
variables
- table
- The parent Table
instance.
- name
- The name of the associated
column.
Column
methods
- __getitem__(key)
- Returns a column
element or slice. It takes different actions depending
on the type of the key parameter:
If key is an integer, the corresponding
element in the column is returned as a scalar object
or as a NumArray/CharArray
object, depending on its shape. If key is a
slice, the row range determined by this slice is
returned as a NumArray or
CharArray object (whichever is
appropriate).
Example of use:
print "Column handlers:"
for name in table.colnames:
print table.cols[name]
print
print "Some selections:"
print "Select table.cols.name[1]-->", table.cols.name[1]
print "Select table.cols.name[1:2]-->", table.cols.name[1:2]
print "Select table.cols.lati[1:3]-->", table.cols.lati[1:3]
print "Select table.cols.pressure[:]-->", table.cols.pressure[:]
print "Select table.cols['temperature'][:]-->", table.cols['temperature'][:]
and the output of this for a certain arbitrary table is:
Column handlers:
/table.cols.name (Column(1,), CharType)
/table.cols.lati (Column(2,), Int32)
/table.cols.longi (Column(1,), Int32)
/table.cols.pressure (Column(1,), Float32)
/table.cols.temperature (Column(1,), Float64)
Some selections:
Select table.cols.name[1]--> Particle: 11
Select table.cols.name[1:2]--> ['Particle: 11']
Select table.cols.lati[1:3]--> [[11 12]
[12 13]]
Select table.cols.pressure[:]--> [ 90. 110. 132.]
Select table.cols['temperature'][:]--> [ 100. 121. 144.]
See the
examples/table2.py for a more
complete example.
4.6 The Array
class
Represents an array on file. It provides methods to
write/read data to/from array objects in the file. This
class does not allow to enlarge the datasets on disk; see
the EArray descendant in section 4.7 if you want
enlargeable dataset support and/or compression features.
Caveat: All
Numeric and numarray data types
are supported except those that corresponds to complex data
types5). See numarray manual () to know more about the supported
data types, or see appendix A.
Note that this object inherits all the public attributes
and methods from Leaf already provides.
4.6.1 Array instance
variables
- flavor
- The object representation for
this array. It can be any of "NumArray",
"CharArray" "Numeric",
"List", "Tuple", "String",
"Int" or "Float" values.
- nrows
- The length of the first dimension
of Array.
- nrow
- On iterators, this is the index of
the current row.
- type
- The type class of the represented
array.
- itemsize
- The size of the base
items. Specially useful for CharArray
objects.
4.6.2 Array
methods
Note that, as this object has no internal I/O buffers, it
is not necessary to flush() method inherited from
Leaf.
iterrows(start=None,
stop=None, step=1)
Returns an iterator yielding numarray
instances built from rows in array. The return rows are
taken from the first dimension in case of an
Array instance and the enlargeable
dimension in case of an EArray instance. If
a range is supplied (i.e. some of the start,
stop or step parameters are passed),
only the appropriate rows are returned. Else, all the
rows are returned. See also the __call__()
and __iter__() special methods in section 4.6.3 for shorter
ways to call this iterator.
The meaning of the start, stop and
step parameters is the same as in the
range() python function, except that
negative values of step are not
allowed. Moreover, if only start is
specified, then stop will be set to
start+1. If you do not specify neither
start nor stop, then all the rows in
the object are selected.
read(start=None, stop=None, step=1)
Read the array from disk and return it as a
numarray (default) object, or an object
with the same original flavor that it was
saved. It accepts start, stop and step parameters to
select rows (the first dimension in the case of an
Array instance and the enlargeable
dimension in case of an EArray) for
reading.
The meaning of the start, stop and
step parameters is the same as in the
range() python function, except that
negative values of step are not
allowed. Moreover, if only start is
specified, then stop will be set to
start+1. If you do not specify neither
start nor stop, then all the rows in
the object are selected.
4.6.3 Array special
methods
Following are described the methods that automatically
trigger actions when an Array instance is
accessed in a special way (e.g.,
array[2:3,...,::2] will be equivalent to a
call to
array.__getitem__(slice(2,3, None),
Ellipsis, slice(None, None, 2))).
__call__(start=None,
stop=None, step=1)
It returns the same iterator than
Array.iterrows(start, stop, step). It is,
therefore, a shorter way to call it.
Example of use:
result = [ row for row in arrayInstance(step=4) ]
Which is equivalent to:
result = [ row for row in arrayInstance.iterrows(step=4) ]
__iter__()
It returns the same iterator than
Array.iterrows(0,0,1). However, this does not
accept parameters.
Example of use:
result = [ row[2] for row in array ]
Which is equivalent to:
result = [ row[2] for row in array.iterrows(0, 0, 1) ]
__getitem__(key)
It returns a numarray (default) object (or
an object with the same original flavor that it
was saved) containing the slice of rows stated in the
key parameter. The set of allowed tokens in
key is the same as extended slicing in
python (the Ellipsis token included).
Example of use:
array1 = array[4] # array1.shape == array.shape[1:]
array2 = array[4:1000:2] # len(array2.shape) == len(array.shape)
array3 = array[::2, 1:4, :]
array4 = array[1, ..., ::2, 1:4, 4:] # General slice selection
4.7 The EArray class
This is a child of the Array class (see 4.6) and as such,
EArray represents an array on the file. The
difference is that EArray allows to enlarge
datasets along any single dimension6) you select. Another important difference is
that it also support compression.
So, in addition to the attributes and methods that
EArray inherits from Array, it
supports a few more that provides a way to enlarge the
arrays on disk. Following are described the new variables
and methods as well as some that already exists in
Array but that differ somewhat on the meaning
and/or functionality in the EArray context.
4.7.1 EArray instance
variables
- atom
- The class instance choosed for the
atom object (see section 4.11.3).
- extdim
- The enlargeable dimension.
- nrows
- The length of the enlargeable
dimension.
4.7.2 EArray
methods
append(object)
Appends an object to the underlying
dataset. Obviously, this object has to have the same
type as the EArray instance, and if not, a
TypeError is issued. In the same way, the
dimensions of the object has to conform
those of EArray, that is, all the
dimensions has to be the same except, of course, that of
the enlargeable dimension which can be of any length
(even 0!).
Example of use (code available in
examples/earray1.py):
import tables
from numarray import strings
fileh = tables.openFile("earray1.h5", mode = "w")
a = tables.StringAtom(shape=(0,), length=8)
# Use 'a' as the object type for the enlargeable array
array_c = fileh.createEArray(fileh.root, 'array_c', a, "Chars")
array_c.append(strings.array(['a'*2, 'b'*4], itemsize=8))
array_c.append(strings.array(['a'*6, 'b'*8, 'c'*10], itemsize=8))
# Read the string EArray we have created on disk
for s in array_c:
print "array_c[%s] => '%s'" % (array_c.nrow, s)
# Close the file
fileh.close()
and the output is:
array_c[0] => 'aa'
array_c[1] => 'bbbb'
array_c[2] => 'aaaaaa'
array_c[3] => 'bbbbbbbb'
array_c[4] => 'cccccccc'
4.8 The VLArray class
Instances of this class represents array objects in the
object tree with the property that their rows can have a
variable number of
(homogeneous) elements (called atomic objects, or
just atoms). Variable length arrays (or
VLA's for short), similarly to Table
instances, can have only one dimension, and likewise
Table, the compound elements (the
atoms) of the rows of VLArrays can be
fully multidimensional objects.
VLArray provides methods to read/write data
from/to variable length array objects residents on disk.
Also, note that this object inherits all the public
attributes and methods that Leaf already has.
4.8.1 VLArray instance
variables
- atom
- The class instance choosed for the
atom object (see section 4.11.3).
- nrow
- On iterators, this is the index of
the current row.
- nrows
- The total number of rows.
4.8.2 VLArray methods
append(object1, object2, ...)
Append the objects passed as parameters to
a single row in the VLArray instance. The
type of the objects has to be compliant with the
VLArray.atom instance type.
Example of use (code available in
examples/vlarray1.py):
import tables
from Numeric import * # or, from numarray import *
# Create a VLArray:
fileh = tables.openFile("vlarray1.h5", mode = "w")
vlarray = fileh.createVLArray(fileh.root, 'vlarray1',
tables.Int32Atom(flavor="Numeric"),
"ragged array of ints", Filters(complevel=1))
# Append some (variable length) rows
# All these different parameter specification are accepted:
vlarray.append(array([5, 6]))
vlarray.append(array([5, 6, 7]))
vlarray.append([5, 6, 9, 8])
vlarray.append(5, 6, 9, 10, 12)
# Now, read it through an iterator
for x in vlarray:
print vlarray.name+"["+str(vlarray.nrow)+"]-->", x
# Close the file
fileh.close()
And the output for this looks like:
vlarray1[0]--> [5 6]
vlarray1[1]--> [5 6 7]
vlarray1[2]--> [5 6 9 8]
vlarray1[3]--> [ 5 6 9 10 12]
iterrows(start=None,
stop=None, step=1)
Returns an iterator yielding one row per iteration. If
a range is supplied (i.e. some of the start,
stop or step parameters are passed),
only the appropriate rows are returned. Else, all the
rows are returned. See also the __call__()
and __iter__() special methods in section 4.8.3 for shorter
ways to call this iterator.
The meaning of the start, stop and
step parameters is the same as in the
range() python function, except that
negative values of step are not
allowed. Moreover, if only start is
specified, then stop will be set to
start+1. If you do not specify neither
start nor stop, then all the rows in
the object are selected.
read(start=None, stop=None,
step=1)
Returns the actual data in VLArray. As the
lengths of the different rows are variable, the returned
value is a python list, with as many entries as
specified rows in the range parameters.
The meaning of the start, stop and
step parameters is the same as in the
range() python function, except that
negative values of step are not
allowed. Moreover, if only start is
specified, then stop will be set to
start+1. If you do not specify neither
start nor stop, then all the rows in
the object are selected.
4.8.3 VLArray special
methods
Following are described the methods that automatically
trigger actions when a VLArray instance is
accessed in a special way (e.g., vlarray[2:5]
will be equivalent to a call to
vlarray.__getitem__(slice(2,5,None)).
__call__(start=None,
stop=None, step=1)
It returns the same iterator than
VLArray.iterrows(start, stop, step). It is,
therefore, a shorter way to call it.
Example of use:
for row in vlarray(step=4):
print vlarray.name+"["+str(vlarray.nrow)+"]-->", row
Which is equivalent to:
for row in vlarray.iterrows(step=4):
print vlarray.name+"["+str(vlarray.nrow)+"]-->", row
__iter__()
It returns the same iterator than
VLArray.iterrows(0,0,1). However, this does
not accept parameters.
Example of use:
result = [ row for row in vlarray ]
Which is equivalent to:
result = [ row for row in vlarray.iterrows() ]
__getitem__(key)
It returns the slice of rows determined by
key, which can be an integer index or an
extended slice. The returned value is a list of objects
of type array.atom.type.
Example of use:
list1 = vlarray[4]
list2 = vlarray[4:1000:2]
4.9 The UnImplemented class
Instances of this class represents an unimplemented dataset
in a generic HDF5 file. When reading such a file (i.e. one
that has not been created with PyTables, but
with some other HDF5 library based tool), chances are that
the specific combination of datatypes and/or
dataspaces in some dataset might not be supported
by PyTables yet. In such a case, this dataset
will be mapped into the UnImplemented class and
hence, the user will still be able to build the complete
object tree of this generic HDF5 file, as well as enabling
the access (both read and write) of the attributes
of this dataset and some metadata. Of course, the user won't
be able to read the actual data on it.
This is an elegant way to allow users to work with generic
HDF5 files despite the fact that some of its datasets would
not be supported by PyTables. However, if you
are really interested in having access to an unimplemented
dataset, please, get in contact with the developer team.
This class does not have any public instance variables,
except those inherited from the Leaf class
(see 4.4).
4.10 The AttributeSet
class
Represents the set of attributes of a node (Leaf or
Group). It provides methods to create new attributes, open,
rename or delete existing ones.
Like in Group instances,
AttributeSet instances make use of the
natural naming convention, i.e. you can access the
attributes on disk like if they were normal
AttributeSet attributes. This offers the user
a very convenient way to access (but also to set and
delete) node attributes by simply specifying them like a
normal attribute class.
Caveat: All Python data types
are supported. The scalar ones (i.e. String, Int and Float)
are mapped directly to the HDF5 counterparts, so you can
correctly visualize them with any HDF5 tool. However, the
rest of the data types and more general objects are
serialized using cPickle, so you will be able
to correctly retrieve them only from a Python-aware HDF5
library. Hopefully, the list of supported native attributes
will be extended to fully multidimensional arrays sometime
in the future.
4.10.1 AttributeSet instance
variables
- _v_node
- The parent node instance.
- _v_attrnames
- List with all attribute
names.
- _v_attrnamessys
- List with system attribute
names.
- _v_attrnamesuser
- List with user attribute
names.
4.10.2 AttributeSet
methods
Note that this class define the
__setattr__,
__getattr__ and
__delattr__ and
they work as normally intended. So, you can access, assign
or delete attributes on disk by just using the next
constructs:
leaf.attrs.myattr = "string attr" # Set the attribute myattr
attrib = leaf.attrs.myattr # Get the attribute myattr
del leaf.attrs.myattr # Delete the attribute myattr
- _f_copy(where)
- Copy
the user attributes to where
object. where has to be a Group or
Leaf instance.
- _f_list(attrset = "user")
- Return the list of attributes of the parent
node. attrset selects the attribute set to be
returned. An "user" value returns only the
user attributes and this is the
default. "sys" returns only the system (some
of which are read-only)
attributes. "readonly" returns the system
read-only attributes. "all" returns both the
system and user attributes.
- _f_rename(oldattrname,
newattrname)
- Rename an attribute.
4.11 Declarative classes
In this section a series of classes that are meant to
declare datatypes that are required for primary
PyTables (like Table or
VLArray ) objects are described.
4.11.1 The IsDescription
class
This class is in fact a so-called metaclass
object. There is nothing special on this fact, except that
their subclasses attributes are transformed during its
instantiation phase, and new methods for instances are
defined based on the values of the class attributes.
It is designed to be used as an easy, yet meaningful way
to describe the properties of Table objects
through the use of classes that inherit properties from
it. In order to define such a special class, you have to
declare it as descendant of IsDescription, with
many attributes as columns you want in your table. The
name of these attributes will become the name of the
columns, while its values are the properties of the
columns that are obtained through the use of the
Col class constructor. See the section 4.11.2 for instructions on
how define the properties of the table columns.
Then, you can pass an instance of this object to the
Table constructor, where all the information it
contains will be used to define the table structure. See
the section 3.3 for an example
on how that works.
4.11.2 The Col class and its descendants
The Col class is used as a mean to declare
the different properties of a table column. In addition, a
series of descendant classes are offered in order to make
these column descriptions easier to the user. In general,
it is recommended to use these descendant classes, as they
are more meaningful when found in the middle of the code.
Note that the only public method accessible in these
classes is the constructor itself.
- Col(dtype="Float64", shape=1, dflt=None, pos=None)
- Declare the properties of a Table
column.
- dtype
- The data type for the
column. See the appendix A for a
relation of data types supported in a IsDescription class
declaration. The type description is accepted both in
string format and as numarray data type.
- shape
- An integer or a tuple, that
specifies the number of dtype items for
each element (or shape, for multidimensional
elements) of this column. For CharType
columns, the last dimension is used as the length
of the character strings. However, for this kind of
objects, the use of StringCol subclass
is strongly recommended.
- dflt
- The default value for elements
of this column. If the user does not supply a value
for an element while filling a table, this default
value will be written to disk. If the user supplies an
scalar value for a multidimensional column, this value
is automatically broadcasted to all the
elements in the column cell. If dflt is not
supplied, an appropriate zero value (or null
string) will be chosen by default.
- pos
- By default, columns are arranged
in memory following an alpha-numerical order of the
column names. In some situations, however, it is
convenient to impose a user defined
ordering. pos parameter allows the user to
force the desired ordering.
- StringCol(length=None, dflt=None, shape=1, pos=None)
- Declare a column to be of type
CharType. The length parameter
sets the length of the strings. The meaning of the other
parameters are like in the Col class.
- BoolCol(dflt=0, shape=1, pos=None)
- Define a column to be of type Bool.
The meaning of the parameters are the same of those in
the Col class.
- IntCol(dflt=0, shape=1, itemsize=4, sign=1, pos=None)
- Declare a column to be of type IntXX,
depending on the value of itemsize parameter,
that sets the number of bytes of the integers in the
column. sign determines whether the integers
are signed or not. The meaning of the other parameters
are the same of those in the Col class.
This class has several descendants:
- Int8Col(dflt=0, shape=1, pos=None)
- Define a column of type Int8.
- UInt8Col(dflt=0, shape=1, pos=None)
- Define a column of type UInt8.
- Int16Col(dflt=0, shape=1, pos=None)
- Define a column of type Int16.
- UInt16Col(dflt=0, shape=1, pos=None)
- Define a column of type UInt16.
- Int32Col(dflt=0, shape=1, pos=None)
- Define a column of type Int32.
- UInt32Col(dflt=0, shape=1, pos=None)
- Define a column of type UInt32.
- Int64Col(dflt=0, shape=1, pos=None)
- Define a column of type Int64.
- UInt64Col(dflt=0, shape=1, pos=None)
- Define a column of type UInt64.
- FloatCol(dflt=0, shape=1, itemsize=8, pos=None)
- Define a column to be of type FloatXX,
depending on the value of itemsize. The
itemsize parameter sets the number of bytes
of the floats in the column and the default is 8 bytes
(double precision). The meaning of the other parameters
are the same as those in the Col class.
This class has two descendants:
- Float32Col(dflt=0.0, shape=1, pos=None)
- Define a column of type Float32.
- Float64Col(dflt=0.0, shape=1, pos=None)
- Define a column of type Float64.
4.11.3 The Atom class and its
descendants.
The Atom class is meant to declare the
different properties of the base element (also
known as atom) of EArray and
VLArray objects. The Atom
instances have the property that its length is always the
same. However, you can grow objects along the extendable
dimension in the case of EArray or put a
variable number of them on a VLArray
row. Moreover, the atoms are not restricted to scalar
values, and they can be fully multidimensional objects.
A series of descendant classes are offered in order to
make the use of these element descriptions easier. In
general, it is recommended to use these descendant
classes, as they are more meaningful when found in the
middle of the code. Note that the only public methods
accessible in these classes are the
atomsize() method and the constructor
itself. The atomsize() method returns the
total length, in bytes, of the element base atom.
A description of the different constructors with their
parameters follows:
- Atom(dtype="Float64", shape=1, flavor="NumArray")
- Define properties for the base elements of
EArray and VLArray objects.
- dtype
- The data type for the base
element. See the appendix A for a
relation of data types supported. The type
description is accepted both in string format and as
numarray data type.
- shape
- In a EArray
context, it is a tuple
specifing the shape of the object, and one (and only
one) of its dimensions must be 0, meaning that the
EArray object will be enlarged along
this axis. In the case of a VLArray, it
can be an integer with a value of 1 (one) or a
tuple, that specifies whether the atom is an scalar
(in the case of a 1) or has multiple dimensions (in
the case of a tuple). For
CharType elements, the last dimension
is used as the length of the character
strings. However, for this kind of objects, the use
of StringAtom subclass is strongly
recommended.
- flavor
- The object representation for this atom. It
can be any of "CharArray" or
"String" for the CharType type
and "NumArray", "Numeric",
"List" or "Tuple" for the rest of
the types. If the specified values differs from
CharArray or NumArray values, the
read atoms will be converted to that specific
flavor. If not specified, the atoms will remain in
their native format (i.e. CharArray or
NumArray).
- StringAtom(shape=1, length=None,
flavor="CharArray")
- Define an atom to be of
CharType type. The meaning of the
shape parameter is the same as in the
Atom class. length sets the length
of the strings atoms. flavor can be whether
"CharArray" or
"String". Unicode strings are not supported
by this type; see the VLStringAtom class if
you want Unicode support (only available for
VLAtom objects).
- BoolAtom(shape=1, flavor="NumArray")
- Define an atom to be of type Bool.
The meaning of the parameters are the same of those in
the Atom class.
- IntAtom(shape=1, itemsize=4, sign=1,
flavor="NumArray")
- Define an atom to be of
type IntXX, depending on the value of
itemsize parameter, that sets the number of
bytes of the integers that conform the
atom. sign determines whether the integers are
signed or not. The meaning of the other parameters are
the same of those in the Atom class.
This class has several descendants:
- Int8Atom(shape=1, flavor="NumArray")
- Define an atom of type Int8.
- UInt8Atom(shape=1, flavor="NumArray")
- Define an atom of type UInt8.
- Int16Atom(shape=1, flavor="NumArray")
- Define an atom of type Int16.
- UInt16Atom(shape=1, flavor="NumArray")
- Define an atom of type UInt16.
- Int32Atom(shape=1, flavor="NumArray")
- Define an atom of type Int32.
- UInt32Atom(shape=1, flavor="NumArray")
- Define an atom of type UInt32.
- Int64Atom(shape=1, flavor="NumArray")
- Define an atom of type Int64.
- UInt64Atom(shape=1, flavor="NumArray")
- Define an atom of type UInt64.
- FloatAtom(shape=1, itemsize=8, flavor="NumArray")
- Define an atom to be of FloatXX
type, depending on the value of itemsize. The
itemsize parameter sets the number of bytes
of the floats in the atom and the default is 8 bytes
(double precision). The meaning of the other parameters
are the same as those in the Atom class.
This class has two descendants:
- Float32Atom(shape=1, flavor="NumArray")
- Define an atom of type Float32.
- Float64Atom(shape=1, flavor="NumArray")
- Define an atom of type Float64.
Now, there come two special classes,
ObjectAtom and VLString, that
actually do not descend from Atom, but which
goal is so similar that they should be described here. The
difference between them and the Atom and
descendents classes is that these special classes does not
allow multidimensional atoms, nor multiple values per
row. A flavor can't be specified neither as it is
immutable (see below).
Caveat emptor: You are only
allowed to use these classes to create
VLArray objects, not EArray
objects.
- ObjectAtom()
- This class is meant to fit
any kind of object in a row of an
VLArray instance by using
cPickle behind the scenes. Due to the fact
that you cannot foresee how long will be the output of
the cPickle serialization (i.e. the atom
already has a variable length), you can only
fit a representant of it per row. However, you can still
pass several parameters to the
VLArray.append() method as they will be
regarded as a tuple of compound objects (the
parameters), so that we still have only one object to be
saved in a single row. It does not accept parameters and
its flavor is automatically set to
"Object", so the reads of rows always
returns an arbitrary python object.
You can regard ObjectAtom types as an easy
way to save an arbitrary number of generic python
objects in a VLArray object.
- VLStringAtom()
- This class describes a
row of the VLArray class, rather
than an atom. It differs from the
StringAtom class in that you can only add
one instance of it to one specific row, i.e. the
VLArray.append() method only accepts one
object when the base atom is of this type. Besides, it
supports Unicode strings (contrarily to
StringAtom) because it uses the UTF-8
codification (this is why its atomsize()
method returns always 1) when serializing to disk. It
does not accept any parameter and because its
flavor is automatically set to
"VLString", the reads of rows always
returns a python string. See the appendix B.3.4 if you are
curious on how this is implemented at the low-level.
You can regard VLStringAtom types as an
easy way to save generic variable length strings.
See examples/vlarray1.py and
examples/vlarray2.py for further examples on
VLArrays, including object serialization and
Unicode string management.
4.12 Helper classes
In this section are listed classes that does not fit in any
other section and that mainly serves for ancillary
purposes.
4.12.1 The Filters class
This class is meant to serve as a container that keeps
information about the filter properties associated with
the enlargeable leaves, that is Table,
EArray and VLArray.
The public variables of Filters are listed
below:
- complevel
- The compression level (0
means no compression).
- complib
- The compression filter used (in
case of compressed dataset).
- shuffle
- Whether the shuffle filter is
active or not.
- fletcher32
- Whether the fletcher32
filter is active or not.
There are no Filters public methods with the
exception of the constructor itself that is described
next.
Filters(complevel=0,
complib="zlib", shuffle=1, fletcher32=0)
The parameters that can be passed to the
Filters class constructor are:
- complevel
- Specifies a compress level
for data. The allowed range is 0-9. A value of 0
disables compression. The default is that compression
is disabled, that balances between compression effort
and CPU consumption.
- complib
- Specifies the compression
library to be used. Right now, "zlib"
(default), "lzo" and "ucl"
values are supported. See section 5.2 for some advice
on which library is better suited to your needs.
- shuffle
- Whether or not to use the
shuffle filter present in the
HDF5 library. This is normally used to
improve the compression ratio (at the cost of
consuming a little bit more CPU time). A value of 0
disables shuffling and 1 makes it active. The default
value depends on whether compression is enabled or
not; if compression is enabled, shuffling defaults to
be active, else shuffling is disabled.
- fletcher32
- Whether or not to use the
fletcher32 filter in the HDF5 library. This
is used to add a checksum on each data chunk. A value
of 0 disables the checksum and it is the default.
Of course, you can also create an instance and then
assign the ones you want to change. For example:
import numarray as na
from tables import *
fileh = openFile("test5.h5", mode = "w")
atom = Float32Atom(shape=(0,2))
filters = Filters(complevel=1, complib = "lzo")
filters.fletcher32 = 1
arr = fileh.createEArray(fileh.root, 'earray', atom, "A growable array",
filters = filters)
# Append several rows in only one call
arr.append(na.array([[1., 2.],
[2., 3.],
[3., 4.]], type=na.Float32))
# Print information on that enlargeable array
print "Result Array:"
print repr(arr)
fileh.close()
This enforces the use of the
LZO library, a
compression level of 1 and a fletcher32 checksum filter
as well. See the output of this example:
Result Array:
/earray (EArray(3L, 2), fletcher32, shuffle, lzo(1)) 'A growable array'
type = Float32
shape = (3L, 2)
itemsize = 4
nrows = 3
extdim = 0
flavor = 'NumArray'
byteorder = 'little'