Working with groups¶
Zarr supports hierarchical organization of arrays via groups. As with arrays, groups can be stored in memory, on disk, or via other storage systems that support a similar interface.
To create a group, use the zarr.group
function:
Groups have a similar API to the Group class from h5py. For example, groups can contain other groups:
Groups can also contain arrays, e.g.:
z1 = bar.create_array(name='baz', shape=(10000, 10000), chunks=(1000, 1000), dtype='int32')
print(z1)
Members of a group can be accessed via the suffix notation, e.g.:
The '/' character can be used to access multiple levels of the hierarchy in one call, e.g.:
The zarr.Group.tree
method can be used to print a tree
representation of the hierarchy, e.g.:
The zarr.open_group
function provides a convenient way to create or
re-open a group stored in a directory on the file-system, with sub-groups stored in
sub-directories, e.g.:
z = root.create_array(name='foo/bar/baz', shape=(10000, 10000), chunks=(1000, 1000), dtype='int32')
print(z)
For more information on groups see the zarr.Group
API docs.
Batch Group Creation¶
You can also create multiple groups concurrently with a single function call. zarr.create_hierarchy
takes
a zarr Storage instance
instance and a dict of key : metadata
pairs, parses that dict, and
writes metadata documents to storage:
from zarr import create_hierarchy
from zarr.core.group import GroupMetadata
from zarr.storage import LocalStore
from pprint import pprint
import io
node_spec = {'a/b/c': GroupMetadata()}
nodes_created = dict(create_hierarchy(store=LocalStore(root='data'), nodes=node_spec))
# Report nodes (pprint is used for cleaner rendering in the docs)
output = io.StringIO()
pprint(nodes_created, stream=output, width=60)
print(output.getvalue())
{'': <Group file://data>,
'a': <Group file://data/a>,
'a/b': <Group file://data/a/b>,
'a/b/c': <Group file://data/a/b/c>}
Note that we only specified a single group named a/b/c
, but 4 groups were created. These additional groups
were created to ensure that the desired node a/b/c
is connected to the root group ''
by a sequence
of intermediate groups. zarr.create_hierarchy
normalizes the nodes
keyword argument to
ensure that the resulting hierarchy is complete, i.e. all groups or arrays are connected to the root
of the hierarchy via intermediate groups.
Because zarr.create_hierarchy
concurrently creates metadata documents, it's more efficient
than repeated calls to create_group
or create_array
, provided you can statically define
the metadata for the groups and arrays you want to create.
Array and group diagnostics¶
Diagnostic information about arrays and groups is available via the info
property. E.g.:
store = zarr.storage.MemoryStore()
root = zarr.group(store=store)
foo = root.create_group('foo')
bar = foo.create_array(name='bar', shape=1000000, chunks=100000, dtype='int64')
bar[:] = 42
baz = foo.create_array(name='baz', shape=(1000, 1000), chunks=(100, 100), dtype='float32')
baz[:] = 4.2
print(root.info)
Type : Array
Zarr format : 3
Data type : Int64(endianness='little')
Fill value : 0
Shape : (1000000,)
Chunk shape : (100000,)
Order : F
Read-only : False
Store type : MemoryStore
Filters : ()
Serializer : BytesCodec(endian=<Endian.little: 'little'>)
Compressors : (ZstdCodec(level=0, checksum=False),)
No. bytes : 8000000 (7.6M)
No. bytes stored : 1614 (1.6K)
Storage ratio : 4956.6
Chunks Initialized : 10
Type : Array
Zarr format : 3
Data type : Float32(endianness='little')
Fill value : 0.0
Shape : (1000, 1000)
Chunk shape : (100, 100)
Order : F
Read-only : False
Store type : MemoryStore
Filters : ()
Serializer : BytesCodec(endian=<Endian.little: 'little'>)
Compressors : (ZstdCodec(level=0, checksum=False),)
No. bytes : 4000000 (3.8M)
Groups also have the zarr.Group.tree
method, e.g.:
Note
zarr.Group.tree
requires the optional rich dependency. It can be installed with the [tree]
extra.