Skip to content

Quick start

This section will help you get up and running with the Zarr library in Python to efficiently manage and analyze multi-dimensional arrays.

Creating an Array

To get started, you can create a simple Zarr array:

import zarr
import numpy as np

# Create a 2D Zarr array
z = zarr.create_array(
    store="data/example-1.zarr",
    shape=(100, 100),
    chunks=(10, 10),
    dtype="f4"
)

# Assign data to the array
z[:, :] = np.random.random((100, 100))
print(z.info)
Type               : Array
Zarr format        : 3
Data type          : Float32(endianness='little')
Fill value         : 0.0
Shape              : (100, 100)
Chunk shape        : (10, 10)
Order              : C
Read-only          : False
Store type         : LocalStore
Filters            : ()
Serializer         : BytesCodec(endian=<Endian.little: 'little'>)
Compressors        : (ZstdCodec(level=0, checksum=False),)
No. bytes          : 40000 (39.1K)

Here, we created a 2D array of shape (100, 100), chunked into blocks of (10, 10), and filled it with random floating-point data. This array was written to a LocalStore in the data/example-1.zarr directory.

Compression and Filters

Zarr supports data compression and filters. For example, to use Blosc compression:

# Create a 2D Zarr array with Blosc compression
z = zarr.create_array(
    store="data/example-2.zarr",
    shape=(100, 100),
    chunks=(10, 10),
    dtype="f4",
    compressors=zarr.codecs.BloscCodec(
        cname="zstd",
        clevel=3,
        shuffle=zarr.codecs.BloscShuffle.shuffle
    )
)

# Assign data to the array
z[:, :] = np.random.random((100, 100))
print(z.info)
Type               : Array
Zarr format        : 3
Data type          : Float32(endianness='little')
Fill value         : 0.0
Shape              : (100, 100)
Chunk shape        : (10, 10)
Order              : C
Read-only          : False
Store type         : LocalStore
Filters            : ()
Serializer         : BytesCodec(endian=<Endian.little: 'little'>)
Compressors        : (BloscCodec(typesize=4, cname=<BloscCname.zstd: 'zstd'>, clevel=3, shuffle=<BloscShuffle.shuffle: 'shuffle'>, blocksize=0),)
No. bytes          : 40000 (39.1K)

This compresses the data using the Blosc codec with shuffle enabled for better compression.

Hierarchical Groups

Zarr allows you to create hierarchical groups, similar to directories:

# Create nested groups and add arrays
root = zarr.group("data/example-3.zarr")
foo = root.create_group(name="foo")
bar = root.create_array(
    name="bar", shape=(100, 10), chunks=(10, 10), dtype="f4"
)
spam = foo.create_array(name="spam", shape=(10,), dtype="i4")

# Assign values
bar[:, :] = np.random.random((100, 10))
spam[:] = np.arange(10)

# print the hierarchy
print(root.tree())
/
├── bar (100, 10) float32
└── foo
    └── spam (10,) int32

This creates a group with two datasets: foo and bar.

Batch Hierarchy Creation

Zarr provides tools for creating a collection of arrays and groups with a single function call. Suppose we want to copy existing groups and arrays into a new storage backend:

# Create nested groups and add arrays
root = zarr.group("data/example-4.zarr", attributes={'name': 'root'})
foo = root.create_group(name="foo")
bar = root.create_array(
    name="bar", shape=(100, 10), chunks=(10, 10), dtype="f4"
)
nodes = {'': root.metadata} | {k: v.metadata for k,v in root.members()}
# Report nodes
output = io.StringIO()
pprint(nodes, stream=output, width=60, depth=3)
result = output.getvalue()
print(result)
# Create new hierarchy from nodes
new_nodes = dict(zarr.create_hierarchy(store=zarr.storage.MemoryStore(), nodes=nodes))
new_root = new_nodes['']
assert new_root.attrs == root.attrs
{'': GroupMetadata(attributes={'name': 'root'},
                   zarr_format=3,
                   consolidated_metadata=None,
                   node_type='group'),
 'bar': ArrayV3Metadata(shape=(100, 10),
                        data_type=Float32(endianness='little'),
                        chunk_grid=RegularChunkGrid(chunk_shape=(10,
                                                                 10)),
                        chunk_key_encoding=DefaultChunkKeyEncoding(separator='/'),
                        fill_value=np.float32(0.0),
                        codecs=(BytesCodec(endian=<Endian.little: 'little'>),
                                ZstdCodec(level=0,
                                          checksum=False)),
                        attributes={},
                        dimension_names=None,
                        zarr_format=3,
                        node_type='array',
                        storage_transformers=()),
 'foo': GroupMetadata(attributes={},
                      zarr_format=3,
                      consolidated_metadata=None,
                      node_type='group')}

Note that zarr.create_hierarchy will only initialize arrays and groups -- copying array data must be done in a separate step.

Persistent Storage

Zarr supports persistent storage to disk or cloud-compatible backends. While examples above utilized a zarr.storage.LocalStore, a number of other storage options are available.

Zarr integrates seamlessly with cloud object storage such as Amazon S3 and Google Cloud Storage using external libraries like s3fs or gcsfs:

import s3fs

z = zarr.create_array("s3://example-bucket/foo", mode="w", shape=(100, 100), chunks=(10, 10), dtype="f4")
z[:, :] = np.random.random((100, 100))

A single-file store can also be created using the zarr.storage.ZipStore:

# Store the array in a ZIP file
store = zarr.storage.ZipStore("data/example-5.zip", mode="w")

z = zarr.create_array(
    store=store,
    shape=(100, 100),
    chunks=(10, 10),
    dtype="f4"
)

# write to the array
z[:, :] = np.random.random((100, 100))

# the ZipStore must be explicitly closed
store.close()

To open an existing array from a ZIP file:

# Open the ZipStore in read-only mode
store = zarr.storage.ZipStore("data/example-5.zip", read_only=True)

z = zarr.open_array(store, mode='r')

# read the data as a NumPy Array
print(z[:])
[[0.66734236 0.15667458 0.98720884 ... 0.36229587 0.67443246 0.34315267]
 [0.65787303 0.9544212  0.4830079  ... 0.33097172 0.60423803 0.45621237]
 [0.27632037 0.9947008  0.42434934 ... 0.94860053 0.6226942  0.6386924 ]
 ...
 [0.12854576 0.934397   0.19524333 ... 0.11838563 0.4967675  0.43074256]
 [0.82029045 0.4671437  0.8090906  ... 0.7814118  0.42650765 0.95929915]
 [0.4335856  0.7565437  0.7828931  ... 0.48119593 0.66220033 0.6652362 ]]

Read more about Zarr's storage options in the User Guide.