ak.to_buffers¶
Defined in awkward.operations.convert on line 3982.
- ak.to_buffers(array, container=None, partition_start=0, form_key='node{id}', key_format='part{partition}-{form_key}-{attribute}', virtual='materialize')¶
- Parameters
array – Data to decompose into named buffers.
container (None or MutableMapping) – The str → NumPy arrays (or Python buffers) that represent the decomposed Awkward Array. This
container
is only assumed to have a__setitem__
method that accepts strings as keys.partition_start (non-negative int) – If
array
is not partitioned, this is the partition number that will be used as part of the container key. Ifarray
is partitioned, this is the first partition number.form_key (str, callable) – Python format string containing
"{id}"
or a function that takes non-negative integer as a string and the currentlayout
as keyword arguments and returns a string, for use as aform_key
on each Form node and inkey_format
(below).key_format (str or callable) – Python format string containing
"{partition}"
,"{form_key}"
, and/or"{attribute}"
or a function that takes these as keyword arguments and returns a string to use as keys for buffers in thecontainer
. Thepartition
is a partition number (non-negative integer, passed as a string), theform_key
is the result of applyingform_key
(above), and theattribute
is a hard-coded string representing the buffer’s function (e.g."data"
,"offsets"
,"index"
).virtual (str) – If
"materialize"
, any virtual arrays will be materialized and the materialized data will be included in the container. If"pass"
, a virtual array’s Form is passed through as aak.forms.VirtualForm
, assuming that it containsform_keys
that can be found in the container (e.g. by a previous pass through this function). No other values are allowed for this function argument.
Decomposes an Awkward Array into a Form and a collection of memory buffers, so that data can be losslessly written to file formats and storage devices that only map names to binary blobs (such as a filesystem directory).
This function returns a 3-tuple:
(form, length, container)
where the form
is a ak.forms.Form
(which can be converted to JSON
with tojson
), the length
is either an integer (len(array)
) or a list
of the lengths of each partition in array
, and the container
is either
the MutableMapping you passed in or a new dict containing the buffers (as
NumPy arrays).
These are also the first three arguments of ak.from_buffers
, so a full
round-trip is
>>> reconstituted = ak.from_buffers(*ak.to_buffers(original))
The container
argument lets you specify your own MutableMapping, which
might be an interface to some storage format or device (e.g. h5py). It’s
okay if the container
drops NumPy’s dtype
and shape
information,
leaving raw bytes, since dtype
and shape
can be reconstituted from
the ak.forms.NumpyForm
.
The partition_start
argument lets you fill the container
gradually or
in parallel. If the array
is not partitioned, the partition_start
argument sets its partition number (for the container keys, through
key_format
). If the array
is partitioned, the first partition is numbered
partition_start
and as many are filled as ar in array
. See ak.partitions
to get the number of partitions in array
.
Here is a simple example:
>>> original = ak.Array([[1, 2, 3], [], [4, 5]])
>>> form, length, container = ak.to_buffers(original)
>>> form
{
"class": "ListOffsetArray64",
"offsets": "i64",
"content": {
"class": "NumpyArray",
"itemsize": 8,
"format": "l",
"primitive": "int64",
"form_key": "node1"
},
"form_key": "node0"
}
>>> length
3
>>> container
{'part0-node0-offsets': array([0, 3, 3, 5], dtype=int64),
'part0-node1-data': array([1, 2, 3, 4, 5])}
which may be read back with
>>> ak.from_buffers(form, length, container)
<Array [[1, 2, 3], [], [4, 5]] type='3 * var * int64'>
Here is an example that builds up a partitioned array:
>>> container = {}
>>> lengths = []
>>> form, length, _ = ak.to_buffers(ak.Array([[1, 2, 3], [], [4, 5]]), container, 0)
>>> lengths.append(length)
>>> form, length, _ = ak.to_buffers(ak.Array([[6, 7, 8, 9]]), container, 1)
>>> lengths.append(length)
>>> form, length, _ = ak.to_buffers(ak.Array([[], [], []]), container, 2)
>>> lengths.append(length)
>>> form, length, _ = ak.to_buffers(ak.Array([[10]]), container, 3)
>>> lengths.append(length)
>>> form
{
"class": "ListOffsetArray64",
"offsets": "i64",
"content": {
"class": "NumpyArray",
"itemsize": 8,
"format": "l",
"primitive": "int64",
"form_key": "node1"
},
"form_key": "node0"
}
>>> lengths
[3, 1, 3, 1]
>>> container
{'part0-node0-offsets': array([0, 3, 3, 5], dtype=int64),
'part0-node1-data': array([1, 2, 3, 4, 5]),
'part1-node0-offsets': array([0, 4], dtype=int64),
'part1-node1-data': array([6, 7, 8, 9]),
'part2-node0-offsets': array([0, 0, 0, 0], dtype=int64),
'part2-node1-data': array([], dtype=float64),
'part3-node0-offsets': array([0, 1], dtype=int64),
'part3-node1-data': array([10])}
The object returned by ak.from_buffers
is now a partitioned array:
>>> reconstituted = ak.from_buffers(form, lengths, container)
>>> reconstituted
<Array [[1, 2, 3], [], [4, ... [], [], [10]] type='8 * var * int64'>
>>> ak.partitions(reconstituted)
[3, 1, 3, 1]
If you intend to use this function for saving data, you may want to pack it
first with ak.packed
.
See also ak.from_buffers
and ak.packed
.