ak.ArrayBuilder¶
Defined in awkward.highlevel on line 2095.
- class ak.ArrayBuilder(self, behavior=None, initial=1024, resize=1.5)¶
- Parameters
behavior (None or dict) – Custom
ak.behavior
for arrays built by this ArrayBuilder.initial (int) – Initial size (in bytes) of buffers used by
ak.layout.ArrayBuilder
(seeak.layout.ArrayBuilderOptions
).resize (float) – Resize multiplier for buffers used by
ak.layout.ArrayBuilder
(seeak.layout.ArrayBuilderOptions
); should be strictly greater than 1.
General tool for building arrays of nested data structures from a sequence of commands. Most data types can be constructed by calling commands in the right order, similar to printing tokens to construct JSON output.
To illustrate how this works, consider the following example.
b = ak.ArrayBuilder()
# fill commands # as JSON # current array type
##########################################################################################
b.begin_list() # [ # 0 * var * unknown (initially, the type is unknown)
b.integer(1) # 1, # 0 * var * int64
b.integer(2) # 2, # 0 * var * int64
b.real(3) # 3.0 # 0 * var * float64 (all the integers have become floats)
b.end_list() # ], # 1 * var * float64
b.begin_list() # [ # 1 * var * float64
b.end_list() # ], # 2 * var * float64
b.begin_list() # [ # 2 * var * float64
b.integer(4) # 4, # 2 * var * float64
b.null() # null, # 2 * var * ?float64 (now the floats are nullable)
b.integer(5) # 5 # 2 * var * ?float64
b.end_list() # ], # 3 * var * ?float64
b.begin_list() # [ # 3 * var * ?float64
b.begin_record() # { # 3 * var * ?union[float64, {}]
b.field("x") # "x": # 3 * var * ?union[float64, {"x": unknown}]
b.integer(1) # 1, # 3 * var * ?union[float64, {"x": int64}]
b.field("y") # "y": # 3 * var * ?union[float64, {"x": int64, "y": unknown}]
b.begin_list() # [ # 3 * var * ?union[float64, {"x": int64, "y": var * unknown}]
b.integer(2) # 2, # 3 * var * ?union[float64, {"x": int64, "y": var * int64}]
b.integer(3) # 3 # 3 * var * ?union[float64, {"x": int64, "y": var * int64}]
b.end_list() # ] # 3 * var * ?union[float64, {"x": int64, "y": var * int64}]
b.end_record() # } # 3 * var * ?union[float64, {"x": int64, "y": var * int64}]
b.end_list() # ] # 4 * var * ?union[float64, {"x": int64, "y": var * int64}]
To get an array, we take a snapshot
of the ArrayBuilder’s current state.
>>> ak.to_list(b.snapshot())
[[1.0, 2.0, 3.0], [], [4.0, None, 5.0], [{'x': 1, 'y': [2, 3]}]]
The full set of filling commands is the following.
null
: appends a None value.
boolean
: appends True or False.
integer
: appends an integer.
real
: appends a floating-point value.
complex
: appends a complex value.
datetime
: appends a datetime value.
timedelta
: appends a timedelta value.
bytestring
: appends an unencoded string (raw bytes).
string
: appends a UTF-8 encoded string.
begin_list
: begins filling a list; must be closed withend_list
.
end_list
: ends a list.
begin_tuple
: begins filling a tuple; must be closed withend_tuple
.
index
: selects a tuple slot to fill; must be followed by a command that actually fills that slot.
end_tuple
: ends a tuple.
begin_record
: begins filling a record; must be closed withend_record
.
field
: selects a record field to fill; must be followed by a command that actually fills that field.
end_record
: ends a record.
append
: generic method for fillingnull
,boolean
,integer
,real
,bytestring
,string
,ak.Array
,ak.Record
, or arbitrary Python data. When filling fromak.Array
orak.Record
, the output holds references to the original data, rather than copying.
list
: context manager forbegin_list
andend_list
.
tuple
: context manager forbegin_tuple
andend_tuple
.
record
: context manager forbegin_record
andend_record
.
ArrayBuilders can be used in Numba: they can be passed as arguments to a Numba-compiled function or returned as return values. (Since ArrayBuilder works by accumulating side-effects, it’s not strictly necessary to return the object.)
The primary limitation is that ArrayBuilders cannot be created and
snapshot
cannot be called inside the Numba-compiled function. Awkward
Array uses Numba as a transformer: ak.Array
and an empty ak.ArrayBuilder
go in and a filled ak.ArrayBuilder
is the result; snapshot
can be called
outside of the compiled function.
Also, context managers (Python’s with
statement) are not supported in
Numba yet, so the list
, tuple
, and record
methods are not available
in Numba-compiled functions.
Here is an example of filling an ArrayBuilder in Numba, which makes a tree of dynamic depth.
>>> import numba as nb
>>> @nb.njit
... def deepnesting(builder, probability):
... if np.random.uniform(0, 1) > probability:
... builder.append(np.random.normal())
... else:
... builder.begin_list()
... for i in range(np.random.poisson(3)):
... deepnesting(builder, probability**2)
... builder.end_list()
...
>>> builder = ak.ArrayBuilder()
>>> deepnesting(builder, 0.9)
>>> builder.snapshot()
<Array [... 1.23, -0.498, 0.272], -0.0519]]]] type='1 * var * var * union[var * ...'>
>>> ak.to_list(builder)
[[[[2.05, 0.95], [[[0.25], 1.86, 0.89, 0.31], 0.38, -1.62, [[0.18], 0.46, 0.39], [-0.57, 1.39, -0.15, -0.20]], [[[-0.74, -0.34], -0.84], [-0.81, -0.72, -0.42, [1.04, 1.69, -0.18, 1.07]]], [[0.51]]], [[-1.97, 0.57], [-1.24, -2.14, -0.54, [[0.24, -2.31, [-0.68, 0.08], 1.80, 0.16], -0.63, [0.01, [-1.28, 0.38, 1.40, -0.26, -0.48]]], -0.62, -2.53], [-1.66, 0.58]], [0.62, [[-0.76, -0.67, -1.15], -0.50, [0.36, 0.48, -0.80, [1.15, -1.09], -1.39, 1.28]], 0.93, [1.35, [0.36, 1.09, -0.27, -0.79], [-0.41], [0.67, 0.89, 0.79]], [], [0.67, [-0.48, -0.39], 1.06, 0.80, -0.34], [[1.56, -1.60, [-0.69], -0.42], 0.33, -0.73, 0.50, -1.25, -1.15], [[0.64], [-0.01], -0.95], [[0.41, -0.68, 0.79], 0.51]], [[0.62, [0.58, -0.75]], [1.61, 0.52, 0.24], -1.09, [-1.11], 0.22], [-0.41, [[0.42], 0.78, [1.22, -0.49, 0.27], -0.05xs]]]]
>>> ak.type(builder.snapshot())
1 * var * var * union[var * union[float64, var * union[var * union[float64, var * float64], float64]], float64]
Note that this is a general method for building arrays; if the type is known in advance, more specialized procedures can be faster. This should be considered the “least effort” approach.
ak.ArrayBuilder._wrap¶
- ak.ArrayBuilder._wrap(cls, layout, behavior=None)¶
- Parameters
layout (
ak.layout.ArrayBuilder
) – Low-level builder to wrap.behavior (None or dict) – Custom
ak.behavior
for arrays built by this ArrayBuilder.
Wraps a low-level ak.layout.ArrayBuilder
as a high-level
ak.ArrayBuilder
.
The ak.ArrayBuilder
constructor creates a new ak.layout.ArrayBuilder
with no accumulated data, but Numba needs to wrap existing data
when returning from a lowered function.
ak.ArrayBuilder.behavior¶
- ak.ArrayBuilder.behavior¶
The behavior
parameter passed into this ArrayBuilder’s constructor.
If a dict, this
behavior
overrides the globalak.behavior
. Any keys in the globalak.behavior
but not thisbehavior
are still valid, but any keys in both are overridden by thisbehavior
. Keys with a None value are equivalent to missing keys, so thisbehavior
can effectively remove keys from the globalak.behavior
.If None, the Array defaults to the global
ak.behavior
.
See ak.behavior
for a list of recognized key patterns and their
meanings.
ak.ArrayBuilder.type¶
- ak.ArrayBuilder.type¶
The high-level type of the accumulated array; same as ak.type
.
Note that the outermost element of an Array’s type is always an
ak.types.ArrayType
, which specifies the number of elements in the array.
The type of a ak.layout.Content
(from ak.Array.layout
) is not
wrapped by an ak.types.ArrayType
.
ak.ArrayBuilder.__len__¶
- ak.ArrayBuilder.__len__(self)¶
The current length of the accumulated array.
ak.ArrayBuilder.__getitem__¶
- ak.ArrayBuilder.__getitem__(self, where)¶
- Parameters
where (many types supported; see below) – Index of positions to select from the array.
Takes a snapshot
and selects items from the array.
See ak.Array.__getitem__
for a more complete description.
ak.ArrayBuilder.__iter__¶
- ak.ArrayBuilder.__iter__(self)¶
Iterates over a snapshot
of the array in Python.
See ak.Array.__iter__
for performance considerations.
ak.ArrayBuilder.__str__¶
- ak.ArrayBuilder.__str__(self)¶
- Parameters
limit_value (int) – Maximum number of characters to use when presenting the ArrayBuilder as a string.
Presents this ArrayBuilder as a string without type or
"<ArrayBuilder ...>"
.
See ak.Array.__str__
for a more complete description.
ak.ArrayBuilder.__repr__¶
- ak.ArrayBuilder.__repr__(self)¶
- Parameters
limit_value (int) – Maximum number of characters to use when presenting the data of the ArrayBuilder.
limit_total (int) – Maximum number of characters to use for the whole string (should be larger than
limit_value
).
Presents this ArrayBuilder as a string with its type and
"<ArrayBuilder ...>"
.
See ak.Array.__repr__
for a more complete description.
ak.ArrayBuilder.__array__¶
- ak.ArrayBuilder.__array__(self)¶
Intercepts attempts to convert a snapshot
of this array into a
NumPy array and either performs a zero-copy conversion or raises
an error.
See ak.Array.__array__
for a more complete description.
ak.ArrayBuilder.__array_ufunc__¶
- ak.ArrayBuilder.__array_ufunc__(self, ufunc, method)¶
Intercepts attempts to pass this ArrayBuilder to a NumPy
universal functions
(ufuncs) and passes it through the structure of the array’s snapshot
.
See ak.Array.__array_ufunc__
for a more complete description.
ak.ArrayBuilder.__array_function__¶
- ak.ArrayBuilder.__array_function__(self, func, types, args, kwargs)¶
Intercepts attempts to pass this ArrayBuilder to those NumPy functions other than universal functions that have an Awkward equivalent.
See ak.ArrayBuilder.__array_ufunc__
for a more complete description.
ak.ArrayBuilder.numba_type¶
- ak.ArrayBuilder.numba_type¶
The type of this Array when it is used in Numba. It contains enough information to generate low-level code for accessing any element, down to the leaves.
See Numba documentation on types and signatures.
ak.ArrayBuilder.snapshot¶
- ak.ArrayBuilder.snapshot(self)¶
Converts the currently accumulated data into an ak.Array
.
This is almost always an O(1) operation (does not scale with the size of the accumulated data, and therefore safe to call relatively often).
The resulting ak.Array
shares memory with the accumulated data (it
is a zero-copy operation), but it is safe to continue filling the
ArrayBuilder because its append-only operations only affect data
outside the range viewed by old snapshots. If ArrayBuilder reallocates
an internal buffer, the data are no longer shared, but they’re
reference-counted by the ak.Array
and the ak.ArrayBuilder
, so all
buffers are deleted exactly once.
ak.ArrayBuilder.null¶
- ak.ArrayBuilder.null(self)¶
Appends a None value at the current position in the accumulated array.
ak.ArrayBuilder.boolean¶
- ak.ArrayBuilder.boolean(self, x)¶
Appends a boolean value x
at the current position in the accumulated
array.
ak.ArrayBuilder.integer¶
- ak.ArrayBuilder.integer(self, x)¶
Appends an integer x
at the current position in the accumulated
array.
ak.ArrayBuilder.real¶
- ak.ArrayBuilder.real(self, x)¶
Appends a floating point number x
at the current position in the
accumulated array.
ak.ArrayBuilder.complex¶
- ak.ArrayBuilder.complex(self, x)¶
Appends a floating point number x
at the current position in the
accumulated array.
ak.ArrayBuilder.datetime¶
- ak.ArrayBuilder.datetime(self, x)¶
Appends a datetime value x
at the current position in the
accumulated array.
ak.ArrayBuilder.timedelta¶
- ak.ArrayBuilder.timedelta(self, x)¶
Appends a timedelta value x
at the current position in the
accumulated array.
ak.ArrayBuilder.bytestring¶
- ak.ArrayBuilder.bytestring(self, x)¶
Appends an unencoded string (raw bytes) x
at the current position
in the accumulated array.
ak.ArrayBuilder.string¶
- ak.ArrayBuilder.string(self, x)¶
Appends a UTF-8 encoded string x
at the current position in the
accumulated array.
ak.ArrayBuilder.begin_list¶
- ak.ArrayBuilder.begin_list(self)¶
Begins filling a list; must be closed with end_list
.
For example,
builder.begin_list()
builder.real(1.1)
builder.real(2.2)
builder.real(3.3)
builder.end_list()
builder.begin_list()
builder.end_list()
builder.begin_list()
builder.real(4.4)
builder.real(5.5)
builder.end_list()
produces
[[1.1, 2.2, 3.3], [], [4.4, 5.5]]
ak.ArrayBuilder.begin_tuple¶
- ak.ArrayBuilder.begin_tuple(self, numfields)¶
Begins filling a tuple with numfields
fields; must be closed with
end_tuple
.
For example,
builder.begin_tuple(3)
builder.index(0).integer(1)
builder.index(1).real(1.1)
builder.index(2).string("one")
builder.end_tuple()
builder.begin_tuple(3)
builder.index(0).integer(2)
builder.index(1).real(2.2)
builder.index(2).string("two")
builder.end_tuple()
produces
[(1, 1.1, "one"), (2, 2.2, "two")]
ak.ArrayBuilder.index¶
- ak.ArrayBuilder.index(self, i)¶
- Parameters
i (int) – The tuple slot to fill.
This method also returns the ak.ArrayBuilder
, so that it can be
chained with the value that fills the slot.
Prepares to fill a tuple slot; see begin_tuple
for an example.
ak.ArrayBuilder.begin_record¶
- ak.ArrayBuilder.begin_record(self, name=None)¶
Begins filling a record with an optional name
; must be closed with
end_record
.
For example,
>>> builder = ak.ArrayBuilder()
>>> builder.begin_record("points")
>>> builder.field("x").real(1)
>>> builder.field("y").real(1.1)
>>> builder.end_record()
>>> builder.begin_record("points")
>>> builder.field("x").real(2)
>>> builder.field("y").real(2.2)
>>> builder.end_record()
produces
>>> ak.to_list(builder.snapshot())
[{"x": 1.0, "y": 1.1}, {"x": 2.0, "y": 2.2}]
with type
>>> ak.type(builder.snapshot())
2 * points["x": float64, "y": float64]
The record type is named "points"
because its "__record__"
parameter is set to that value:
>>> builder.snapshot().layout.parameters
{'__record__': 'points'}
The "__record__"
parameter can be used to add behavior to the records
in the array, as described in ak.Array
, ak.Record
, and ak.behavior
.
ak.ArrayBuilder.field¶
- ak.ArrayBuilder.field(self, key)¶
- Parameters
key (str) – The field key to fill.
This method also returns the ak.ArrayBuilder
, so that it can be
chained with the value that fills the slot.
Prepares to fill a field; see begin_record
for an example.
ak.ArrayBuilder.append¶
- ak.ArrayBuilder.append(self, obj)¶
- Parameters
obj (anything
ak.from_iter
recognizes) – The object to append.at (None or int) – which value to select from
obj
ifobj
is anak.Array
.
Appends a Python object. This method can be used as a shorthand for null
,
boolean
, integer
, real
, bytestring
, or string
.
ak.ArrayBuilder.list¶
- ak.ArrayBuilder.list(self)¶
Context manager to prevent unpaired begin_list
and end_list
. The
example in the begin_list
documentation can be rewritten as
with builder.list():
builder.real(1.1)
builder.real(2.2)
builder.real(3.3)
with builder.list():
pass
with builder.list():
builder.real(4.4)
builder.real(5.5)
to produce the same result.
[[1.1, 2.2, 3.3], [], [4.4, 5.5]]
Since context managers aren’t yet supported by Numba, this method can’t be used in Numba.
ak.ArrayBuilder.tuple¶
- ak.ArrayBuilder.tuple(self, numfields)¶
Context manager to prevent unpaired begin_tuple
and end_tuple
. The
example in the begin_tuple
documentation can be rewritten as
with builder.tuple(3):
builder.index(0).integer(1)
builder.index(1).real(1.1)
builder.index(2).string("one")
with builder.tuple(3):
builder.index(0).integer(2)
builder.index(1).real(2.2)
builder.index(2).string("two")
to produce the same result.
[(1, 1.1, "one"), (2, 2.2, "two")]
Since context managers aren’t yet supported by Numba, this method can’t be used in Numba.
ak.ArrayBuilder.record¶
- ak.ArrayBuilder.record(self, name=None)¶
Context manager to prevent unpaired begin_record
and end_record
. The
example in the begin_record
documentation can be rewritten as
with builder.record("points"):
builder.field("x").real(1)
builder.field("y").real(1.1)
with builder.record("points"):
builder.field("x").real(2)
builder.field("y").real(2.2)
to produce the same result.
[{"x": 1.0, "y": 1.1}, {"x": 2.0, "y": 2.2}]
Since context managers aren’t yet supported by Numba, this method can’t be used in Numba.