ak.ArrayBuilder

Defined in awkward.highlevel on line 2110.

class ak.ArrayBuilder(self, behavior=None, initial=1024, resize=1.5)
Parameters
  • behavior (None or dict) – Custom ak.behavior for arrays built by this ArrayBuilder.

  • initial (int) – Initial size (in bytes) of buffers used by ak.layout.ArrayBuilder (see ak.layout.ArrayBuilderOptions).

  • resize (float) – Resize multiplier for buffers used by ak.layout.ArrayBuilder (see ak.layout.ArrayBuilderOptions); should be strictly greater than 1.

General tool for building arrays of nested data structures from a sequence of commands. Most data types can be constructed by calling commands in the right order, similar to printing tokens to construct JSON output.

To illustrate how this works, consider the following example.

b = ak.ArrayBuilder()

# fill commands   # as JSON   # current array type
##########################################################################################
b.begin_list()    # [         # 0 * var * unknown     (initially, the type is unknown)
b.integer(1)      #   1,      # 0 * var * int64
b.integer(2)      #   2,      # 0 * var * int64
b.real(3)         #   3.0     # 0 * var * float64     (all the integers have become floats)
b.end_list()      # ],        # 1 * var * float64
b.begin_list()    # [         # 1 * var * float64
b.end_list()      # ],        # 2 * var * float64
b.begin_list()    # [         # 2 * var * float64
b.integer(4)      #   4,      # 2 * var * float64
b.null()          #   null,   # 2 * var * ?float64    (now the floats are nullable)
b.integer(5)      #   5       # 2 * var * ?float64
b.end_list()      # ],        # 3 * var * ?float64
b.begin_list()    # [         # 3 * var * ?float64
b.begin_record()  #   {       # 3 * var * ?union[float64, {}]
b.field("x")      #     "x":  # 3 * var * ?union[float64, {"x": unknown}]
b.integer(1)      #      1,   # 3 * var * ?union[float64, {"x": int64}]
b.field("y")      #      "y": # 3 * var * ?union[float64, {"x": int64, "y": unknown}]
b.begin_list()    #      [    # 3 * var * ?union[float64, {"x": int64, "y": var * unknown}]
b.integer(2)      #        2, # 3 * var * ?union[float64, {"x": int64, "y": var * int64}]
b.integer(3)      #        3  # 3 * var * ?union[float64, {"x": int64, "y": var * int64}]
b.end_list()      #      ]    # 3 * var * ?union[float64, {"x": int64, "y": var * int64}]
b.end_record()    #   }       # 3 * var * ?union[float64, {"x": int64, "y": var * int64}]
b.end_list()      # ]         # 4 * var * ?union[float64, {"x": int64, "y": var * int64}]

To get an array, we take a snapshot of the ArrayBuilder’s current state.

>>> ak.to_list(b.snapshot())
[[1.0, 2.0, 3.0], [], [4.0, None, 5.0], [{'x': 1, 'y': [2, 3]}]]

The full set of filling commands is the following.

ArrayBuilders can be used in Numba: they can be passed as arguments to a Numba-compiled function or returned as return values. (Since ArrayBuilder works by accumulating side-effects, it’s not strictly necessary to return the object.)

The primary limitation is that ArrayBuilders cannot be created and snapshot cannot be called inside the Numba-compiled function. Awkward Array uses Numba as a transformer: ak.Array and an empty ak.ArrayBuilder go in and a filled ak.ArrayBuilder is the result; snapshot can be called outside of the compiled function.

Also, context managers (Python’s with statement) are not supported in Numba yet, so the list, tuple, and record methods are not available in Numba-compiled functions.

Here is an example of filling an ArrayBuilder in Numba, which makes a tree of dynamic depth.

>>> import numba as nb
>>> @nb.njit
... def deepnesting(builder, probability):
...     if np.random.uniform(0, 1) > probability:
...         builder.append(np.random.normal())
...     else:
...         builder.begin_list()
...         for i in range(np.random.poisson(3)):
...             deepnesting(builder, probability**2)
...         builder.end_list()
...
>>> builder = ak.ArrayBuilder()
>>> deepnesting(builder, 0.9)
>>> builder.snapshot()
<Array [... 1.23, -0.498, 0.272], -0.0519]]]] type='1 * var * var * union[var * ...'>
>>> ak.to_list(builder)
[[[[2.05, 0.95], [[[0.25], 1.86, 0.89, 0.31], 0.38, -1.62, [[0.18], 0.46, 0.39], [-0.57, 1.39, -0.15, -0.20]], [[[-0.74, -0.34], -0.84], [-0.81, -0.72, -0.42, [1.04, 1.69, -0.18, 1.07]]], [[0.51]]], [[-1.97, 0.57], [-1.24, -2.14, -0.54, [[0.24, -2.31, [-0.68, 0.08], 1.80, 0.16], -0.63, [0.01, [-1.28, 0.38, 1.40, -0.26, -0.48]]], -0.62, -2.53], [-1.66, 0.58]], [0.62, [[-0.76, -0.67, -1.15], -0.50, [0.36, 0.48, -0.80, [1.15, -1.09], -1.39, 1.28]], 0.93, [1.35, [0.36, 1.09, -0.27, -0.79], [-0.41], [0.67, 0.89, 0.79]], [], [0.67, [-0.48, -0.39], 1.06, 0.80, -0.34], [[1.56, -1.60, [-0.69], -0.42], 0.33, -0.73, 0.50, -1.25, -1.15], [[0.64], [-0.01], -0.95], [[0.41, -0.68, 0.79], 0.51]], [[0.62, [0.58, -0.75]], [1.61, 0.52, 0.24], -1.09, [-1.11], 0.22], [-0.41, [[0.42], 0.78, [1.22, -0.49, 0.27], -0.05xs]]]]
>>> ak.type(builder.snapshot())
1 * var * var * union[var * union[float64, var * union[var * union[float64, var * float64], float64]], float64]

Note that this is a general method for building arrays; if the type is known in advance, more specialized procedures can be faster. This should be considered the “least effort” approach.

ak.ArrayBuilder._wrap

ak.ArrayBuilder._wrap(cls, layout, behavior=None)
Parameters
  • layout (ak.layout.ArrayBuilder) – Low-level builder to wrap.

  • behavior (None or dict) – Custom ak.behavior for arrays built by this ArrayBuilder.

Wraps a low-level ak.layout.ArrayBuilder as a high-level ak.ArrayBulider.

The ak.ArrayBuilder constructor creates a new ak.layout.ArrayBuilder with no accumulated data, but Numba needs to wrap existing data when returning from a lowered function.

ak.ArrayBuilder.behavior

ak.ArrayBuilder.behavior

The behavior parameter passed into this ArrayBuilder’s constructor.

  • If a dict, this behavior overrides the global ak.behavior. Any keys in the global ak.behavior but not this behavior are still valid, but any keys in both are overridden by this behavior. Keys with a None value are equivalent to missing keys, so this behavior can effectively remove keys from the global ak.behavior.

  • If None, the Array defaults to the global ak.behavior.

See ak.behavior for a list of recognized key patterns and their meanings.

ak.ArrayBuilder.type

ak.ArrayBuilder.type

The high-level type of the accumulated array; same as ak.type.

Note that the outermost element of an Array’s type is always an ak.types.ArrayType, which specifies the number of elements in the array.

The type of a ak.layout.Content (from ak.Array.layout) is not wrapped by an ak.types.ArrayType.

ak.ArrayBuilder.__len__

ak.ArrayBuilder.__len__(self)

The current length of the accumulated array.

ak.ArrayBuilder.__getitem__

ak.ArrayBuilder.__getitem__(self, where)
Parameters

where (many types supported; see below) – Index of positions to select from the array.

Takes a snapshot and selects items from the array.

See ak.Array.__getitem__ for a more complete description.

ak.ArrayBuilder.__iter__

ak.ArrayBuilder.__iter__(self)

Iterates over a snapshot of the array in Python.

See ak.Array.__iter__ for performance considerations.

ak.ArrayBuilder.__str__

ak.ArrayBuilder.__str__(self)
Parameters

limit_value (int) – Maximum number of characters to use when presenting the ArrayBuilder as a string.

Presents this ArrayBuilder as a string without type or "<ArrayBuilder ...>".

See ak.Array.__str__ for a more complete description.

ak.ArrayBuilder.__repr__

ak.ArrayBuilder.__repr__(self)
Parameters
  • limit_value (int) – Maximum number of characters to use when presenting the data of the ArrayBuilder.

  • limit_total (int) – Maximum number of characters to use for the whole string (should be larger than limit_value).

Presents this ArrayBuilder as a string with its type and "<ArrayBuilder ...>".

See ak.Array.__repr__ for a more complete description.

ak.ArrayBuilder._str

ak.ArrayBuilder._str(self, limit_value=85, snapshot=None)

ak.ArrayBuilder._repr

ak.ArrayBuilder._repr(self, limit_value=40, limit_total=85)

ak.ArrayBuilder.__array__

ak.ArrayBuilder.__array__(self)

Intercepts attempts to convert a snapshot of this array into a NumPy array and either performs a zero-copy conversion or raises an error.

See ak.Array.__array__ for a more complete description.

ak.ArrayBuilder.__array_ufunc__

ak.ArrayBuilder.__array_ufunc__(self, ufunc, method)

Intercepts attempts to pass this ArrayBuilder to a NumPy universal functions (ufuncs) and passes it through the structure of the array’s snapshot.

See ak.Array.__array_ufunc__ for a more complete description.

ak.ArrayBuilder.__array_function__

ak.ArrayBuilder.__array_function__(self, func, types, args, kwargs)

Intercepts attempts to pass this ArrayBuilder to those NumPy functions other than universal functions that have an Awkward equivalent.

See ak.ArrayBuilder.__array_ufunc__ for a more complete description.

ak.ArrayBuilder.numba_type

ak.ArrayBuilder.numba_type

The type of this Array when it is used in Numba. It contains enough information to generate low-level code for accessing any element, down to the leaves.

See Numba documentation on types and signatures.

ak.ArrayBuilder.__bool__

ak.ArrayBuilder.__bool__(self)

ak.ArrayBuilder.snapshot

ak.ArrayBuilder.snapshot(self)

Converts the currently accumulated data into an ak.Array.

This is almost always an O(1) operation (does not scale with the size of the accumulated data, and therefore safe to call relatively often).

The resulting ak.Array shares memory with the accumulated data (it is a zero-copy operation), but it is safe to continue filling the ArrayBuilder because its append-only operations only affect data outside the range viewed by old snapshots. If ArrayBuilder reallocates an internal buffer, the data are no longer shared, but they’re reference-counted by the ak.Array and the ak.ArrayBuilder, so all buffers are deleted exactly once.

ak.ArrayBuilder.null

ak.ArrayBuilder.null(self)

Appends a None value at the current position in the accumulated array.

ak.ArrayBuilder.boolean

ak.ArrayBuilder.boolean(self, x)

Appends a boolean value x at the current position in the accumulated array.

ak.ArrayBuilder.integer

ak.ArrayBuilder.integer(self, x)

Appends an integer x at the current position in the accumulated array.

ak.ArrayBuilder.real

ak.ArrayBuilder.real(self, x)

Appends a floating point number x at the current position in the accumulated array.

ak.ArrayBuilder.complex

ak.ArrayBuilder.complex(self, x)

Appends a floating point number x at the current position in the accumulated array.

ak.ArrayBuilder.datetime

ak.ArrayBuilder.datetime(self, x)

Appends a datetime value x at the current position in the accumulated array.

ak.ArrayBuilder.timedelta

ak.ArrayBuilder.timedelta(self, x)

Appends a timedelta value x at the current position in the accumulated array.

ak.ArrayBuilder.bytestring

ak.ArrayBuilder.bytestring(self, x)

Appends an unencoded string (raw bytes) x at the current position in the accumulated array.

ak.ArrayBuilder.string

ak.ArrayBuilder.string(self, x)

Appends a UTF-8 encoded string x at the current position in the accumulated array.

ak.ArrayBuilder.begin_list

ak.ArrayBuilder.begin_list(self)

Begins filling a list; must be closed with end_list.

For example,

builder.begin_list()
builder.real(1.1)
builder.real(2.2)
builder.real(3.3)
builder.end_list()
builder.begin_list()
builder.end_list()
builder.begin_list()
builder.real(4.4)
builder.real(5.5)
builder.end_list()

produces

[[1.1, 2.2, 3.3], [], [4.4, 5.5]]

ak.ArrayBuilder.end_list

ak.ArrayBuilder.end_list(self)

Ends a list.

ak.ArrayBuilder.begin_tuple

ak.ArrayBuilder.begin_tuple(self, numfields)

Begins filling a tuple with numfields fields; must be closed with end_tuple.

For example,

builder.begin_tuple(3)
builder.index(0).integer(1)
builder.index(1).real(1.1)
builder.index(2).string("one")
builder.end_tuple()
builder.begin_tuple(3)
builder.index(0).integer(2)
builder.index(1).real(2.2)
builder.index(2).string("two")
builder.end_tuple()

produces

[(1, 1.1, "one"), (2, 2.2, "two")]

ak.ArrayBuilder.index

ak.ArrayBuilder.index(self, i)
Parameters

i (int) – The tuple slot to fill.

This method also returns the ak.ArrayBuilder, so that it can be chained with the value that fills the slot.

Prepares to fill a tuple slot; see begin_tuple for an example.

ak.ArrayBuilder.end_tuple

ak.ArrayBuilder.end_tuple(self)

Ends a tuple.

ak.ArrayBuilder.begin_record

ak.ArrayBuilder.begin_record(self, name=None)

Begins filling a record with an optional name; must be closed with end_record.

For example,

>>> builder = ak.ArrayBuilder()
>>> builder.begin_record("points")
>>> builder.field("x").real(1)
>>> builder.field("y").real(1.1)
>>> builder.end_record()
>>> builder.begin_record("points")
>>> builder.field("x").real(2)
>>> builder.field("y").real(2.2)
>>> builder.end_record()

produces

>>> ak.to_list(builder.snapshot())
[{"x": 1.0, "y": 1.1}, {"x": 2.0, "y": 2.2}]

with type

>>> ak.type(builder.snapshot())
2 * points["x": float64, "y": float64]

The record type is named "points" because its "__record__" parameter is set to that value:

>>> builder.snapshot().layout.parameters
{'__record__': 'points'}

The "__record__" parameter can be used to add behavior to the records in the array, as described in ak.Array, ak.Record, and ak.behavior.

ak.ArrayBuilder.field

ak.ArrayBuilder.field(self, key)
Parameters

key (str) – The field key to fill.

This method also returns the ak.ArrayBuilder, so that it can be chained with the value that fills the slot.

Prepares to fill a field; see begin_record for an example.

ak.ArrayBuilder.end_record

ak.ArrayBuilder.end_record(self)

Ends a record.

ak.ArrayBuilder.append

ak.ArrayBuilder.append(self, obj, at=None)
Parameters
  • obj – The object to append.

  • at (None or int) – which value to select from obj if obj is an ak.Array.

Appends any type of object, which can be a shorthand for null, boolean, integer, real, bytestring, or string, but also an ak.Array or ak.Record to reference values from an existing dataset, or any Python object to convert to Awkward Array.

If obj is an ak.Array or ak.Record, the output will be an ak.layout.IndexedArray64 (or ak.layout.IndexedOptionArray64 if there are any None values) that references the existing data. This can be a more time and memory-efficient way to put old data in a new structure, since it avoids copying and even walking over the old data structure (matters more when the structures are large).

If obj is an arbitrary Python object, this is equivalent to ak.from_iter except that it fills an existing ak.ArrayBuilder, rather than creating a new one.

If obj is an ak.Array and at is an int, this method fills the ArrayBuilder with a reference to obj[at] instead of obj.

ak.ArrayBuilder.extend

ak.ArrayBuilder.extend(self, obj)
Parameters

obj (ak.Array) – The Array to concatenate with the data in this ArrayBuilder.

Appends every value from obj, by reference (see append).

ak.ArrayBuilder.list

ak.ArrayBuilder.list(self)

Context manager to prevent unpaired begin_list and end_list. The example in the begin_list documentation can be rewritten as

with builder.list():
    builder.real(1.1)
    builder.real(2.2)
    builder.real(3.3)
with builder.list():
    pass
with builder.list():
    builder.real(4.4)
    builder.real(5.5)

to produce the same result.

[[1.1, 2.2, 3.3], [], [4.4, 5.5]]

Since context managers aren’t yet supported by Numba, this method can’t be used in Numba.

ak.ArrayBuilder.tuple

ak.ArrayBuilder.tuple(self, numfields)

Context manager to prevent unpaired begin_tuple and end_tuple. The example in the begin_tuple documentation can be rewritten as

with builder.tuple(3):
    builder.index(0).integer(1)
    builder.index(1).real(1.1)
    builder.index(2).string("one")
with builder.tuple(3):
    builder.index(0).integer(2)
    builder.index(1).real(2.2)
    builder.index(2).string("two")

to produce the same result.

[(1, 1.1, "one"), (2, 2.2, "two")]

Since context managers aren’t yet supported by Numba, this method can’t be used in Numba.

ak.ArrayBuilder.record

ak.ArrayBuilder.record(self, name=None)

Context manager to prevent unpaired begin_record and end_record. The example in the begin_record documentation can be rewritten as

with builder.record("points"):
    builder.field("x").real(1)
    builder.field("y").real(1.1)
with builder.record("points"):
    builder.field("x").real(2)
    builder.field("y").real(2.2)

to produce the same result.

[{"x": 1.0, "y": 1.1}, {"x": 2.0, "y": 2.2}]

Since context managers aren’t yet supported by Numba, this method can’t be used in Numba.