ak.Array

Defined in awkward.highlevel on line 31.

class ak.Array(self, data, behavior=None, with_name=None, check_valid=False, cache=None, kernels=None)
Parameters
  • data (ak.layout.Content, ak.partition.PartitionedArray, ak.Array, np.ndarray, cp.ndarray, pyarrow.*, str, dict, or iterable) –

    Data to wrap or convert into an array.
    • If a NumPy array, the regularity of its dimensions is preserved and the data are viewed, not copied.

    • CuPy arrays are treated the same way as NumPy arrays except that they default to kernels="cuda", rather than kernels="cpu".

    • If a pyarrow object, calls ak.from_arrow, preserving as much metadata as possible, usually zero-copy.

    • If a dict of str → columns, combines the columns into an array of records (like Pandas’s DataFrame constructor).

    • If a string, the data are assumed to be JSON.

    • If an iterable, calls ak.from_iter, which assumes all dimensions have irregular lengths.

  • behavior (None or dict) – Custom ak.behavior for this Array only.

  • with_name (None or str) – Gives tuples and records a name that can be used to override their behavior (see below).

  • check_valid (bool) – If True, verify that the layout is valid.

  • kernels (None, "cpu", or "cuda") – If "cpu", the Array will be placed in main memory for use with other "cpu" Arrays and Records; if "cuda", the Array will be placed in GPU global memory using CUDA; if None, the data are left untouched. For "cuda", awkward-cuda-kernels must be installed, which can be invoked with pip install awkward[cuda] --upgrade.

High-level array that can contain data of any type.

For most users, this is the only class in Awkward Array that matters: it is the entry point for data analysis with an emphasis on usability. It intentionally has a minimum of methods, preferring standalone functions like

ak.num(array1)
ak.combinations(array1)
ak.cartesian([array1, array2])
ak.zip({"x": array1, "y": array2, "z": array3})

instead of bound methods like

array1.num()
array1.combinations()
array1.cartesian([array2, array3])
array1.zip(...)   # ?

because its namespace is valuable for domain-specific parameters and functionality. For example, if records contain a field named "num", they can be accessed as

array1.num

instead of

array1["num"]

without any confusion or interference from ak.num. The same is true for domain-specific methods that have been attached to the data. For instance, an analysis of mailing addresses might have a function that computes zip codes, which can be attached to the data with a method like

latlon.zip()

without any confusion or interference from ak.zip. Custom methods like this can be added with ak.behavior, and so the namespace of Array attributes must be kept clear for such applications.

See also ak.Record.

Interfaces to other libraries

NumPy

When NumPy universal functions (ufuncs) are applied to an ak.Array, they are passed through the Awkward data structure, applied to the numerical data at its leaves, and the output maintains the original structure.

For example,

>>> array = ak.Array([[1, 4, 9], [], [16, 25]])
>>> np.sqrt(array)
<Array [[1, 2, 3], [], [4, 5]] type='3 * var * float64'>

See also ak.Array.__array_ufunc__.

Some NumPy functions other than ufuncs are also handled properly in NumPy >= 1.17 (see NEP 18) and if an Awkward override exists. That is,

np.concatenate

can be used on an Awkward Array because

ak.concatenate

exists. If your NumPy is older than 1.17, use ak.concatenate directly.

Pandas

Ragged arrays (list type) can be converted into Pandas MultiIndex rows and nested records can be converted into MultiIndex columns. If the Awkward Array has only one “branch” of nested lists (i.e. different record fields do not have different-length lists, but a single chain of lists-of-lists is okay), then it can be losslessly converted into a single DataFrame. Otherwise, multiple DataFrames are needed, though they can be merged (with a loss of information).

The ak.to_pandas function performs this conversion; if how=None, it returns a list of DataFrames; otherwise, how is passed to pd.merge when merging the resultant DataFrames.

Numba

Arrays can be used in Numba: they can be passed as arguments to a Numba-compiled function or returned as return values. The only limitation is that Awkward Arrays cannot be created inside the Numba-compiled function; to make outputs, consider ak.ArrayBuilder.

Arrow

Arrays are convertible to and from Apache Arrow, a standard for representing nested data structures in columnar arrays. See ak.to_arrow and ak.from_arrow.

NumExpr

NumExpr can calculate expressions on a set of ak.Arrays, but only if the functions in ak.numexpr are used, not the functions in the numexpr library directly.

Like NumPy ufuncs, the expression is evaluated on the numeric leaves of the data structure, maintaining structure in the output.

See ak.numexpr.evaluate to calculate an expression.

See ak.numexpr.re_evaluate to recalculate an expression without rebuilding its virtual machine.

Autograd

Derivatives of a calculation on a set of ak.Arrays can be calculated with Autograd, but only if the function in ak.autograd is used, not the functions in the autograd library directly.

Like NumPy ufuncs, the function and its derivatives are evaluated on the numeric leaves of the data structure, maintaining structure in the output.

See ak.autograd.elementwise_grad to calculate a function and its derivatives elementwise on each numeric value in an ak.Array.

ak.Array.layout

ak.Array.layout

The composable ak.layout.Content elements that determine how this Array is structured.

This may be considered a “low-level” view, as it distinguishes between arrays that have the same logical meaning (i.e. same JSON output and high-level type) but different

  • node types, such as ak.layout.ListArray64 and ak.layout.ListOffsetArray64,

  • integer type specialization, such as ak.layout.ListArray64 and ak.layout.ListArray32,

  • or specific values, such as gaps in a ak.layout.ListArray64.

The ak.layout.Content elements are fully composable, whereas an Array is not; the high-level Array is a single-layer “shell” around its layout.

Layouts are rendered as XML instead of a nested list. For example, the following array

ak.Array([[1.1, 2.2, 3.3], [], [4.4, 5.5]])

is presented as

<Array [[1.1, 2.2, 3.3], [], [4.4, 5.5]] type='3 * var * float64'>

but array.layout is presented as

<ListOffsetArray64>
    <offsets>
        <Index64 i="[0 3 3 5]" offset="0" length="4" at="0x55a26df62590"/>
    </offsets>
    <content>
        <NumpyArray format="d" shape="5" data="1.1 2.2 3.3 4.4 5.5" at="0x55a26e0c5f50"/>
    </content>
</ListOffsetArray64>

(with truncation for large arrays).

ak.Array.behavior

ak.Array.behavior

The behavior parameter passed into this Array’s constructor.

  • If a dict, this behavior overrides the global ak.behavior. Any keys in the global ak.behavior but not this behavior are still valid, but any keys in both are overridden by this behavior. Keys with a None value are equivalent to missing keys, so this behavior can effectively remove keys from the global ak.behavior.

  • If None, the Array defaults to the global ak.behavior.

See ak.behavior for a list of recognized key patterns and their meanings.

ak.Array._internal_for_jax

ak.Array._internal_for_jax(cls, layout, jaxtracers, isscalar=False)

ak.Array.caches

ak.Array.caches

ak.Array.mask

ak.Array.mask

Whereas

array[array_of_booleans]

removes elements from array in which array_of_booleans is False,

array.mask[array_of_booleans]

returns data with the same length as the original array but False values in array_of_booleans are mapped to None. Such an output can be used in mathematical expressions with the original array because they are still aligned.

See filtering and ak.mask.

ak.Array.tolist

ak.Array.tolist(self)

Converts this Array into Python objects; same as ak.to_list (but without the underscore, like NumPy’s tolist).

ak.Array.to_list

ak.Array.to_list(self)

Converts this Array into Python objects; same as ak.to_list.

ak.Array.to_numpy

ak.Array.to_numpy(self, allow_missing=True)

Converts this Array into a NumPy array, if possible; same as ak.to_numpy.

ak.Array.nbytes

ak.Array.nbytes

The total number of bytes in all the ak.layout.Index, ak.layout.Identities, and ak.layout.NumpyArray buffers in this array tree.

Note: this calculation takes overlapping buffers into account, to the extent that overlaps are not double-counted, but overlaps are currently assumed to be complete subsets of one another, and so it is theoretically possible (though unlikely) that this number is an underestimate of the true usage.

It also does not count buffers that must be kept in memory because of ownership, but are not directly used in the array. Nor does it count the (small) C++ nodes or Python objects that reference the (large) array buffers.

ak.Array.ndim

ak.Array.ndim

Number of dimensions (nested variable-length lists and/or regular arrays) before reaching a numeric type or a record.

There may be nested lists within the record, as field values, but this number of dimensions does not count those.

(Some fields may have different depths than others, which is why they are not counted.)

ak.Array.fields

ak.Array.fields

List of field names or tuple slot numbers (as strings) of the outermost record or tuple in this array.

If the array contains nested records, only the fields of the outermost record are shown. If it contains tuples instead of records, its fields are string representations of integers, such as "0", "1", "2", etc. The records or tuples may be within multiple layers of nested lists.

If the array contains neither tuples nor records, it is an empty list.

See also ak.fields.

ak.Array._ipython_key_completions_

ak.Array._ipython_key_completions_(self)

ak.Array.type

ak.Array.type

The high-level type of this Array; same as ak.type.

Note that the outermost element of an Array’s type is always an ak.types.ArrayType, which specifies the number of elements in the array.

The type of a ak.layout.Content (from ak.Array.layout) is not wrapped by an ak.types.ArrayType.

ak.Array.__len__

ak.Array.__len__(self)

The length of this Array, only counting the outermost structure.

For example, the length of

ak.Array([[1.1, 2.2, 3.3], [], [4.4, 5.5]])

is 3, not 5.

ak.Array.__iter__

ak.Array.__iter__(self)

Iterates over this Array in Python.

Note that this is the slowest way to access data (even slower than native Python objects, like lists and dicts). Usually, you should express your problems in array-at-a-time operations.

In other words, do this:

>>> print(np.sqrt(ak.Array([[1.1, 2.2, 3.3], [], [4.4, 5.5]])))
[[1.05, 1.48, 1.82], [], [2.1, 2.35]]

not this:

>>> for outer in ak.Array([[1.1, 2.2, 3.3], [], [4.4, 5.5]]):
...     for inner in outer:
...         print(np.sqrt(inner))
...
1.0488088481701516
1.4832396974191326
1.816590212458495
2.0976176963403033
2.345207879911715

Iteration over Arrays exists so that they can be more easily inspected as Python objects.

See also ak.to_list.

ak.Array.__getitem__

ak.Array.__getitem__(self, where)
Parameters

where (many types supported; see below) – Index of positions to select from this Array.

Select items from the Array using an extension of NumPy’s (already quite extensive) rules.

All methods of selecting items described in NumPy indexing are supported with one exception (combining advanced and basic indexing with basic indexes between two advanced indexes: the definition NumPy chose for the result does not have a generalization beyond rectilinear arrays).

The where parameter can be any of the following or a tuple of the following.

  • An integer selects one element. Like Python/NumPy, it is zero-indexed: 0 is the first item, 1 is the second, etc. Negative indexes count from the end of the list: -1 is the last, -2 is the second-to-last, etc. Indexes beyond the size of the array, either because they’re too large or because they’re too negative, raise errors. In particular, some nested lists might contain a desired element while others don’t; this would raise an error.

  • A slice (either a Python slice object or the start:stop:step syntax) selects a range of elements. The start and stop values are zero-indexed; start is inclusive and stop is exclusive, like Python/NumPy. Negative step values are allowed, but a step of 0 is an error. Slices beyond the size of the array are not errors but are truncated, like Python/NumPy.

  • A string selects a tuple or record field, even if its position in the tuple is to the left of the dimension where the tuple/record is defined. (See projection below.) This is similar to NumPy’s field access, except that strings are allowed in the same tuple with other slice types. While record fields have names, tuple fields are integer strings, such as "0", "1", "2" (always non-negative). Be careful to distinguish these from non-string integers.

  • An iterable of strings (not the top-level tuple) selects multiple tuple/record fields.

  • An ellipsis (either the Python Ellipsis object or the ... syntax) skips as many dimensions as needed to put the rest of the slice items to the innermost dimensions.

  • A np.newaxis or its equivalent, None, does not select items but introduces a new regular dimension in the output with size 1. This is a convenient way to explicitly choose a dimension for broadcasting.

  • A boolean array with the same length as the current dimension (or any iterable, other than the top-level tuple) selects elements corresponding to each True value in the array, dropping those that correspond to each False. The behavior is similar to NumPy’s compress function.

  • An integer array (or any iterable, other than the top-level tuple) selects elements like a single integer, but produces a regular dimension of as many as are desired. The array can have any length, any order, and it can have duplicates and incomplete coverage. The behavior is similar to NumPy’s take function.

  • An integer Array with missing (None) items selects multiple values by index, as above, but None values are passed through to the output. This behavior matches pyarrow’s Array.take which also manages arrays with missing values. See option indexing below.

  • An Array of nested lists, ultimately containing booleans or integers and having the same lengths of lists at each level as the Array to which they’re applied, selects by boolean or by integer at the deeply nested level. Missing items at any level above the deepest level must broadcast. See nested indexing below.

A tuple of the above applies each slice item to a dimension of the data, which can be very expressive. More than one flat boolean/integer array are “iterated as one” as described in the NumPy documentation.

Filtering

A common use of selection by boolean arrays is to filter a dataset by some property. For instance, to get the odd values of the array

ak.Array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

one can put an array expression with True for each odd value inside square brackets:

>>> array[array % 2 == 1]
<Array [1, 3, 5, 7, 9] type='5 * int64'>

This technique is so common in NumPy and Pandas data analysis that it is often read as a syntax, rather than a consequence of array slicing.

The extension to nested arrays like

ak.Array([[[0, 1, 2], [], [3, 4], [5]], [[6, 7, 8], [9]]])

allows us to use the same syntax more generally.

>>> array[array % 2 == 1]
<Array [[[1], [], [3], [5]], [[7], [9]]] type='2 * var * var * int64'>

In this example, the boolean array is itself nested (see nested indexing below).

>>> array % 2 == 1
<Array [[[False, True, False], ... [True]]] type='2 * var * var * bool'>

This also applies to data with record structures.

For nested data, we often need to select the first or first two elements from variable-length lists. That can be a problem if some lists are empty. A function like ak.num can be useful for first selecting by the lengths of lists.

>>> array = ak.Array([[1.1, 2.2, 3.3],
...                   [],
...                   [4.4, 5.5],
...                   [6.6],
...                   [],
...                   [7.7, 8.8, 9.9]])
...
>>> array[ak.num(array) > 0, 0]
<Array [1.1, 4.4, 6.6, 7.7] type='4 * float64'>
>>> array[ak.num(array) > 1, 1]
<Array [2.2, 5.5, 8.8] type='3 * float64'>

It’s sometimes also a problem that “cleaning” the dataset by dropping empty lists changes its alignment, so that it can no longer be used in calculations with “uncleaned” data. For this, ak.mask can be useful because it inserts None in positions that fail the filter, rather than removing them.

>>> print(ak.mask(array, ak.num(array) > 1))
[[1.1, 2.2, 3.3], None, [4.4, 5.5], None, None, [7.7, 8.8, 9.9]]

Note, however, that the 0 or 1 to pick the first or second item of each nested list is in the second dimension, so the first dimension of the slice must be a :.

>>> ak.mask(array, ak.num(array) > 1)[:, 0]
<Array [1.1, None, 4.4, None, None, 7.7] type='6 * ?float64'>
>>> ak.mask(array, ak.num(array) > 1)[:, 1]
<Array [2.2, None, 5.5, None, None, 8.8] type='6 * ?float64'>

Another syntax for

ak.mask(array, array_of_booleans)

is

array.mask[array_of_booleans]

(which is 5 characters away from simply filtering the array).

Projection

The following array

ak.Array([[{"x": 1.1, "y": [1]}, {"x": 2.2, "y": [2, 2]}],
          [{"x": 3.3, "y": [3, 3, 3]}],
          [{"x": 0, "y": []}, {"x": 1.1, "y": [1, 1, 1]}]])

has records inside of nested lists:

>>> ak.type(array)
3 * var * {"x": float64, "y": var * int64}

In principle, one should select nested lists before record fields,

>>> array[2, :, "x"]
<Array [0, 1.1] type='2 * float64'>
>>> array[::2, :, "x"]
<Array [[1.1, 2.2], [0, 1.1]] type='2 * var * float64'>

but it’s also possible to select record fields first.

>>> array["x"]
<Array [[1.1, 2.2], [3.3], [0, 1.1]] type='3 * var * float64'>

The string can “commute” to the left through integers and slices to get the same result as it would in its “natural” position.

>>> array[2, :, "x"]
<Array [0, 1.1] type='2 * float64'>
>>> array[2, "x", :]
<Array [0, 1.1] type='2 * float64'>
>>> array["x", 2, :]
<Array [0, 1.1] type='2 * float64'>

The is analogous to selecting rows (integer indexes) before columns (string names) or columns before rows, except that the rows are more complex (like a Pandas MultiIndex). This would be an expensive operation in a typical object-oriented environment, in which the records with fields "x" and "y" are akin to C structs, but for columnar Awkward Arrays, projecting through all records to produce an array of nested lists of "x" values just changes the metadata (no loop over data, and therefore fast).

Thus, data analysts should think of records as fluid objects that can be easily projected apart and zipped back together with ak.zip.

Note, however, that while a column string can “commute” with row indexes to the left of its position in the tree, it can’t commute to the right. For example, it’s possible to use slices inside "y" because "y" is a list:

>>> array[0, :, "y"]
<Array [[1], [2, 2]] type='2 * var * int64'>
>>> array[0, :, "y", 0]
<Array [1, 2] type='2 * int64'>

but it’s not possible to move "y" to the right

>>> array[0, :, 0, "y"]
ValueError: in NumpyArray, too many dimensions in slice

because the array[0, :, 0, ...] slice applies to both "x" and "y" before "y" is selected, and "x" is a one-dimensional NumpyArray that can’t take more than its share of slices.

Finally, note that the dot (__getattr__) syntax is equivalent to a single string in a slice (__getitem__) if the field name is a valid Python identifier and doesn’t conflict with ak.Array methods or properties.

>>> array.x
<Array [[1.1, 2.2], [3.3], [0, 1.1]] type='3 * var * float64'>
>>> array.y
<Array [[[1], [2, 2]], ... [[], [1, 1, 1]]] type='3 * var * var * int64'>

Nested Projection

If records are nested within records, you can use a series of strings in the selector to drill down. For instance, with the following array,

ak.Array([
    {"a": {"x": 1, "y": 2}, "b": {"x": 10, "y": 20}, "c": {"x": 1.1, "y": 2.2}},
    {"a": {"x": 1, "y": 2}, "b": {"x": 10, "y": 20}, "c": {"x": 1.1, "y": 2.2}},
    {"a": {"x": 1, "y": 2}, "b": {"x": 10, "y": 20}, "c": {"x": 1.1, "y": 2.2}}])

we can go directly to the numerical data by specifying a string for the outer field and a string for the inner field.

>>> array["a", "x"]
<Array [1, 1, 1] type='3 * int64'>
>>> array["a", "y"]
<Array [2, 2, 2] type='3 * int64'>
>>> array["b", "y"]
<Array [20, 20, 20] type='3 * int64'>
>>> array["c", "y"]
<Array [2.2, 2.2, 2.2] type='3 * float64'>

As with single projections, the dot (__getattr__) syntax is equivalent to a single string in a slice (__getitem__) if the field name is a valid Python identifier and doesn’t conflict with ak.Array methods or properties.

>>> array.a.x
<Array [1, 1, 1] type='3 * int64'>

You can even get every field of the same name within an outer record using a list of field names for the outer record. The following selects the "x" field of "a", "b", and "c" records:

>>> array[["a", "b", "c"], "x"].tolist()
[{'a': 1, 'b': 10, 'c': 1.1},
 {'a': 1, 'b': 10, 'c': 1.1},
 {'a': 1, 'b': 10, 'c': 1.1}]

You don’t need to get all fields:

>>> array[["a", "b"], "x"].tolist()
[{'a': 1, 'b': 10},
 {'a': 1, 'b': 10},
 {'a': 1, 'b': 10}]

And you can select lists of field names at all levels:

>>> array[["a", "b"], ["x", "y"]].tolist()
[{'a': {'x': 1, 'y': 2}, 'b': {'x': 10, 'y': 20}},
 {'a': {'x': 1, 'y': 2}, 'b': {'x': 10, 'y': 20}},
 {'a': {'x': 1, 'y': 2}, 'b': {'x': 10, 'y': 20}}]

Option indexing

NumPy arrays can be sliced by all of the above slice types except arrays with missing values and arrays with nested lists, both of which are inexpressible in NumPy. Missing values, represented by None in Python, are called option types (ak.types.OptionType) in Awkward Array and can be used as a slice.

For example, an array like

ak.Array([1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8, 9.9])

can be sliced with a boolean array

>>> array[[False, False, False, False, True, False, True, False, True]]
<Array [5.5, 7.7, 9.9] type='3 * float64'>

or a boolean array containing None values:

>>> array[[False, False, False, False, True, None, True, None, True]]
<Array [5.5, None, 7.7, None, 9.9] type='5 * ?float64'>

Similarly for arrays of integers and None:

>>> array[[0, 1, None, None, 7, 8]]
<Array [1.1, 2.2, None, None, 8.8, 9.9] type='6 * ?float64'>

This is the same behavior as pyarrow’s Array.take, which establishes a convention for how to interpret slice arrays with option type:

>>> import pyarrow as pa
>>> array = pa.array([1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8, 9.9])
>>> array.take(pa.array([0, 1, None, None, 7, 8]))
<pyarrow.lib.DoubleArray object at 0x7efc7f060210>
[
  1.1,
  2.2,
  null,
  null,
  8.8,
  9.9
]

Nested indexing

Awkward Array’s nested lists can be used as slices as well, as long as the type at the deepest level of nesting is boolean or integer.

For example, the array

ak.Array([[[0.0, 1.1, 2.2], [], [3.3, 4.4]], [], [[5.5]]])

can be sliced at the top level with one-dimensional arrays:

>>> array[[False, True, True]]
<Array [[], [[5.5]]] type='2 * var * var * float64'>
>>> array[[1, 2]]
<Array [[], [[5.5]]] type='2 * var * var * float64'>

with singly nested lists:

>>> array[[[False, True, True], [], [True]]]
<Array [[[], [3.3, 4.4]], [], [[5.5]]] type='3 * var * var * float64'>
>>> array[[[1, 2], [], [0]]]
<Array [[[], [3.3, 4.4]], [], [[5.5]]] type='3 * var * var * float64'>

and with doubly nested lists:

>>> array[[[[False, True, False], [], [True, False]], [], [[False]]]]
<Array [[[1.1], [], [3.3]], [], [[]]] type='3 * var * var * float64'>
>>> array[[[[1], [], [0]], [], [[]]]]
<Array [[[1.1], [], [3.3]], [], [[]]] type='3 * var * var * float64'>

The key thing is that the nested slice has the same number of elements as the array it’s slicing at every level of nesting that it reproduces. This is similar to the requirement that boolean arrays have the same length as the array they’re filtering.

This kind of slicing is useful because NumPy’s universal functions produce arrays with the same structure as the original array, which can then be used as filters.

>>> print((array * 10) % 2 == 1)
[[[False, True, False], [], [True, False]], [], [[True]]]
>>> print(array[(array * 10) % 2 == 1])
[[[1.1], [], [3.3]], [], [[5.5]]]

Functions whose names start with “arg” return index positions, which can be used with the integer form.

>>> print(np.argmax(array, axis=-1))
[[2, None, 1], [], [0]]
>>> print(array[np.argmax(array, axis=-1)])
[[[3.3, 4.4], None, []], [], [[5.5]]]

Here, the np.argmax returns the integer position of the maximum element or None for empty arrays. It’s a nice example of option indexing with nested indexing.

When applying a nested index with missing (None) entries at levels higher than the last level, the indexer must have the same dimension as the array being indexed, and the resulting output will have missing entries at the corresponding locations, e.g. for

>>> print(array[ [[[0, None, 2, None, None], None, [1]], None, [[0]]] ])
[[[0, None, 2.2, None, None], None, [4.4]], None, [[5.5]]]

the sub-list at entry 0,0 is extended as the masked entries are acting at the last level, while the higher levels of the indexer all have the same dimension as the array being indexed.

ak.Array.__setitem__

ak.Array.__setitem__(self, where, what)
Parameters
  • where (str) – Field name to add to records in the array.

  • what (ak.Array) – Array to add as the new field.

Unlike __getitem__, which allows a wide variety of slice types, only single field-slicing is supported for assignment. (ak.layout.Content arrays are immutable; field assignment replaces the layout with an array that has the new field using ak.with_field.)

However, a field can be assigned deeply into a nested record e.g.

>>> nested = ak.zip({"a" : ak.zip({"x" : [1, 2, 3]})})
>>> nested["a", "y"] = 2 * nested.a.x
>>> ak.to_list(nested)
[{'a': {'x': 1, 'y': 2}}, {'a': {'x': 2, 'y': 4}}, {'a': {'x': 3, 'y': 6}}]

Note that the following does not work:

>>> nested["a"]["y"] = 2 * nested.a.x # does not work, nested["a"] is a copy!

Always assign by passing the whole path to the top level

>>> nested["a", "y"] = 2 * nested.a.x

If necessary, the new field will be broadcasted to fit the array. For example, given an array like

ak.Array([[{"x": 1.1}, {"x": 2.2}, {"x": 3.3}], [], [{"x": 4.4}, {"x": 5.5}]])

which has three elements with nested data in each, assigning

>>> array["y"] = [100, 200, 300]

will result in

>>> ak.to_list(array)
[[{'x': 1.1, 'y': 100}, {'x': 2.2, 'y': 100}, {'x': 3.3, 'y': 100}],
 [],
 [{'x': 4.4, 'y': 300}, {'x': 5.5, 'y': 300}]]

because the 100 in what[0] is broadcasted to all three nested elements of array[0], the 200 in what[1] is broadcasted to the empty list array[1], and the 300 in what[2] is broadcasted to both elements of array[2].

See ak.with_field for a variant that does not change the ak.Array in-place. (Internally, this method uses ak.with_field, so performance is not a factor in choosing one over the other.)

ak.Array.__getattr__

ak.Array.__getattr__(self, where)

Whenever possible, fields can be accessed as attributes.

For example, the fields of an array like

ak.Array([[{"x": 1.1, "y": [1]}, {"x": 2.2, "y": [2, 2]}, {"x": 3.3, "y": [3, 3, 3]}],
          [],
          [{"x": 4.4, "y": [4, 4, 4, 4]}, {"x": 5.5, "y": [5, 5, 5, 5, 5]}]])

can be accessed as

>>> array.x
<Array [[1.1, 2.2, 3.3], [], [4.4, 5.5]] type='3 * var * float64'>
>>> array.y
<Array [[[1], [2, 2], ... [5, 5, 5, 5, 5]]] type='3 * var * var * int64'>

which are equivalent to array["x"] and array["y"]. (See projection.)

Fields can’t be accessed as attributes when

  • ak.Array methods or properties take precedence,

  • a domain-specific behavior has methods or properties that take precedence, or

  • the field name is not a valid Python identifier or is a Python keyword.

Note that while fields can be accessed as attributes, they cannot be assigned as attributes: the following doesn’t work.

array.z = new_field

Always use

array["z"] = new_field

to add a field.

ak.Array.__dir__

ak.Array.__dir__(self)

Lists all methods, properties, and field names (see __getattr__) that can be accessed as attributes.

ak.Array.slot0

ak.Array.slot0

Equivalent to __getitem__ with "0", which selects slot 0 from all tuples.

Record fields can be accessed from __getitem__ with strings (see projection), but tuples only have slot positions, which are 0-indexed integers. However, they must also be quoted as strings to avoid confusion with integers as array indexes. Sometimes, though, interleaving integers in strings and integers outside of strings can be confusing in analysis code.

Record fields can also be accessed as attributes (with limitations), and the distinction between attributes (__getattr__) and subscripts (__getitem__) shows up more clearly in dense code. But integers would not be valid attribute names, so they’re named slot0 through slot9.

(Tuples with more than 10 slots are rare and can defer to __getitem__.)

ak.Array.slot1

ak.Array.slot1

Equivalent to __getitem__ with "1". See slot0.

ak.Array.slot2

ak.Array.slot2

Equivalent to __getitem__ with "2". See slot0.

ak.Array.slot3

ak.Array.slot3

Equivalent to __getitem__ with "3". See slot0.

ak.Array.slot4

ak.Array.slot4

Equivalent to __getitem__ with "4". See slot0.

ak.Array.slot5

ak.Array.slot5

Equivalent to __getitem__ with "5". See slot0.

ak.Array.slot6

ak.Array.slot6

Equivalent to __getitem__ with "6". See slot0.

ak.Array.slot7

ak.Array.slot7

Equivalent to __getitem__ with "7". See slot0.

ak.Array.slot8

ak.Array.slot8

Equivalent to __getitem__ with "8". See slot0.

ak.Array.slot9

ak.Array.slot9

Equivalent to __getitem__ with "9". See slot0.

ak.Array.__str__

ak.Array.__str__(self)
Parameters

limit_value (int) – Maximum number of characters to use when presenting the Array as a string.

Presents this Array as a string without type or "<Array ...>".

Large Arrays are truncated to the first few elements and the last few elements to fit within limit_value characters, using ellipsis to indicate the break. For example, an array like

ak.Array([[1.1, 2.2, 3.3],
          [],
          [4.4, 5.5, 6.6],
          [7.7, 8.8, 9.9, 10.0],
          [],
          [],
          [],
          [11.1, 12.2]])

is shown as

[[1.1, 2.2, 3.3], [], [4.4, 5.5, 6.6], [7.7, 8.8, 9.9, ... [], [], [], [11.1, 12.2]]

The algorithm does not split tokens; it will not show half a number (which can be very misleading), but it can lose structural elements like the ] that closes [7.7, 8.8, 9.9, 10.0].

The algorithm also avoids reading data unnecessarily: most of the data in the ellipsis are not even read. This can be particularly important for datasets that contain ak.layout.VirtualArray nodes that might be expensive to read.

Note that the string also does not quote field names. An array like

ak.Array([[{"x": 1.1, "y": [1]}, {"x": 2.2, "y": [2, 2]}, {"x": 3.3, "y": [3, 3, 3]}],
          [],
          [{"x": 4.4, "y": [4, 4, 4, 4]}]])

is presented as

[[{x: 1.1, y: [1]}, {x: 2.2, y: [2, 2]}, ... [], [{x: 4.4, y: [4, 4, 4, 4]}]]

Floating point numbers are presented in .3g format (3 digits using exponential notation if necessary).

The string representation cannot be read as JSON or as an ak.Array constructor.

See ak.to_list and ak.to_json to convert whole Arrays into Python data or JSON strings without loss (except for type).

ak.Array.__repr__

ak.Array.__repr__(self)
Parameters
  • limit_value (int) – Maximum number of characters to use when presenting the data of the Array.

  • limit_total (int) – Maximum number of characters to use for the whole string (should be larger than limit_value).

Presents this Array as a string with its type and "<Array ...>".

See __str__ for details of the string truncation algorithm.

The type is truncated as well, but showing only the left side of its string (the outermost data structures).

ak.Array._str

ak.Array._str(self, limit_value=85)

ak.Array._repr

ak.Array._repr(self, limit_value=40, limit_total=85)

ak.Array.__array__

ak.Array.__array__(self)

Intercepts attempts to convert this Array into a NumPy array and either performs a zero-copy conversion or raises an error.

This function is also called by the np.asarray family of functions, which have copy=False by default.

>>> np.asarray(ak.Array([[1.1, 2.2, 3.3], [4.4, 5.5, 6.6]]))
array([[1.1, 2.2, 3.3],
       [4.4, 5.5, 6.6]])

If the data are numerical and regular (nested lists have equal lengths in each dimension, as described by the type), they can be losslessly converted to a NumPy array and this function returns without an error.

Otherwise, the function raises an error. It does not create a NumPy array with dtype "O" for np.object_ (see the note on object_ type) since silent conversions to dtype "O" arrays would not only be a significant performance hit, but would also break functionality, since nested lists in a NumPy "O" array are severed from the array and cannot be sliced as dimensions.

ak.Array.__array_ufunc__

ak.Array.__array_ufunc__(self, ufunc, method)

Intercepts attempts to pass this Array to a NumPy universal functions (ufuncs) and passes it through the Array’s structure.

This method conforms to NumPy’s NEP 13 for overriding ufuncs, which has been available since NumPy 1.13 (and thus NumPy 1.13 is the minimum allowed version).

When any ufunc is applied to an Awkward Array, it applies to the innermost level of structure and preserves the structure through the operation.

For example, with an array like

ak.Array([[{"x": 0.0, "y": []}, {"x": 1.1, "y": [1]}], [], [{"x": 2.2, "y": [2, 2]}]])

applying np.sqrt would yield

>>> print(np.sqrt(array))
[[{x: 0, y: []}, {x: 1.05, y: [1]}], [], [{x: 1.48, y: [1.41, 1.41]}]]

In addition, many unary and binary operators implicitly call ufuncs, such as np.power in

>>> print(array**2)
[[{x: 0, y: []}, {x: 1.21, y: [1]}], [], [{x: 4.84, y: [4, 4]}]]

In the above example, array is a nested list of records and 2 is a scalar. Awkward Array applies the same broadcasting rules as NumPy plus a few more to deal with nested structures. In addition to broadcasting a scalar, as above, it is possible to broadcast arrays with less depth into arrays with more depth, such as

>>> print(array + ak.Array([10, 20, 30]))
[[{x: 10, y: []}, {x: 11.1, y: [11]}], [], [{x: 32.2, y: [32, 32]}]]

See ak.broadcast_arrays for details about broadcasting and the generalized set of broadcasting rules.

Third party libraries can create ufuncs, not just NumPy, so any library that “plays well” with the NumPy ecosystem can be used with Awkward Arrays:

>>> import numba as nb
>>> @nb.vectorize([nb.float64(nb.float64)])
... def sqr(x):
...     return x * x
...
>>> print(sqr(array))
[[{x: 0, y: []}, {x: 1.21, y: [1]}], [], [{x: 4.84, y: [4, 4]}]]

See also __array_function__.

ak.Array.__array_function__

ak.Array.__array_function__(self, func, types, args, kwargs)

Intercepts attempts to pass this Array to those NumPy functions other than universal functions that have an Awkward equivalent.

This method conforms to NumPy’s NEP 18 for overriding functions, which has been available since NumPy 1.17 (and NumPy 1.16 with an experimental flag set). This is not crucial for Awkward Array to work correctly, as NumPy functions like np.concatenate can be manually replaced with ak.concatenate for early versions of NumPy.

See also __array_ufunc__.

ak.Array.numba_type

ak.Array.numba_type

The type of this Array when it is used in Numba. It contains enough information to generate low-level code for accessing any element, down to the leaves.

See Numba documentation on types and signatures.

ak.Array.__getstate__

ak.Array.__getstate__(self)

ak.Array.__setstate__

ak.Array.__setstate__(self, state)

ak.Array.__copy__

ak.Array.__copy__(self)

ak.Array.__deepcopy__

ak.Array.__deepcopy__(self, memo)

ak.Array.__bool__

ak.Array.__bool__(self)

ak.Array.__contains__

ak.Array.__contains__(self, element)