ak.run_lengths

Defined in awkward.operations.structure on line 265.

ak.run_lengths(array, highlevel=True, behavior=None)
Parameters
  • array – Data containing runs of numbers to count.

  • highlevel (bool) – If True, return an ak.Array; otherwise, return a low-level ak.layout.Content subclass.

  • behavior (None or dict) – Custom ak.behavior for the output array, if high-level.

Computes the lengths of sequences of identical values at the deepest level of nesting, returning an array with the same structure but with int64 type.

For example,

>>> array = ak.Array([1.1, 1.1, 1.1, 2.2, 3.3, 3.3, 4.4, 4.4, 5.5])
>>> ak.run_lengths(array)
<Array [3, 1, 2, 2, 1] type='5 * int64'>

There are 3 instances of 1.1, followed by 1 instance of 2.2, 2 instances of 3.3, 2 instances of 4.4, and 1 instance of 5.5.

The order and uniqueness of the input data doesn’t matter,

>>> array = ak.Array([1.1, 1.1, 1.1, 5.5, 4.4, 4.4, 1.1, 1.1, 5.5])
>>> ak.run_lengths(array)
<Array [3, 1, 2, 2, 1] type='5 * int64'>

just the difference between each value and its neighbors.

The data can be nested, but runs don’t cross list boundaries.

>>> array = ak.Array([[1.1, 1.1, 1.1, 2.2, 3.3], [3.3, 4.4], [4.4, 5.5]])
>>> ak.run_lengths(array)
<Array [[3, 1, 1], [1, 1], [1, 1]] type='3 * var * int64'>

This function recognizes strings as distinguishable values.

>>> array = ak.Array([["one", "one"], ["one", "two", "two"], ["three", "two", "two"]])
>>> ak.run_lengths(array)
<Array [[2], [1, 2], [1, 2]] type='3 * var * int64'>

Note that this can be combined with ak.argsort and ak.unflatten to compute a “group by” operation:

>>> array = ak.Array([{"x": 1, "y": 1.1}, {"x": 2, "y": 2.2}, {"x": 1, "y": 1.1},
...                   {"x": 3, "y": 3.3}, {"x": 1, "y": 1.1}, {"x": 2, "y": 2.2}])
>>> sorted = array[ak.argsort(array.x)]
>>> sorted.x
<Array [1, 1, 1, 2, 2, 3] type='6 * int64'>
>>> ak.run_lengths(sorted.x)
<Array [3, 2, 1] type='3 * int64'>
>>> ak.unflatten(sorted, ak.run_lengths(sorted.x)).tolist()
[[{'x': 1, 'y': 1.1}, {'x': 1, 'y': 1.1}, {'x': 1, 'y': 1.1}],
 [{'x': 2, 'y': 2.2}, {'x': 2, 'y': 2.2}],
 [{'x': 3, 'y': 3.3}]]

Unlike a database “group by,” this operation can be applied in bulk to many sublists (though the run lengths need to be fully flattened to be used as counts for ak.unflatten, and you need to specify axis=-1 as the depth).

>>> array = ak.Array([[{"x": 1, "y": 1.1}, {"x": 2, "y": 2.2}, {"x": 1, "y": 1.1}],
...                   [{"x": 3, "y": 3.3}, {"x": 1, "y": 1.1}, {"x": 2, "y": 2.2}]])
>>> sorted = array[ak.argsort(array.x)]
>>> sorted.x
<Array [[1, 1, 2], [1, 2, 3]] type='2 * var * int64'>
>>> ak.run_lengths(sorted.x)
<Array [[2, 1], [1, 1, 1]] type='2 * var * int64'>
>>> counts = ak.flatten(ak.run_lengths(sorted.x), axis=None)
>>> ak.unflatten(sorted, counts, axis=-1).tolist()
[[[{'x': 1, 'y': 1.1}, {'x': 1, 'y': 1.1}],
  [{'x': 2, 'y': 2.2}]],
 [[{'x': 1, 'y': 1.1}],
  [{'x': 2, 'y': 2.2}],
  [{'x': 3, 'y': 3.3}]]]

See also ak.num, ak.argsort, ak.unflatten.