ak.run_lengths¶
Defined in awkward.operations.structure on line 255.
- ak.run_lengths(array, highlevel=True, behavior=None)¶
- Parameters
array – Data containing runs of numbers to count.
highlevel (bool) – If True, return an
ak.Array
; otherwise, return a low-levelak.layout.Content
subclass.behavior (None or dict) – Custom
ak.behavior
for the output array, if high-level.
Computes the lengths of sequences of identical values at the deepest level
of nesting, returning an array with the same structure but with int64
type.
For example,
>>> array = ak.Array([1.1, 1.1, 1.1, 2.2, 3.3, 3.3, 4.4, 4.4, 5.5])
>>> ak.run_lengths(array)
<Array [3, 1, 2, 2, 1] type='5 * int64'>
There are 3 instances of 1.1, followed by 1 instance of 2.2, 2 instances of 3.3, 2 instances of 4.4, and 1 instance of 5.5.
The order and uniqueness of the input data doesn’t matter,
>>> array = ak.Array([1.1, 1.1, 1.1, 5.5, 4.4, 4.4, 1.1, 1.1, 5.5])
>>> ak.run_lengths(array)
<Array [3, 1, 2, 2, 1] type='5 * int64'>
just the difference between each value and its neighbors.
The data can be nested, but runs don’t cross list boundaries.
>>> array = ak.Array([[1.1, 1.1, 1.1, 2.2, 3.3], [3.3, 4.4], [4.4, 5.5]])
>>> ak.run_lengths(array)
<Array [[3, 1, 1], [1, 1], [1, 1]] type='3 * var * int64'>
This function recognizes strings as distinguishable values.
>>> array = ak.Array([["one", "one"], ["one", "two", "two"], ["three", "two", "two"]])
>>> ak.run_lengths(array)
<Array [[2], [1, 2], [1, 2]] type='3 * var * int64'>
Note that this can be combined with ak.argsort
and ak.unflatten
to compute
a “group by” operation:
>>> array = ak.Array([{"x": 1, "y": 1.1}, {"x": 2, "y": 2.2}, {"x": 1, "y": 1.1},
... {"x": 3, "y": 3.3}, {"x": 1, "y": 1.1}, {"x": 2, "y": 2.2}])
>>> sorted = array[ak.argsort(array.x)]
>>> sorted.x
<Array [1, 1, 1, 2, 2, 3] type='6 * int64'>
>>> ak.run_lengths(sorted.x)
<Array [3, 2, 1] type='3 * int64'>
>>> ak.unflatten(sorted, ak.run_lengths(sorted.x)).tolist()
[[{'x': 1, 'y': 1.1}, {'x': 1, 'y': 1.1}, {'x': 1, 'y': 1.1}],
[{'x': 2, 'y': 2.2}, {'x': 2, 'y': 2.2}],
[{'x': 3, 'y': 3.3}]]
Unlike a database “group by,” this operation can be applied in bulk to many sublists
(though the run lengths need to be fully flattened to be used as counts
for
ak.unflatten
, and you need to specify axis=-1
as the depth).
>>> array = ak.Array([[{"x": 1, "y": 1.1}, {"x": 2, "y": 2.2}, {"x": 1, "y": 1.1}],
... [{"x": 3, "y": 3.3}, {"x": 1, "y": 1.1}, {"x": 2, "y": 2.2}]])
>>> sorted = array[ak.argsort(array.x)]
>>> sorted.x
<Array [[1, 1, 2], [1, 2, 3]] type='2 * var * int64'>
>>> ak.run_lengths(sorted.x)
<Array [[2, 1], [1, 1, 1]] type='2 * var * int64'>
>>> counts = ak.flatten(ak.run_lengths(sorted.x), axis=None)
>>> ak.unflatten(sorted, counts, axis=-1).tolist()
[[[{'x': 1, 'y': 1.1}, {'x': 1, 'y': 1.1}],
[{'x': 2, 'y': 2.2}]],
[[{'x': 1, 'y': 1.1}],
[{'x': 2, 'y': 2.2}],
[{'x': 3, 'y': 3.3}]]]
See also ak.num
, ak.argsort
, ak.unflatten
.