ak.layout.ListArray¶
The ListArray concept is implemented in 3 specialized classes:
ak.layout.ListArray64
:starts
andstops
values are 64-bit signed integers.
ak.layout.ListArray32
:starts
andstops
values are 32-bit signed integers.
ak.layout.ListArrayU32
:starts
andstops
values are 32-bit unsigned integers.
The ListArray class generalizes ak.layout.ListOffsetArray by not
requiring its content
to be in increasing order and by allowing it to
have unreachable elements between lists. Instead of a single offsets
array,
ListArray has
starts
: The starting index of each list.
stops
: The stopping index of each list.
ak.layout.ListOffsetArray offsets
may be related to starts
and
stops
:
starts = offsets[:-1]
stops = offsets[1:]
ListArrays are a common by-product of structure manipulation: as a result of
some operation, we might want to view slices or permutations of the content
without copying it to make a contiguous version of it. For that reason,
ListArrays are more useful in a data-manipulation library like Awkward Array
than in a data-representation library like Apache Arrow.
There is not equivalent of ListArray in Apache Arrow.
Below is a simplified implementation of a ListArray class in pure Python
that exhaustively checks validity in its constructor (see
ak.is_valid) and can generate random valid arrays. The
random_number()
function returns a random float and the
random_length(minlen)
function returns a random int that is at least
minlen
. The RawArray
class represents simple, one-dimensional data.
class ListArray(Content):
def __init__(self, starts, stops, content):
assert isinstance(starts, list)
assert isinstance(stops, list)
assert isinstance(content, Content)
assert len(stops) >= len(starts) # usually equal
for i in range(len(starts)):
start = starts[i]
stop = stops[i]
assert isinstance(start, int)
assert isinstance(stop, int)
if start != stop:
assert start < stop # i.e. start <= stop
assert start >= 0
assert stop <= len(content)
self.starts = starts
self.stops = stops
self.content = content
@staticmethod
def random(minlen, choices):
content = random.choice(choices).random(0, choices)
length = random_length(minlen)
if len(content) == 0:
starts = [random.randint(0, 10) for i in range(length)]
stops = list(starts)
else:
starts = [random.randint(0, len(content) - 1) for i in range(length)]
stops = [x + min(random_length(), len(content) - x) for x in starts]
return ListArray(starts, stops, content)
def __len__(self):
return len(self.starts)
def __getitem__(self, where):
if isinstance(where, int):
assert 0 <= where < len(self)
return self.content[self.starts[where]:self.stops[where]]
elif isinstance(where, slice) and where.step is None:
starts = self.starts[where.start:where.stop]
stops = self.stops[where.start:where.stop]
return ListArray(starts, stops, self.content)
elif isinstance(where, str):
return ListArray(self.starts, self.stops, self.content[where])
else:
raise AssertionError(where)
def __repr__(self):
return ("ListArray(" + repr(self.starts) + ", " + repr(self.stops) + ", "
+ repr(self.content) + ")")
def xml(self, indent="", pre="", post=""):
out = indent + pre + "<ListArray>\n"
out += indent + " <starts>" + " ".join(str(x) for x in self.starts)
out += "</starts>\n"
out += indent + " <stops>" + " ".join(str(x) for x in self.stops) + "</stops>\n"
out += self.content.xml(indent + " ", "<content>", "</content>\n")
out += indent + "</ListArray>" + post
return out
Here is an example:
ListArray([5, 1, 4, 1, 1, 1, 0, 0, 4, 3, 5],
[6, 2, 5, 6, 6, 1, 6, 6, 6, 3, 6],
RawArray([13.3, 3.8, 5.9, 5.9, 9.2, 9.3]))
<ListArray>
<starts>5 1 4 1 1 1 0 0 4 3 5</starts>
<stops>6 2 5 6 6 1 6 6 6 3 6</stops>
<content><RawArray>
<ptr>13.3 3.8 5.9 5.9 9.2 9.3</ptr>
</RawArray></content>
</ListArray>
which represents the following logical data.
[[9.3],
[3.8],
[9.2],
[3.8, 5.9, 5.9, 9.2, 9.3],
[3.8, 5.9, 5.9, 9.2, 9.3],
[],
[13.3, 3.8, 5.9, 5.9, 9.2, 9.3],
[13.3, 3.8, 5.9, 5.9, 9.2, 9.3],
[9.2, 9.3],
[],
[9.3]]
In addition to the properties and methods described in ak.layout.Content, a ListArray has the following.
ak.layout.ListArray.__init__¶
-
ak.layout.ListArray.
__init__
(starts, stops, content, identities=None, parameters=None)¶
ak.layout.ListArray.compact_offsets64¶
-
ak.layout.ListArray.
compact_offsets64
(start_at_zero=True)¶
Returns a 64-bit ak.layout.Index of offsets
that represent the same lengths
of this array’s starts
and stops
(though not the physical order in memory).
ak.layout.ListArray.broadcast_tooffsets64¶
-
ak.layout.ListArray.
broadcast_tooffsets64
(offsets)¶
Reorders contents
to match a given set of offsets
(if possible) and
returns a ak.layout.ListOffsetArray with the results. This is used in
broadcasting because a set of ak.types.ListType and ak.types.RegularType
arrays have to be reordered to a common offsets
before they can be directly
operated upon.
ak.layout.ListArray.toRegularArray¶
-
ak.layout.ListArray.
toRegularArray
()¶
Converts this ak.types.ListType into a ak.types.RegularType array if possible.