ak.layout.UnionArray¶
The UnionArray concept is implemented in 3 specialized classes:
ak.layout.UnionArray8_64
:tags
values are 8-bit signed integers andindex
values are 64-bit signed integers.
ak.layout.UnionArray8_32
:tags
values are 8-bit signed integers andindex
values are 32-bit signed integers.
ak.layout.UnionArray8_U32
:tags
values are 8-bit signed integers andindex
values are 32-bit unsigned integers.
The UnionArray class represents data drawn from an ordered list of contents
,
which can have different types, using
tags
: array of integers indicating which content each array element draws from.
index
: array of integers indicating which element from the content to draw from.
UnionArrays correspond to Apache Arrow’s dense union type. Awkward Array has no direct equivalent for Apache Arrow’s sparse union type, but an appropriate index may be generated with sparse_index.
Below is a simplified implementation of a UnionArray class in pure Python
that exhaustively checks validity in its constructor (see
ak.is_valid) and can generate random valid arrays. The
random_number()
function returns a random float and the
random_length(minlen)
function returns a random int that is at least
minlen
. The RawArray
class represents simple, one-dimensional data.
class UnionArray(Content):
def __init__(self, tags, index, contents):
assert isinstance(tags, list)
assert isinstance(index, list)
assert isinstance(contents, list)
assert len(index) >= len(tags) # usually equal
for x in tags:
assert isinstance(x, int)
assert 0 <= x < len(contents)
for i, x in enumerate(tags):
assert isinstance(index[i], numbers.Integral)
assert 0 <= index[i] < len(contents[x])
self.tags = tags
self.index = index
self.contents = contents
@staticmethod
def random(minlen, choices):
contents = []
unshuffled_tags = []
unshuffled_index = []
for i in range(random.randint(1, 3)):
if minlen == 0:
contents.append(random.choice(choices).random(0, choices))
else:
contents.append(random.choice(choices).random(1, choices))
if len(contents[-1]) != 0:
thisindex = [random.randint(0, len(contents[-1]) - 1)
for i in range(random_length(minlen))]
unshuffled_tags.extend([i] * len(thisindex))
unshuffled_index.extend(thisindex)
permutation = list(range(len(unshuffled_tags)))
random.shuffle(permutation)
tags = [unshuffled_tags[i] for i in permutation]
index = [unshuffled_index[i] for i in permutation]
return UnionArray(tags, index, contents)
def __len__(self):
return len(self.tags)
def __getitem__(self, where):
if isinstance(where, int):
assert 0 <= where < len(self)
return self.contents[self.tags[where]][self.index[where]]
elif isinstance(where, slice) and where.step is None:
return UnionArray(self.tags[where], self.index[where], self.contents)
elif isinstance(where, str):
return UnionArray(self.tags, self.index, [x[where] for x in self.contents])
else:
raise AssertionError(where)
def __repr__(self):
return ("UnionArray(" + repr(self.tags) + ", " + repr(self.index)
+ ", [" + ", ".join(repr(x) for x in self.contents) + "])")
def xml(self, indent="", pre="", post=""):
out = indent + pre + "<UnionArray>\n"
out += indent + " <tags>" + " ".join(str(x) for x in self.tags) + "</tags>\n"
out += indent + " <index>" + " ".join(str(x) for x in self.index) + "</index>\n"
for i, content in enumerate(self.contents):
out += content.xml(indent + " ", "<content i=\"" + str(i) + "\">",
"</content>\n")
out += indent + "</UnionArray>" + post
return out
Here is an example:
UnionArray([0, 1, 2, 0, 2, 2, 1],
[0, 16, 9, 0, 10, 0, 13],
[ListOffsetArray([10, 21, 22, 50, 54, 55, 59, 89, 92, 101, 111, 119, 120, 131,
138, 158, 165, 171, 173],
RawArray([0.5, 4.8, 8.6, -1.3, 4.0, 2.5, 5.0, 3.3, 5.0, 1.5, 9.3, 2.5, 5.4, 2.1,
7.1, 5.3, 10.8, -2.1, 6.4, 7.6, 5.6, 6.2, 4.9, 8.0, 6.2, 4.1, 6.6,
-1.3, 4.0, 3.8, 0.3, 5.7, 9.9, 5.6, 9.9, 9.4, 1.4, 3.9, 6.2, 6.3, 3.4,
6.2, 10.1, 3.7, 8.3, -0.6, 2.8, 9.7, 3.3, 6.5, 6.5, 2.1, 4.9, 5.8, 1.0,
6.8, 2.7, 3.2, 6.0, 6.4, 1.9, 8.1, 5.5, 6.3, 4.8, 5.5, 1.1, 0.1, 4.0,
1.8, 10.0, 3.8, 3.9, 2.5, 1.8, 6.0, 5.2, 6.0, 9.6, 11.7, 6.4, 7.9, 4.3,
5.3, 4.4, 7.0, 8.6, 6.1, 11.2, 4.7, 5.9, 9.3, 7.0, 5.1, 8.0, 6.9, 8.4,
3.7, 5.8, 4.8, 1.6, -1.5, -0.9, 6.0, 2.8, -0.2, 8.1, 2.9, 7.6, 5.7,
8.3, 8.1, 5.5, 7.1, 6.5, 0.8, 4.3, 1.9, 0.2, 7.7, 5.6, -0.5, 2.1, 6.1,
7.1, 4.5, 4.5, 4.2, 9.1, 5.7, 2.2, 9.0, 2.6, 3.8, 7.2, 3.2, 5.1, 6.6,
3.0, 6.6, 6.3, 4.8, 2.6, 3.7, 7.0, 5.2, 1.8, 4.2, 5.9, 2.2, 7.1, 6.1,
1.8, 4.2, 3.6, 3.0, 5.7, 2.1, 7.7, 1.5, 3.8, 6.4, 5.1, 7.4, 2.8, 3.3,
10.1, 8.0, 2.3, 4.5, 5.9, 6.0, 4.2, 2.6, 1.1, 2.5, 12.2])),
RawArray([3.8, 5.3, 2.2, 4.9, 6.9, 5.6, -0.6, 3.2, 2.5, 2.6, 3.6, 6.9, 7.7, 4.7,
4.0, 5.1, 0.5, 4.0]),
RawArray([6.2, 7.6, 7.6, -1.2, 5.0, 6.3, 6.8, 6.0, 3.2, 5.6, 2.3, 9.4, 1.6, 5.2,
6.1, 1.2])])
<UnionArray>
<tags>0 1 2 0 2 2 1</tags>
<index>0 16 9 0 10 0 13</index>
<content i="0"><ListOffsetArray>
<offsets>10 21 22 50 54 55 59 89 92 101 111 119 120 131 138 158 165 171 173</offsets>
<content><RawArray>
<ptr>0.5 4.8 8.6 -1.3 4.0 2.5 5.0 3.3 5.0 1.5 9.3 2.5 5.4 2.1 7.1 5.3 10.8 -2.1
6.4 7.6 5.6 6.2 4.9 8.0 6.2 4.1 6.6 -1.3 4.0 3.8 0.3 5.7 9.9 5.6 9.9 9.4 1.4
3.9 6.2 6.3 3.4 6.2 10.1 3.7 8.3 -0.6 2.8 9.7 3.3 6.5 6.5 2.1 4.9 5.8 1.0
6.8 2.7 3.2 6.0 6.4 1.9 8.1 5.5 6.3 4.8 5.5 1.1 0.1 4.0 1.8 10.0 3.8 3.9 2.5
1.8 6.0 5.2 6.0 9.6 11.7 6.4 7.9 4.3 5.3 4.4 7.0 8.6 6.1 11.2 4.7 5.9 9.3
7.0 5.1 8.0 6.9 8.4 3.7 5.8 4.8 1.6 -1.5 -0.9 6.0 2.8 -0.2 8.1 2.9 7.6 5.7
8.3 8.1 5.5 7.1 6.5 0.8 4.3 1.9 0.2 7.7 5.6 -0.5 2.1 6.1 7.1 4.5 4.5 4.2 9.1
5.7 2.2 9.0 2.6 3.8 7.2 3.2 5.1 6.6 3.0 6.6 6.3 4.8 2.6 3.7 7.0 5.2 1.8 4.2
5.9 2.2 7.1 6.1 1.8 4.2 3.6 3.0 5.7 2.1 7.7 1.5 3.8 6.4 5.1 7.4 2.8 3.3 10.1
8.0 2.3 4.5 5.9 6.0 4.2 2.6 1.1 2.5 12.2</ptr>
</RawArray></content>
</ListOffsetArray></content>
<content i="1"><RawArray>
<ptr>3.8 5.3 2.2 4.9 6.9 5.6 -0.6 3.2 2.5 2.6 3.6 6.9 7.7 4.7 4.0 5.1 0.5 4.0</ptr>
</RawArray></content>
<content i="2"><RawArray>
<ptr>6.2 7.6 7.6 -1.2 5.0 6.3 6.8 6.0 3.2 5.6 2.3 9.4 1.6 5.2 6.1 1.2</ptr>
</RawArray></content>
</UnionArray>
which represents the following logical data.
[[9.3, 2.5, 5.4, 2.1, 7.1, 5.3, 10.8, -2.1, 6.4, 7.6, 5.6],
0.5,
5.6,
[9.3, 2.5, 5.4, 2.1, 7.1, 5.3, 10.8, -2.1, 6.4, 7.6, 5.6],
2.3,
6.2,
4.7]
In addition to the properties and methods described in ak.layout.Content, a UnionArray has the following.
ak.layout.UnionArray.__init__¶
- ak.layout.UnionArray.__init__(tags, index, contents, identities=None, parameters=None)¶
ak.layout.UnionArray.content¶
- ak.layout.UnionArray.content(i)¶
Returns one of the contents
by index.
ak.layout.UnionArray.project¶
- ak.layout.UnionArray.project(i)¶
Returns an array of only one of the possibilities, like selecting
union_array[union_array.tags == i]
Note that this is different from the content(i)
method because this reindexes
to present the result in its logical order, not its physical order.
ak.layout.UnionArray.simplify¶
- ak.layout.UnionArray.simplify(mergebool=False)¶
If any of the contents
have ak.types.UnionType and/or any
of the contents
are
ak.layout.Content.mergeable,
they are combined to return the simplest possible node structure.
This method only operates one level deep.