ak.layout.ListOffsetArray

The ListOffsetArray concept is implemented in 3 specialized classes:

  • ak.layout.ListOffsetArray64: offsets values are 64-bit signed integers.

  • ak.layout.ListOffsetArray32: offsets values are 32-bit signed integers.

  • ak.layout.ListOffsetArrayU32: offsets values are 32-bit unsigned integers.

The ListOffsetArray class describes unequal-length lists (often called a “jagged” or “ragged” array). Like ak.layout.RegularArray, the underlying data for all lists are in a contiguous content. It is subdivided into lists according to an offsets array, which specifies the starting and stopping index of each list.

The offsets must have at least length 1 (corresponding to an empty array), but it need not start with 0 or include all of the content. Just as ak.layout.RegularArray can have unreachable content if it is not an integer multiple of size, a ListOffsetArray can have unreachable content before the first list and after the last list.

ListOffsetArray corresponds to Apache Arrow List type.

Below is a simplified implementation of a ListOffsetArray class in pure Python that exhaustively checks validity in its constructor (see ak.is_valid) and can generate random valid arrays. The random_number() function returns a random float and the random_length(minlen) function returns a random int that is at least minlen. The RawArray class represents simple, one-dimensional data.

class ListOffsetArray(Content):
    def __init__(self, offsets, content):
        assert isinstance(offsets, list)
        assert isinstance(content, Content)
        assert len(offsets) != 0
        for i in range(len(offsets) - 1):
            start = offsets[i]
            stop = offsets[i + 1]
            assert isinstance(start, int)
            assert isinstance(stop, int)
            if start != stop:
                assert start < stop   # i.e. start <= stop
                assert start >= 0
                assert stop <= len(content)
        self.offsets = offsets
        self.content = content

    @staticmethod
    def random(minlen, choices):
        counts = [random_length() for i in range(random_length(minlen))]
        offsets = [random_length()]
        for x in counts:
            offsets.append(offsets[-1] + x)
        return ListOffsetArray(offsets, random.choice(choices).random(offsets[-1], choices))

    def __len__(self):
        return len(self.offsets) - 1

    def __getitem__(self, where):
        if isinstance(where, int):
            assert 0 <= where < len(self)
            return self.content[self.offsets[where]:self.offsets[where + 1]]
        elif isinstance(where, slice) and where.step is None:
            offsets = self.offsets[where.start : where.stop + 1]
            if len(offsets) == 0:
                offsets = [0]
            return ListOffsetArray(offsets, self.content)
        elif isinstance(where, str):
            return ListOffsetArray(self.offsets, self.content[where])
        else:
            raise AssertionError(where)

    def __repr__(self):
        return "ListOffsetArray(" + repr(self.offsets) + ", " + repr(self.content) + ")"

    def xml(self, indent="", pre="", post=""):
        out = indent + pre + "<ListOffsetArray>\n"
        out += indent + "    <offsets>" + " ".join(str(x) for x in self.offsets)
        out += "</offsets>\n"
        out += self.content.xml(indent + "    ", "<content>", "</content>\n")
        out += indent + "</ListOffsetArray>" + post
        return out

Here is an example:

ListOffsetArray([0, 2, 4, 11, 19],
                RawArray([5.9, 3.5, 2.2, 5.8, 7.4, 3.4, 2.7, 7.2, 6.6, 8.6, 8.2, 5.5, 3.8,
                          3.0, 8.4, 5.1, 1.2, -0.9, 3.7, 4.2, 0.8, 9.5, 4.0, 4.2, 4.2]))
<ListOffsetArray>
    <offsets>0 2 4 11 19</offsets>
    <content><RawArray>
        <ptr>5.9 3.5 2.2 5.8 7.4 3.4 2.7 7.2 6.6 8.6 8.2 5.5 3.8 3.0 8.4 5.1 1.2 -0.9 3.7
             4.2 0.8 9.5 4.0 4.2 4.2</ptr>
    </RawArray></content>
</ListOffsetArray>

which represents the following logical data.

[[5.9, 3.5],
 [2.2, 5.8],
 [7.4, 3.4, 2.7, 7.2, 6.6, 8.6, 8.2],
 [5.5, 3.8, 3.0, 8.4, 5.1, 1.2, -0.9, 3.7]]

In addition to the properties and methods described in ak.layout.Content, a ListOffsetArray has the following.

ak.layout.ListOffsetArray.__init__

ak.layout.ListOffsetArray.__init__(offsets, content, identities=None, parameters=None)

ak.layout.ListOffsetArray.offsets

ak.layout.ListOffsetArray.offsets

ak.layout.ListOffsetArray.content

ak.layout.ListOffsetArray.content

ak.layout.ListOffsetArray.starts

ak.layout.ListOffsetArray.starts

Derives starts as a view of offsets:

starts = offsets[:-1]

ak.layout.ListOffsetArray.stops

ak.layout.ListOffsetArray.stops

Derives stops as a view of offsets:

stops = offsets[1:]

ak.layout.ListOffsetArray.compact_offsets64

ak.layout.ListOffsetArray.compact_offsets64(start_at_zero=True)

Returns a 64-bit ak.layout.Index of offsets that represent the same lengths of this array’s offsets. If this offsets[0] == 0 or not start_at_zero, the return value is a view of this array’s offsets.

ak.layout.ListOffsetArray.broadcast_tooffsets64

ak.layout.ListOffsetArray.broadcast_tooffsets64(offsets)

Shifts contents to match a given set of offsets (if possible) and returns a ak.layout.ListOffsetArray with the results. This is used in broadcasting because a set of ak.types.ListType and ak.types.RegularType arrays have to be reordered to a common offsets before they can be directly operated upon.

ak.layout.ListOffsetArray.toRegularArray

ak.layout.ListOffsetArray.toRegularArray()

Converts this ak.types.ListType into a ak.types.RegularType array if possible.

ak.layout.ListOffsetArray.simplify

ak.layout.ListOffsetArray.simplify()

Pass-through; returns the original array.