ak.layout.BitMaskedArray

Like ak.layout.ByteMaskedArray, BitMaskedArray implements an ak.types.OptionType with two arrays, mask and content. However, the boolean mask values are packed into a bitmap.

BitMaskedArray has an additional parameter, lsb_order; if True, the position of each bit is in Least-Significant Bit order (LSB):

is_valid[j] = bool(mask[j // 8] & (1 << (j % 8))) == valid_when

If False, the position of each bit is in Most-Significant Bit order (MSB):

is_valid[j] = bool(mask[j // 8] & (128 >> (j % 8))) == valid_when

Note that NumPy’s unpackbits function before version 1.17 has MSB order, but now it is configurable (bitorder="little" for LSB, bitorder="big" for MSB).

If the logical size of the array is not a multiple of 8, the mask has to be padded. Thus, an explicit length is also part of the class’s definition.

Below is a simplified implementation of a BitMaskedArray class in pure Python that exhaustively checks validity in its constructor (see ak.is_valid) and can generate random valid arrays. The random_number() function returns a random float and the random_length(minlen) function returns a random int that is at least minlen. The RawArray class represents simple, one-dimensional data.

class BitMaskedArray(Content):
    def __init__(self, mask, content, valid_when, length, lsb_order):
        assert isinstance(mask, np.ndarray)
        assert isinstance(content, Content)
        assert isinstance(valid_when, bool)
        assert isinstance(length, int) and length >= 0
        assert isinstance(lsb_order, bool)
        assert len(mask) <= len(content)
        self.mask = mask
        self.content = content
        self.valid_when = valid_when
        self.length = length
        self.lsb_order = lsb_order

    @staticmethod
    def random(minlen, choices):
        mask = []
        for i in range(random_length(minlen)):
            mask.append(bool(random.randint(0, 1)))
        lsb_order = bool(random.randint(0, 1))
        bitmask = np.packbits(np.array(mask, dtype=np.uint8),
                              bitorder=("little" if lsb_order else "big"))
        content = random.choice(choices).random(len(mask), choices)
        return BitMaskedArray(bitmask, content, bool(random.randint(0, 1)),
                              len(mask), lsb_order)

    def __len__(self):
        return self.length

    def __getitem__(self, where):
        if isinstance(where, int):
            assert 0 <= where < len(self)
            if self.lsb_order:
                bit = bool(self.mask[where // 8] & (1 << (where % 8)))
            else:
                bit = bool(self.mask[where // 8] & (128 >> (where % 8)))
            if bit == self.valid_when:
                return self.content[where]
            else:
                return None
        elif isinstance(where, slice) and where.step is None:
            # In general, slices must convert BitMaskedArray to ByteMaskedArray.
            bytemask = np.unpackbits(self.mask,
                           bitorder=("little" if self.lsb_order else "big")).view(bool)
            return ByteMaskedArray(bytemask[where.start:where.stop],
                                   self.content[where.start:where.stop],
                                   valid_when=self.valid_when)
        elif isinstance(where, str):
            return BitMaskedArray(self.mask,
                                  self.content[where],
                                  valid_when=self.valid_when,
                                  length=self.length,
                                  lsb_order=self.lsb_order)
        else:
            raise AssertionError(where)

    def __repr__(self):
        return ("BitMaskedArray(" + repr(self.mask) + ", " + repr(self.content)
                + ", " + repr(self.valid_when) + ", " + repr(self.length)
                + ", " + repr(self.lsb_order) + ")")

    def xml(self, indent="", pre="", post=""):
        out = indent + pre + "<BitMaskedArray>\n"
        out += indent + "    <valid_when>" + repr(self.valid_when) + "<valid_when>\n"
        out += indent + "    <length>" + repr(self.length) + "<length>\n"
        out += indent + "    <lsb_order>" + repr(self.lsb_order) + "<lsb_order>\n"
        out += indent + "    <mask>" + " ".join(str(x) for x in self.mask) + "</mask>\n"
        out += self.content.xml(indent + "    ", "<content>", "</content>\n")
        out += indent + "</BitMaskedArray>\n"
        return out

Here is an example:

BitMaskedArray(np.array([ 40, 173,  59, 104, 182, 116], dtype=np.uint8),
               RawArray([5.5, 6.6, 1.5, 3.2, 9.8, 0.4, 5.7, 1.5, 0.2, 6.1, 5.4, 4.3, 5.9,
                         10.1, -2.3, 5.8, 3.4, 5.6, 6.2, 8.8, 3.1, 7.0, 1.2, 7.3, 5.8, 8.3,
                         9.7, 5.2, 3.4, 5.8, 1.7, 4.3, 5.8, 1.2, 1.7, 3.6, 4.4, 9.7, 5.0,
                         4.3, 7.8, 6.1, 3.3, 7.9, 7.1, 6.5, -0.6, 8.2, 3.7, 4.6, 3.9, 7.5]),
               False,
               46,
               False)
<BitMaskedArray>
    <valid_when>False<valid_when>
    <length>46<length>
    <lsb_order>False<lsb_order>
    <mask>40 173 59 104 182 116</mask>
    <content><RawArray>
        <ptr>5.5 6.6 1.5 3.2 9.8 0.4 5.7 1.5 0.2 6.1 5.4 4.3 5.9 10.1 -2.3 5.8 3.4 5.6 6.2
             8.8 3.1 7.0 1.2 7.3 5.8 8.3 9.7 5.2 3.4 5.8 1.7 4.3 5.8 1.2 1.7 3.6 4.4 9.7 5.0
             4.3 7.8 6.1 3.3 7.9 7.1 6.5 -0.6 8.2 3.7 4.6 3.9 7.5</ptr>
    </RawArray></content>
</BitMaskedArray>

which represents the following logical data.

[5.5, 6.6, None, 3.2, None, 0.4, 5.7, 1.5, None, 6.1, None, 4.3, None, None, -2.3, None, 3.4,
 5.6, None, None, None, 7.0, None, None, 5.8, None, None, 5.2, None, 5.8, 1.7, 4.3, None,
 1.2, None, None, 4.4, None, None, 4.3, 7.8, None, None, None, 7.1, None]

This is equivalent to all of Apache Arrow’s array types because they all use bitmaps to mask their data, with valid_when=True and lsb_order=True.

In addition to the properties and methods described in ak.layout.Content, a BitMaskedArray has the following.

ak.layout.BitMaskedArray.__init__

ak.layout.BitMaskedArray.__init__(mask, content, valid_when, length, lsb_order, identities=None, parameters=None)

ak.layout.BitMaskedArray.mask

ak.layout.BitMaskedArray.mask

ak.layout.BitMaskedArray.content

ak.layout.BitMaskedArray.content

ak.layout.BitMaskedArray.valid_when

ak.layout.BitMaskedArray.valid_when

ak.layout.BitMaskedArray.lsb_order

ak.layout.BitMaskedArray.lsb_order

ak.layout.BitMaskedArray.project

ak.layout.BitMaskedArray.project(mask=None)

Returns a non-ak.types.OptionType array containing only the valid elements. If mask is a signed 8-bit ak.layout.Index in which 0 means valid and 1 means missing, this mask is unioned with the BitMaskedArray’s mask (after converting to 8-bit and to valid_when=False to match this mask).

ak.layout.BitMaskedArray.bytemask

ak.layout.BitMaskedArray.bytemask()

Returns an array of 8-bit values in which 0 means valid and 1 means missing.

ak.layout.BitMaskedArray.simplify

ak.layout.BitMaskedArray.simplify()

Combines this node with its content if the content also has ak.types.OptionType; otherwise, this is a pass-through. In all cases, the output has the same logical meaning as the input.

This method only operates one level deep.

ak.layout.BitMaskedArray.toByteMaskedArray

ak.layout.BitMaskedArray.toByteMaskedArray()

Converts to the equivalent ak.layout.ByteMaskedArray.

ak.layout.BitMaskedArray.toIndexedOptionArray

ak.layout.BitMaskedArray.toIndexedOptionArray()

Converts to the equivalent ak.layout.IndexedOptionArray.