How to concatenate and interleave arrays#

import awkward as ak
import numpy as np
import pandas as pd

Simple concatenation#

ak.concatenate() is an analog of np.concatenate (in fact, you can use np.concatenate where you mean ak.concatenate()). However, it applies to data of arbitrary data structures:

array1 = ak.Array([
    [{"x": 1.1, "y": [1]}, {"x": 2.2, "y": [1, 2]}, {"x": 3.3, "y": [1, 2, 3]}],
    [],
    [{"x": 4.4, "y": [1, 2, 3, 4]}, {"x": 5.5, "y": [1, 2, 3, 4, 5]}],
])
array2 = ak.Array([
    [{"x": 6.6, "y": [1, 2, 3, 4, 5, 6]}],
    [{"x": 7.7, "y": [1, 2, 3, 4, 5, 6, 7]}],
])
ak.concatenate([array1, array2])
[[{x: 1.1, y: [1]}, {x: 2.2, y: [...]}, {x: 3.3, y: [1, 2, 3]}],
 [],
 [{x: 4.4, y: [1, 2, 3, 4]}, {x: 5.5, y: [1, ..., 5]}],
 [{x: 6.6, y: [1, 2, 3, 4, 5, 6]}],
 [{x: 7.7, y: [1, 2, 3, 4, 5, 6, 7]}]]
----------------------------------------------------------------
type: 5 * var * {
    x: float64,
    y: var * int64
}

The arrays can even have different data types, in which case the output has union-type.

array3 = ak.Array([{"z": None}, {"z": 0}, {"z": 123}])
ak.concatenate([array1, array2, array3])
[[{x: 1.1, y: [1]}, {x: 2.2, y: [...]}, {x: 3.3, y: [1, 2, 3]}],
 [],
 [{x: 4.4, y: [1, 2, 3, 4]}, {x: 5.5, y: [1, ..., 5]}],
 [{x: 6.6, y: [1, 2, 3, 4, 5, 6]}],
 [{x: 7.7, y: [1, 2, 3, 4, 5, 6, 7]}],
 {z: None},
 {z: 0},
 {z: 123}]
----------------------------------------------------------------
type: 8 * union[
    var * {
        x: float64,
        y: var * int64
    },
    {
        z: ?int64
    }
]

Keep in mind, however, that some operations can’t deal with union-types (heterogeneous data), so you might want to avoid this.

Interleaving lists with axis > 0#

The default axis=0 returns an array whose length is equal to the sum of the lengths of the input arrays.

Other axis values combine lists within the arrays, as long as the arrays have the same lengths.

array1 = ak.Array([[1.1, 2.2, 3.3], [], [4.4, 5.5]])
array2 = ak.Array([[10, 20], [30], [40, 50, 60, 70]])
len(array1), len(array2)
(3, 3)
ak.concatenate([array1, array2], axis=1)
[[1.1, 2.2, 3.3, 10, 20],
 [30],
 [4.4, 5.5, 40, 50, 60, 70]]
----------------------------
type: 3 * var * float64

This can be used in some non-trivial ways: sometimes a problem that doesn’t seem to have anything to do with concatenation can be solved this way.

For instance, suppose that you have to pad some lists so that they start and stop with 0 (for some window-averaging procedure, perhaps). You can make the pad as a new array:

pad = np.zeros(len(array1))[:, np.newaxis]
pad
array([[0.],
       [0.],
       [0.]])

and concatenate it with axis=1 to get the desired effect:

ak.concatenate([pad, array1, pad], axis=1)
[[0, 1.1, 2.2, 3.3, 0],
 [0, 0],
 [0, 4.4, 5.5, 0]]
-----------------------
type: 3 * var * float64

Or similarly, to double the first value and double the last value (without affecting empty lists):

ak.concatenate([array1[:, :1], array1, array1[:, -1:]], axis=1)
[[1.1, 1.1, 2.2, 3.3, 3.3],
 [],
 [4.4, 4.4, 5.5, 5.5]]
---------------------------
type: 3 * var * float64

The same applies for more deeply nested lists and axis > 1. Remember that axis=-1 starts counting from the innermost dimension, outward.

Emulating NumPy’s “stack” functions#

np.stack, np.hstack, np.vstack, and np.dstack are concatenations with np.newaxis (reshaping to add a dimension of length 1).

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
np.stack([a, b])
array([[1, 2, 3],
       [4, 5, 6]])
np.concatenate([a[np.newaxis], b[np.newaxis]], axis=0)
array([[1, 2, 3],
       [4, 5, 6]])
np.stack([a, b], axis=1)
array([[1, 4],
       [2, 5],
       [3, 6]])
np.concatenate([a[:, np.newaxis], b[:, np.newaxis]], axis=1)
array([[1, 4],
       [2, 5],
       [3, 6]])

Since ak.concatenate() has the same interface as np.concatenate and Awkward Arrays can also be sliced with np.newaxis, they can be stacked the same way, with the addition of arbitrary data structures.

a = ak.Array([[1], [1, 2], [1, 2, 3]])
b = ak.Array([[4], [4, 5], [4, 5, 6]])
ak.concatenate([a[np.newaxis], b[np.newaxis]], axis=0)
[[[1], [1, 2], [1, 2, 3]],
 [[4], [4, 5], [4, 5, 6]]]
--------------------------
type: 2 * 3 * var * int64
ak.concatenate([a[:, np.newaxis], b[:, np.newaxis]], axis=1)
[[[1], [4]],
 [[1, 2], [4, 5]],
 [[1, 2, 3], [4, 5, 6]]]
-------------------------
type: 3 * 2 * var * int64

Differences from Pandas#

Concatenation in Awkward Array combines arrays lengthwise: by adding the lengths of the arrays or adding the lengths of lists within an array. It does not refer to adding fields to a record (that is, “adding columns to a table”). To add fields to a record, see ak.zip() or ak.Array.__setitem__() in how to zip/unzip and project and how to add fields. This is important to note because pandas.concat does both, depending on its axis argument (and there’s no equivalent in NumPy).

Here’s a table-like example of concatenation in Awkward Array:

array1 = ak.Array({"column": [[1, 2, 3], [], [4, 5]]})
array2 = ak.Array({"column": [[1.1, 2.2, 3.3], [], [4.4, 5.5]]})
array1
[{column: [1, 2, 3]},
 {column: []},
 {column: [4, 5]}]
-----------------------
type: 3 * {
    column: var * int64
}
array2
[{column: [1.1, 2.2, 3.3]},
 {column: []},
 {column: [4.4, 5.5]}]
---------------------------
type: 3 * {
    column: var * float64
}
ak.concatenate([array1, array2], axis=0)
[{column: [1, 2, 3]},
 {column: []},
 {column: [4, 5]},
 {column: [1.1, 2.2, 3.3]},
 {column: []},
 {column: [4.4, 5.5]}]
---------------------------
type: 6 * {
    column: var * float64
}

This is like Pandas for axis=0,

df1 = pd.DataFrame({"column": [[1, 2, 3], [], [4, 5]]})
df2 = pd.DataFrame({"column": [[1.1, 2.2, 3.3], [], [4.4, 5.5]]})
df1
column
0 [1, 2, 3]
1 []
2 [4, 5]
df2
column
0 [1.1, 2.2, 3.3]
1 []
2 [4.4, 5.5]
pd.concat([df1, df2], axis=0)
column
0 [1, 2, 3]
1 []
2 [4, 5]
0 [1.1, 2.2, 3.3]
1 []
2 [4.4, 5.5]

But for axis=1, they’re quite different:

ak.concatenate([array1, array2], axis=1)
[{column: [1, 2, 3, 1.1, 2.2, 3.3]},
 {column: []},
 {column: [4, 5, 4.4, 5.5]}]
------------------------------------
type: 3 * {
    column: var * float64
}
pd.concat([df1, df2], axis=1)
column column
0 [1, 2, 3] [1.1, 2.2, 3.3]
1 [] []
2 [4, 5] [4.4, 5.5]

ak.concatenate() accepts any axis less than the number of dimensions in the arrays, but Pandas has only two choices, axis=0 and axis=1.

Fields (“columns”) of an Awkward Array are unrelated to array dimensions. If you want what pandas.concat does with axis=1, you would use ak.zip():

ak.zip({"column1": array1.column, "column2": array2.column}, depth_limit=1)
[{column1: [1, 2, 3], column2: [1.1, 2.2, 3.3]},
 {column1: [], column2: []},
 {column1: [4, 5], column2: [4.4, 5.5]}]
------------------------------------------------
type: 3 * {
    column1: var * int64,
    column2: var * float64
}

The depth_limit prevents ak.zip() from interleaving the lists further:

ak.zip({"column1": array1.column, "column2": array2.column})
[[{column1: 1, column2: 1.1}, {...}, {column1: 3, column2: 3.3}],
 [],
 [{column1: 4, column2: 4.4}, {column1: 5, column2: 5.5}]]
-----------------------------------------------------------------
type: 3 * var * {
    column1: int64,
    column2: float64
}

which Pandas doesn’t do because lists in Pandas cells are Python objects that it doesn’t modify.