ak.virtual

Defined in awkward.operations.structure on line 3893.

ak.virtual(generate, args=(), kwargs=None, form=None, length=None, cache='new', cache_key=None, parameters=None, highlevel=True, behavior=None)
Parameters
  • generate (callable) – Function that makes an array from args and kwargs.

  • args (tuple) – Positional arguments to pass to generate.

  • kwargs (dict) – Keyword arguments to pass to generate.

  • form (None, Form, or JSON) – If None, the layout of the generated array is unknown until it is generated, which might require it to be generated earlier than intended; if a Form, use this Form to predict the layout and verify that the generated array complies; if a JSON string, convert the JSON into a Form and use it.

  • length (None or int) – If None or negative, the length of the generated array is unknown until it is generated, which might require it to be generated earlier than intended; if a non-negative int, use this to predict the length and verify that the generated array complies.

  • cache (None, "new", or MutableMapping) – If “new”, a new dict (keep-forever cache) is created. If None, no cache is used.

  • cache_key (None or str) – If None, a unique string is generated for this virtual array for use with the cache (unique per Python process); otherwise, the explicitly provided key is used (which ought to ensure global uniqueness for the scope in which these arrays are used).

  • parameters (None or dict) – Parameters for the new ak.layout.VirtualArray node that is created by this operation.

  • highlevel (bool) – If True, return an ak.Array; otherwise, return a low-level ak.layout.Content subclass.

  • behavior (None or dict) – Custom ak.behavior for the output array, if high-level.

Creates a virtual array, an array that is created on demand.

For example, the following array is only created when we try to look at its values:

>>> def generate():
...     print("generating")
...     return ak.Array([[1.1, 2.2, 3.3], [], [4.4, 5.5]])
...
>>> array = ak.virtual(generate)
>>> array
generating
<Array [[1.1, 2.2, 3.3], [], [4.4, 5.5]] type='3 * var * float64'>

However, just about any kind of query about this array would cause it to be “materialized.”

>>> array = ak.virtual(generate)
>>> len(array)
generating
3
>>> array = ak.virtual(generate)
>>> ak.type(array)
generating
3 * var * float64
>>> array = ak.virtual(generate)
>>> array[2]
generating
<Array [4.4, 5.5] type='2 * float64'>

Since this “materialization” is probably some expensive disk-read, we want delay it as much as possible. This can be done by giving the ak.layout.VirtualArray more information, such as the length. Knowing the length makes it possible to slice a virtual array without materializing it.

>>> array = ak.virtual(generate)
>>> sliced = array[1:]   # don't know the length; have to materialize
generating
>>> array = ak.virtual(generate, length=3)
>>> sliced = array[1:]   # the length is known; no materialization
>>> len(sliced)          # even the length of the sliced array is known
2

However, anything that needs more information than just the length will cause the virtual array to be materialized.

>>> ak.type(sliced)
generating
2 * var * float64

To prevent this, we can give it detailed type information, the ak.forms.Form.

>>> array = ak.virtual(generate, length=3, form={
...     "class": "ListOffsetArray64",
...     "offsets": "i64",
...     "content": "float64"})
>>> sliced = array[1:]
>>> ak.type(sliced)
2 * var * float64

Of course, _at some point_ the array has to be materialized if we need any data values.

>>> selected = sliced[1]
generating
>>> selected
<Array [4.4, 5.5] type='2 * float64'>

Note that you can make arrays of records (ak.layout.RecordArray) in which a field is virtual.

>>> form = {
...     "class": "ListOffsetArray64",
...     "offsets": "i64",
...     "content": "float64"
... }
>>> records = ak.Array({
...     "x": ak.virtual(generate, length=3, form=form),
...     "y": [10, 20, 30]})

You can do simple field slicing without materializing the array.

>>> x = records.x
>>> y = records.y

But, of course, any operation that looks at values of that field are going to materialize it.

>>> x
generating
<Array [[1.1, 2.2, 3.3], [], [4.4, 5.5]] type='3 * var * float64'>
>>> y
<Array [10, 20, 30] type='3 * int64'>

The advantage is that you can make a table of data, most of which resides on disk, and only read the values you’re interested in. Like all Awkward Arrays, the table need not be rectangular.

If you’re going to try this trick with ak.zip, note that you need to set depth_limit=1 to avoid materializing the array when it’s constructed, since ak.zip (unlike the dict form of the ak.Array constructor) broadcasts its arguments together (hence the name “zip”).

>>> records = ak.zip({
...     "x": ak.virtual(generate, length=3, form=form),
...     "y": [10, 20, 30]})
generating
>>> records = ak.zip({
...     "x": ak.virtual(generate, length=3, form=form),
...     "y": [10, 20, 30]}, depth_limit=1)

Functions with a lazy option, such as ak.from_parquet and ak.from_buffers, construct ak.layout.RecordArray of ak.layout.VirtualArray in this way.

See also ak.materialized.