Using the Nested-Dask nest Accessor

Using the Nested-Dask nest Accessor#

The nest accessor implements an additional API layer to support working with nested columns of a NestedFrame.

NOTE: The nest accessor in Nested-Dask has a limited implementation compared to Nested-Pandas

[1]:
from nested_dask.datasets import generate_data

# generate_data creates some toy data
ndf = generate_data(10, 5)  # 10 rows, 5 nested rows per row
ndf
[1]:
Nested-Dask NestedFrame Structure:
a b nested
npartitions=1
0 float64 float64 nested<t: [double], flux: [double], band: [string]>
9 ... ... ...
Dask Name: repartition, 3 expressions

The nest accessor is available when selecting a nested column of a NestedFrame. For example:

[2]:
ndf["nested"].nest
[2]:
<nested_dask.accessor.DaskNestSeriesAccessor at 0x7822f785c520>

Nested column labels can be viewed using the fields property:

[3]:
ndf["nested"].nest.fields
[3]:
['t', 'flux', 'band']

Nested data can be viewed in different formats using nest accessor functions.

to_flat will take the nested data and send it to a single flat DataFrame:

[4]:
flat_nested = ndf["nested"].nest.to_flat()
flat_nested
[4]:
Dask DataFrame Structure:
t flux band
npartitions=1
0 double[pyarrow] double[pyarrow] string[pyarrow]
9 ... ... ...
Dask Name: lambda, 5 expressions
[5]:
flat_nested.head(20)
[5]:
t flux band
0 13.711731 71.476915 g
0 19.646864 22.472936 r
0 4.086222 42.14007 r
0 9.380488 83.892451 r
0 3.606885 73.735441 r
1 6.662741 68.587997 g
1 0.863353 45.430948 r
1 9.931626 12.307377 r
1 3.341144 68.13682 r
1 2.764886 79.613518 r
2 19.416213 13.42851 r
2 2.152549 61.223836 r
2 16.9995 72.596087 r
2 1.230658 69.028779 g
2 18.759409 75.03683 g
3 18.743661 74.212016 g
3 16.573969 71.930643 g
3 6.595293 87.443438 r
3 19.51212 45.933739 r
3 2.412554 47.07309 g

The index of the resulting flat dataframe is repeated and maps directly to the index of the original NestedFrame.

Alternatively, to_lists can be used to package the data into numpy arrays:

[6]:
list_nested = ndf["nested"].nest.to_lists()
list_nested.compute()
[6]:
t flux band
0 [13.71173073 19.64686441 4.08622174 9.380487... [71.47691499 22.47293575 42.14007041 83.892451... ['g' 'r' 'r' 'r' 'r']
1 [6.66274146 0.86335343 9.93162611 3.34114423 2... [68.58799684 45.43094845 12.30737719 68.136820... ['g' 'r' 'r' 'r' 'r']
2 [19.4162129 2.15254879 16.99949958 1.230658... [13.42851021 61.22383622 72.59608673 69.028779... ['r' 'r' 'r' 'g' 'g']
3 [18.74366091 16.57396943 6.59529319 19.512119... [74.21201647 71.93064291 87.44343816 45.933738... ['g' 'g' 'r' 'r' 'g']
4 [ 4.31522926 11.4071466 13.41783397 17.352829... [35.061291 25.25394453 8.82945313 86.934486... ['r' 'g' 'r' 'r' 'g']
5 [12.95368282 19.02354827 9.21831458 17.645428... [34.57259548 56.05801588 48.53754425 42.677585... ['r' 'g' 'g' 'g' 'r']
6 [ 0.30429609 7.5430631 4.70260335 18.441218... [25.5503642 7.58175325 78.04896612 48.944451... ['g' 'g' 'g' 'r' 'r']
7 [ 9.52754983 17.38793206 9.81360713 8.433635... [64.2084655 27.5351413 8.26659244 65.406996... ['g' 'g' 'r' 'g' 'r']
8 [7.91438537 9.01064832 0.4526323 0.07085433 2... [ 4.30168316 52.11574135 66.21503827 91.898719... ['g' 'g' 'g' 'r' 'g']
9 [17.56579535 18.77163653 8.42277394 9.179682... [27.61609175 69.71224732 34.93181629 98.215357... ['r' 'r' 'r' 'r' 'g']