Using the Nested-Dask nest Accessor#
The nest accessor implements an additional API layer to support working with nested columns of a NestedFrame.
NOTE: The nest accessor in Nested-Dask has a limited implementation compared to Nested-Pandas
[1]:
from nested_dask.datasets import generate_data
# generate_data creates some toy data
ndf = generate_data(10, 5) # 10 rows, 5 nested rows per row
ndf
[1]:
Nested-Dask NestedFrame Structure:
| a | b | nested | |
|---|---|---|---|
| npartitions=1 | |||
| 0 | float64 | float64 | nested<t: [double], flux: [double], band: [string]> |
| 9 | ... | ... | ... |
Dask Name: repartition, 3 expressions
The nest accessor is available when selecting a nested column of a NestedFrame. For example:
[2]:
ndf["nested"].nest
[2]:
<nested_dask.accessor.DaskNestSeriesAccessor at 0x7822f785c520>
Nested column labels can be viewed using the fields property:
[3]:
ndf["nested"].nest.fields
[3]:
['t', 'flux', 'band']
Nested data can be viewed in different formats using nest accessor functions.
to_flat will take the nested data and send it to a single flat DataFrame:
[4]:
flat_nested = ndf["nested"].nest.to_flat()
flat_nested
[4]:
Dask DataFrame Structure:
| t | flux | band | |
|---|---|---|---|
| npartitions=1 | |||
| 0 | double[pyarrow] | double[pyarrow] | string[pyarrow] |
| 9 | ... | ... | ... |
Dask Name: lambda, 5 expressions
[5]:
flat_nested.head(20)
[5]:
| t | flux | band | |
|---|---|---|---|
| 0 | 13.711731 | 71.476915 | g |
| 0 | 19.646864 | 22.472936 | r |
| 0 | 4.086222 | 42.14007 | r |
| 0 | 9.380488 | 83.892451 | r |
| 0 | 3.606885 | 73.735441 | r |
| 1 | 6.662741 | 68.587997 | g |
| 1 | 0.863353 | 45.430948 | r |
| 1 | 9.931626 | 12.307377 | r |
| 1 | 3.341144 | 68.13682 | r |
| 1 | 2.764886 | 79.613518 | r |
| 2 | 19.416213 | 13.42851 | r |
| 2 | 2.152549 | 61.223836 | r |
| 2 | 16.9995 | 72.596087 | r |
| 2 | 1.230658 | 69.028779 | g |
| 2 | 18.759409 | 75.03683 | g |
| 3 | 18.743661 | 74.212016 | g |
| 3 | 16.573969 | 71.930643 | g |
| 3 | 6.595293 | 87.443438 | r |
| 3 | 19.51212 | 45.933739 | r |
| 3 | 2.412554 | 47.07309 | g |
The index of the resulting flat dataframe is repeated and maps directly to the index of the original NestedFrame.
Alternatively, to_lists can be used to package the data into numpy arrays:
[6]:
list_nested = ndf["nested"].nest.to_lists()
list_nested.compute()
[6]:
| t | flux | band | |
|---|---|---|---|
| 0 | [13.71173073 19.64686441 4.08622174 9.380487... | [71.47691499 22.47293575 42.14007041 83.892451... | ['g' 'r' 'r' 'r' 'r'] |
| 1 | [6.66274146 0.86335343 9.93162611 3.34114423 2... | [68.58799684 45.43094845 12.30737719 68.136820... | ['g' 'r' 'r' 'r' 'r'] |
| 2 | [19.4162129 2.15254879 16.99949958 1.230658... | [13.42851021 61.22383622 72.59608673 69.028779... | ['r' 'r' 'r' 'g' 'g'] |
| 3 | [18.74366091 16.57396943 6.59529319 19.512119... | [74.21201647 71.93064291 87.44343816 45.933738... | ['g' 'g' 'r' 'r' 'g'] |
| 4 | [ 4.31522926 11.4071466 13.41783397 17.352829... | [35.061291 25.25394453 8.82945313 86.934486... | ['r' 'g' 'r' 'r' 'g'] |
| 5 | [12.95368282 19.02354827 9.21831458 17.645428... | [34.57259548 56.05801588 48.53754425 42.677585... | ['r' 'g' 'g' 'g' 'r'] |
| 6 | [ 0.30429609 7.5430631 4.70260335 18.441218... | [25.5503642 7.58175325 78.04896612 48.944451... | ['g' 'g' 'g' 'r' 'r'] |
| 7 | [ 9.52754983 17.38793206 9.81360713 8.433635... | [64.2084655 27.5351413 8.26659244 65.406996... | ['g' 'g' 'r' 'g' 'r'] |
| 8 | [7.91438537 9.01064832 0.4526323 0.07085433 2... | [ 4.30168316 52.11574135 66.21503827 91.898719... | ['g' 'g' 'g' 'r' 'g'] |
| 9 | [17.56579535 18.77163653 8.42277394 9.179682... | [27.61609175 69.71224732 34.93181629 98.215357... | ['r' 'r' 'r' 'r' 'g'] |