Using the Nested-Dask nest Accessor#
The nest accessor implements an additional API layer to support working with nested columns of a NestedFrame.
NOTE: The nest accessor in Nested-Dask has a limited implementation compared to Nested-Pandas
[1]:
from nested_dask.datasets import generate_data
# generate_data creates some toy data
ndf = generate_data(10, 5) # 10 rows, 5 nested rows per row
ndf
[1]:
Nested-Dask NestedFrame Structure:
| a | b | nested | |
|---|---|---|---|
| npartitions=1 | |||
| 0 | float64 | float64 | nested<t: [double], flux: [double], band: [string]> |
| 9 | ... | ... | ... |
Dask Name: repartition, 3 expressions
The nest accessor is available when selecting a nested column of a NestedFrame. For example:
[2]:
ndf["nested"].nest
[2]:
<nested_dask.accessor.DaskNestSeriesAccessor at 0x77bd546e1420>
Nested column labels can be viewed using the fields property:
[3]:
ndf["nested"].nest.fields
[3]:
['t', 'flux', 'band']
Nested data can be viewed in different formats using nest accessor functions.
to_flat will take the nested data and send it to a single flat DataFrame:
[4]:
flat_nested = ndf["nested"].nest.to_flat()
flat_nested
[4]:
Dask DataFrame Structure:
| t | flux | band | |
|---|---|---|---|
| npartitions=1 | |||
| 0 | double[pyarrow] | double[pyarrow] | string[pyarrow] |
| 9 | ... | ... | ... |
Dask Name: lambda, 5 expressions
[5]:
flat_nested.head(20)
[5]:
| t | flux | band | |
|---|---|---|---|
| 0 | 1.325903 | 61.153573 | r |
| 0 | 6.033141 | 72.901042 | r |
| 0 | 12.220836 | 43.097841 | r |
| 0 | 0.091875 | 38.133159 | r |
| 0 | 4.400336 | 61.888155 | r |
| 1 | 16.782632 | 1.105326 | g |
| 1 | 17.605098 | 14.738311 | r |
| 1 | 1.457988 | 49.323012 | r |
| 1 | 10.796511 | 16.801297 | g |
| 1 | 13.16986 | 68.882152 | r |
| 2 | 8.908296 | 42.003535 | g |
| 2 | 5.024717 | 51.351708 | g |
| 2 | 4.938491 | 40.938804 | g |
| 2 | 16.43555 | 37.173199 | g |
| 2 | 12.318743 | 72.938537 | r |
| 3 | 10.552214 | 20.061313 | g |
| 3 | 19.585625 | 44.054317 | r |
| 3 | 13.707732 | 3.786506 | g |
| 3 | 16.863835 | 21.510408 | g |
| 3 | 1.644076 | 92.095755 | g |
The index of the resulting flat dataframe is repeated and maps directly to the index of the original NestedFrame.
Alternatively, to_lists can be used to package the data into numpy arrays:
[6]:
list_nested = ndf["nested"].nest.to_lists()
list_nested.compute()
[6]:
| t | flux | band | |
|---|---|---|---|
| 0 | [ 1.32590268 6.03314089 12.22083592 0.091874... | [61.15357327 72.9010419 43.09784106 38.133159... | ['r' 'r' 'r' 'r' 'r'] |
| 1 | [16.78263201 17.60509845 1.45798796 10.796510... | [ 1.10532635 14.73831052 49.32301171 16.801296... | ['g' 'r' 'r' 'g' 'r'] |
| 2 | [ 8.90829613 5.02471726 4.93849145 16.435550... | [42.00353481 51.35170808 40.93880372 37.173198... | ['g' 'g' 'g' 'g' 'r'] |
| 3 | [10.55221447 19.58562539 13.70773219 16.863835... | [20.06131347 44.05431747 3.78650584 21.510407... | ['g' 'r' 'g' 'g' 'g'] |
| 4 | [14.33024211 9.54222321 4.15897558 9.475802... | [64.34397867 72.18977285 87.4422239 41.103670... | ['r' 'g' 'r' 'r' 'g'] |
| 5 | [10.15234846 14.96664897 2.81065012 13.992988... | [37.81706109 38.28169835 1.1854475 88.244109... | ['r' 'r' 'g' 'r' 'g'] |
| 6 | [10.12681637 5.03381176 11.52850486 6.838267... | [44.87793133 24.3005897 42.6610081 23.490047... | ['r' 'r' 'r' 'g' 'g'] |
| 7 | [7.57029852 9.57362512 5.9748561 0.14549093 9... | [28.17126776 86.51496141 88.49730501 23.257712... | ['r' 'g' 'g' 'g' 'g'] |
| 8 | [ 9.25392718 7.50260614 12.75698421 8.255835... | [ 0.68343356 67.82475979 11.49455173 45.731410... | ['g' 'r' 'r' 'r' 'r'] |
| 9 | [14.14808928 7.02496436 7.50543544 11.957956... | [31.83960187 97.7125154 58.38161137 20.607829... | ['r' 'r' 'r' 'r' 'g'] |