Using the Nested-Dask `nest` Accessor#

The nest accessor implements an additional API layer to support working with nested columns of a NestedFrame.

NOTE: The nest accessor in Nested-Dask has a limited implementation compared to Nested-Pandas

[1]:

from nested_dask.datasets import generate_data

# generate_data creates some toy data
ndf = generate_data(10, 5)  # 10 rows, 5 nested rows per row
ndf

[1]:

Nested-Dask NestedFrame Structure:

	a	b	nested
npartitions=1
0	float64	float64	nested<t: [double], flux: [double], band: [string]>
9	...	...	...

Dask Name: repartition, 3 expressions

The nest accessor is available when selecting a nested column of a NestedFrame. For example:

[2]:

ndf["nested"].nest

[2]:

<nested_dask.accessor.DaskNestSeriesAccessor at 0x77bd546e1420>

Nested column labels can be viewed using the fields property:

[3]:

ndf["nested"].nest.fields

[3]:

['t', 'flux', 'band']

Nested data can be viewed in different formats using nest accessor functions.

to_flat will take the nested data and send it to a single flat DataFrame:

[4]:

flat_nested = ndf["nested"].nest.to_flat()
flat_nested

[4]:

Dask DataFrame Structure:

	t	flux	band
npartitions=1
0	double[pyarrow]	double[pyarrow]	string[pyarrow]
9	...	...	...

Dask Name: lambda, 5 expressions

[5]:

flat_nested.head(20)

[5]:

	t	flux	band
0	1.325903	61.153573	r
0	6.033141	72.901042	r
0	12.220836	43.097841	r
0	0.091875	38.133159	r
0	4.400336	61.888155	r
1	16.782632	1.105326	g
1	17.605098	14.738311	r
1	1.457988	49.323012	r
1	10.796511	16.801297	g
1	13.16986	68.882152	r
2	8.908296	42.003535	g
2	5.024717	51.351708	g
2	4.938491	40.938804	g
2	16.43555	37.173199	g
2	12.318743	72.938537	r
3	10.552214	20.061313	g
3	19.585625	44.054317	r
3	13.707732	3.786506	g
3	16.863835	21.510408	g
3	1.644076	92.095755	g

The index of the resulting flat dataframe is repeated and maps directly to the index of the original NestedFrame.

Alternatively, to_lists can be used to package the data into numpy arrays:

[6]:

list_nested = ndf["nested"].nest.to_lists()
list_nested.compute()

[6]:

	t	flux	band
0	[ 1.32590268 6.03314089 12.22083592 0.091874...	[61.15357327 72.9010419 43.09784106 38.133159...	['r' 'r' 'r' 'r' 'r']
1	[16.78263201 17.60509845 1.45798796 10.796510...	[ 1.10532635 14.73831052 49.32301171 16.801296...	['g' 'r' 'r' 'g' 'r']
2	[ 8.90829613 5.02471726 4.93849145 16.435550...	[42.00353481 51.35170808 40.93880372 37.173198...	['g' 'g' 'g' 'g' 'r']
3	[10.55221447 19.58562539 13.70773219 16.863835...	[20.06131347 44.05431747 3.78650584 21.510407...	['g' 'r' 'g' 'g' 'g']
4	[14.33024211 9.54222321 4.15897558 9.475802...	[64.34397867 72.18977285 87.4422239 41.103670...	['r' 'g' 'r' 'r' 'g']
5	[10.15234846 14.96664897 2.81065012 13.992988...	[37.81706109 38.28169835 1.1854475 88.244109...	['r' 'r' 'g' 'r' 'g']
6	[10.12681637 5.03381176 11.52850486 6.838267...	[44.87793133 24.3005897 42.6610081 23.490047...	['r' 'r' 'r' 'g' 'g']
7	[7.57029852 9.57362512 5.9748561 0.14549093 9...	[28.17126776 86.51496141 88.49730501 23.257712...	['r' 'g' 'g' 'g' 'g']
8	[ 9.25392718 7.50260614 12.75698421 8.255835...	[ 0.68343356 67.82475979 11.49455173 45.731410...	['g' 'r' 'r' 'r' 'r']
9	[14.14808928 7.02496436 7.50543544 11.957956...	[31.83960187 97.7125154 58.38161137 20.607829...	['r' 'r' 'r' 'r' 'g']

Using the Nested-Dask nest Accessor

Using the Nested-Dask nest Accessor#

Using the Nested-Dask `nest` Accessor#