Using the Nested-Dask `nest` Accessor#

The nest accessor implements an additional API layer to support working with nested columns of a NestedFrame.

NOTE: The nest accessor in Nested-Dask has a limited implementation compared to Nested-Pandas

[1]:

from nested_dask.datasets import generate_data

# generate_data creates some toy data
ndf = generate_data(10, 5)  # 10 rows, 5 nested rows per row
ndf

[1]:

Nested-Dask NestedFrame Structure:

	a	b	nested
npartitions=1
0	float64	float64	nested<t: [double], flux: [double], band: [string]>
9	...	...	...

Dask Name: repartition, 3 expressions

The nest accessor is available when selecting a nested column of a NestedFrame. For example:

[2]:

ndf["nested"].nest

[2]:

<nested_dask.accessor.DaskNestSeriesAccessor at 0x7822f785c520>

Nested column labels can be viewed using the fields property:

[3]:

ndf["nested"].nest.fields

[3]:

['t', 'flux', 'band']

Nested data can be viewed in different formats using nest accessor functions.

to_flat will take the nested data and send it to a single flat DataFrame:

[4]:

flat_nested = ndf["nested"].nest.to_flat()
flat_nested

[4]:

Dask DataFrame Structure:

	t	flux	band
npartitions=1
0	double[pyarrow]	double[pyarrow]	string[pyarrow]
9	...	...	...

Dask Name: lambda, 5 expressions

[5]:

flat_nested.head(20)

[5]:

	t	flux	band
0	13.711731	71.476915	g
0	19.646864	22.472936	r
0	4.086222	42.14007	r
0	9.380488	83.892451	r
0	3.606885	73.735441	r
1	6.662741	68.587997	g
1	0.863353	45.430948	r
1	9.931626	12.307377	r
1	3.341144	68.13682	r
1	2.764886	79.613518	r
2	19.416213	13.42851	r
2	2.152549	61.223836	r
2	16.9995	72.596087	r
2	1.230658	69.028779	g
2	18.759409	75.03683	g
3	18.743661	74.212016	g
3	16.573969	71.930643	g
3	6.595293	87.443438	r
3	19.51212	45.933739	r
3	2.412554	47.07309	g

The index of the resulting flat dataframe is repeated and maps directly to the index of the original NestedFrame.

Alternatively, to_lists can be used to package the data into numpy arrays:

[6]:

list_nested = ndf["nested"].nest.to_lists()
list_nested.compute()

[6]:

	t	flux	band
0	[13.71173073 19.64686441 4.08622174 9.380487...	[71.47691499 22.47293575 42.14007041 83.892451...	['g' 'r' 'r' 'r' 'r']
1	[6.66274146 0.86335343 9.93162611 3.34114423 2...	[68.58799684 45.43094845 12.30737719 68.136820...	['g' 'r' 'r' 'r' 'r']
2	[19.4162129 2.15254879 16.99949958 1.230658...	[13.42851021 61.22383622 72.59608673 69.028779...	['r' 'r' 'r' 'g' 'g']
3	[18.74366091 16.57396943 6.59529319 19.512119...	[74.21201647 71.93064291 87.44343816 45.933738...	['g' 'g' 'r' 'r' 'g']
4	[ 4.31522926 11.4071466 13.41783397 17.352829...	[35.061291 25.25394453 8.82945313 86.934486...	['r' 'g' 'r' 'r' 'g']
5	[12.95368282 19.02354827 9.21831458 17.645428...	[34.57259548 56.05801588 48.53754425 42.677585...	['r' 'g' 'g' 'g' 'r']
6	[ 0.30429609 7.5430631 4.70260335 18.441218...	[25.5503642 7.58175325 78.04896612 48.944451...	['g' 'g' 'g' 'r' 'r']
7	[ 9.52754983 17.38793206 9.81360713 8.433635...	[64.2084655 27.5351413 8.26659244 65.406996...	['g' 'g' 'r' 'g' 'r']
8	[7.91438537 9.01064832 0.4526323 0.07085433 2...	[ 4.30168316 52.11574135 66.21503827 91.898719...	['g' 'g' 'g' 'r' 'g']
9	[17.56579535 18.77163653 8.42277394 9.179682...	[27.61609175 69.71224732 34.93181629 98.215357...	['r' 'r' 'r' 'r' 'g']

Using the Nested-Dask nest Accessor

Using the Nested-Dask nest Accessor#

Using the Nested-Dask `nest` Accessor#