Quickstart

Quickstart#

With a valid Python environment, nested-dask and its dependencies are easy to install using the pip package manager. The following command can be used to install it:

[1]:
# % pip install nested-dask

Nested-Dask is a package that enables parallelized computation of nested associated datasets.

Usage of Nested-Dask very closely follows the usage of Nested-Pandas, but with a layer of Dask concepts introduced on top. This quickstart guide will step through a basic example that mirrors the quickstart guide of Nested-Pandas. First, let’s load some toy data:

[2]:
from nested_dask.datasets import generate_data

# generate_data creates some toy data
ndf = generate_data(10, 100)  # 10 rows, 100 nested rows per row
ndf
[2]:
Nested-Dask NestedFrame Structure:
a b nested
npartitions=1
0 float64 float64 nested<t: [double], flux: [double], band: [string]>
9 ... ... ...
Dask Name: repartition, 3 expressions

The above is a Nested-Dask NestedFrame object. It’s currently a “lazy” representation of the data, meaning that no data has actually been brought into memory yet. This lazy view gives us some useful information on the structure of the data, with notable pieces of information being:

  • Shows us which columns are in the dataset and their respective dtypes.

  • npartitions=1 indicates how many partitions the dataset has been split into.

  • The 0 and 9 tell us the “divisions” of the partitions. When the dataset is sorted by the index, these divisions are ranges to show which index values reside in each partition.

We can use peek at the first n rows using ndf.head(n) (or the last few with ndf.tail(n)).

[3]:
ndf.head(3)
[3]:
  a b nested
0 0.497448 1.186517
t flux band
5.95973 53.840049 r

100 rows × 3 columns

1 0.914472 1.226739
1.380994 80.248678 r

100 rows × 3 columns

2 0.153702 1.748005
6.161197 35.818547 g

100 rows × 3 columns

3 rows x 3 columns

We can signal to Dask that we’d like to actually obtain all of the data as nested_pandas.NestedFrame by using compute.

[4]:
ndf.compute()
[4]:
  a b nested
0 0.497448 1.186517
t flux band
5.95973 53.840049 r

100 rows × 3 columns

1 0.914472 1.226739
1.380994 80.248678 r

100 rows × 3 columns

2 0.153702 1.748005
6.161197 35.818547 g

100 rows × 3 columns

3 0.660518 0.894475
19.712416 23.060196 r

100 rows × 3 columns

4 0.311996 0.386149
0.444581 1.324795 r

100 rows × 3 columns

5 0.236855 0.423388
0.510952 11.243248 r

100 rows × 3 columns

6 0.205673 0.182522
9.357901 44.790462 r

100 rows × 3 columns

7 0.758041 0.216532
17.134305 62.495211 g

100 rows × 3 columns

8 0.557032 1.081138
3.417226 13.34516 g

100 rows × 3 columns

9 0.745377 0.810245
12.302421 28.15279 r

100 rows × 3 columns

10 rows x 3 columns

As with Nested-Pandas, this NestedFrame holds special nested columns in addition to normal Pandas columns. In this case, we have the top level dataframe with 10 rows and 2 typical columns, “a” and “b”. The “nested” column contains a dataframe in each row. We can inspect the contents of the “nested” column using the standard Pandas/Dask API.

[5]:
ndf.nested.compute()[0]
[5]:
t flux band
0 5.959730 53.840049 r
1 7.675038 30.741834 r
... ... ... ...
98 19.596919 20.755134 r
99 17.213989 60.974674 g

100 rows × 3 columns

Here we see that within the “nested” column there are Nested-Pandas NestedFrame objects with their own data. In this case we have 3 columns (“t”, “flux”, and “band”).

Nested-Dask functionality mirrors Nested-Pandas, as we can see via the query function. In this case, we use a Nested-Pandas specific feature to query nested layers using a hierarchical column name (“nested.t” queries the “t” sub-column from the “nested” column of ndf).

[6]:
# Applies the query to "nested", filtering based on "t >17"
result = ndf.query("nested.t > 17.0")
result
[6]:
Nested-Dask NestedFrame Structure:
a b nested
npartitions=1
0 float64 float64 nested<t: [double], flux: [double], band: [string]>
9 ... ... ...
Dask Name: lambda, 4 expressions

Once again, the result is lazy and no work has been performed. We can kick off some computation using compute as above or this time using head to just peek at the result:

[7]:
result.head(5)
[7]:
  a b nested
0 0.497448 1.186517
t flux band
19.853928 6.709146 r

18 rows × 3 columns

1 0.914472 1.226739
18.596419 86.832778 g

10 rows × 3 columns

2 0.153702 1.748005
19.96159 73.659794 r

21 rows × 3 columns

3 0.660518 0.894475
19.712416 23.060196 r

24 rows × 3 columns

4 0.311996 0.386149
17.590407 86.730045 g

16 rows × 3 columns

5 rows x 3 columns

In this case, the query has actually affected the rows of the “nested” column.

[8]:
result.head(5).nested[0]  # no `t` value is lower than 17.0
[8]:
t flux band
0 19.853928 6.709146 r
1 19.481351 54.279580 g
2 18.870711 35.388579 r
3 17.587139 67.321154 r
4 18.281895 95.456750 g
5 17.660319 75.219559 g
6 19.502393 92.471635 g
7 17.125941 1.876946 r
8 19.747156 75.255061 r
9 18.916546 86.355152 g
10 19.182886 83.928470 g
11 18.227857 8.684140 r
12 18.405449 64.282790 g
13 19.830650 21.137759 r
14 17.142630 82.857088 g
15 19.337413 7.482233 g
16 19.596919 20.755134 r
17 17.213989 60.974674 g

Nested-Dask reduce functions near-identically to Nested-Pandas reduce, providing a way to call custom functions on NestedFrame data. The one additional concern is that Dask requires, in almost every case, a meta= argument to help Dask understand the shape and type of the output data. Dask provides a make_meta function, to which you can pass a dummy output value.

[9]:
import numpy as np
import pandas as pd
from dask.dataframe.utils import make_meta

# Use hierarchical column names to access the flux column
# passed as an array to np.mean .
#
# Take a single sample row, computed (that's what .head(1) will do),
# and generate the meta for it.
meta = make_meta(ndf.head(1).reduce(np.mean, "nested.flux"))

means = ndf.reduce(np.mean, "nested.flux", meta=meta)
means.compute()
[9]:
0
0 52.482559
1 47.794941
2 47.865195
3 51.835951
4 50.223569
5 55.425361
6 53.614796
7 54.838310
8 52.924436
9 47.718718

10 rows × 1 columns

The reduce function can also be used to apply any row-based calculation, as it turns out, even if the dimension stays the same. Observe that we can use this similar pattern to produce, say, the square of the flux. It is still a “reduction” in that the result is no longer within the original NestedFrame structure, but the cardinality of each output row is now the same as the cardinality of each input row.

[10]:
meta = make_meta(ndf.head(1).reduce(np.square, "nested.flux"))

flux_sq = ndf.reduce(np.square, "nested.flux", meta=meta)
flux_sq.compute()
[10]:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
0 2898.750867 945.060352 2535.489891 358.939152 45.012637 2946.272820 1252.351552 74.528862 49.327149 9977.402281 2786.508365 4532.137800 3629.438701 3327.600062 7000.403782 111.324127 5574.710703 808.845527 9111.991213 8998.841672 10.221304 5353.812903 5090.786645 3166.654251 2305.047403 1597.694383 65.117453 5657.982077 5231.721221 1757.393582 2303.075480 268.717392 8551.003289 3.522925 4457.036164 1083.328315 4644.419674 1338.790917 4050.105423 2874.521095 2781.962211 5663.324199 61.900470 7457.212246 5303.077160 720.545034 6606.375831 5049.579245 5336.682930 815.055630 103.147826 9598.553879 3386.848982 7043.988135 9006.878159 5753.864060 4520.654386 1.821413 5521.581626 2704.983373 7681.064332 8994.740163 75.414296 2288.011417 4132.277111 1347.126668 5448.853543 9033.727717 2187.498363 85.763629 2086.033499 3712.941136 3800.237520 8974.280397 1045.481162 68.593740 9797.293690 244.286053 3723.193377 40.066035 9529.248005 402.895381 446.804857 9032.278770 1710.576372 9447.073225 2873.041829 2307.762019 1317.850703 644.293828 6865.297024 2628.867825 2325.925782 6066.684683 55.983811 88.670368 4332.229606 7095.440170 430.775586 3717.910878
1 6439.850378 7539.931266 10.322434 9796.572579 2909.133835 162.822911 8230.586377 397.497409 6272.601795 241.105158 8115.744501 4338.448150 5052.430526 8836.325651 933.221084 890.540307 1685.515203 928.101187 108.109938 22.210879 4572.361349 1084.804097 6354.589998 2748.857225 3654.498930 5857.285379 1213.305969 856.354925 281.649063 2879.475583 6865.859943 43.508409 1293.894863 4809.374602 9758.082585 18.339122 3427.624766 723.467608 892.603189 11.961522 1158.360664 862.471549 115.606959 2435.640191 4936.532406 129.278548 9249.612384 9185.823254 759.019692 3552.634008 1204.792673 7158.402825 1482.238875 5671.721938 3818.881619 2654.160377 5382.578654 14.541461 238.815535 406.252222 1786.428295 372.917084 4754.241415 1641.934982 1979.714735 3880.321772 44.392719 1.999521 3585.411802 22.982001 4808.966116 9708.086384 3638.396397 16.364485 133.690494 7.302322 8786.353354 709.552071 3660.836955 1023.954757 3857.984493 9912.664116 8924.562939 45.276146 5052.364645 8297.318551 4575.846883 1357.820430 3723.688695 1252.086683 153.757518 2080.105796 762.162742 686.513811 8956.402824 728.208702 8264.566676 803.559089 9630.852350 359.038511
2 1282.968293 5701.280271 1811.715077 5425.765206 1.188302 2468.460167 13.787400 1358.416755 1136.648858 3552.431995 8967.114310 1713.297641 736.935703 4145.713654 8288.361141 5083.817824 1257.784531 51.642072 947.848089 3283.536898 1063.247020 880.324462 6182.612832 1254.240709 3138.155653 7394.281389 259.529392 164.665258 196.009851 408.366328 4127.585570 5114.195780 373.262644 2606.661986 168.794225 2546.147667 3158.785187 4104.720856 6877.287468 353.368761 1165.229814 980.687315 3968.052528 5317.189049 1491.832821 2694.590178 6803.339512 5377.253891 580.554163 9859.057797 2988.105835 3875.283419 4899.164939 9574.203594 212.730281 61.306603 5020.304713 5749.896347 1852.510811 2529.158408 874.341336 2990.808088 4.904154 586.285240 352.210667 96.704079 12.626353 6257.325912 1191.476518 296.546606 1116.633420 6584.076360 4278.081259 648.436386 2765.255072 323.037824 1294.224383 110.222444 2632.326709 8546.496153 9622.473056 3104.313469 8350.985133 574.065121 1431.776114 184.671426 532.888836 1355.081157 1816.377183 3775.299976 3252.340461 7958.937129 2181.541813 282.883995 3801.895841 9620.721739 5062.855844 9243.895630 3477.708163 1766.931502
3 531.772629 6258.074776 0.198393 2542.876900 1517.643577 3.167819 638.230947 4019.983285 1164.355840 9828.137072 5248.880524 3334.499342 5759.300074 5202.214794 6871.058629 5010.515672 3579.704928 5750.331020 312.031696 9702.168923 4489.680978 7571.587105 9497.154623 117.813997 258.683587 3112.143572 173.712572 3858.218089 3133.768729 5364.629182 10.241666 2513.209054 360.570379 4498.947910 201.381387 9775.174444 4733.513518 9346.332740 8264.253781 9391.370945 5841.652552 282.632976 1635.184185 934.385403 67.788905 4689.625582 2542.025319 3527.182958 2597.270923 8787.592697 266.772909 1837.707781 8218.381889 4562.401995 1381.458017 9965.419338 9321.450725 335.212193 2070.312305 4370.635771 5.070729 2362.077861 687.229164 356.564906 8695.545464 4490.030423 2504.954936 2870.861312 5279.222399 4932.763729 3381.975600 1405.940045 157.318524 41.485955 4056.973167 2213.989819 5579.711548 4162.217806 1934.893197 1038.480307 16.007811 4304.976101 2.054687 762.859474 2.735400 3970.061136 3337.679755 2200.035028 2173.730927 5520.908360 3574.797285 7963.325671 8211.052068 968.068839 2742.674559 70.626145 2649.860924 2526.279612 8291.143230 43.254469
4 1.755081 8179.893264 448.043713 8506.999541 383.274667 7522.100778 9568.801534 119.549064 216.897962 5319.669869 3231.280743 7102.511658 1720.775901 8541.280719 5043.203615 927.347896 494.115779 4294.385150 6702.540107 2796.721836 1063.090146 989.187159 6896.231449 241.892335 223.346630 2517.424879 372.692924 4836.305907 519.365610 6623.602952 476.118161 9426.410396 6815.868681 8949.563263 4950.830686 407.138089 6116.662277 2352.330675 1133.030588 3819.796745 15.209294 4028.262275 1180.864973 2770.161188 7563.422247 5097.097157 1969.822118 6980.512580 5690.748391 89.767408 3406.514584 4410.036793 108.948939 1.766407 1473.089469 9143.530234 1767.541856 5906.304519 3697.372404 7155.589644 184.900524 20.511915 5286.614365 1666.793548 997.494985 1596.842234 277.723928 3428.235262 6765.062740 6613.378960 3452.165009 6322.242267 8040.228881 1343.055807 1159.164151 1783.481446 2899.421879 17.548901 962.422109 4921.165848 628.993091 750.378615 2691.880805 5643.455825 8902.603369 1255.992250 2986.947110 1662.877835 4.732480 2888.244438 817.426507 219.595200 0.815902 2228.406181 9117.725566 5560.712676 4230.729615 183.459852 964.011633 2660.079168
5 126.410618 8504.673287 9978.972284 4471.594037 2834.433464 6026.308752 4340.608615 262.645973 697.733064 7298.374669 92.808731 3521.513432 4315.723261 9635.989847 539.424655 37.541825 4572.452096 2305.274835 8409.251622 8836.769069 70.391171 7865.452681 120.810957 190.542233 5049.425815 59.726306 983.784856 3609.288509 2147.848472 2960.020686 4025.025950 2035.989046 298.522918 5440.488691 8541.941099 66.933583 3499.214755 4607.721898 397.224382 1772.865906 248.614329 3196.739351 9431.513194 3743.873403 6622.299540 1694.615296 2453.973156 3308.629208 3901.144239 2261.161209 1997.536091 3405.605400 6087.510802 8732.537445 2214.512675 5603.990047 2177.704805 2879.259870 388.294077 8238.689561 7099.596765 7626.826802 681.412566 3871.129587 35.535615 5392.928488 711.867306 4216.711464 4380.965027 6562.470767 4735.034426 9076.231722 1404.399583 3050.995359 5.469268 2667.156931 7646.374869 3849.670930 3028.912292 5178.815974 1039.906349 2100.240280 7992.366547 149.775172 767.157358 6411.266660 9572.910494 5078.166615 8521.631751 2889.455925 38.700475 9185.646102 9111.552082 5059.613676 5548.306852 2298.977768 702.013483 7172.058635 249.201027 1466.488953
6 2006.185492 42.199495 3060.325682 7692.939872 1588.880318 5325.049461 7503.899164 8869.627846 408.176761 4980.695347 1311.092994 1191.555112 7554.336523 6896.725541 3061.370548 2430.423935 5.339171 157.209248 5229.934985 7505.453819 9634.946468 9038.413549 175.249982 5.489755 3556.101042 1309.821088 9486.366015 3757.948233 2165.512883 4651.639091 4288.159603 7035.187211 427.704224 6806.649580 5363.636831 1427.587395 7150.506009 2683.793051 3459.446636 2290.930248 202.953976 9830.797239 7248.522586 462.182907 810.445100 1053.766507 6239.007691 4441.696479 4321.942822 297.568186 3694.478759 4466.046291 966.127659 801.297771 2689.800168 832.440825 2212.768082 6749.419553 7.813270 3286.608686 2786.306896 204.760315 5723.594416 3811.078446 9053.674835 4621.407034 8642.968403 3503.726897 497.705135 5403.858121 4.012232 1318.812237 8531.026451 360.143744 7.040657 2953.771932 125.094694 527.160438 4903.401583 4800.135019 9173.113276 4508.847932 5958.468378 9174.071327 7754.734578 1995.997253 4935.137168 3037.578387 7394.052478 2832.680916 428.419702 1391.709784 7.171091 1557.202400 3814.545273 314.334704 1728.485483 396.024021 3154.336678 7418.241153
7 3905.651402 6704.391231 30.563789 1919.213580 650.725207 12.083223 41.981497 4196.631313 4975.443196 6421.348294 3865.137172 2695.387100 5220.730126 6252.916867 7622.039266 8373.692236 2035.887426 180.216778 357.161807 264.619466 718.639102 1344.952921 5023.776888 4953.006164 7616.042648 7029.342779 8490.873008 6239.263184 2399.996215 3478.973906 6888.387183 7282.131910 8299.438563 1568.179995 84.921680 959.435055 100.679543 2848.094220 683.196229 3828.680663 1675.259594 1605.102125 26.519837 6373.852560 7678.820508 1952.245123 6935.822502 8100.791580 5.584189 2602.230196 9073.968806 9389.279825 6439.709813 7222.889488 756.673059 5856.186469 5554.543649 373.144228 5285.819953 1125.655700 691.852803 3146.934603 745.776896 181.071469 3109.847506 6321.485778 1042.463443 1554.817622 2266.046825 7062.301551 952.136517 5651.653169 4060.985104 7204.573787 1928.185425 5250.569015 5043.818524 6125.069125 4309.461821 988.952679 5112.064405 6835.345837 6930.474586 8.351195 2281.477578 9692.043497 6998.061460 4401.754272 1551.678751 9349.472385 3238.363571 3419.502522 14.508628 3194.524529 8809.778453 1349.654976 218.839167 1926.996903 0.227661 2746.902865
8 178.093303 3957.136293 303.327670 1642.408106 6039.120851 2151.697080 5792.050126 1104.774041 501.116366 1045.764547 1130.724908 1582.584806 815.223101 3778.131101 1588.627483 5259.027646 7798.372252 8.659302 6594.855404 5864.465828 9587.395335 3946.406776 8286.464315 804.550475 4443.517250 19.113170 5610.666562 31.847340 7438.743489 3751.584405 5123.724988 1778.055660 469.149757 2785.472188 2770.105898 985.751008 6098.984219 4127.698244 9629.743713 487.883394 37.276982 7111.415393 1865.240096 202.018456 206.649449 5002.229921 2723.498509 8002.459125 6495.320091 6187.301391 6133.749792 6841.174466 6990.713348 1752.520989 9924.047067 9016.718869 2205.323402 9743.892146 2557.786736 885.156918 1036.472327 2565.727125 110.046218 6819.133285 1256.213151 514.246800 1278.048395 1534.497443 4160.971354 8776.781866 1373.094588 4099.352984 20.122698 3980.789918 2077.807423 5502.701599 2.428222 9717.496850 3731.900691 553.865529 3823.138233 211.825793 7803.407304 3300.790415 713.411892 3553.363186 8420.486420 3826.437040 645.222407 7574.839581 140.578826 9604.912838 993.913692 1982.012087 7.235474 2370.371505 916.555942 5134.344531 4756.478796 4526.952365
9 792.579601 332.140408 5193.402681 2006.970327 3868.638015 1909.309446 6287.859188 3870.066356 2115.936890 229.765580 541.439018 5499.946456 6306.011725 91.203091 4133.161009 2.006797 2812.657354 723.707314 767.358710 1174.102059 1592.068130 1468.526389 623.808690 355.360505 397.677802 1291.649315 4497.501740 1264.838218 2594.748016 3684.854799 5104.261668 1181.121787 574.933050 389.878141 1726.325795 1123.903243 6271.165916 3119.235225 2318.189758 3909.420173 2723.433708 9945.596301 725.865299 4902.365535 8178.112249 99.731721 163.410204 1046.310834 8805.404815 1760.280197 6798.606951 2898.814414 28.843851 7052.422358 3059.430857 1786.759332 8115.110677 7988.010784 241.454511 1631.676516 7202.306315 440.563839 1088.073230 6545.356799 1379.492671 3299.579213 4265.375001 1460.037825 4.293699 1846.468038 1168.373459 8102.932255 9879.957273 7874.280225 3446.181153 177.880507 582.484947 8922.479741 1052.591940 346.049929 6209.322303 2109.742338 239.379307 3892.882923 3532.798969 1569.376515 576.122619 3328.861815 6054.913584 3579.073604 27.995826 4886.828710 1695.315311 486.086949 5085.396978 4203.685449 351.821104 3.472752 2680.270174 4881.348965

10 rows × 100 columns

[ ]: