Note
Click here to download the full example code
Using dask distributed for single-machine parallel computing¶
This example shows the simplest usage of the dask distributed backend, on the local computer.
This is useful for prototyping a solution, to later be run on a truly distributed cluster, as the only change to be made is the address of the scheduler.
Another realistic usage scenario: combining dask code with joblib code, for instance using dask for preprocessing data, and scikit-learn for machine learning. In such a setting, it may be interesting to use distributed as a backend scheduler for both dask and joblib, to orchestrate well the computation.
Setup the distributed client¶
Run parallel computation using dask.distributed¶
import time
import joblib
def long_running_function(i):
time.sleep(.1)
return i
The verbose messages below show that the backend is indeed the dask.distributed one
with joblib.parallel_backend('dask'):
joblib.Parallel(verbose=100)(
joblib.delayed(long_running_function)(i)
for i in range(10))
Out:
[Parallel(n_jobs=-1)]: Using backend DaskDistributedBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done 1 tasks | elapsed: 0.1s
[Parallel(n_jobs=-1)]: Batch computation too fast (0.1437s.) Setting batch_size=2.
[Parallel(n_jobs=-1)]: Done 2 tasks | elapsed: 0.2s
[Parallel(n_jobs=-1)]: Done 3 tasks | elapsed: 0.2s
[Parallel(n_jobs=-1)]: Done 4 out of 10 | elapsed: 0.2s remaining: 0.3s
[Parallel(n_jobs=-1)]: Done 5 out of 10 | elapsed: 0.2s remaining: 0.2s
[Parallel(n_jobs=-1)]: Done 6 out of 10 | elapsed: 0.3s remaining: 0.2s
[Parallel(n_jobs=-1)]: Done 7 out of 10 | elapsed: 0.3s remaining: 0.1s
[Parallel(n_jobs=-1)]: Done 8 out of 10 | elapsed: 0.3s remaining: 0.1s
[Parallel(n_jobs=-1)]: Done 10 out of 10 | elapsed: 0.3s remaining: 0.0s
[Parallel(n_jobs=-1)]: Done 10 out of 10 | elapsed: 0.3s finished
Progress in computation can be followed on the distributed web interface, see https://dask.pydata.org/en/latest/diagnostics-distributed.html
Total running time of the script: ( 0 minutes 1.045 seconds)