Databricks Runtime 7.0 ML (Unsupported)
Databricks released this image in June 2020.
Databricks Runtime 7.0 for Machine Learning provides a ready-to-go environment for machine learning and data science based onDatabricks Runtime 7.0 (Unsupported). Databricks Runtime ML contains many popular machine learning libraries, including TensorFlow, PyTorch, and XGBoost. It also supports distributed deep learning training using Horovod.
For more information, including instructions for creating a Databricks Runtime ML cluster, seeIntroduction to Databricks Runtime for Machine Learning.
New features and major changes
Databricks Runtime 7.0 ML is built on top of Databricks Runtime 7.0. For information on what’s new in Databricks Runtime 7.0, including Apache Spark MLlib and SparkR, see theDatabricks Runtime 7.0 (Unsupported)release notes.
GPU-aware scheduling
Databricks Runtime 7.0 ML supports GPU-aware scheduling from Apache Spark 3.0. Databricks automatically configures it for you. SeeGPU scheduling.
Major changes to ML Python environment
This section describes the major changes to the pre-installed ML Python environment compared toDatabricks Runtime 6.6 ML (Unsupported). You should also review the major changes to the base Python environment inDatabricks Runtime 7.0 (Unsupported). For a full list of installed Python packages and their versions, seePython libraries.
Python packages upgraded
tensorflow 1.15.0 -> 2.2.0
tensorboard 1.15.0 -> 2.2.2
pytorch 1.4.0 -> 1.5.0
xgboost 0.90 -> 1.1.1
sparkdl 1.6.0-db1 -> 2.1.0-db1
hyperopt 0.2.2.db1 -> 0.2.4.db1
Python packages removed
argparse
boto (use
boto3
instead)colorama
deprecated
et-xmlfile
fusepy
html5lib
jdcal
keras (use
tensorflow.keras
instead)keras-applications (use
tensorflow.keras.applications
instead)llvmlite
lxml
nose
nose-exclude
numba
openpyxl
pathlib2
ply
pymongo
singledispatch
tensorboardX (use
torch.utils.tensorboard
instead)virtualenv
webencodings
Major changes to ML R environment
Databricks Runtime 7.0 ML includes an unmodified version of RStudio Server Open Source v1.2.5033 for which the source code can be found inGitHub. Read more aboutRStudio Serveron Databricks.
Changes to ML Spark packages, Java and Scala libraries
The following packages are upgraded. Some are upgraded toSNAPSHOT
releases that are compatible with Apache Spark 3.0:
graphframes: 0.7.0-db1-spark2.4 -> 0.8.0-db2-spark3.0
spark-tensorflow-connector: 1.15.0 (Scala 2.11) -> 1.15.0 (Scala 2.12)
xgboost4j and xgboost4j-spark: 0.90 -> 1.0.0
mleap-databricks-runtime: 0.17.0-4882dc3 (SNAPSHOT)
The following packages are removed:
TensorFlow (Java)
TensorFrames
Deep Learning Pipelines for Apache Spark (HorovodRunner is available in Python)
Added conda and pip commands to support notebook-scoped Python libraries (public preview)
Starting with Databricks Runtime 7.0 ML, you can use%pip
and%conda
commands to manage Python libraries installed in a notebook session. You can also use these commands to create a custom environment for a notebook and to reproduce this environment between notebooks. To enable this feature, in cluster settings, set theSpark configurationspark.databricks.conda.condaMagic.enabledtrue
. For more information, seeNotebook-scoped Python libraries.
Deprecations and unsupported features
Databricks Runtime 7.0 ML does not supporttable access control. If you need table access control, we recommend that you use Databricks Runtime 7.0.
Known issues
Passing the
sample_input
argument tomlflow.spark.log_model
in order to log an MLlib model in mleap format fails with an AttributeError due to an mleap API change. Upgrade to MLflow 1.9.0 as a workaround. You can install MLflow 1.9.0 usingNotebook-scoped Python librariesorWorkspace Libraries
System environment
The system environment in Databricks Runtime 7.0 ML differs from Databricks Runtime 7.0 as follows:
DBUtils: Databricks Runtime ML does not containLibrary utility (dbutils.library). You can use
%pip
and%conda
commands instead. SeeNotebook-scoped Python libraries.For GPU clusters, the following NVIDIA GPU libraries:
CUDA 10.1 Update 2
cuDNN 7.6.5
NCCL 2.7.3
TensorRT 6.0.1
Libraries
The following sections list the libraries included in Databricks Runtime 7.0 ML that differ from those included in Databricks Runtime 7.0.
In this section:
Top-tier libraries
Databricks Runtime 7.0 ML includes the following top-tierlibraries:
Python libraries
Databricks Runtime 7.0 ML uses Conda for Python package management and includes many popular ML packages. The following section describes the Conda environment for Databricks Runtime 7.0 ML.
Python on CPU clusters
name:databricks-mlchannels:-pytorch-defaultsdependencies:-_libgcc_mutex=0.1=main-absl-py=0.9.0=py37_0-asn1crypto=1.3.0=py37_0-astor=0.8.0=py37_0-backcall=0.1.0=py37_0-backports=1.0=py_2-bcrypt=3.1.7=py37h7b6447c_1-blas=1.0=mkl-blinker=1.4=py37_0-boto3=1.12.0=py_0-botocore=1.15.0=py_0-c-ares=1.15.0=h7b6447c_1001-ca-certificates=2020.1.1=0-cachetools=4.1.0=py_1-certifi=2020.4.5.1=py37_0-cffi=1.14.0=py37h2e261b9_0-chardet=3.0.4=py37_1003-單擊= 7.0 = py37_0-cloudpickle=1.3.0=py_0-configparser=3.7.4=py37_0-cpuonly=1.0=0-cryptography=2.8=py37h1ba5d50_0-cycler=0.10.0=py37_0-cython=0.29.15=py37he6710b0_0-decorator=4.4.1=py_0-dill=0.3.1.1=py37_1-docutils=0.15.2=py37_0-entrypoints=0.3=py37_0-flask=1.1.1=py_1-freetype=2.9.1=h8a8886c_1-future=0.18.2=py37_1-gast=0.3.3=py_0-gitdb2=2.0.6=py_0-gitpython=3.0.5=py_0-google-auth=1.11.2=py_0-google-auth-oauthlib=0.4.1=py_2-google-pasta=0.2.0=py_0-grpcio=1.27.2=py37hf8bcb03_0-gunicorn=20.0.4=py37_0-h5py=2.10.0=py37h7918eee_0-hdf5=1.10.4=hb1b8bf9_0-icu=58.2=he6710b0_3-idna=2.8=py37_0-intel-openmp=2020.0=166-ipykernel=5.1.4=py37h39e3cac_0-ipython=7.12.0=py37h5ca1d4c_0-ipython_genutils=0.2.0=py37_0-itsdangerous = 1.1.0 = py37_0-jedi=0.14.1=py37_0-jinja2=2.11.1=py_0-jmespath=0.9.4=py_0-joblib=0.14.1=py_0-jpeg=9b=h024ee3a_2-jupyter_client=5.3.4=py37_0-jupyter_core=4.6.1=py37_0-kiwisolver=1.1.0=py37he6710b0_0-krb5=1.16.4=h173b8e3_0-ld_impl_linux-64=2.33.1=h53a641e_7-libedit=3.1.20181209=hc058e9b_0-libffi=3.2.1=hd88cf55_4-libgcc-ng=9.1.0=hdf63c60_0-libgfortran-ng=7.3.0=hdf63c60_0-libpng=1.6.37=hbc83047_0-libpq=11.2=h20c2e04_0-libprotobuf=3.11.4=hd408876_0-libsodium=1.0.16=h1bed415_0-libstdcxx-ng=9.1.0=hdf63c60_0-libtiff=4.1.0=h2733197_0-lightgbm=2.3.0=py37he6710b0_0-lz4-c=1.8.1.2=h14c3975_0-mako=1.1.2=py_0-markdown=3.1.1=py37_0-markupsafe=1.1.1=py37h7b6447c_0-matplotlib-base=3.1.3=py37hef1b27d_0-mkl=2020.0=166-mkl-service=2.3.0=py37he904b0f_0-mkl_fft=1.0.15=py37ha843d7b_0-mkl_random=1.1.0=py37hd6b4f25_0-ncurses=6.2=he6710b0_1-networkx=2.4=py_0-ninja=1.9.0=py37hfd86e86_0-nltk=3.4.5=py37_0-numpy=1.18.1=py37h4f9e942_0-numpy-base=1.18.1=py37hde5b4d6_1-oauthlib=3.1.0=py_0-olefile=0.46=py37_0-openssl=1.1.1g=h7b6447c_0-packaging=20.1=py_0-pandas=1.0.1=py37h0573a6f_0-paramiko=2.7.1=py_0-parso=0.5.2=py_0-patsy=0.5.1=py37_0-pexpect=4.8.0=py37_0-pickleshare=0.7.5=py37_0-pillow=7.0.0=py37hb39fc2d_0-pip=20.0.2=py37_3-plotly=4.5.2=py_0-prompt_toolkit=3.0.3=py_0-protobuf=3.11.4=py37he6710b0_0-psutil=5.6.7=py37h7b6447c_0-psycopg2=2.8.4=py37h1ba5d50_0-ptyprocess=0.6.0=py37_0-pyasn1=0.4.8=py_0-pyasn1-modules=0.2.7=py_0-pycparser=2.19=py37_0-pygments=2.5.2=py_0-pyjwt=1.7.1=py37_0-pynacl=1.3.0=py37h7b6447c_0-pyodbc=4.0.30=py37he6710b0_0-pyopenssl=19.1.0=py37_0-pyparsing=2.4.6=py_0-pysocks=1.7.1=py37_0-python=3.7.6=h0371630_2-python-dateutil=2.8.1=py_0-python-editor=1.0.4=py_0-pytorch=1.5.0=py3.7_cpu_0-pytz=2019.3=py_0-pyzmq=18.1.1=py37he6710b0_0-readline=7.0=h7b6447c_5-requests=2.22.0=py37_1-requests-oauthlib=1.3.0=py_0-retrying=1.3.3=py37_2-rsa=4.0=py_0-s3transfer=0.3.3=py37_0-scikit-learn=0.22.1=py37hd81dba3_0-scipy=1.4.1=py37h0b6359f_0-setuptools=45.2.0=py37_0-simplejson=3.17.0=py37h7b6447c_0-six=1.14.0=py37_0-smmap2=2.0.5=py37_0-sqlite=3.31.1=h62c20be_1-sqlparse=0.3.0=py_0-statsmodels=0.11.0=py37h7b6447c_0-tabulate=0.8.3=py37_0-tk=8.6.8=hbc83047_0-torchvision=0.6.0=py37_cpu-tornado=6.0.3=py37h7b6447c_3-tqdm=4.42.1=py_0-traitlets=4.3.3=py37_0-unixodbc=2.3.7=h14c3975_0-urllib3=1.25.8=py37_0-wcwidth=0.1.8=py_0-websocket-client=0.56.0=py37_0-werkzeug=1.0.0=py_0-wheel=0.34.2=py37_0-wrapt=1.11.2=py37h7b6447c_0-xz=5.2.4=h14c3975_4-zeromq=4.3.1=he6710b0_3-zlib=1.2.11=h7b6447c_3-zstd=1.3.7=h0b5b093_0-pip:-astunparse==1.6.3-databricks-cli==0.11.0-diskcache==4.1.0-docker==4.2.1-gorilla==0.3.0-horovod==0.19.1-hyperopt==0.2.4.db1-keras-preprocessing==1.1.2-mleap==0.16.0-mlflow==1.8.0-opt-einsum==3.2.1-petastorm==0.9.2-pyarrow==0.15.1-pyyaml==5.3.1-querystring-parser==1.2.4-seaborn==0.10.0-sparkdl==2.1.0-db1-tensorboard==2.2.2-tensorboard-plugin-wit==1.6.0.post3-tensorflow-cpu==2.2.0-tensorflow-estimator==2.2.0-termcolor==1.1.0-xgboost==1.1.1prefix:/databricks/conda/envs/databricks-ml
Python on GPU clusters
name:databricks-ml-gpuchannels:-pytorch-defaultsdependencies:-_libgcc_mutex=0.1=main-absl-py=0.9.0=py37_0-asn1crypto=1.3.0=py37_0-astor=0.8.0=py37_0-backcall=0.1.0=py37_0-backports=1.0=py_2-bcrypt=3.1.7=py37h7b6447c_1-blas=1.0=mkl-blinker=1.4=py37_0-boto3=1.12.0=py_0-botocore=1.15.0=py_0-c-ares=1.15.0=h7b6447c_1001-ca-certificates=2020.1.1=0-cachetools=4.1.0=py_1-certifi=2020.4.5.2=py37_0-cffi=1.14.0=py37h2e261b9_0-chardet=3.0.4=py37_1003-單擊= 7.0 = py37_0-cloudpickle=1.3.0=py_0-configparser=3.7.4=py37_0-cryptography=2.8=py37h1ba5d50_0-cudatoolkit=10.1.243=h6bb024c_0-cycler=0.10.0=py37_0-cython=0.29.15=py37he6710b0_0-decorator=4.4.1=py_0-dill=0.3.1.1=py37_1-docutils=0.15.2=py37_0-entrypoints=0.3=py37_0-flask=1.1.1=py_1-freetype=2.9.1=h8a8886c_1-future=0.18.2=py37_1-gast=0.3.3=py_0-gitdb2=2.0.6=py_0-gitpython=3.0.5=py_0-google-auth=1.11.2=py_0-google-auth-oauthlib=0.4.1=py_2-google-pasta=0.2.0=py_0-grpcio=1.27.2=py37hf8bcb03_0-gunicorn=20.0.4=py37_0-h5py=2.10.0=py37h7918eee_0-hdf5=1.10.4=hb1b8bf9_0-icu=58.2=he6710b0_3-idna=2.8=py37_0-intel-openmp=2020.0=166-ipykernel=5.1.4=py37h39e3cac_0-ipython=7.12.0=py37h5ca1d4c_0-ipython_genutils=0.2.0=py37_0-itsdangerous = 1.1.0 = py37_0-jedi=0.14.1=py37_0-jinja2=2.11.1=py_0-jmespath=0.9.4=py_0-joblib=0.14.1=py_0-jpeg=9b=h024ee3a_2-jupyter_client=5.3.4=py37_0-jupyter_core=4.6.1=py37_0-kiwisolver=1.1.0=py37he6710b0_0-krb5=1.16.4=h173b8e3_0-ld_impl_linux-64=2.33.1=h53a641e_7-libedit=3.1.20181209=hc058e9b_0-libffi=3.2.1=hd88cf55_4-libgcc-ng=9.1.0=hdf63c60_0-libgfortran-ng=7.3.0=hdf63c60_0-libpng=1.6.37=hbc83047_0-libpq=11.2=h20c2e04_0-libprotobuf=3.11.4=hd408876_0-libsodium=1.0.16=h1bed415_0-libstdcxx-ng=9.1.0=hdf63c60_0-libtiff=4.1.0=h2733197_0-lightgbm=2.3.0=py37he6710b0_0-lz4-c=1.8.1.2=h14c3975_0-mako=1.1.2=py_0-markdown=3.1.1=py37_0-markupsafe=1.1.1=py37h7b6447c_0-matplotlib-base=3.1.3=py37hef1b27d_0-mkl=2020.0=166-mkl-service=2.3.0=py37he904b0f_0-mkl_fft=1.0.15=py37ha843d7b_0-mkl_random=1.1.0=py37hd6b4f25_0-ncurses=6.2=he6710b0_1-networkx=2.4=py_0-ninja=1.9.0=py37hfd86e86_0-nltk=3.4.5=py37_0-numpy=1.18.1=py37h4f9e942_0-numpy-base=1.18.1=py37hde5b4d6_1-oauthlib=3.1.0=py_0-olefile=0.46=py37_0-openssl=1.1.1g=h7b6447c_0-packaging=20.1=py_0-pandas=1.0.1=py37h0573a6f_0-paramiko=2.7.1=py_0-parso=0.5.2=py_0-patsy=0.5.1=py37_0-pexpect=4.8.0=py37_0-pickleshare=0.7.5=py37_0-pillow=7.0.0=py37hb39fc2d_0-pip=20.0.2=py37_3-plotly=4.5.2=py_0-prompt_toolkit=3.0.3=py_0-protobuf=3.11.4=py37he6710b0_0-psutil=5.6.7=py37h7b6447c_0-psycopg2=2.8.4=py37h1ba5d50_0-ptyprocess=0.6.0=py37_0-pyasn1=0.4.8=py_0-pyasn1-modules=0.2.7=py_0-pycparser=2.19=py37_0-pygments=2.5.2=py_0-pyjwt=1.7.1=py37_0-pynacl=1.3.0=py37h7b6447c_0-pyodbc=4.0.30=py37he6710b0_0-pyopenssl=19.1.0=py37_0-pyparsing=2.4.6=py_0-pysocks=1.7.1=py37_0-python=3.7.6=h0371630_2-python-dateutil=2.8.1=py_0-python-editor=1.0.4=py_0-pytorch=1.5.0=py3.7_cuda10.1.243_cudnn7.6.3_0-pytz=2019.3=py_0-pyzmq=18.1.1=py37he6710b0_0-readline=7.0=h7b6447c_5-requests=2.22.0=py37_1-requests-oauthlib=1.3.0=py_0-retrying=1.3.3=py37_2-rsa=4.0=py_0-s3transfer=0.3.3=py37_0-scikit-learn=0.22.1=py37hd81dba3_0-scipy=1.4.1=py37h0b6359f_0-setuptools=45.2.0=py37_0-simplejson=3.17.0=py37h7b6447c_0-six=1.14.0=py37_0-smmap2=2.0.5=py37_0-sqlite=3.31.1=h62c20be_1-sqlparse=0.3.0=py_0-statsmodels=0.11.0=py37h7b6447c_0-tabulate=0.8.3=py37_0-tk=8.6.8=hbc83047_0-torchvision=0.6.0=py37_cu101-tornado=6.0.3=py37h7b6447c_3-tqdm=4.42.1=py_0-traitlets=4.3.3=py37_0-unixodbc=2.3.7=h14c3975_0-urllib3=1.25.8=py37_0-wcwidth=0.1.8=py_0-websocket-client=0.56.0=py37_0-werkzeug=1.0.0=py_0-wheel=0.34.2=py37_0-wrapt=1.11.2=py37h7b6447c_0-xz=5.2.4=h14c3975_4-zeromq=4.3.1=he6710b0_3-zlib=1.2.11=h7b6447c_3-zstd=1.3.7=h0b5b093_0-pip:-astunparse==1.6.3-databricks-cli==0.11.0-diskcache==4.1.0-docker==4.2.1-gorilla==0.3.0-horovod==0.19.1-hyperopt==0.2.4.db1-keras-preprocessing==1.1.2-mleap==0.16.0-mlflow==1.8.0-opt-einsum==3.2.1-petastorm==0.9.2-pyarrow==0.15.1-pyyaml==5.3.1-querystring-parser==1.2.4-seaborn==0.10.0-sparkdl==2.1.0-db1-tensorboard==2.2.2-tensorboard-plugin-wit==1.6.0.post3-tensorflow-estimator==2.2.0-tensorflow-gpu==2.2.0-termcolor==1.1.0-xgboost==1.1.1prefix:/databricks/conda/envs/databricks-ml-gpu
R libraries
The R libraries are identical to theR Librariesin Databricks Runtime 7.0 Beta.
Java and Scala libraries (Scala 2.12 cluster)
In addition to Java and Scala libraries in Databricks Runtime 7.0, Databricks Runtime 7.0 ML contains the following JARs:
Group ID |
Artifact ID |
Version |
---|---|---|
com.typesafe.akka |
akka-actor_2.12 |
2.5.23 |
ml.combust.mleap |
mleap-databricks-runtime_2.12 |
0.17.0-4882dc3 |
ml.dmlc |
xgboost4j-spark_2.12 |
1.0.0 |
ml.dmlc |
xgboost4j_2.12 |
1.0.0 |
org.mlflow |
mlflow-client |
1.8.0 |
org.scala-lang.modules |
scala-java8-compat_2.12 |
0.8.0 |
org.tensorflow |
spark-tensorflow-connector_2.12 |
1.15.0 |