Databricks Runtime 7.0 ML (Unsupported)

Databricks released this image in June 2020.

Databricks Runtime 7.0 for Machine Learning provides a ready-to-go environment for machine learning and data science based onDatabricks Runtime 7.0 (Unsupported). Databricks Runtime ML contains many popular machine learning libraries, including TensorFlow, PyTorch, and XGBoost. It also supports distributed deep learning training using Horovod.

For more information, including instructions for creating a Databricks Runtime ML cluster, seeIntroduction to Databricks Runtime for Machine Learning.

New features and major changes

Databricks Runtime 7.0 ML is built on top of Databricks Runtime 7.0. For information on what’s new in Databricks Runtime 7.0, including Apache Spark MLlib and SparkR, see theDatabricks Runtime 7.0 (Unsupported)release notes.

GPU-aware scheduling

Databricks Runtime 7.0 ML supports GPU-aware scheduling from Apache Spark 3.0. Databricks automatically configures it for you. SeeGPU scheduling.

Major changes to ML Python environment

This section describes the major changes to the pre-installed ML Python environment compared toDatabricks Runtime 6.6 ML (Unsupported). You should also review the major changes to the base Python environment inDatabricks Runtime 7.0 (Unsupported). For a full list of installed Python packages and their versions, seePython libraries.

Python packages upgraded

  • tensorflow 1.15.0 -> 2.2.0

  • tensorboard 1.15.0 -> 2.2.2

  • pytorch 1.4.0 -> 1.5.0

  • xgboost 0.90 -> 1.1.1

  • sparkdl 1.6.0-db1 -> 2.1.0-db1

  • hyperopt 0.2.2.db1 -> 0.2.4.db1

Python packages added

  • lightgbm: 2.3.0

  • nltk: 3.4.5

  • petastorm: 0.9.2

  • plotly: 4.5.2

Python packages removed

  • argparse

  • boto (useboto3instead)

  • colorama

  • deprecated

  • et-xmlfile

  • fusepy

  • html5lib

  • jdcal

  • keras (usetensorflow.kerasinstead)

  • keras-applications (usetensorflow.keras.applicationsinstead)

  • llvmlite

  • lxml

  • nose

  • nose-exclude

  • numba

  • openpyxl

  • pathlib2

  • ply

  • pymongo

  • singledispatch

  • tensorboardX (usetorch.utils.tensorboardinstead)

  • virtualenv

  • webencodings

Major changes to ML R environment

Databricks Runtime 7.0 ML includes an unmodified version of RStudio Server Open Source v1.2.5033 for which the source code can be found inGitHub. Read more aboutRStudio Serveron Databricks.

Changes to ML Spark packages, Java and Scala libraries

The following packages are upgraded. Some are upgraded toSNAPSHOTreleases that are compatible with Apache Spark 3.0:

  • graphframes: 0.7.0-db1-spark2.4 -> 0.8.0-db2-spark3.0

  • spark-tensorflow-connector: 1.15.0 (Scala 2.11) -> 1.15.0 (Scala 2.12)

  • xgboost4j and xgboost4j-spark: 0.90 -> 1.0.0

  • mleap-databricks-runtime: 0.17.0-4882dc3 (SNAPSHOT)

The following packages are removed:

  • TensorFlow (Java)

  • TensorFrames

  • Deep Learning Pipelines for Apache Spark (HorovodRunner is available in Python)

Added conda and pip commands to support notebook-scoped Python libraries (public preview)

Starting with Databricks Runtime 7.0 ML, you can use%pipand%condacommands to manage Python libraries installed in a notebook session. You can also use these commands to create a custom environment for a notebook and to reproduce this environment between notebooks. To enable this feature, in cluster settings, set theSpark configurationspark.databricks.conda.condaMagic.enabledtrue. For more information, seeNotebook-scoped Python libraries.

Deprecations and unsupported features

Databricks Runtime 7.0 ML does not supporttable access control. If you need table access control, we recommend that you use Databricks Runtime 7.0.

Known issues

  • Passing thesample_inputargument tomlflow.spark.log_modelin order to log an MLlib model in mleap format fails with an AttributeError due to an mleap API change. Upgrade to MLflow 1.9.0 as a workaround. You can install MLflow 1.9.0 usingNotebook-scoped Python librariesorWorkspace Libraries

System environment

The system environment in Databricks Runtime 7.0 ML differs from Databricks Runtime 7.0 as follows:


The following sections list the libraries included in Databricks Runtime 7.0 ML that differ from those included in Databricks Runtime 7.0.

Python libraries

Databricks Runtime 7.0 ML uses Conda for Python package management and includes many popular ML packages. The following section describes the Conda environment for Databricks Runtime 7.0 ML.

Python on CPU clusters

Python on GPU clusters

Spark packages containing Python modules

R libraries

The R libraries are identical to theR Librariesin Databricks Runtime 7.0 Beta.

Java and Scala libraries (Scala 2.12 cluster)

In addition to Java and Scala libraries in Databricks Runtime 7.0, Databricks Runtime 7.0 ML contains the following JARs:

