Managing Python dependencies for Spark workloads in Cloudera Data Engineering
Cloudera
APRIL 30, 2021
Apache Spark is now widely used in many enterprises for building high-performance ETL and Machine Learning pipelines. If the users are already familiar with Python then PySpark provides a python API for using Apache Spark. Apache Spark provides several options to manage these dependencies.
Let's personalize your content