article thumbnail

Delivering Modern Enterprise Data Engineering with Cloudera Data Engineering on Azure

Cloudera

After the launch of CDP Data Engineering (CDE) on AWS a few months ago, we are thrilled to announce that CDE, the only cloud-native service purpose built for enterprise data engineers, is now available on Microsoft Azure. . Prerequisites for deploying CDP Data Engineering on Azure can be found here.

article thumbnail

Running unsupported Azure Python SDK on my brand new M2 Mac

Xebia

This worked out great until I tried to follow a tutorial written by a colleague which used the Azure Python SDK to create a dataset and upload it to an Azure storage account. brew install azure-cli brew install poetry etc. For example docker commands stopped working. pip install azureml-dataset-runtime==1.40.0

Azure 130
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

DNS Zone Setup Best Practices on Azure

Cloudera

In this blog, we’ll take you through our tried and tested best practices for setting up your DNS for use with Cloudera on Azure. Most Azure users use hub-spoke network topology. DNS servers are usually deployed in the hub virtual network or an on-prem data center instead of in the Cloudera VNET.

Azure 52
article thumbnail

Make the leap to Hybrid with Cloudera Data Engineering

Cloudera

When we introduced Cloudera Data Engineering (CDE) in the Public Cloud in 2020 it was a culmination of many years of working alongside companies as they deployed Apache Spark based ETL workloads at scale. Each unlocking value in the data engineering workflows enterprises can start taking advantage of. Usage Patterns.

article thumbnail

Managing Python dependencies for Spark workloads in Cloudera Data Engineering

Cloudera

Cloudera Data Engineering (CDE) is a cloud-native service purpose-built for enterprise data engineering teams. CDE is already available in CDP Public Cloud (AWS & Azure) and will soon be available in CDP Private Cloud Experiences. Here is an example showing a simple PySpark program querying an ACID table.

article thumbnail

How to use Multiple Databricks Workspaces with one dbt Cloud Project

Xebia

Setup the Azure Service Principal : We want to avoid Personal Tokens that are associated with a specific user as much as possible, so we will use a SP to authenticate dbt with Databricks. For this project, we will use Azure as our Cloud provider. We will call them data-platform-udev and data-platform-uprod.

Cloud 130
article thumbnail

Use Terraform to create ADF pipelines

Xebia

Most of the online resources suggest to use Azure Data factory (ADF ) in Git mode instead of Live mode as it has some advantages. For example, ability to work on the resources as a team in a collaborative manner or ability to revert changes that introduced bugs. When they do, null_resource part should not be necessary anymore.