Remove how-to-run-queries-periodically-in-apache-hive
article thumbnail

Optimizing Cloudera Data Engineering Autoscaling Performance

Cloudera

Normally on-premises, one of the key challenges was how to allocate resources within a finite set of resources (i.e., When building CDE, we integrated with Apache YuniKorn which offers rich scheduling capabilities on Kubernetes. . We tested the scaling capabilities of CDE with the following job runs to mimic a real-world scenario: .

article thumbnail

Migrate Hive data from CDH to CDP public cloud

Cloudera

Many Cloudera customers are making the transition from being completely on-prem to cloud by either backing up their data in the cloud, or running multi-functional analytics on CDP Public cloud in AWS or Azure. The Replication Manager service facilitates both disaster recovery and data migration across different environments.

Cloud 68
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Fraud Detection With Cloudera Stream Processing Part 2: Real-Time Streaming Analytics

Cloudera

In part 1 of this blog we discussed how Cloudera DataFlow for the Public Cloud (CDF-PC), the universal data distribution service powered by Apache NiFi, can make it easy to acquire data from wherever it originates and move it efficiently to make it available to other applications in a streaming fashion. Data decays! Use case recap.

article thumbnail

Admission Control Architecture for Cloudera Data Platform

Cloudera

Apache Impala is a massively parallel in-memory SQL engine supported by Cloudera designed for Analytics and ad hoc queries against data stored in Apache Hive, Apache HBase and Apache Kudu tables. Anatomy of Impala Query Execution. Introduction.

article thumbnail

Fine-Grained Authorization with Apache Kudu and Apache Ranger

Cloudera

which made it possible to restrict access only to Apache Impala where Apache Sentry policies could be applied, enabling a lot more use cases. which made it possible to restrict access only to Apache Impala where Apache Sentry policies could be applied, enabling a lot more use cases. How it works.

article thumbnail

Cost Conscious Data Warehousing with Cloudera Data Platform

Cloudera

Continuous resource consumption in the cloud (billable on-demand by a running clock) makes no sense today because a better option is available: resource consumption that starts when you need it and stops when you don’t. If not, before adopting a cloud data warehouse, consider the true costs of a cloud-native data warehouse.

Data 98
article thumbnail

Materialized Views in Hive for Iceberg Table Format

Cloudera

Apache Iceberg is a high-performance open table format for petabyte-scale analytic datasets. It brings the reliability and simplicity of SQL tables to big data while enabling engines like Hive, Impala, Spark, Trino, Flink, and Presto to work with the same tables at the same time. Such a query pattern is quite common in BI queries.

Groups 88