Data Forest overview

Prev Next

Available in VPC

Data Forest is a large-scale, multi-tenant big data processing cluster based on Apache Hadoop. Data Forest supports various big data frameworks to simplify data storage, data processing, and serving. Security technology is applied, and large-scale data is stored in distributed storage, so you can use it safely.

df-overview_storage_vpc_ko

Data Forest features

  • Quick and easy analysis environment setup
    In a container‑based serverless environment, you can launch apps quickly, and you can create the required Hadoop ecosystem as Apps to build your analysis environment. It provides an integrated multi-tenant-based platform designed to process large amounts of data and large numbers of users. Depending on the analysis purpose, you can perform batch analysis and long-lived analysis in a multi-tenant environment.

  • Flexible scalability
    Even after creating an app, you can expand or reduce containers to as many or as few as you need to use to respond to traffic flexibly. Because it is container-based, it can dynamically scale online and change quickly when needed.

  • Enhanced security
    Data Forest supports Kerberos/LDAP authentication as a secure Hadoop cluster with enhanced security. It provides a powerful security environment by using secret key encryption, so that other credentials are not transferred through a network.

  • Guarantee of high network and disk performance
    Data Forest uses app‑based computing nodes and Hadoop Distributed File System (HDFS) storage on the local disks of physical servers, and it guarantees smooth network and disk performance.

  • Diverse components
    Data Forest consists of components for storing, analyzing, and visualizing data. You may create and use components suitable for each purpose. HDFS, HBase, Kafka, and OpenTSDB are provided for data storage, Spark, Hive, Hive LLAP, Elasticsearch, Grafana, Hue, Trino, and Phoenix are provided for data analysis and processing, and Kibana and Zeppelin are provided for data visualization.

Data Forest user guide

Data Forest is available in Korea Region. Use this guide to get the most out of Data Forest.

Data Forest related resources

Beyond the user guide, these resources provide additional context and support for Data Forest. Whether you're considering Data Forest or need in-depth information for development, marketing, and other purposes, these resources can help:

FAQs

You can quickly resolve your questions by checking the FAQs. If you don't find the answer you need in the FAQs, check the user guide for more details.

Q. Cloud Hadoop and Data Forest appear to be similar services, what are the differences?
A. The difference between the two services lies in server/serverless.

  • Cloud Hadoop uses dedicated customer resources to build and provide a Hadoop cluster.
    • It is a self‑managed offering in which you manage Hadoop yourself.
    • It provides an open‑source web management tool (Apache Ambari) that you can manage directly.
  • Data Forest is a serverless offering; you submit analysis Jobs (DL Jobs) to use it, and for Hadoop ecosystem components that must run long‑lived, you create Apps to analyze them easily.
    • It is a managed offering in which high availability is ensured at the product level rather than you managing Hadoop directly.
    • It provides more Apps than Cloud Hadoop, and you can submit GPU‑based deep learning Jobs.

Comparison

Feature Cloud Hadoop Data Forest
Scalability You decide the Hadoop cluster size yourself. Managed by the service.
Cost Cluster maintenance fees apply. Charges apply for running jobs and storage.
Maintenance Managed directly by the user; User management tool (Apache Ambari) supported. Managed by the service.
Characteristics You can configure the environment freely. Provides diverse Apps. GPU-based deep learning jobs can be submitted

Q. What features are provided to collect and process real-time data or configure ETL environments?
A. Although Data Forest does not directly provide real-time data collection and processing, the environment can be configured using the Hadoop Ecosystem, which consists of various services of NAVER Cloud Platform and apps provided by Data Forest. A service for professionally configuring ETL is scheduled to be released as a separate product in the future.