Available in VPC
Data Forest is a large-scale, multi-tenant big data processing cluster based on Apache Hadoop. Data Forest supports various big data frameworks to simplify data storage, data processing, and serving. Security technology is applied, and large-scale data is stored in distributed storage, so you can use it safely.

Data Forest features
-
Quick and easy analysis environment setup
In a container‑based serverless environment, you can launch apps quickly, and you can create the required Hadoop ecosystem as Apps to build your analysis environment. It provides an integrated multi-tenant-based platform designed to process large amounts of data and large numbers of users. Depending on the analysis purpose, you can perform batch analysis and long-lived analysis in a multi-tenant environment. -
Flexible scalability
Even after creating an app, you can expand or reduce containers to as many or as few as you need to use to respond to traffic flexibly. Because it is container-based, it can dynamically scale online and change quickly when needed. -
Enhanced security
Data Forest supports Kerberos/LDAP authentication as a secure Hadoop cluster with enhanced security. It provides a powerful security environment by using secret key encryption, so that other credentials are not transferred through a network. -
Guarantee of high network and disk performance
Data Forest uses app‑based computing nodes and Hadoop Distributed File System (HDFS) storage on the local disks of physical servers, and it guarantees smooth network and disk performance. -
Diverse components
Data Forest consists of components for storing, analyzing, and visualizing data. You may create and use components suitable for each purpose. HDFS, HBase, Kafka, and OpenTSDB are provided for data storage, Spark, Hive, Hive LLAP, Elasticsearch, Grafana, Hue, Trino, and Phoenix are provided for data analysis and processing, and Kibana and Zeppelin are provided for data visualization.
Data Forest user guide
Data Forest is available in Korea Region. Use this guide to get the most out of Data Forest.
- Data Forest overview: Learn about the service and its strengths, and find helpful resources and FAQs.
- Data Forest quickstart: Guides you through the entire process step-by-step.
- Data Forest prerequisites: View supported specifications.
- Getting started with Data Forest: Learn how to configure the client environment to access Data Forest and the Data Forest app.
- Using Data Forest
- Create and manage account: Learn how to create and manage a Data Forest account and how to certify your account.
- Create and manage apps: Learn how to create and manage Data Forest apps.
- Using Data Forest apps
- Access Quick links: Types of Quick links and how to access Quick links.
- Using Dev: Dev App details and how to use it.
- Using Elasticsearch: Elasticsearch details and precautions.
- Using Grafana: Grafana details, how to add data sources, and how to back up databases.
- Using HBase: HBase details and precautions.
- Using Hive: Hive details, how to connect, and precautions.
- Using Hue: Hue details.
- Using Kafka: Kafka details, how to use Kafka Manager, and precautions.
- Using Kibana: Kibana details.
- Using OpenTSDB: OpenTSDB details.
- Using Phoenix: Phoenix details.
- Using Spark History Server: Spark History Server details and how to view jobs.
- Using Trino: Trino details.
- Using Zepplin: Zeppelin details, interpreter settings, and backup.
- Using ZooKeeper: ZooKeeper details, how to integrate with other Apps, and precautions.
- Using Data Forest apps
- Monitoring: Learn how to monitor submitted batch jobs and Apps.
- Utilizing the Data Forest ecosystem
- Using HDFS: Learn how to upload and download files to and from HDFS.
- Using Public Hive: Learn how to create Hive databases and tables.
- Using Oozie: Learn how to write workflows.
- Using Spark: Learn how to submit Spark jobs.
- Data Forest use cases
- Copy HDFS data to Object Storage: Learn how to copy HDFS data to Object Storage.
- Register Spark batch jobs with Oozie scheduler: Learn how to register Spark batch jobs with the Oozie scheduler.
- Data processing with Spark and Hive: Learn how to process Spark and Hive data with the Zeppelin App and the Dev App.
- Data Forest resource management: Manage resources using NAVER Cloud Platform’s Resource Manager and Cloud Activity Tracer.
- Data Forest permissions management: Learn how to manage access and policies for Data Forest.
- Data Forest troubleshooting: Troubleshoot issues with Data Forest.
- Data Forest release notes: See documentation updates.
Data Forest related resources
Beyond the user guide, these resources provide additional context and support for Data Forest. Whether you're considering Data Forest or need in-depth information for development, marketing, and other purposes, these resources can help:
- Pricing and features: View pricing details and key capabilities.
- Data Forest easy start guide.
- Analyze big data with Data Forest: Basic usage of Data Forest.
- Build a big data analysis environment with Data Forest: Learn how to build the development environment required for big data analysis through a Notebook node and easily integrate with external systems.
- Latest announcements: Stay informed about service updates and news.
- FAQs: Get answers to common Data Forest questions.
- Customer Support: Get help if you can't find what you need in the user guide.
FAQs
You can quickly resolve your questions by checking the FAQs. If you don't find the answer you need in the FAQs, check the user guide for more details.
Q. Cloud Hadoop and Data Forest appear to be similar services, what are the differences?
A. The difference between the two services lies in server/serverless.
- Cloud Hadoop uses dedicated customer resources to build and provide a Hadoop cluster.
- It is a self‑managed offering in which you manage Hadoop yourself.
- It provides an open‑source web management tool (Apache Ambari) that you can manage directly.
- Data Forest is a serverless offering; you submit analysis Jobs (DL Jobs) to use it, and for Hadoop ecosystem components that must run long‑lived, you create Apps to analyze them easily.
- It is a managed offering in which high availability is ensured at the product level rather than you managing Hadoop directly.
- It provides more Apps than Cloud Hadoop, and you can submit GPU‑based deep learning Jobs.
Comparison
| Feature | Cloud Hadoop | Data Forest |
|---|---|---|
| Scalability | You decide the Hadoop cluster size yourself. | Managed by the service. |
| Cost | Cluster maintenance fees apply. | Charges apply for running jobs and storage. |
| Maintenance | Managed directly by the user; User management tool (Apache Ambari) supported. | Managed by the service. |
| Characteristics | You can configure the environment freely. | Provides diverse Apps. GPU-based deep learning jobs can be submitted |
Q. What features are provided to collect and process real-time data or configure ETL environments?
A. Although Data Forest does not directly provide real-time data collection and processing, the environment can be configured using the Hadoop Ecosystem, which consists of various services of NAVER Cloud Platform and apps provided by Data Forest. A service for professionally configuring ETL is scheduled to be released as a separate product in the future.