- Print
- PDF
Prerequisites for using Data Forest
- Print
- PDF
Available in VPC
This document describes the information you need to know and pricing information for the smooth use of Data Forest.
Data Forest components
Data Forest consists of components available to store, analyze, and visualize data. Users may create and use components suitable for each purpose.
Purpose | Component |
---|---|
Data storage | - HDFS - HBase - OpenTSDB |
Data access and processing | - Hive - Spark - Phoenix - Elasticsearch - Kafka |
Data management | - Oozie - Zookeeper |
Data visualization | - Kibana - Zeppelin - Grafana - Hue |
Data Forest application types
Applications that you can use in Data Forest are as follows:
Application | Description |
---|---|
DEV-1.0.0 | - Plays the role of a client for all services provided in Data Forest - Runs HDFS commands or submits Spark Jobs - Builds client environment for HBase and Kafka |
ELASTICSEARCH-7.3.2 | - Creates Elasticsearch cluster - Provides OSS version |
GRAFANA-7.5.10 | - Provides Grafana servers - Can be integrated with OpenTSDB and used as a monitoring page |
HBASE-2.0.0 | - Provides Apache HBase clusters - Kerberos authentication applied |
HBASE-2.2.3 | - Provides Apache HBase clusters - Kerberos authentication not applied |
HIVESERVER2-LDAP-3.1.0 | - Provides Apache HiveServer2 - Authentication provided using the LDAP method - Can be used to build streaming platforms |
HUE-4.7.0 | - Apache Hue server - Provides an interface where you can browse files, edit codes, and submit jobs |
KAFKA-2.4.0 | - Provides Apache Kafka clusters - Can be used to build streaming platforms |
KIBANA-7.3.2 | - Provides Kibana servers - Provides OSS version - Can be used as a visualization tool for Elasticsearch |
OPENTSDB-2.4.1 | - Provides OpenTSDB servers - Can save time series data - Use HBase as storage |
PHOENIX-5.0.0 | - Provides Apache Phoenix servers - Queries can be run directly with Phoenix CLI provided |
SPARK-HISTORYSERVER-3.1.2 | - Provides Spark History Server - Can select to see only the jobs you executed |
TRINO-367 | - Provides Trino servers - Queries can be run directly with Trino Cli provided |
ZEEPPELIN-0.10.1 | - Provides Apache Zeppelin servers - Provides an interface that enables code editing |
ZOOKEEPER-3.4.13 | - Provides Apache Zookeeper ensembles - Required for running and using the HBase and Kafka apps |
Applications that you can use in Notebooks are as follows:
Application | Description |
---|---|
JUPYTERLAB | - Provides JupyterLab, a web interface based on Jupyter Notebook - Provides Object Storage integration and runs codes for data analysis |
Inter-application dependency information
The following describes inter-application dependency in Data Forest.
Direction of inter-application dependencies
In Data Forest, some apps have dependencies on each other, and this impacts the sequence in which the apps should be created. Create apps referring to each app's dependency direction.- OpenTSDB relies on HBase, and HBase relies on Zookeeper. Therefore, the recommended order for app creation is Zookeeper > HBase > OpenTSDB.
- Since Kafka depends on Zookeeper, it is crucial to create apps in the order of Zookeeper > Kafka.
- Also, create apps in the order of Elasticsearch > Kibana.
Direction of integration between apps
There are certain apps that can be integrated together, but this does not mean that there are any restrictions in app creation.
Application version information
The version information for applications provided by Data Forest is as follows:
Application | Version |
---|---|
DEV | 1.0.0 |
ELASTICSEARCH | 7.3.2 |
GRAFANA | 7.5.10 |
HBASE | 2.0.0, 2.2.3 |
HIVESERVER2-LDAP | 3.1.0 |
HUE | 4.7.0 |
KAFKA | 2.4.0 |
KIBANA | 7.3.2 |
OPENTSDB | 2.4.1 |
PHOENIX | 5.0.0 |
SPARK-HISTORYSERVER | 3.1.2 |
TRINO | 3.6.7 |
ZEPPELIN | 0.10.1 |
ZOOKEEPER | 3.4.13 |
The version information for applications provided by Notebooks is as follows:
Application | Version |
---|---|
JUPYTERLAB | 3.6.3 |
The supported apps' version may change, depending on their inter-app dependencies or integration availability.
Server specifications for Notebooks
CPU | Memory | Disk(HDD) |
---|---|---|
4 vCPUs | 16GB | 50GB |
4 vCPUs | 32GB | 50GB |
8 vCPUs | 16GB | 50GB |
8 vCPUs | 32GB | 50GB |
8 vCPUs | 64GB | 50GB |
Usage fees
For specific information regarding Data Forest usage fees, see the Portal > Service > Analytics > Data Forest menu or the Portal > Pricing menu.