Available in VPC
Use Load Balancer access log
You can analyze access logs collected in NCP Load Balancer using Data Query.
This example shows the process of refining access logs with the data query service, analyzing them with analysis template queries, and visualizing them with business intelligence (BI) tools.
Feature summary
You can use the following features to refine and use data:
- Create table with scanner - Using data as table by Data Catalog
- Refine raw data table - Simple ETL task by Data Query (or Data Flow)
- Visualize through integration with BI by Data Query
- Set to apply changed data
- Update partition value and schedule scanner by Data Catalog
- Parameter query with run scheduling feature by Data Query
1. Configure LB access log collection
You can check Load Balancer access log collection settings and automatic export settings for storing logs saved in CLA.
Configure Load Balancer access log collection
- You can activate Load Balancer access log collection feature and save the logs in CLA.
- Activate the access log collection feature for the Load Balancer access logs you need to analyze.
- For more information, see the Load Balancer user guide.
Configure automatic export in CLA
- In order to store the access logs saved in CLA permanently, you can set automatic export to Object Storage.
- Example of automatic export settings
Item Set value Export method Type-based save method Select bucket User bucket where automatic export data is to be stored Select log application_loadbalancer_access - For more information, see the CLA user guide.
2. Create data table to be analyzed
To create source data tables and refined data tables:
Create source data table (use data as table)
- Use the scanner feature of Data Catalog to create a source data table that refers to data saved in Object Storage.
- Refer to the following settings example and create and run a scanner:
- Scanner settings example
Item Set value Data type Object Storage Path s3a://[User bucket with CLA automatic export set]/CLA/AutoExport/application_loadbalancer_access/ Scan range Partial, 10 items Execution cycle On-demand Pattern Not used Partition setting All partitioning form Collection option Update table definition Table merging True Table number limit 1 - For more information about scanner, see the Data Catalog user guide.
Create refined data table (ETL task)
- Refer to the following example query provided by the data query service and create a refined data table:
- For large-sized data, you can use Dataflow service to process ETL tasks.
CREATE TABLE data_catalog.default.alb_access_log
WITH (
format='parquet'
) AS
WITH parsed AS (
SELECT
split(message, ' ') AS parts,
regexp_extract_all(message, '\"([^\"]*)\"') AS quoted_fields
FROM data_catalog.default.application_loadbalancer_access
)
SELECT
parts[1] AS http_version,
parts[2] AS client_ip_port,
CAST(parts[3] AS bigint) AS request_processing_time_ms,
CAST(parts[4] AS bigint) AS target_connection_time_ms,
CAST(parts[5] AS bigint) AS target_response_time_ms,
CAST(parts[6] AS bigint) AS response_processing_time_ms,
parts[7] AS target_status_code,
CAST(parts[8] AS bigint) AS received_bytes,
CAST(parts[9] AS bigint) AS sent_bytes,
parts[10] AS http_method,
quoted_fields[1] AS request,
quoted_fields[2] AS user_agent,
parts[ cardinality(parts) - 2 ] AS ssl_cipher,
parts[ cardinality(parts) - 1 ] AS ssl_version,
CAST(parse_datetime(parts[ cardinality(parts)], 'dd/MMM/yyyy:HH:mm:ss.SSS') AS timestamp) AS request_creation_time
FROM parsed
- Table is a unit that defines metadata required to read and manage data, such as schema and location of data.
- You can access the files saved in the storage with the created table resources.
3. Analyze traffic using template query
You can analyze traffic using the provided template queries. You can run queries through the console editor, BI, or CLI, or integrate with the user application business logics using JDBC or SDK.
- CLI user guide : How to connect and use CLI.
- JDBC user guide : Supports DB tool and BI connection.
- SDK user guide : Integrates with user application.
- External BI connection guide : Integrates with BIs, such as Superset and Zeppelin.
Provided template query
- Traffic analysis
-- 1-1. Number of requests by user agent (user_agent)
SELECT
user_agent,
count(*) AS count
FROM data_catalog.default.alb_access_log
GROUP BY 1
ORDER BY count DESC
LIMIT 5;
-- 1-2. Total number of requests by client IP
SELECT
split_part(client_ip_port, ':', 1) AS client_ip,
count(*) AS request_count
FROM data_catalog.default.alb_access_log
GROUP BY 1
ORDER BY request_count DESC
LIMIT 10;
-- 1-3. Most requested path (based on the request field)
SELECT
regexp_extract(request, '/[^ ]*') AS uri_path,
count(*) AS count
FROM data_catalog.default.alb_access_log
GROUP BY 1
ORDER BY count DESC
LIMIT 10;
-- [Backup] Analysis of number of requests by time period (minutes/hours)
SELECT
date_trunc('minute', request_creation_time) AS minute,
count(*) AS request_count
FROM data_catalog.default.alb_access_log
GROUP BY 1
ORDER BY 1
LIMIT 50 ;
-- [Backup] Distribution by response code
SELECT
target_status_code,
count(*) AS count
FROM data_catalog.default.alb_access_log
GROUP BY 1
ORDER BY count DESC;
- Security analysis
-- 2-1. Detection of abnormal HTTP status code request (e.g. 4xx, 5xx)
SELECT
split_part(client_ip_port, ':', 1) AS client_ip,
target_status_code,
count(*) AS error_count
FROM data_catalog.default.alb_access_log
WHERE target_status_code LIKE '4%' OR target_status_code LIKE '5%'
GROUP BY 1, 2
ORDER BY error_count DESC
LIMIT 50;
-- 2-2. Too many requests in short time (based on DoS detection)
SELECT
split_part(client_ip_port, ':', 1) AS client_ip,
date_trunc('minute', request_creation_time) AS minute,
count(*) AS request_count
FROM data_catalog.default.alb_access_log
GROUP BY 1, 2
HAVING count(*) > 100 -- Modify Threshold Here
ORDER BY request_count DESC
LIMIT 10;
-- 2-3. Filtering abnormally long request URL (e.g. attack vector)
SELECT
split_part(client_ip_port, ':', 1) AS client_ip,
regexp_extract(request, '/[^ ]*') AS uri_path,
length(request) AS request_length
FROM data_catalog.default.alb_access_log
WHERE length(regexp_extract(request, '/[^ ]*')) > 500 -- Modify Threshold Here
ORDER BY request_length DESC
LIMIT 20;
- Performance analysis
--- 3-1. Average response speed by client IP
SELECT
split_part(client_ip_port, ':', 1) AS client_ip,
round(avg(cast(request_processing_time_ms AS double)), 2) AS request_processing_time_ms,
round(avg(cast(target_connection_time_ms AS double)), 2) AS target_connection_time_ms,
round(avg(cast(target_response_time_ms AS double)), 2) AS target_response_time_ms,
round(avg(cast(response_processing_time_ms AS double)), 2) AS response_processing_time_ms
FROM data_catalog.default.alb_access_log
GROUP BY 1
ORDER BY request_processing_time_ms DESC
LIMIT 10;
-- 3-2. Average response speed by request URI (top 10 slow URI)
SELECT
regexp_extract(request, '/[^ ]*') AS uri_path,
round(avg(cast(request_processing_time_ms AS double)), 2) AS request_processing_time_ms,
round(avg(cast(target_connection_time_ms AS double)), 2) AS target_connection_time_ms,
round(avg(cast(target_response_time_ms AS double)), 2) AS target_response_time_ms,
round(avg(cast(response_processing_time_ms AS double)), 2) AS response_processing_time_ms
FROM data_catalog.default.alb_access_log
GROUP BY 1
ORDER BY request_processing_time_ms DESC
LIMIT 10;
-- 3-3. Trend of average processing time in minutes
SELECT
date_trunc('minute', request_creation_time) AS minute,
round(avg(cast(request_processing_time_ms AS double)), 2) AS request_processing_time_ms,
round(avg(cast(response_processing_time_ms AS double)), 2) AS response_processing_time_ms
FROM data_catalog.default.alb_access_log
GROUP BY 1
ORDER BY 1
LIMIT 20;
-- 3-4. Average response time analysis by status code
SELECT
target_status_code,
round(avg(cast(request_processing_time_ms AS double)), 2) AS request_processing_time_ms,
count(*) AS request_count
FROM data_catalog.default.alb_access_log
GROUP BY 1
ORDER BY request_processing_time_ms DESC;
BI use examples
You can use refined data tables you created and template queries to visualize them with BI tools. For more information about connecting with an external BI, see the Data Query user guide.
- Example: Visualization dashboard using Superset BI

4. Set to apply changed data (automatic update)
The access log data created in LB is saved in CLA, and the CLA data is stored in Object Storage once a day through the automatic export feature. To update source data tables and refined data tables with the added data, proceed with Set Data Catalog scanner's execution cycle.
- If no update is needed for the daily added data, you can skip this setting.
Set Data Catalog scanner's execution cycle
By setting Data Catalog scanner's execution cycle, you can set the metadata (partition) of the source data table to be automatically updated. Set the scanner execution cycle considering the operation time of CLA's automatic export feature.
- Example of scanner execution cycle setting
Item Set value Data type Object Storage Path s3a://[User bucket with CLA automatic export set]/CLA/AutoExport/application_loadbalancer_access/ Scan range Partial, 10 items Execution cycle 03:30 everyday Pattern Not used Partition setting All partitioning form Collection option Update table definition Table merging True Table number limit 1 - For more information about Data Catalog scanner, see the Data Catalog user guide.
Schedule Data Query parameter query run
With Data Query's parameter query feature and scheduling query run, you can run queries periodically and run queries dynamically by binding parameters. Save the following example query into My queries and set parameters and schedule query run. For more information, see Data Query user guide.
- Guide to saving query statements
- Guide to running queries using parameters
- Guide to scheduling saved query's execution
INSERT INTO data_catalog.default.alb_access_log
WITH parsed AS (
SELECT
split(message, ' ') AS parts,
regexp_extract_all(message, '\"([^\"]*)\"') AS quoted_fields
FROM data_catalog.default.application_loadbalancer_access
WHERE partition_0 = ?
AND partition_1 = ?
AND partition_2 = ?
)
SELECT
parts[1] AS http_version,
parts[2] AS client_ip_port,
CAST(parts[3] AS bigint) AS request_processing_time_ms,
CAST(parts[4] AS bigint) AS target_connection_time_ms,
CAST(parts[5] AS bigint) AS target_response_time_ms,
CAST(parts[6] AS bigint) AS response_processing_time_ms,
parts[7] AS target_status_code,
CAST(parts[8] AS bigint) AS received_bytes,
CAST(parts[9] AS bigint) AS sent_bytes,
parts[10] AS http_method,
quoted_fields[1] AS request,
quoted_fields[2] AS user_agent,
parts[ cardinality(parts) - 2 ] AS ssl_cipher,
parts[ cardinality(parts) - 1 ] AS ssl_version,
CAST(parse_datetime(parts[ cardinality(parts)], 'dd/MMM/yyyy:HH:mm:ss.SSS') AS timestamp) AS request_creation_time
FROM parsed
- Parameter settings
Item Set value Parameter 1 date_format((current_timestamp AT TIME ZONE 'Asia/Seoul') - interval '1' day, '%Y') Parameter 2 date_format((current_timestamp AT TIME ZONE 'Asia/Seoul') - interval '1' day, '%m') Parameter 3 date_format((current_timestamp AT TIME ZONE 'Asia/Seoul') - interval '1' day, '%d') - Example of query execution schedule settings
Item Set value Scheduling ON Execution type Execute cycle Execution cycle Everyday Execution time 03:45