Use Load Balancer access log

Available in VPC

Use Load Balancer access log

You can analyze access logs collected in NCP Load Balancer using Data Query.
This example shows the process of refining access logs with the data query service, analyzing them with analysis template queries, and visualizing them with business intelligence (BI) tools.

Feature summary

You can use the following features to refine and use data:

Create table with scanner - Using data as table by Data Catalog
Refine raw data table - Simple ETL task by Data Query (or Data Flow)
Visualize through integration with BI by Data Query
Set to apply changed data
- Update partition value and schedule scanner by Data Catalog
- Parameter query with run scheduling feature by Data Query

1. Configure LB access log collection

You can check Load Balancer access log collection settings and automatic export settings for storing logs saved in CLA.

Configure Load Balancer access log collection

You can activate Load Balancer access log collection feature and save the logs in CLA.
Activate the access log collection feature for the Load Balancer access logs you need to analyze.
For more information, see the Load Balancer user guide.

Configure automatic export in CLA

In order to store the access logs saved in CLA permanently, you can set automatic export to Object Storage.
Example of automatic export settings

Item Set value

Export method Type-based save method

Select bucket User bucket where automatic export data is to be stored

Select log application_loadbalancer_access
For more information, see the CLA user guide.

Item	Set value
Export method	Type-based save method
Select bucket	User bucket where automatic export data is to be stored
Select log	application_loadbalancer_access

2. Create data table to be analyzed

To create source data tables and refined data tables:

Create source data table (use data as table)

Use the scanner feature of Data Catalog to create a source data table that refers to data saved in Object Storage.
Refer to the following settings example and create and run a scanner:

Scanner settings example

Item	Set value
Data type	Object Storage
Path	s3a://[User bucket with CLA automatic export set]/CLA/AutoExport/application_loadbalancer_access/
Scan range	Partial, 10 items
Execution cycle	On-demand
Pattern	Not used
Partition setting	All partitioning form
Collection option	Update table definition
Table merging	True
Table number limit	1

For more information about scanner, see the Data Catalog user guide.

Create refined data table (ETL task)

Refer to the following example query provided by the data query service and create a refined data table:
For large-sized data, you can use Dataflow service to process ETL tasks.

CREATE TABLE data_catalog.default.alb_access_log
WITH (
    format='parquet'
) AS
WITH parsed AS (
SELECT
    split(message, ' ') AS parts,
    regexp_extract_all(message, '\"([^\"]*)\"') AS quoted_fields
FROM data_catalog.default.application_loadbalancer_access
)
SELECT
    parts[1]  AS http_version,
    parts[2]  AS client_ip_port,
    CAST(parts[3] AS bigint)  AS request_processing_time_ms,
    CAST(parts[4] AS bigint)  AS target_connection_time_ms,
    CAST(parts[5] AS bigint)  AS target_response_time_ms,
    CAST(parts[6] AS bigint)  AS response_processing_time_ms,
    parts[7]                  AS target_status_code,
    CAST(parts[8] AS bigint)  AS received_bytes,
    CAST(parts[9] AS bigint)  AS sent_bytes,
    parts[10] AS http_method,
    quoted_fields[1] AS request,
    quoted_fields[2] AS user_agent,
    parts[ cardinality(parts) - 2 ] AS ssl_cipher,
    parts[ cardinality(parts) - 1 ] AS ssl_version,
    CAST(parse_datetime(parts[ cardinality(parts)], 'dd/MMM/yyyy:HH:mm:ss.SSS') AS timestamp) AS request_creation_time
FROM parsed

Note

Table is a unit that defines metadata required to read and manage data, such as schema and location of data.
You can access the files saved in the storage with the created table resources.

3. Analyze traffic using template query

You can analyze traffic using the provided template queries. You can run queries through the console editor, BI, or CLI, or integrate with the user application business logics using JDBC or SDK.

CLI user guide : How to connect and use CLI.
JDBC user guide : Supports DB tool and BI connection.
SDK user guide : Integrates with user application.
External BI connection guide : Integrates with BIs, such as Superset and Zeppelin.

Provided template query

Traffic analysis

-- 1-1. Number of requests by user agent (user_agent)
SELECT
  user_agent,
  count(*) AS count
FROM data_catalog.default.alb_access_log
GROUP BY 1
ORDER BY count DESC
LIMIT 5;

-- 1-2. Total number of requests by client IP
SELECT
  split_part(client_ip_port, ':', 1) AS client_ip,
  count(*) AS request_count
FROM data_catalog.default.alb_access_log
GROUP BY 1
ORDER BY request_count DESC
LIMIT 10;

-- 1-3. Most requested path (based on the request field)
SELECT
  regexp_extract(request, '/[^ ]*') AS uri_path,
  count(*) AS count
FROM data_catalog.default.alb_access_log
GROUP BY 1
ORDER BY count DESC
LIMIT 10;

-- [Backup] Analysis of number of requests by time period (minutes/hours)
SELECT
  date_trunc('minute', request_creation_time) AS minute,
  count(*) AS request_count
FROM data_catalog.default.alb_access_log
GROUP BY 1
ORDER BY 1
LIMIT 50 ;

-- [Backup] Distribution by response code
SELECT
  target_status_code,
  count(*) AS count
FROM data_catalog.default.alb_access_log
GROUP BY 1
ORDER BY count DESC;

Security analysis

-- 2-1. Detection of abnormal HTTP status code request (e.g. 4xx, 5xx)
SELECT
  split_part(client_ip_port, ':', 1) AS client_ip,
  target_status_code,
  count(*) AS error_count
FROM data_catalog.default.alb_access_log
WHERE target_status_code LIKE '4%' OR target_status_code LIKE '5%'
GROUP BY 1, 2
ORDER BY error_count DESC
LIMIT 50;

-- 2-2. Too many requests in short time (based on DoS detection)
SELECT
  split_part(client_ip_port, ':', 1) AS client_ip,
  date_trunc('minute', request_creation_time) AS minute,
  count(*) AS request_count
FROM data_catalog.default.alb_access_log
GROUP BY 1, 2
HAVING count(*) > 100 -- Modify Threshold Here
ORDER BY request_count DESC
LIMIT 10;

-- 2-3. Filtering abnormally long request URL (e.g. attack vector)
SELECT
  split_part(client_ip_port, ':', 1) AS client_ip,
  regexp_extract(request, '/[^ ]*') AS uri_path,
  length(request) AS request_length
FROM data_catalog.default.alb_access_log
WHERE length(regexp_extract(request, '/[^ ]*')) > 500 -- Modify Threshold Here
ORDER BY request_length DESC
LIMIT 20;

Performance analysis

--- 3-1. Average response speed by client IP
SELECT
  split_part(client_ip_port, ':', 1) AS client_ip,
  round(avg(cast(request_processing_time_ms AS double)), 2) AS request_processing_time_ms,
  round(avg(cast(target_connection_time_ms AS double)), 2) AS target_connection_time_ms,
  round(avg(cast(target_response_time_ms AS double)), 2) AS target_response_time_ms,
  round(avg(cast(response_processing_time_ms AS double)), 2) AS response_processing_time_ms
FROM data_catalog.default.alb_access_log
GROUP BY 1
ORDER BY request_processing_time_ms DESC
LIMIT 10;

-- 3-2. Average response speed by request URI (top 10 slow URI)
SELECT
  regexp_extract(request, '/[^ ]*') AS uri_path,
  round(avg(cast(request_processing_time_ms AS double)), 2) AS request_processing_time_ms,
  round(avg(cast(target_connection_time_ms AS double)), 2) AS target_connection_time_ms,
  round(avg(cast(target_response_time_ms AS double)), 2) AS target_response_time_ms,
  round(avg(cast(response_processing_time_ms AS double)), 2) AS response_processing_time_ms
FROM data_catalog.default.alb_access_log
GROUP BY 1
ORDER BY request_processing_time_ms DESC
LIMIT 10;

-- 3-3. Trend of average processing time in minutes
SELECT
  date_trunc('minute', request_creation_time) AS minute,
  round(avg(cast(request_processing_time_ms AS double)), 2) AS request_processing_time_ms,
  round(avg(cast(response_processing_time_ms AS double)), 2) AS response_processing_time_ms
FROM data_catalog.default.alb_access_log
GROUP BY 1
ORDER BY 1
LIMIT 20;

-- 3-4. Average response time analysis by status code
SELECT
  target_status_code,
  round(avg(cast(request_processing_time_ms AS double)), 2) AS request_processing_time_ms,
  count(*) AS request_count
FROM data_catalog.default.alb_access_log
GROUP BY 1
ORDER BY request_processing_time_ms DESC;

BI use examples

You can use refined data tables you created and template queries to visualize them with BI tools. For more information about connecting with an external BI, see the Data Query user guide.

Example: Visualization dashboard using Superset BI

dataquery-example-lb-accesslog-bi

4. Set to apply changed data (automatic update)

The access log data created in LB is saved in CLA, and the CLA data is stored in Object Storage once a day through the automatic export feature. To update source data tables and refined data tables with the added data, proceed with Set Data Catalog scanner's execution cycle.

If no update is needed for the daily added data, you can skip this setting.

Set Data Catalog scanner's execution cycle

By setting Data Catalog scanner's execution cycle, you can set the metadata (partition) of the source data table to be automatically updated. Set the scanner execution cycle considering the operation time of CLA's automatic export feature.

Example of scanner execution cycle setting

Item	Set value
Data type	Object Storage
Path	s3a://[User bucket with CLA automatic export set]/CLA/AutoExport/application_loadbalancer_access/
Scan range	Partial, 10 items
Execution cycle	03:30 everyday
Pattern	Not used
Partition setting	All partitioning form
Collection option	Update table definition
Table merging	True
Table number limit	1

For more information about Data Catalog scanner, see the Data Catalog user guide.

Schedule Data Query parameter query run

With Data Query's parameter query feature and scheduling query run, you can run queries periodically and run queries dynamically by binding parameters. Save the following example query into My queries and set parameters and schedule query run. For more information, see Data Query user guide.

INSERT INTO data_catalog.default.alb_access_log
WITH parsed AS (
SELECT
    split(message, ' ') AS parts,
    regexp_extract_all(message, '\"([^\"]*)\"') AS quoted_fields
FROM data_catalog.default.application_loadbalancer_access
WHERE partition_0 = ?
  AND partition_1 = ?
  AND partition_2 = ?
)
SELECT
    parts[1]  AS http_version,
    parts[2]  AS client_ip_port,
    CAST(parts[3] AS bigint)  AS request_processing_time_ms,
    CAST(parts[4] AS bigint)  AS target_connection_time_ms,
    CAST(parts[5] AS bigint)  AS target_response_time_ms,
    CAST(parts[6] AS bigint)  AS response_processing_time_ms,
    parts[7]                  AS target_status_code,
    CAST(parts[8] AS bigint)  AS received_bytes,
    CAST(parts[9] AS bigint)  AS sent_bytes,
    parts[10] AS http_method,
    quoted_fields[1] AS request,
    quoted_fields[2] AS user_agent,
    parts[ cardinality(parts) - 2 ] AS ssl_cipher,
    parts[ cardinality(parts) - 1 ] AS ssl_version,
    CAST(parse_datetime(parts[ cardinality(parts)], 'dd/MMM/yyyy:HH:mm:ss.SSS') AS timestamp) AS request_creation_time
FROM parsed

Parameter settings

Item	Set value
Parameter 1	date_format((current_timestamp AT TIME ZONE 'Asia/Seoul') - interval '1' day, '%Y')
Parameter 2	date_format((current_timestamp AT TIME ZONE 'Asia/Seoul') - interval '1' day, '%m')
Parameter 3	date_format((current_timestamp AT TIME ZONE 'Asia/Seoul') - interval '1' day, '%d')

Example of query execution schedule settings

Item Set value

Scheduling ON

Execution type Execute cycle

Execution cycle Everyday

Execution time 03:45

Item	Set value
Scheduling	ON
Execution type	Execute cycle
Execution cycle	Everyday
Execution time	03:45

Documentation Index

Use Load Balancer access log