Use Load Balancer access log

Prev Next

Available in VPC

Use Load Balancer access log

You can analyze access logs collected in NCP Load Balancer using Data Query.
This example shows the process of refining access logs with the data query service, analyzing them with analysis template queries, and visualizing them with business intelligence (BI) tools.

Feature summary

You can use the following features to refine and use data:


1. Configure LB access log collection

You can check Load Balancer access log collection settings and automatic export settings for storing logs saved in CLA.

Configure Load Balancer access log collection

  • You can activate Load Balancer access log collection feature and save the logs in CLA.
  • Activate the access log collection feature for the Load Balancer access logs you need to analyze.
  • For more information, see the Load Balancer user guide.

Configure automatic export in CLA

  • In order to store the access logs saved in CLA permanently, you can set automatic export to Object Storage.
  • Example of automatic export settings
    Item Set value
    Export method Type-based save method
    Select bucket User bucket where automatic export data is to be stored
    Select log application_loadbalancer_access
  • For more information, see the CLA user guide.

2. Create data table to be analyzed

To create source data tables and refined data tables:

Create source data table (use data as table)

  • Use the scanner feature of Data Catalog to create a source data table that refers to data saved in Object Storage.
  • Refer to the following settings example and create and run a scanner:
  • Scanner settings example
    Item Set value
    Data type Object Storage
    Path s3a://[User bucket with CLA automatic export set]/CLA/AutoExport/application_loadbalancer_access/
    Scan range Partial, 10 items
    Execution cycle On-demand
    Pattern Not used
    Partition setting All partitioning form
    Collection option Update table definition
    Table merging True
    Table number limit 1
  • For more information about scanner, see the Data Catalog user guide.

Create refined data table (ETL task)

  • Refer to the following example query provided by the data query service and create a refined data table:
  • For large-sized data, you can use Dataflow service to process ETL tasks.
CREATE TABLE data_catalog.default.alb_access_log
WITH (
    format='parquet'
) AS
WITH parsed AS (
SELECT
    split(message, ' ') AS parts,
    regexp_extract_all(message, '\"([^\"]*)\"') AS quoted_fields
FROM data_catalog.default.application_loadbalancer_access
)
SELECT
    parts[1]  AS http_version,
    parts[2]  AS client_ip_port,
    CAST(parts[3] AS bigint)  AS request_processing_time_ms,
    CAST(parts[4] AS bigint)  AS target_connection_time_ms,
    CAST(parts[5] AS bigint)  AS target_response_time_ms,
    CAST(parts[6] AS bigint)  AS response_processing_time_ms,
    parts[7]                  AS target_status_code,
    CAST(parts[8] AS bigint)  AS received_bytes,
    CAST(parts[9] AS bigint)  AS sent_bytes,
    parts[10] AS http_method,
    quoted_fields[1] AS request,
    quoted_fields[2] AS user_agent,
    parts[ cardinality(parts) - 2 ] AS ssl_cipher,
    parts[ cardinality(parts) - 1 ] AS ssl_version,
    CAST(parse_datetime(parts[ cardinality(parts)], 'dd/MMM/yyyy:HH:mm:ss.SSS') AS timestamp) AS request_creation_time
FROM parsed 
Note
  • Table is a unit that defines metadata required to read and manage data, such as schema and location of data.
  • You can access the files saved in the storage with the created table resources.

3. Analyze traffic using template query

You can analyze traffic using the provided template queries. You can run queries through the console editor, BI, or CLI, or integrate with the user application business logics using JDBC or SDK.

Provided template query

  • Traffic analysis
-- 1-1. Number of requests by user agent (user_agent)
SELECT
  user_agent,
  count(*) AS count
FROM data_catalog.default.alb_access_log
GROUP BY 1
ORDER BY count DESC
LIMIT 5;

-- 1-2. Total number of requests by client IP
SELECT
  split_part(client_ip_port, ':', 1) AS client_ip,
  count(*) AS request_count
FROM data_catalog.default.alb_access_log
GROUP BY 1
ORDER BY request_count DESC
LIMIT 10;

-- 1-3. Most requested path (based on the request field)
SELECT
  regexp_extract(request, '/[^ ]*') AS uri_path,
  count(*) AS count
FROM data_catalog.default.alb_access_log
GROUP BY 1
ORDER BY count DESC
LIMIT 10;

-- [Backup] Analysis of number of requests by time period (minutes/hours)
SELECT
  date_trunc('minute', request_creation_time) AS minute,
  count(*) AS request_count
FROM data_catalog.default.alb_access_log
GROUP BY 1
ORDER BY 1
LIMIT 50 ;

-- [Backup] Distribution by response code
SELECT
  target_status_code,
  count(*) AS count
FROM data_catalog.default.alb_access_log
GROUP BY 1
ORDER BY count DESC;
  • Security analysis
-- 2-1. Detection of abnormal HTTP status code request (e.g. 4xx, 5xx)
SELECT
  split_part(client_ip_port, ':', 1) AS client_ip,
  target_status_code,
  count(*) AS error_count
FROM data_catalog.default.alb_access_log
WHERE target_status_code LIKE '4%' OR target_status_code LIKE '5%'
GROUP BY 1, 2
ORDER BY error_count DESC
LIMIT 50;

-- 2-2. Too many requests in short time (based on DoS detection)
SELECT
  split_part(client_ip_port, ':', 1) AS client_ip,
  date_trunc('minute', request_creation_time) AS minute,
  count(*) AS request_count
FROM data_catalog.default.alb_access_log
GROUP BY 1, 2
HAVING count(*) > 100 -- Modify Threshold Here
ORDER BY request_count DESC
LIMIT 10;

-- 2-3. Filtering abnormally long request URL (e.g. attack vector)
SELECT
  split_part(client_ip_port, ':', 1) AS client_ip,
  regexp_extract(request, '/[^ ]*') AS uri_path,
  length(request) AS request_length
FROM data_catalog.default.alb_access_log
WHERE length(regexp_extract(request, '/[^ ]*')) > 500 -- Modify Threshold Here
ORDER BY request_length DESC
LIMIT 20;
  • Performance analysis
--- 3-1. Average response speed by client IP
SELECT
  split_part(client_ip_port, ':', 1) AS client_ip,
  round(avg(cast(request_processing_time_ms AS double)), 2) AS request_processing_time_ms,
  round(avg(cast(target_connection_time_ms AS double)), 2) AS target_connection_time_ms,
  round(avg(cast(target_response_time_ms AS double)), 2) AS target_response_time_ms,
  round(avg(cast(response_processing_time_ms AS double)), 2) AS response_processing_time_ms
FROM data_catalog.default.alb_access_log
GROUP BY 1
ORDER BY request_processing_time_ms DESC
LIMIT 10;

-- 3-2. Average response speed by request URI (top 10 slow URI)
SELECT
  regexp_extract(request, '/[^ ]*') AS uri_path,
  round(avg(cast(request_processing_time_ms AS double)), 2) AS request_processing_time_ms,
  round(avg(cast(target_connection_time_ms AS double)), 2) AS target_connection_time_ms,
  round(avg(cast(target_response_time_ms AS double)), 2) AS target_response_time_ms,
  round(avg(cast(response_processing_time_ms AS double)), 2) AS response_processing_time_ms
FROM data_catalog.default.alb_access_log
GROUP BY 1
ORDER BY request_processing_time_ms DESC
LIMIT 10;

-- 3-3. Trend of average processing time in minutes
SELECT
  date_trunc('minute', request_creation_time) AS minute,
  round(avg(cast(request_processing_time_ms AS double)), 2) AS request_processing_time_ms,
  round(avg(cast(response_processing_time_ms AS double)), 2) AS response_processing_time_ms
FROM data_catalog.default.alb_access_log
GROUP BY 1
ORDER BY 1
LIMIT 20;

-- 3-4. Average response time analysis by status code
SELECT
  target_status_code,
  round(avg(cast(request_processing_time_ms AS double)), 2) AS request_processing_time_ms,
  count(*) AS request_count
FROM data_catalog.default.alb_access_log
GROUP BY 1
ORDER BY request_processing_time_ms DESC;

BI use examples

You can use refined data tables you created and template queries to visualize them with BI tools. For more information about connecting with an external BI, see the Data Query user guide.

  • Example: Visualization dashboard using Superset BI

dataquery-example-lb-accesslog-bi

4. Set to apply changed data (automatic update)

The access log data created in LB is saved in CLA, and the CLA data is stored in Object Storage once a day through the automatic export feature. To update source data tables and refined data tables with the added data, proceed with Set Data Catalog scanner's execution cycle.

  • If no update is needed for the daily added data, you can skip this setting.

Set Data Catalog scanner's execution cycle

By setting Data Catalog scanner's execution cycle, you can set the metadata (partition) of the source data table to be automatically updated. Set the scanner execution cycle considering the operation time of CLA's automatic export feature.

  • Example of scanner execution cycle setting
    Item Set value
    Data type Object Storage
    Path s3a://[User bucket with CLA automatic export set]/CLA/AutoExport/application_loadbalancer_access/
    Scan range Partial, 10 items
    Execution cycle 03:30 everyday
    Pattern Not used
    Partition setting All partitioning form
    Collection option Update table definition
    Table merging True
    Table number limit 1
  • For more information about Data Catalog scanner, see the Data Catalog user guide.

Schedule Data Query parameter query run

With Data Query's parameter query feature and scheduling query run, you can run queries periodically and run queries dynamically by binding parameters. Save the following example query into My queries and set parameters and schedule query run. For more information, see Data Query user guide.

INSERT INTO data_catalog.default.alb_access_log
WITH parsed AS (
SELECT
    split(message, ' ') AS parts,
    regexp_extract_all(message, '\"([^\"]*)\"') AS quoted_fields
FROM data_catalog.default.application_loadbalancer_access
WHERE partition_0 = ?
  AND partition_1 = ?
  AND partition_2 = ?
)
SELECT
    parts[1]  AS http_version,
    parts[2]  AS client_ip_port,
    CAST(parts[3] AS bigint)  AS request_processing_time_ms,
    CAST(parts[4] AS bigint)  AS target_connection_time_ms,
    CAST(parts[5] AS bigint)  AS target_response_time_ms,
    CAST(parts[6] AS bigint)  AS response_processing_time_ms,
    parts[7]                  AS target_status_code,
    CAST(parts[8] AS bigint)  AS received_bytes,
    CAST(parts[9] AS bigint)  AS sent_bytes,
    parts[10] AS http_method,
    quoted_fields[1] AS request,
    quoted_fields[2] AS user_agent,
    parts[ cardinality(parts) - 2 ] AS ssl_cipher,
    parts[ cardinality(parts) - 1 ] AS ssl_version,
    CAST(parse_datetime(parts[ cardinality(parts)], 'dd/MMM/yyyy:HH:mm:ss.SSS') AS timestamp) AS request_creation_time
FROM parsed 
  • Parameter settings
    Item Set value
    Parameter 1 date_format((current_timestamp AT TIME ZONE 'Asia/Seoul') - interval '1' day, '%Y')
    Parameter 2 date_format((current_timestamp AT TIME ZONE 'Asia/Seoul') - interval '1' day, '%m')
    Parameter 3 date_format((current_timestamp AT TIME ZONE 'Asia/Seoul') - interval '1' day, '%d')
  • Example of query execution schedule settings
    Item Set value
    Scheduling ON
    Execution type Execute cycle
    Execution cycle Everyday
    Execution time 03:45