HEaaN Homomorphic Analytics use cases
    • PDF

    HEaaN Homomorphic Analytics use cases

    • PDF

    Article Summary

    Available in VPC

    These are examples where various statistics analyses and machine learning tasks are conducted using NAVER Cloud HEaaN Homomorphic Analytics based on virtual customers' personal information.

    Note

    All information in example data is fictional and unrelated to real people or organizations.

    Prepare example data

    The data used in these examples is personal information of 100 virtual customers of a virtual company, where detailed information of each customer is divided into 12 columns.

    Download the example data below and open the sample_data_raw.csv file.

    sample_data.zip

    The following explains each column in the sample_data_raw.csv file.

    heaan-example_sampleRaw_vpc_ko

    Column no.Column nameTypeDescription
    1nameTextVirtual customer name
    2genderTextVirtual customer's gender
    3phone_numberTextVirtual customer's mobile phone number
    4ageReal number typeVirtual customer's age
    5addressTextVirtual customer's address
    6housingTextVirtual customer's housing type
    7card_expenseReal number typeVirtual customer's monthly credit card bill amount
    8incomeReal number typeVirtual customer's annual income
    9credit_rankingCategory typeVirtual customer's credit ranking is classified into 7 categories from grade 1 to grade 7
    10bankTextVirtual customer's main bank
    11loanReal number typeVirtual customer's loan amount
    12assetReal number typeVirtual customer's personal asset

    Plaintext data preprocessing example

    To get a meaningful result out of homomorphic encryption computation with prepared data, the plaintext data needs to be processed first into the suitable format for the computation. In this example, we'll delete unnecessary information and convert each column to contain real number type or category type data so the computation and machine learning tasks can be conducted in the NAVER Cloud Platform console.

    The following describes how to preprocess the example data. Add the phrase "encoded" at the end of the column name after converting columns to indicate the column has been converted in the example.

    Note

    To skip the preprocessing step, use thesample_data_preprocessed.csv file.

    1. Delete the columns of name and phone_number which are irrelevant to the data analysis.

      heaan-example_1deleteColumns_vpc_en.png

    2. The address column contains 20 gus in Seoul. Assign numbers from 0 to 19 to each gu to convert the column into a category type column.

      heaan-example_2convertStrings_vpc_en.png

    3. The gender, housing, and bank columns can be categorized into 2, 3, and 5 categories respectively. Assign numbers to each column's categories to convert them to category type columns.

      • Please keep the credit_ranking column as it is a categorical column. The information in the credit_ranking column can also be regarded as real data for analysis purposes.

      heaan-example_3labelCategories_vpc_en.png

    4. Categorize the data in the age column into categories of 20s to 60s, and assign them with numbers to convert the column into a category type column. This is to understand each age group's characteristics.

      • Assign the numbers 20, 30, 40, 50, and 60 to each age group for intuitive classification.

      heaan-example_4categorizeNumbers_vpc_en.png

    5. Take logs with base 10 of the columns of card_expense, income, loan, and asset. Create new columns next to each of these columns and enter the value calculated.

      • Financial data often consists of large numbers and is prone to imbalance in its distribution, thus taking a log allows you to simplify values and normalize data.

      heaan-example_5takeLog_vpc_en.png

    6. Check if the preprocessing has been completed as in the table below, and upload the data to the NAVER Cloud console with Upload to cloud and encrypt.

      Column no.Column nameTypeDescription
      1gender_encodedCategory type2 categories converted from gender: 0, 1
      2age_encodedCategory type5 categories by age group: 20, 30, 40, 50, 60
      3address_encodedCategory type20 categories by gu: 0 to 19
      4housing_encodedCategory type3 categories converted from housing types: 0, 1, 2
      5card_expenseReal number typeNumber that indicates the virtual customer's monthly credit card billing amount
      6card_expense_logReal number typeResult value calculated by taking a log with base 10 of the monthly credit card billing amount
      7incomeReal number typeNumber that indicates the virtual customer's annual income
      8income_logReal number typeValue calculated by taking a log with base 10 of the annual income
      9credit_rankingCategory typeVirtual customer's credit ranking is classified into 7 categories from grade 1 to grade 7
      10bank_encodedCategory type5 categories converted from the virtual customers' main bank data: 0, 1, 2, 3, 4
      11loanReal number typeNumber that indicates the virtual customer's loan amount
      12loan_logReal number typeValue calculated by taking a log with base 10 of the loan amount
      13assetReal number typeNumber that indicates the virtual customer's personal asset
      14asset_logReal number typeValue calculated by taking a log with base 10 of the personal asset

    Encrypted computation examples

    How to perform various encrypted computations and analyze the results are described using the example data uploaded to the NAVER Cloud Platform console.

    In these encrypted computation examples, we'll delete columns, and perform computations including subtraction between columns, average, standard deviation, and average by category in order to obtain various statistics analysis results such as personal net asset amount, average loan amount, annual income in each age group, etc.

    Caution

    There may be differences with a margin of error of 3.28938×10-6 due to approximate calculations while the data is encrypted.

    Note

    For general information about creating, performing, and checking results of computation tasks, see Data computation.

    Example 1. Delete columns - remove unnecessary information

    The main bank information is not necessary in the statistics analysis we're about to perform in the examples. Let us delete this column.

    The following describes how to delete the main bank information from the example data.

    1. From the NAVER Cloud Platform's HEaaN console, enter the settings as below and create a job.

      heaan-example_extask1_vpc_en.png

      • Type: column management
      • Computation name: delete column
      • Computation data: preprocessed data uploaded
      • Target column: bank_encoded
      • Decrypt result data (appears after adding the computation): enable
    2. Run the job, and check the decryption result after the computation is finished.

      • The main bank information is now deleted from the data.

    Example 2. Subtraction between columns - personal net asset amount

    Let us perform a subtraction between columns and subtract the loan amount column from the personal asset column to view each virtual customer's net asset amount.

    The following describes how to subtract the loan amount column from the personal asset column in the example data.

    1. From the NAVER Cloud Platform's HEaaN console, enter the settings as below and create a job.

      heaan-example_extask2_vpc_en.png

      • Type: Statistical analysis
      • Computation name: Subtraction between columns
      • Computation data: preprocessed data uploaded
      • Column 1: asset
      • Column 2: loan
      • Decrypt result data (appears after adding the computation): enable
    2. Run the job, and check the decryption result after the computation is finished.

      • The net asset amount of each virtual customer, which is their personal asset excluding the loan amount, appears in the newly created result column.

      heaan-example_extask2result_vpc_en.png

    Example 3. Average - average loan amount per person

    Perform a computation of average to calculate the average loan amount per person, and get the average value from the loan amount column.

    The following describes how to calculate average value from the loan amount column in the example data.

    1. From the NAVER Cloud Platform's HEaaN console, enter the settings as below and create a job.

      heaan-example_extask3_vpc_en.png

      • Type: Statistical analysis
      • Computation name: Average
      • Computation data: preprocessed data uploaded
      • Target column: loan
      • Decrypt result data (appears after adding the computation): enable
    2. Run the job, and check the decryption result after the computation is finished.

      • The average of loan amounts per person appears in the newly created result column.

      heaan-example_extask3result_vpc_en.png

    Example 4. Standard deviation - standard deviation of annual income

    Let us perform a computation of standard deviation and get the standard deviation value from the annual income column to calculate the annual income's standard deviation.

    The following describes how to calculate the standard deviation from the annual income column in the example data.

    1. From the NAVER Cloud Platform's HEaaN console, enter the settings as below and create a job.

      heaan-example_extask4_vpc_en.png

      • Type: statistical analysis
      • Computation name: standard deviation
      • Computation data: preprocessed data uploaded
      • Target column: income
      • Decrypt result data (appears after adding the computation): enable
    2. Run the job, and check the decryption result after the computation is finished.

      • The annual income's standard deviation appears in the newly created result column. You can see that 68% of the annual income amount is distributed within the standard deviation of ± 1, based on the annual income average calculated from the example 3.

      heaan-example_extask4_vpc_en.png

    Example 5. Average by category - various averages by customer type

    Let us perform a computation of average by category, and select a category and get the average of relevant values to calculate the average of monthly credit card bill amount by sex, and the average of annual income by age group.

    The following describes how to select a category from the example data and calculate the monthly credit card bill amount and annual income averages.

    Caution

    When you create a job to compute average by category, make sure that a computation for one category is added as one individual job. Average by category won't be correctly calculated if you add the computation for multiple categories as one individual job.



    Average of monthly credit card bill amount by gender

    1. From the NAVER Cloud Platform's HEaaN console, enter the settings as below and create a job.
      heaan-example_extask5-1_vpc_en.png

      • Type: statistical analysis
      • Computation name: average by category
      • Computation data: preprocessed data uploaded
      • Target column: card_expense
      • Category column: gender_encoded
      • Category value: 0, 1 (Add each as separate computation)
      • Decrypt result data (appears after adding the computation): enable
    2. Run the job, and check the decryption result after the computation is finished.

      • The average of monthly credit card bill amount by sex appears in the respective, newly created result columns. You can see that the credit card expenses are slightly larger for females than males in general.

      heaan-example_extask5-1genderResult_vpc_en.png



    Average of annual income by age group

    1. From the NAVER Cloud Platform's HEaaN console, enter the settings as below and create a job.

      heaan-example_extask5-2_vpc_en.png

      • Type: statistical analysis
      • Computation name: average by category
      • Computation data: preprocessed data uploaded
      • Target column: income
      • Category column: age_encoded
      • Category value: 20, 30, 40, 50, 60
      • Decrypt result data (appears after adding the computation): enable
    2. Run the job, and check the decryption result after the computation is finished.

      • The average of annual income by age group appears in the respective, newly created result columns. You can see that the customers in their 60s have the lowest annual income in general.
        heaan-example_extask5-2_vpc_en.png

    Machine learning examples

    How to process and upload the example data and conduct machine learning through HEaaN Homomorphic Analytics is explained.

    With these machine learning examples, we're going to perform logistic regression learning, inference, and result decryption and predict the virtual customer's housing type, whether they'd be renting or owning, based on their personal information.

    Machine learning data preparation example

    To perform a machine learning, data for learning and data for inference need to be prepared separately. We're going to separate the preprocessed example data and use part of it as learning data and the rest as data for inference in this example.

    In order to predict the chance for each virtual customer to be renting or owning, do a binary classification of the housing information in the learning data into two categories, renting and owning, and delete any housing type information in the data for inference.

    The following describes the process of preparing the data for logistic regression learning and for inference.

    Note

    For details on preprocessing of example data, see Plaintext data preprocessing example.

    1. Copy the 70 rows from the top in the preprocessed example data, and save them into a new .csv file.

      • It will be used as the data for learning.
    2. Copy the remaining 30 rows in the preprocessed example data, and save them into a new .csv file.

      • It will be used as the data for inference.
    3. After reclassifying categories in the learning data's housing_encoded column in the binary format as below, and change the column's name and move it to the rightmost column.

      heaan-example_MLdataprep1_vpc_en.png

      • New column name: my_house_encoded
      • 0, 1: convert the categories of Wolse (monthly rent) and Jeonse (2-year-lease) to 0, which indicates "renting"
      • 2: convert the owning category to 1, which indicates "owning"
    4. Delete the housing_encoded column in the data for inference.

    5. Upload the data for learning and for inference individually to the NAVER Cloud Platform console as shown below.

      • Data for learning: Select For machine learning, and enter "2" in the number of classes field.
      • Data for inference: Select For machine inference
      • When training data is uploaded, "_train", the data used for training, and "_model", the initialized inference model data are created.

    Example 1. Learning - logistic regression

    Use the learning data where the housing types are binary-classified into renting and owning to perform logistic regression learning in the NAVER Cloud Platform console.

    To perform the logistic regression learning, create and run the job in NAVER Cloud Platform's HEaaN console as below.

    heaan-example_ML1_vpc_en.png

    • Type: Machine learning
    • Computation name: Learning - logistic regression
    • Data for learning: "_train" data
    • Learning rate: 0.1
    • Number of learning epochs: 10
    • Mini batch: 1

    NAVER Cloud Platform's HEaaN will learn how each virtual customer's other personal information affects their housing type and reflect it to the "_model" inference model data.

    Example 2. Inference - logistic regression

    Apply the inference model where a logistic regression learning has been completed to the data for inference and perform the inference task.

    To perform logistic regression inference, create and run a job as shown below in the HEaaN console of NAVER CLOUD PLATFORM.

    heaan-example_ML2_vpc_en.png

    • Type: Machine learning
    • Computation name: Inference - logistic regression
    • Inference model: The learning-completed "_model" data
    • Data for inference: Uploaded data for inference

    NAVER Cloud Platform's HEaaN will infer the chance of each virtual customer's housing type in the data for inference being renting or owning, and save them in the "_predict" data. This result data can be decrypted through the machine learning's decryption computation.

    Example 3. Inference result decryption and interpretation

    Decrypt the inference-completed data and interpret it.

    The following describes how to decrypt and interpret the inference result data.

    1. From the NAVER Cloud Platform's HEaaN console, enter the settings as below and create a job.

      heaan-example_ML3_vpc_en.png

      • Type: Machine learning
      • Computation name: Decryption
      • Computation data: Inference result data
    2. Run the job, and check the decryption result.

      • The chance of the customer's housing type being renting or owning, inferred based on various personal information including monthly credit card bill amount, annual income, credit rating, personal asset, etc., is indicated in the column 0 and 1 respectively.

      heaan-example_ML3result_vpc_en.png

    Calculation time

    The time taken for each calculation is shown below.

    • Generating a bootstrappable key: 37 secs (6.6 gb)

    • Encrypting data: 6.27 secs (8 kb > 654.3 mb) / preprocessed

    • Statistical analysis: 141.81 secs

      • Statistics computation

        CalculationDataExecution time (Sec)
        Delete columnBranch1.08
        Subtraction between columnsAsset-loan6.34
        Column meanLoan15.37
        Standard deviationAnnual Salary29.25
      • Categorical calculation

        CalculationDataExecution time (Sec)
        Average by categoryCard spending - Female12.00
        Average by categoryCard spending - Male9.63
        Average by categoryCard spending - 20s13.62
        Average by categoryCard spending - 30s13.51
        Average by categoryCard spending - 40s13.86
        Average by categoryCard spending - 50s13.45
        Average by categoryCard spending - 60s13.70
    • Machine learning (logistic regression): 89 secs

      CalculationExecution time (Sec)
      Learning74
      Inference15
      Decoding inference results1

    Was this article helpful?

    Changing your password will log you out immediately. Use the new password to log back in.
    First name must have atleast 2 characters. Numbers and special characters are not allowed.
    Last name must have atleast 1 characters. Numbers and special characters are not allowed.
    Enter a valid email
    Enter a valid password
    Your profile has been successfully updated.