Skip to main content
Skip table of contents

Using Google Cloud services with API_CALL

Guide objective

This guide explains how to interact with Google Cloud services using the Agile Data Engine load language API_CALL.


See also:


Examples

Invoking Google Cloud Function

Required IAM Role for ADE Service Account: roles/run.invoker

In this example, a Google Cloud Function is invoked using an API call. For demonstration purposes, the Cloud Function serves as a simple file extractor that exports a table into a single file.

  • For Cloud Functions API, the bearer_id_token_from_bq_service_account is used for the authentication.

  • The content block contains a JSON payload to be sent to the Cloud Function.

  • Variables used:

  1. LOAD STEP NAME: invoke_cloud_function

CODE
type: HTTP
request:
  url: https://europe-west1-<gcp_project_id>.cloudfunctions.net/gcf-file-exporter-dev
  method: POST 
  headers:
    Authorization: <bearer_id_token_from_bq_service_account>
  content: |
    {
      "calls": [
        ["exports/test_data", "<target_schema>.<target_entity_name>"]
      ]
    }
retries:
  total: 4

Invoking Dataplex Data Profiling

Required IAM Role for ADE Service Account: roles/dataplex.dataScanEditor

In this example, a Dataplex data profiling data scan is created and executed. Data profiling is added as a separate load step to an existing entity. When added to a separate load, data profiling can be scheduled independently from the orchestration of data transformations.

  • Create a data scan with API_CALL.

  • If the data scan already exists (HTTP code 409), it is considered a success.

  • The target dataset/schema monitor is created beforehand, but the results table DATA_PROFILE_RESULTS is automatically created by Dataplex.

  • The data scan is named pub-f-sales-scan. This name must be unique within a Google Cloud project.

  • Variables used:

  1. LOAD STEP NAME: create_data_profile_datascan_if_not_exists

YAML
type: HTTP
request:
  url: https://dataplex.googleapis.com/v1/projects/<gcp_project_id>/locations/europe-west1/dataScans?dataScanId=pub-f-sales-scan
  timeout_seconds: 10
  method: POST
  headers:
    Content-Type: application/json
    Authorization: <bearer_access_token_from_bq_service_account>
  content: |
      {
          "type": "DATA_PROFILE",
          "description": "Data profile scan for <target_schema>.<target_entity_name>",
          "data": {
              "resource": "//bigquery.googleapis.com/projects/<gcp_project_id>/datasets/<target_schema>/tables/<target_entity_name>"
          },
          "dataProfileSpec": {
              "samplingPercent": 100,
              "postScanActions": {
                "bigqueryExport": {
                  "resultsTable": "projects/<gcp_project_id>/datasets/monitor/tables/DATA_PROFILE_RESULTS"
                }
              }
          }
      }
retries:
    total: 3
    rules:
      - conditions:
          - type: VALUE_MATCHER
            source: <http_status_code>
            values: [401, 404, 429, 500, 502, 503, 504]
response:
  transformations:
    - description: Success in case 200, 201 or 409, already exists, returned.
      conditions:
        - type: VALUE_MATCHER
          source: <http_status_code>
          values:
            - 200
            - 201
            - 409
  1. LOAD STEP NAME: run_data_profile_datascan

YAML
type: HTTP
request:
  url: https://dataplex.googleapis.com/v1/projects/<gcp_project_id>/locations/europe-west1/dataScans/pub-f-sales-scan:run
  timeout_seconds: 10
  method: POST
  headers:
    Content-Type: application/json
    Authorization: <bearer_access_token_from_bq_service_account>
retries:
    total: 3
    rules:
      - conditions:
          - type: VALUE_MATCHER
            source: <http_status_code>
            values: [400, 401, 404, 429, 500, 502, 503, 504]
response:
  transformations:
    - description: Success in case of 200 or 201.
      conditions:
        - type: VALUE_MATCHER
          source: <http_status_code>
          values:
            - 200
            - 201

Invoking Dataplex Data Quality scan

Required IAM Role for ADE Service Account: roles/dataplex.dataScanEditor

In this example, a Dataplex data quality scan is created and executed. Data quality is added as a separate load step to an existing entity, just like the previous example with data profiling.

  • Create a data scan with API_CALL.

  • If the data scan already exists (HTTP code 409), it is considered a success.

  • The target dataset/schema monitor is created beforehand, but the results table DATA_QUALITY_RESULTS is automatically created by Dataplex.

  • The data scan is named pub-f-sales-dq-scan. This name must be unique within a Google Cloud project.

  • Variables used:

  1. LOAD STEP NAME: create_data_quality_scan_if_not_exists

YAML
type: HTTP
request:
  url: https://dataplex.googleapis.com/v1/projects/<gcp_project_id>/locations/europe-west1/dataScans?dataScanId=pub-f-sales-dq-scan
  timeout_seconds: 10
  method: POST
  headers:
    Content-Type: application/json
    Authorization: <bearer_access_token_from_bq_service_account>
  content: |
      {
          "type": "DATA_QUALITY",
          "description": "Data quality scan for <target_schema>.<target_entity_name>",
          "data": {
              "resource": "//bigquery.googleapis.com/projects/<gcp_project_id>/datasets/<target_schema>/tables/<target_entity_name>"
          },
          "dataQualitySpec": {
            "rules": [
              {
                "column": "sales_id",
                "dimension": "UNIQUENESS",
                "uniquenessExpectation": {},
                "name": "unique-sales-id",
                "description": "Each sale should have a unique ID."
              },
              {
                "column": "quantity",
                "dimension": "VALIDITY",
                "rangeExpectation": {
                  "minValue": "1"
                },
                "name": "positive-quantity",
                "description": "Quantity must be at least 1."
              },
              {
                "column": "price",
                "dimension": "VALIDITY",
                "rangeExpectation": {
                  "minValue": "0"
                },
                "name": "non-negative-price",
                "description": "Price must be non-negative."
              }
            ],
            "postScanActions": {
              "bigqueryExport": {
                "resultsTable": "projects/<gcp_project_id>/datasets/monitor/tables/DATA_QUALITY_RESULTS"
              }
            }
          }
        }
retries:
    total: 3
    rules:
      - conditions:
          - type: VALUE_MATCHER
            source: <http_status_code>
            values: [401, 404, 429, 500, 502, 503, 504]
response:
  transformations:
    - description: Success in case 200, 201 or 409, already exists, returned.
      conditions:
        - type: VALUE_MATCHER
          source: <http_status_code>
          values:
            - 200
            - 201
            - 409
  1. LOAD STEP NAME: run_data_quality_datascan

YAML
type: HTTP
request:
  url: https://dataplex.googleapis.com/v1/projects/<gcp_project_id>/locations/europe-west1/dataScans/pub-f-sales-dq-scan:run
  timeout_seconds: 10
  method: POST
  headers:
    Content-Type: application/json
    Authorization: <bearer_access_token_from_bq_service_account>
retries:
    total: 3
    rules:
      - conditions:
          - type: VALUE_MATCHER
            source: <http_status_code>
            values: [400, 401, 404, 429, 500, 502, 503, 504]
response:
  transformations:
    - description: Success in case of 200 or 201.
      conditions:
        - type: VALUE_MATCHER
          source: <http_status_code>
          values:
            - 200
            - 201

Invoking Sensitive Data Protection Inspection Jobs

Required IAM Role for ADE Service Account: roles/dlp.jobsEditor

In this example, Google Sensitive Data Protection (Cloud DLP) Inspection Job is created. The Cloud Data Loss Prevention API (DLP API) can be used to inspect sensitive data elements in Google Cloud Storage or BigQuery tables.

In this example, three different infoTypes are used to scan a Cloud Storage bucket recursively.

  • An inspection job configuration is created with API_CALL.

  • API calls are made in europe-west1 region. Please note that the URLs differ depending on the selected region.

  • Variables used:

  1. LOAD STEP NAME: dlp_api_call

YAML
type: HTTP
request:
  url: https://dlp.europe-west1.rep.googleapis.com/v2/projects/<gcp_project_id>/locations/europe-west1/dlpJobs
  timeout_seconds: 10
  method: POST
  headers:
    Content-Type: application/json
    Authorization: <bearer_access_token_from_bq_service_account>
  content: |
    {
      "inspectJob": {
        "inspectConfig": {
          "infoTypes": [
            {
              "name": "PERSON_NAME"
            },
            {
              "name": "EMAIL_ADDRESS"
            },
            {
              "name": "CREDIT_CARD_NUMBER"
            }
          ],
          "limits": {},
          "includeQuote": true
        },
        "storageConfig": {
          "cloudStorageOptions": {
            "fileSet": {
              "url": "gs://<gcp_bucket_name>/**"
            },
            "fileTypes": [
              "FILE_TYPE_UNSPECIFIED"
            ],
            "filesLimitPercent": 50
          }
        }
      }
    }
retries:
    total: 3
    rules:
      - conditions:
          - type: VALUE_MATCHER
            source: <http_status_code>
            values: [401, 404, 429, 500, 502, 503, 504]

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.