Skip to main content
Skip table of contents

Using Google Cloud services with API_CALL

Guide objective

This guide explains how to interact with Google Cloud services using the Agile Data Engine load language API_CALL.


See also:


Examples

Invoking Google Cloud Function

Required IAM Role for ADE Service Account: roles/run.invoker

In this example, a Google Cloud Function is invoked using an API call. For demonstration purposes, the Cloud Function serves as a simple file extractor that exports a table into a single file.

  • For Cloud Functions API, the bearer_id_token_from_bq_service_account is used for the authentication.

  • The content block contains a JSON payload to be sent to the Cloud Function.

  • Variables used:

CODE
type: HTTP
request:
  url: https://europe-west1-<gcp_project_id>.cloudfunctions.net/gcf-file-exporter-dev
  method: POST 
  headers:
    Authorization: <bearer_id_token_from_bq_service_account>
  content: |
    {
      "calls": [
        ["exports/test_data", "<target_schema>.<target_entity_name>"]
      ]
    }
retries:
  total: 4

Invoking Dataplex Data Profiling

Required IAM Role for ADE Service Account: roles/dataplex.dataScanEditor

In this example, a Dataplex data profiling data scan is created and executed. Data profiling is added as a separate load step to an existing entity. When added to a separate load, data profiling can be scheduled independently from the orchestration of data transformations.

  • Create a data scan with API_CALL.

  • If the data scan already exists (HTTP code 409), it is considered a success.

  • The target dataset/schema monitor is created beforehand, but the results table DATA_PROFILE_RESULTS is automatically created by Dataplex.

  • The data scan is named pub-f-sales-scan. This name must be unique within a Google Cloud project.

  • Variables used:

  1. LOAD STEP NAME: create_data_profile_datascan_if_not_exists

YAML
type: HTTP
request:
  url: https://dataplex.googleapis.com/v1/projects/<gcp_project_id>/locations/europe-west1/dataScans?dataScanId=pub-f-sales-scan
  timeout_seconds: 10
  method: POST
  headers:
    Content-Type: application/json
    Authorization: <bearer_access_token_from_bq_service_account>
  content: |
      {
          "type": "DATA_PROFILE",
          "description": "Data profile scan for <target_schema>.<target_entity_name>",
          "data": {
              "resource": "//bigquery.googleapis.com/projects/<gcp_project_id>/datasets/<target_schema>/tables/<target_entity_name>"
          },
          "dataProfileSpec": {
              "samplingPercent": 100,
              "postScanActions": {
                "bigqueryExport": {
                  "resultsTable": "projects/<gcp_project_id>/datasets/monitor/tables/DATA_PROFILE_RESULTS"
                }
              }
          }
      }
retries:
    total: 3
    rules:
      - conditions:
          - type: VALUE_MATCHER
            source: <http_status_code>
            values: [401, 404, 429, 500, 502, 503, 504]
response:
  transformations:
    - description: Success in case 200, 201 or 409, already exists, returned.
      conditions:
        - type: VALUE_MATCHER
          source: <http_status_code>
          values:
            - 200
            - 201
            - 409
  1. LOAD STEP NAME: run_data_profile_datascan

YAML
type: HTTP
request:
  url: https://dataplex.googleapis.com/v1/projects/<gcp_project_id>/locations/europe-west1/dataScans/pub-f-sales-scan:run
  timeout_seconds: 10
  method: POST
  headers:
    Content-Type: application/json
    Authorization: <bearer_access_token_from_bq_service_account>
retries:
    total: 3
    rules:
      - conditions:
          - type: VALUE_MATCHER
            source: <http_status_code>
            values: [400, 401, 404, 429, 500, 502, 503, 504]
response:
  transformations:
    - description: Success in case of 200 or 201.
      conditions:
        - type: VALUE_MATCHER
          source: <http_status_code>
          values:
            - 200
            - 201

Invoking Dataplex Data Quality scan

Required IAM Role for ADE Service Account: roles/dataplex.dataScanEditor

In this example, a Dataplex data quality scan is created and executed. Data quality is added as a separate load step to an existing entity, just like the previous example with data profiling.

  • Create a data scan with API_CALL.

  • If the data scan already exists (HTTP code 409), it is considered a success.

  • The target dataset/schema monitor is created beforehand, but the results table DATA_QUALITY_RESULTS is automatically created by Dataplex.

  • The data scan is named pub-f-sales-dq-scan. This name must be unique within a Google Cloud project.

  • Variables used:

  1. LOAD STEP NAME: create_data_quality_scan_if_not_exists

YAML
type: HTTP
request:
  url: https://dataplex.googleapis.com/v1/projects/<gcp_project_id>/locations/europe-west1/dataScans?dataScanId=pub-f-sales-dq-scan
  timeout_seconds: 10
  method: POST
  headers:
    Content-Type: application/json
    Authorization: <bearer_access_token_from_bq_service_account>
  content: |
      {
          "type": "DATA_QUALITY",
          "description": "Data quality scan for <target_schema>.<target_entity_name>",
          "data": {
              "resource": "//bigquery.googleapis.com/projects/<gcp_project_id>/datasets/<target_schema>/tables/<target_entity_name>"
          },
          "dataQualitySpec": {
            "rules": [
              {
                "column": "sales_id",
                "dimension": "UNIQUENESS",
                "uniquenessExpectation": {},
                "name": "unique-sales-id",
                "description": "Each sale should have a unique ID."
              },
              {
                "column": "quantity",
                "dimension": "VALIDITY",
                "rangeExpectation": {
                  "minValue": "1"
                },
                "name": "positive-quantity",
                "description": "Quantity must be at least 1."
              },
              {
                "column": "price",
                "dimension": "VALIDITY",
                "rangeExpectation": {
                  "minValue": "0"
                },
                "name": "non-negative-price",
                "description": "Price must be non-negative."
              }
            ],
            "postScanActions": {
              "bigqueryExport": {
                "resultsTable": "projects/<gcp_project_id>/datasets/monitor/tables/DATA_QUALITY_RESULTS"
              }
            }
          }
        }
retries:
    total: 3
    rules:
      - conditions:
          - type: VALUE_MATCHER
            source: <http_status_code>
            values: [401, 404, 429, 500, 502, 503, 504]
response:
  transformations:
    - description: Success in case 200, 201 or 409, already exists, returned.
      conditions:
        - type: VALUE_MATCHER
          source: <http_status_code>
          values:
            - 200
            - 201
            - 409
  1. LOAD STEP NAME: run_data_quality_datascan

YAML
type: HTTP
request:
  url: https://dataplex.googleapis.com/v1/projects/<gcp_project_id>/locations/europe-west1/dataScans/pub-f-sales-dq-scan:run
  timeout_seconds: 10
  method: POST
  headers:
    Content-Type: application/json
    Authorization: <bearer_access_token_from_bq_service_account>
retries:
    total: 3
    rules:
      - conditions:
          - type: VALUE_MATCHER
            source: <http_status_code>
            values: [400, 401, 404, 429, 500, 502, 503, 504]
response:
  transformations:
    - description: Success in case of 200 or 201.
      conditions:
        - type: VALUE_MATCHER
          source: <http_status_code>
          values:
            - 200
            - 201
JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.