Using Google Cloud services with API_CALL
Guide objective
This guide explains how to interact with Google Cloud services using the Agile Data Engine load language API_CALL.
See also:
Examples
Invoking Google Cloud Function
Required IAM Role for ADE Service Account: roles/run.invoker
In this example, a Google Cloud Function is invoked using an API call. For demonstration purposes, the Cloud Function serves as a simple file extractor that exports a table into a single file.
For Cloud Functions API, the
bearer_id_token_from_bq_service_account
is used for the authentication.The
content
block contains a JSON payload to be sent to the Cloud Function.Variables used:
gcp_project_id
: Defined beforehand. For more information, refer to CONFIG_ENVIRONMENT_VARIABLEStarget_schema
: For more information, refer to target_schematarget_entity_name
: For more information, refer to target_entity_name
LOAD STEP NAME: invoke_cloud_function
type: HTTP
request:
url: https://europe-west1-<gcp_project_id>.cloudfunctions.net/gcf-file-exporter-dev
method: POST
headers:
Authorization: <bearer_id_token_from_bq_service_account>
content: |
{
"calls": [
["exports/test_data", "<target_schema>.<target_entity_name>"]
]
}
retries:
total: 4
Invoking Dataplex Data Profiling
Required IAM Role for ADE Service Account: roles/dataplex.dataScanEditor
In this example, a Dataplex data profiling data scan is created and executed. Data profiling is added as a separate load step to an existing entity. When added to a separate load, data profiling can be scheduled independently from the orchestration of data transformations.
Create a data scan with API_CALL.
If the data scan already exists (HTTP code 409), it is considered a success.
The target dataset/schema
monitor
is created beforehand, but the results tableDATA_PROFILE_RESULTS
is automatically created by Dataplex.The data scan is named
pub-f-sales-scan
. This name must be unique within a Google Cloud project.Variables used:
gcp_project_id
: Defined beforehand. For more information, refer to CONFIG_ENVIRONMENT_VARIABLEStarget_schema
: For more information, refer to target_schematarget_entity_name
: For more information, refer to target_entity_name
LOAD STEP NAME: create_data_profile_datascan_if_not_exists
type: HTTP
request:
url: https://dataplex.googleapis.com/v1/projects/<gcp_project_id>/locations/europe-west1/dataScans?dataScanId=pub-f-sales-scan
timeout_seconds: 10
method: POST
headers:
Content-Type: application/json
Authorization: <bearer_access_token_from_bq_service_account>
content: |
{
"type": "DATA_PROFILE",
"description": "Data profile scan for <target_schema>.<target_entity_name>",
"data": {
"resource": "//bigquery.googleapis.com/projects/<gcp_project_id>/datasets/<target_schema>/tables/<target_entity_name>"
},
"dataProfileSpec": {
"samplingPercent": 100,
"postScanActions": {
"bigqueryExport": {
"resultsTable": "projects/<gcp_project_id>/datasets/monitor/tables/DATA_PROFILE_RESULTS"
}
}
}
}
retries:
total: 3
rules:
- conditions:
- type: VALUE_MATCHER
source: <http_status_code>
values: [401, 404, 429, 500, 502, 503, 504]
response:
transformations:
- description: Success in case 200, 201 or 409, already exists, returned.
conditions:
- type: VALUE_MATCHER
source: <http_status_code>
values:
- 200
- 201
- 409
LOAD STEP NAME: run_data_profile_datascan
type: HTTP
request:
url: https://dataplex.googleapis.com/v1/projects/<gcp_project_id>/locations/europe-west1/dataScans/pub-f-sales-scan:run
timeout_seconds: 10
method: POST
headers:
Content-Type: application/json
Authorization: <bearer_access_token_from_bq_service_account>
retries:
total: 3
rules:
- conditions:
- type: VALUE_MATCHER
source: <http_status_code>
values: [400, 401, 404, 429, 500, 502, 503, 504]
response:
transformations:
- description: Success in case of 200 or 201.
conditions:
- type: VALUE_MATCHER
source: <http_status_code>
values:
- 200
- 201
Invoking Dataplex Data Quality scan
Required IAM Role for ADE Service Account: roles/dataplex.dataScanEditor
In this example, a Dataplex data quality scan is created and executed. Data quality is added as a separate load step to an existing entity, just like the previous example with data profiling.
Create a data scan with API_CALL.
If the data scan already exists (HTTP code 409), it is considered a success.
The target dataset/schema
monitor
is created beforehand, but the results tableDATA_QUALITY_RESULTS
is automatically created by Dataplex.The data scan is named
pub-f-sales-dq-scan
. This name must be unique within a Google Cloud project.Variables used:
gcp_project_id
: Defined beforehand. For more information, refer to CONFIG_ENVIRONMENT_VARIABLEStarget_schema
: For more information, refer to target_schematarget_entity_name
: For more information, refer to target_entity_name
LOAD STEP NAME: create_data_quality_scan_if_not_exists
type: HTTP
request:
url: https://dataplex.googleapis.com/v1/projects/<gcp_project_id>/locations/europe-west1/dataScans?dataScanId=pub-f-sales-dq-scan
timeout_seconds: 10
method: POST
headers:
Content-Type: application/json
Authorization: <bearer_access_token_from_bq_service_account>
content: |
{
"type": "DATA_QUALITY",
"description": "Data quality scan for <target_schema>.<target_entity_name>",
"data": {
"resource": "//bigquery.googleapis.com/projects/<gcp_project_id>/datasets/<target_schema>/tables/<target_entity_name>"
},
"dataQualitySpec": {
"rules": [
{
"column": "sales_id",
"dimension": "UNIQUENESS",
"uniquenessExpectation": {},
"name": "unique-sales-id",
"description": "Each sale should have a unique ID."
},
{
"column": "quantity",
"dimension": "VALIDITY",
"rangeExpectation": {
"minValue": "1"
},
"name": "positive-quantity",
"description": "Quantity must be at least 1."
},
{
"column": "price",
"dimension": "VALIDITY",
"rangeExpectation": {
"minValue": "0"
},
"name": "non-negative-price",
"description": "Price must be non-negative."
}
],
"postScanActions": {
"bigqueryExport": {
"resultsTable": "projects/<gcp_project_id>/datasets/monitor/tables/DATA_QUALITY_RESULTS"
}
}
}
}
retries:
total: 3
rules:
- conditions:
- type: VALUE_MATCHER
source: <http_status_code>
values: [401, 404, 429, 500, 502, 503, 504]
response:
transformations:
- description: Success in case 200, 201 or 409, already exists, returned.
conditions:
- type: VALUE_MATCHER
source: <http_status_code>
values:
- 200
- 201
- 409
LOAD STEP NAME: run_data_quality_datascan
type: HTTP
request:
url: https://dataplex.googleapis.com/v1/projects/<gcp_project_id>/locations/europe-west1/dataScans/pub-f-sales-dq-scan:run
timeout_seconds: 10
method: POST
headers:
Content-Type: application/json
Authorization: <bearer_access_token_from_bq_service_account>
retries:
total: 3
rules:
- conditions:
- type: VALUE_MATCHER
source: <http_status_code>
values: [400, 401, 404, 429, 500, 502, 503, 504]
response:
transformations:
- description: Success in case of 200 or 201.
conditions:
- type: VALUE_MATCHER
source: <http_status_code>
values:
- 200
- 201
Invoking Sensitive Data Protection Inspection Jobs
Required IAM Role for ADE Service Account: roles/dlp.jobsEditor
In this example, Google Sensitive Data Protection (Cloud DLP) Inspection Job is created. The Cloud Data Loss Prevention API (DLP API) can be used to inspect sensitive data elements in Google Cloud Storage or BigQuery tables.
In this example, three different infoTypes are used to scan a Cloud Storage bucket recursively.
An inspection job configuration is created with
API_CALL
.API calls are made in
europe-west1
region. Please note that the URLs differ depending on the selected region.Variables used:
gcp_project_id
: Defined beforehand. For more information, refer to CONFIG_ENVIRONMENT_VARIABLESgcp_bucket_name
: Defined beforehand. For more information, refer to CONFIG_ENVIRONMENT_VARIABLES
LOAD STEP NAME: dlp_api_call
type: HTTP
request:
url: https://dlp.europe-west1.rep.googleapis.com/v2/projects/<gcp_project_id>/locations/europe-west1/dlpJobs
timeout_seconds: 10
method: POST
headers:
Content-Type: application/json
Authorization: <bearer_access_token_from_bq_service_account>
content: |
{
"inspectJob": {
"inspectConfig": {
"infoTypes": [
{
"name": "PERSON_NAME"
},
{
"name": "EMAIL_ADDRESS"
},
{
"name": "CREDIT_CARD_NUMBER"
}
],
"limits": {},
"includeQuote": true
},
"storageConfig": {
"cloudStorageOptions": {
"fileSet": {
"url": "gs://<gcp_bucket_name>/**"
},
"fileTypes": [
"FILE_TYPE_UNSPECIFIED"
],
"filesLimitPercent": 50
}
}
}
}
retries:
total: 3
rules:
- conditions:
- type: VALUE_MATCHER
source: <http_status_code>
values: [401, 404, 429, 500, 502, 503, 504]