Using Google Cloud services with API_CALL
Guide objective
This guide explains how to interact with Google Cloud services using the Agile Data Engine load language API_CALL.
See also:
Examples
Invoking Google Cloud Function
Required IAM Role for ADE Service Account: roles/run.invoker
In this example, a Google Cloud Function is invoked using an API call. For demonstration purposes, the Cloud Function serves as a simple file extractor that exports a table into a single file.
For Cloud Functions API, the
bearer_id_token_from_bq_service_account
is used for the authentication.The
content
block contains a JSON payload to be sent to the Cloud Function.Variables used:
gcp_project_id
: Defined beforehand. For more information, refer to CONFIG_ENVIRONMENT_VARIABLEStarget_schema
: For more information, refer to target_schematarget_entity_name
: For more information, refer to target_entity_name
type: HTTP
request:
url: https://europe-west1-<gcp_project_id>.cloudfunctions.net/gcf-file-exporter-dev
method: POST
headers:
Authorization: <bearer_id_token_from_bq_service_account>
content: |
{
"calls": [
["exports/test_data", "<target_schema>.<target_entity_name>"]
]
}
retries:
total: 4
Invoking Dataplex Data Profiling
Required IAM Role for ADE Service Account: roles/dataplex.dataScanEditor
In this example, a Dataplex data profiling data scan is created and executed. Data profiling is added as a separate load step to an existing entity. When added to a separate load, data profiling can be scheduled independently from the orchestration of data transformations.
Create a data scan with API_CALL.
If the data scan already exists (HTTP code 409), it is considered a success.
The target dataset/schema
monitor
is created beforehand, but the results tableDATA_PROFILE_RESULTS
is automatically created by Dataplex.The data scan is named
pub-f-sales-scan
. This name must be unique within a Google Cloud project.Variables used:
gcp_project_id
: Defined beforehand. For more information, refer to CONFIG_ENVIRONMENT_VARIABLEStarget_schema
: For more information, refer to target_schematarget_entity_name
: For more information, refer to target_entity_name
LOAD STEP NAME: create_data_profile_datascan_if_not_exists
type: HTTP
request:
url: https://dataplex.googleapis.com/v1/projects/<gcp_project_id>/locations/europe-west1/dataScans?dataScanId=pub-f-sales-scan
timeout_seconds: 10
method: POST
headers:
Content-Type: application/json
Authorization: <bearer_access_token_from_bq_service_account>
content: |
{
"type": "DATA_PROFILE",
"description": "Data profile scan for <target_schema>.<target_entity_name>",
"data": {
"resource": "//bigquery.googleapis.com/projects/<gcp_project_id>/datasets/<target_schema>/tables/<target_entity_name>"
},
"dataProfileSpec": {
"samplingPercent": 100,
"postScanActions": {
"bigqueryExport": {
"resultsTable": "projects/<gcp_project_id>/datasets/monitor/tables/DATA_PROFILE_RESULTS"
}
}
}
}
retries:
total: 3
rules:
- conditions:
- type: VALUE_MATCHER
source: <http_status_code>
values: [401, 404, 429, 500, 502, 503, 504]
response:
transformations:
- description: Success in case 200, 201 or 409, already exists, returned.
conditions:
- type: VALUE_MATCHER
source: <http_status_code>
values:
- 200
- 201
- 409
LOAD STEP NAME: run_data_profile_datascan
type: HTTP
request:
url: https://dataplex.googleapis.com/v1/projects/<gcp_project_id>/locations/europe-west1/dataScans/pub-f-sales-scan:run
timeout_seconds: 10
method: POST
headers:
Content-Type: application/json
Authorization: <bearer_access_token_from_bq_service_account>
retries:
total: 3
rules:
- conditions:
- type: VALUE_MATCHER
source: <http_status_code>
values: [400, 401, 404, 429, 500, 502, 503, 504]
response:
transformations:
- description: Success in case of 200 or 201.
conditions:
- type: VALUE_MATCHER
source: <http_status_code>
values:
- 200
- 201
Invoking Dataplex Data Quality scan
Required IAM Role for ADE Service Account: roles/dataplex.dataScanEditor
In this example, a Dataplex data quality scan is created and executed. Data quality is added as a separate load step to an existing entity, just like the previous example with data profiling.
Create a data scan with API_CALL.
If the data scan already exists (HTTP code 409), it is considered a success.
The target dataset/schema
monitor
is created beforehand, but the results tableDATA_QUALITY_RESULTS
is automatically created by Dataplex.The data scan is named
pub-f-sales-dq-scan
. This name must be unique within a Google Cloud project.Variables used:
gcp_project_id
: Defined beforehand. For more information, refer to CONFIG_ENVIRONMENT_VARIABLEStarget_schema
: For more information, refer to target_schematarget_entity_name
: For more information, refer to target_entity_name
LOAD STEP NAME: create_data_quality_scan_if_not_exists
type: HTTP
request:
url: https://dataplex.googleapis.com/v1/projects/<gcp_project_id>/locations/europe-west1/dataScans?dataScanId=pub-f-sales-dq-scan
timeout_seconds: 10
method: POST
headers:
Content-Type: application/json
Authorization: <bearer_access_token_from_bq_service_account>
content: |
{
"type": "DATA_QUALITY",
"description": "Data quality scan for <target_schema>.<target_entity_name>",
"data": {
"resource": "//bigquery.googleapis.com/projects/<gcp_project_id>/datasets/<target_schema>/tables/<target_entity_name>"
},
"dataQualitySpec": {
"rules": [
{
"column": "sales_id",
"dimension": "UNIQUENESS",
"uniquenessExpectation": {},
"name": "unique-sales-id",
"description": "Each sale should have a unique ID."
},
{
"column": "quantity",
"dimension": "VALIDITY",
"rangeExpectation": {
"minValue": "1"
},
"name": "positive-quantity",
"description": "Quantity must be at least 1."
},
{
"column": "price",
"dimension": "VALIDITY",
"rangeExpectation": {
"minValue": "0"
},
"name": "non-negative-price",
"description": "Price must be non-negative."
}
],
"postScanActions": {
"bigqueryExport": {
"resultsTable": "projects/<gcp_project_id>/datasets/monitor/tables/DATA_QUALITY_RESULTS"
}
}
}
}
retries:
total: 3
rules:
- conditions:
- type: VALUE_MATCHER
source: <http_status_code>
values: [401, 404, 429, 500, 502, 503, 504]
response:
transformations:
- description: Success in case 200, 201 or 409, already exists, returned.
conditions:
- type: VALUE_MATCHER
source: <http_status_code>
values:
- 200
- 201
- 409
LOAD STEP NAME: run_data_quality_datascan
type: HTTP
request:
url: https://dataplex.googleapis.com/v1/projects/<gcp_project_id>/locations/europe-west1/dataScans/pub-f-sales-dq-scan:run
timeout_seconds: 10
method: POST
headers:
Content-Type: application/json
Authorization: <bearer_access_token_from_bq_service_account>
retries:
total: 3
rules:
- conditions:
- type: VALUE_MATCHER
source: <http_status_code>
values: [400, 401, 404, 429, 500, 502, 503, 504]
response:
transformations:
- description: Success in case of 200 or 201.
conditions:
- type: VALUE_MATCHER
source: <http_status_code>
values:
- 200
- 201