Target cloud: Google Cloud
API_CALL is a load language used to define load steps that make API calls to external systems. It enables triggering external processes or polling external system states as part of a load process.
API_CALL load steps are defined using YAML with a structured schema that supports various features.
See also:
Usage
Load step created to a load by using API_CALL as the load step language. It is possible to use OVERRIDE, POST or PRE step to define load steps with API_CALL.
Supported Functionalities
API_CALL supports technically following functionalities:
-
HTTP Requests
-
URL: Only HTTPS is allowed.
-
Methods: GET, POST, PUT, DELETE.
-
Headers
-
Query Parameters: Automatically URL-encoded.
-
Timeout Definition: Configurable timeout settings.
-
Content: Ability to send JSON payloads.
-
-
Retry Logic
-
Retry count
-
Backoff factor, min and max delays etc
-
Rules based on
-
HTTP response codes
-
Content extraction
-
-
-
Response transformations by defining
-
Conditions used to select wanted transformation based on
-
HTTP response codes
-
Content extraction
-
-
Result (SUCCESS, FAILED)
-
Affected rows
-
Variables (output which can be used in later steps etc)
-
Transformations results and variables can be based on
-
HTTP response codes
-
Content extraction
-
Constants
-
-
Use of variables
API_CALL supports:
-
Environment Variables: Variables defined in CONFIG_ENVIRONMENT_VARIABLES.
-
Workflow Variables: Variables defined in CONFIG_LOAD_SCHEDULES.
-
Variables documented in Variables.
-
Special Variables:
-
<bearer_id_token_from_bq_service_account>: Uses Service Account credentials with a bearer ID token, primarily used when invoking Google Cloud Functions. -
<bearer_token_from_bq_service_account>: Alternative tobearer_id_token_from_bq_service_account. -
<bearer_access_token_from_bq_service_account>: An alternative to the ID token, it can be used with other Google Cloud APIs besides Cloud Functions.
-
Examples
Example 1: Simple API_CALL Load Step to Google Cloud functions
-
POST request to a specified URL.
-
Default timeout: 300 seconds.
-
Authorization: Bearer token authentication
-
<bearer_token_from_bq_service_account>is special variable only supported when target database is BigQuery and we can use same credentials for Google API-calls what we use for the target database.
-
-
Request Content: Simple JSON payload.
-
Retry Mechanism: Up to 4 retries with default logic.
-
Response Handling: Default response transformation where 200/201 are SUCCESS and other cases are FAILED
type: HTTP
request:
url: https://europe-north1-test-ade-gcp-ci.cloudfunctions.net/ade-ci-test-function
method: POST
headers:
Authorization: <bearer_token_from_bq_service_account>
content: |
{"name": "test1"}
retries:
total: 4
Example 2: API_CALL with Polling
Load step 1: Start External Process and Capture Process Name
-
POST request with JSON payload
-
Authorization: Bearer token authentication
-
Retries: Retry
401, 404, 429, 500, 502, 503, 504error codes or timeouts of 10 seconds -
Response Handling:
-
Parse process_name variable from 200/201 response by using response transformation definition
-
Failure if other responses are received.
-
type: HTTP
request:
url: https://europe-north1-ade-gcp-ci.cloudfunctions.net/ade-ci-data-processor
timeout_seconds: 10
method: POST
headers:
Content-Type: application/json
Authorization: <bearer_token_from_bq_service_account>
content: |
{"process_time":"30","future_result":"SUCCESS","info":"Test call 1"}
retries:
total: 3
rules:
- conditions:
- type: VALUE_MATCHER
source: <http_status_code>
values: [401, 404, 429, 500, 502, 503, 504]
response:
transformations:
- description: In case of success we extract returned process_name value with is needed in next step polling
conditions:
- type: VALUE_MATCHER
source: <http_status_code>
values:
- 200
- 201
variables:
process_name:
type: REGEXP_VALUE
source: <http_response_content>
regexp: '"process_name"\s*:\s*"([^"]+)"'
Load step 2: Poll Until External Process Completes in external system
-
GET endpoint with url containing <process_name> variable extracted in previous load step
-
Authorization: Bearer token authentication
-
Retries: Retry until we get 200 response where status is not RUNNING
-
backoff min and max are configured to same to get same interval with polling (every 5 seconds)
-
total is configured to 12 → 12 * 5s = 60s (we expect external process finish in 60 seconds)
-
-
Response Handling:
-
Check that we get 200 response which return status SUCCESS
-
If anything else comes this step fails which indicates that actual external process executed and finished but with fail status
-
type: HTTP
request:
# in url we use variable containing process name from previous step
url: https://europe-north1-ade-gcp-ci.cloudfunctions.net/ade-ci-data-processor/<process_name>/status
timeout_seconds: 10
method: GET
headers:
# We need to change audience of token because it is otherwise taken from url and for sub resources it won't work for this cloud function
Authorization: <bearer_token_from_bq_service_account:audience=https://europe-north1-ade-gcp-ci.cloudfunctions.net/ade-ci-data-processor>
retries:
# we do status polling every 5 seconds and assumne process should be ready in 60 secs -> 60/5=12
total: 12
backoff_min_seconds: 5
backoff_max_seconds: 5
# we retry polling as long as we get non 200 response or status endpoint return RUNNING state (or we hit retry total count)
rules:
- conditions:
- type: VALUE_MATCHER
source: <http_status_code>
negate: true
values: [200]
- conditions:
- type: VALUE_MATCHER
source: <http_status_code>
values: [200]
- type: REGEXP_VALUE_MATCHER
source: <http_response_content>
regexp: 'RUNNING'
values:
- RUNNING
response:
transformations:
- description: Step is successful if 200 is returned and state is SUCCESS anymore (process has finished successfully in expected time span)
conditions:
- type: VALUE_MATCHER
source: <http_status_code>
values:
- 200
- type: VALUE_MATCHER
source: <http_response_content>
values:
- SUCCESS
Load step 3: Retrieve Results from Completed Process
Retrieving data from external process result endpoint after process has finished.
-
GET endpoint with url containing <process_name> variable extracted in previous load step
-
Authorization: Bearer token authentication
-
Retries: Default retries
-
Response Handling:
-
Check that we get 200 response and parse result_info with reg exp
-
If anything else comes this step fails
-
type: HTTP
request:
url: https://europe-north1-ade-gcp-ci.cloudfunctions.net/ade-ci-data-processor/<process_name>
timeout_seconds: 10
method: GET
headers:
Content-Type: application/json
Authorization: <bearer_token_from_bq_service_account:audience=https://europe-north1--ade-gcp-ci.cloudfunctions.net/ade-ci-data-processor>
retries:
total: 3
response:
transformations:
- description: In case of success extract affected rows
conditions:
- type: VALUE_MATCHER
source: <http_status_code>
values:
- 200
variables:
result_info:
type: REGEXP_VALUE
source: <http_response_content>
regexp: '"generation_time"\s*:\s*"([^"]+)"'