API_CALL
Target cloud: Google Cloud
API_CALL is a load language used to define load steps that make API calls to external systems. It enables triggering external processes or polling external system states as part of a load process.
API_CALL load steps are defined using YAML with a structured schema that supports various features.
See also:
Usage
Load step created to a load by using API_CALL as the load step language. It is possible to use OVERRIDE, POST or PRE step to define load steps with API_CALL.
Supported Functionalities
API_CALL supports technically following functionalities:
HTTP Requests
URL: Only HTTPS is allowed.
Methods: GET, POST, PUT, DELETE.
Headers
Query Parameters: Automatically URL-encoded.
Timeout Definition: Configurable timeout settings.
Content: Ability to send JSON payloads.
Retry Logic
Retry count
Backoff factor, min and max delays etc
Rules based on
HTTP response codes
Content extraction
Response transformations by defining
Conditions used to select wanted transformation based on
HTTP response codes
Content extraction
Result (SUCCESS, FAILED)
Affected rows
Variables (output which can be used in later steps etc)
Transformations results and variables can be based on
HTTP response codes
Content extraction
Constants
Use of variables
API_CALL supports:
Environment Variables: Variables defined in CONFIG_ENVIRONMENT_VARIABLES.
Workflow Variables: Variables defined in CONFIG_LOAD_SCHEDULES.
Variables documented in Variables.
Special Variables:
<bearer_id_token_from_bq_service_account>
: Uses Service Account credentials with a bearer ID token, primarily used when invoking Google Cloud Functions.<bearer_token_from_bq_service_account>
: Alternative tobearer_id_token_from_bq_service_account
.<bearer_access_token_from_bq_service_account>
: An alternative to the ID token, it can be used with other Google Cloud APIs besides Cloud Functions.
Examples
Example 1: Simple API_CALL Load Step to Google Cloud functions
POST request to a specified URL.
Default timeout: 300 seconds.
Authorization: Bearer token authentication
<bearer_token_from_bq_service_account>
is special variable only supported when target database is BigQuery and we can use same credentials for Google API-calls what we use for the target database.
Request Content: Simple JSON payload.
Retry Mechanism: Up to 4 retries with default logic.
Response Handling: Default response transformation where 200/201 are SUCCESS and other cases are FAILED
type: HTTP
request:
url: https://europe-north1-test-ade-gcp-ci.cloudfunctions.net/ade-ci-test-function
method: POST
headers:
Authorization: <bearer_token_from_bq_service_account>
content: |
{"name": "test1"}
retries:
total: 4
Example 2: API_CALL with Polling
Load step 1: Start External Process and Capture Process Name
POST request with JSON payload
Authorization: Bearer token authentication
Retries: Retry
401, 404, 429, 500, 502, 503, 504
error codes or timeouts of 10 secondsResponse Handling:
Parse process_name variable from 200/201 response by using response transformation definition
Failure if other responses are received.
type: HTTP
request:
url: https://europe-north1-ade-gcp-ci.cloudfunctions.net/ade-ci-data-processor
timeout_seconds: 10
method: POST
headers:
Content-Type: application/json
Authorization: <bearer_token_from_bq_service_account>
content: |
{"process_time":"30","future_result":"SUCCESS","info":"Test call 1"}
retries:
total: 3
rules:
- conditions:
- type: VALUE_MATCHER
source: <http_status_code>
values: [401, 404, 429, 500, 502, 503, 504]
response:
transformations:
- description: In case of success we extract returned process_name value with is needed in next step polling
conditions:
- type: VALUE_MATCHER
source: <http_status_code>
values:
- 200
- 201
variables:
process_name:
type: REGEXP_VALUE
source: <http_response_content>
regexp: '"process_name"\s*:\s*"([^"]+)"'
Load step 2: Poll Until External Process Completes in external system
GET endpoint with url containing <process_name> variable extracted in previous load step
Authorization: Bearer token authentication
Retries: Retry until we get 200 response where status is not RUNNING
backoff min and max are configured to same to get same interval with polling (every 5 seconds)
total is configured to 12 → 12 * 5s = 60s (we expect external process finish in 60 seconds)
Response Handling:
Check that we get 200 response which return status SUCCESS
If anything else comes this step fails which indicates that actual external process executed and finished but with fail status
type: HTTP
request:
# in url we use variable containing process name from previous step
url: https://europe-north1-ade-gcp-ci.cloudfunctions.net/ade-ci-data-processor/<process_name>/status
timeout_seconds: 10
method: GET
headers:
# We need to change audience of token because it is otherwise taken from url and for sub resources it won't work for this cloud function
Authorization: <bearer_token_from_bq_service_account:audience=https://europe-north1-ade-gcp-ci.cloudfunctions.net/ade-ci-data-processor>
retries:
# we do status polling every 5 seconds and assumne process should be ready in 60 secs -> 60/5=12
total: 12
backoff_min_seconds: 5
backoff_max_seconds: 5
# we retry polling as long as we get non 200 response or status endpoint return RUNNING state (or we hit retry total count)
rules:
- conditions:
- type: VALUE_MATCHER
source: <http_status_code>
negate: true
values: [200]
- conditions:
- type: VALUE_MATCHER
source: <http_status_code>
values: [200]
- type: REGEXP_VALUE_MATCHER
source: <http_response_content>
regexp: 'RUNNING'
values:
- RUNNING
response:
transformations:
- description: Step is successful if 200 is returned and state is SUCCESS anymore (process has finished successfully in expected time span)
conditions:
- type: VALUE_MATCHER
source: <http_status_code>
values:
- 200
- type: VALUE_MATCHER
source: <http_response_content>
values:
- SUCCESS
Load step 3: Retrieve Results from Completed Process
Retrieving data from external process result endpoint after process has finished.
GET endpoint with url containing <process_name> variable extracted in previous load step
Authorization: Bearer token authentication
Retries: Default retries
Response Handling:
Check that we get 200 response and parse result_info with reg exp
If anything else comes this step fails
type: HTTP
request:
url: https://europe-north1-ade-gcp-ci.cloudfunctions.net/ade-ci-data-processor/<process_name>
timeout_seconds: 10
method: GET
headers:
Content-Type: application/json
Authorization: <bearer_token_from_bq_service_account:audience=https://europe-north1--ade-gcp-ci.cloudfunctions.net/ade-ci-data-processor>
retries:
total: 3
response:
transformations:
- description: In case of success extract affected rows
conditions:
- type: VALUE_MATCHER
source: <http_status_code>
values:
- 200
variables:
result_info:
type: REGEXP_VALUE
source: <http_response_content>
regexp: '"generation_time"\s*:\s*"([^"]+)"'