API_CALL
Target cloud: Google Cloud
API_CALL is a load language that is used for defining load steps that do API calls to external systems. This can be used when there is a need to trigger something externally or wait and poll some external system state as part of load process.
API_CALL load steps are defined with YAML which has certain schema which allows use different features.
API_CALL supports technically following functionalities:
Make any HTTP call by defining
Url (only https allowed)
Method: GET, POST, PUT, DELETE
Headers
Query parameters (automatically url encoded)
Timeout definition
Content
Retry logic by defining
Retry count
Backoff factor, min and max delays etc
Rules based on
HTTP response codes
Content extraction
Response transformations by defining
Conditions used to select wanted transformation based on
HTTP response codes
Content extraction
Result (SUCCESS, FAILED)
Affected rows
Variables (output which can be used in later steps etc)
Transformations results and variables can be based on
HTTP response codes
Content extraction
Constants
Use of variables (environment variables, workflow variables, etc.) are supported in API_CALL load steps.
See also:
Examples
Minimal example which does simple http call to Google Cloud functions:
POST to given URL
Default timeout 300s
Authorization with bearer token
<bearer_token_from_bq_service_account> is special variable only supported when target db is big query and we can use same credentials for google function call what we use for target db
Simple json content
Maximum 4 retries with default retry logic
Default response transformation where 200/201 are SUCCESS and other cases are FAILED
type: HTTP
request:
url: https://europe-north1-test-ade-gcp-ci.cloudfunctions.net/ade-ci-test-function
method: POST
headers:
Authorization: <bearer_token_from_bq_service_account>
content: |
{"name": "test1"}
retries:
total: 4
More extensive example with api using polling:
Step 1: Initiating process in external system
POST endpoint with json content
Authorization with bearer token
Retry
401, 404, 429, 500, 502, 503, 504
error codes or timeouts of 10 secondsParse process_name variable from 200/201 response by using response transformation definition
If anything else comes this step fails
type: HTTP
request:
url: https://europe-north1-ade-gcp-ci.cloudfunctions.net/ade-ci-data-processor
timeout_seconds: 10
method: POST
headers:
Content-Type: application/json
Authorization: <bearer_token_from_bq_service_account>
content: |
{"process_time":"30","future_result":"SUCCESS","info":"Test call 1"}
retries:
total: 3
rules:
- conditions:
- type: VALUE_MATCHER
source: <http_status_code>
values: [401, 404, 429, 500, 502, 503, 504]
response:
transformations:
- description: In case of success we extract returned process_name value with is needed in next step polling
conditions:
- type: VALUE_MATCHER
source: <http_status_code>
values:
- 200
- 201
variables:
process_name:
type: REGEXP_VALUE
source: <http_response_content>
regexp: '"process_name"\s*:\s*"([^"]+)"'
Step 2: Polling until process finishes in external system
GET endpoint with url containing <process_name> variable extracted in previous load step
Authorization with bearer token
Retry until we get 200 response where status is not RUNNING
backoff min and max are configured to same to get same interval with polling (every 5 seconds)
total is configured to 12 → 12 * 5s = 60s (we expect external process finish in 60 seconds)
Check that we get 200 response which return status SUCCESS
If anything else comes this step fails which indicates that actual external process executed and finished but with fail status
type: HTTP
request:
# in url we use variable containing process name from previous step
url: https://europe-north1-ade-gcp-ci.cloudfunctions.net/ade-ci-data-processor/<process_name>/status
timeout_seconds: 10
method: GET
headers:
# We need to change audience of token because it is otherwise taken from url and for sub resources it won't work for this cloud function
Authorization: <bearer_token_from_bq_service_account:audience=https://europe-north1-ade-gcp-ci.cloudfunctions.net/ade-ci-data-processor>
retries:
# we do status polling every 5 seconds and assumne process should be ready in 60 secs -> 60/5=12
total: 12
backoff_min_seconds: 5
backoff_max_seconds: 5
# we retry polling as long as we get non 200 response or status endpoint return RUNNING state (or we hit retry total count)
rules:
- conditions:
- type: VALUE_MATCHER
source: <http_status_code>
negate: true
values: [200]
- conditions:
- type: VALUE_MATCHER
source: <http_status_code>
values: [200]
- type: REGEXP_VALUE_MATCHER
source: <http_response_content>
regexp: 'RUNNING'
values:
- RUNNING
response:
transformations:
- description: Step is successful if 200 is returned and state is SUCCESS anymore (process has finished successfully in expected time span)
conditions:
- type: VALUE_MATCHER
source: <http_status_code>
values:
- 200
- type: VALUE_MATCHER
source: <http_response_content>
values:
- SUCCESS
Step 3: Example of retrieving some data from external process result endpoint after process has finished
GET endpoint with url containing <process_name> variable extracted in previous load step
Authorization with bearer token
Default retries
Check that we get 200 response and parse result_info with reg exp
If anything else comes this step fails
type: HTTP
request:
url: https://europe-north1-ade-gcp-ci.cloudfunctions.net/ade-ci-data-processor/<process_name>
timeout_seconds: 10
method: GET
headers:
Content-Type: application/json
Authorization: <bearer_token_from_bq_service_account:audience=https://europe-north1--ade-gcp-ci.cloudfunctions.net/ade-ci-data-processor>
retries:
total: 3
response:
transformations:
- description: In case of success extract affected rows
conditions:
- type: VALUE_MATCHER
source: <http_status_code>
values:
- 200
variables:
result_info:
type: REGEXP_VALUE
source: <http_response_content>
regexp: '"generation_time"\s*:\s*"([^"]+)"'