Skip to main content
Skip table of contents

API_CALL

Target cloud: Google Cloud

API_CALL is a load language used to define load steps that make API calls to external systems. It enables triggering external processes or polling external system states as part of a load process.

API_CALL load steps are defined using YAML with a structured schema that supports various features.


See also:


Usage

Load step created to a load by using API_CALL as the load step language. It is possible to use OVERRIDE, POST or PRE step to define load steps with API_CALL.

Supported Functionalities

API_CALL supports technically following functionalities:

  • HTTP Requests

    • URL: Only HTTPS is allowed.

    • Methods: GET, POST, PUT, DELETE.

    • Headers

    • Query Parameters: Automatically URL-encoded.

    • Timeout Definition: Configurable timeout settings.

    • Content: Ability to send JSON payloads.

  • Retry Logic

    • Retry count

    • Backoff factor, min and max delays etc

    • Rules based on

      • HTTP response codes

      • Content extraction

  • Response transformations by defining

    • Conditions used to select wanted transformation based on

      • HTTP response codes

      • Content extraction

    • Result (SUCCESS, FAILED)

    • Affected rows

    • Variables (output which can be used in later steps etc)

    • Transformations results and variables can be based on

      • HTTP response codes

      • Content extraction

      • Constants

Use of variables

API_CALL supports:

  • Environment Variables: Variables defined in CONFIG_ENVIRONMENT_VARIABLES.

  • Workflow Variables: Variables defined in CONFIG_LOAD_SCHEDULES.

  • Variables documented in Variables.

  • Special Variables:

    • <bearer_id_token_from_bq_service_account>: Uses Service Account credentials with a bearer ID token, primarily used when invoking Google Cloud Functions.

    • <bearer_token_from_bq_service_account>: Alternative to bearer_id_token_from_bq_service_account.

    • <bearer_access_token_from_bq_service_account>: An alternative to the ID token, it can be used with other Google Cloud APIs besides Cloud Functions.

Examples

Example 1: Simple API_CALL Load Step to Google Cloud functions

  • POST request to a specified URL.

  • Default timeout: 300 seconds.

  • Authorization: Bearer token authentication

    • <bearer_token_from_bq_service_account> is special variable only supported when target database is BigQuery and we can use same credentials for Google API-calls what we use for the target database.

  • Request Content: Simple JSON payload.

  • Retry Mechanism: Up to 4 retries with default logic.

  • Response Handling: Default response transformation where 200/201 are SUCCESS and other cases are FAILED

YAML
type: HTTP
request:
  url: https://europe-north1-test-ade-gcp-ci.cloudfunctions.net/ade-ci-test-function
  method: POST 
  headers:
    Authorization: <bearer_token_from_bq_service_account>
  content: |
    {"name": "test1"}
retries:
  total: 4

Example 2: API_CALL with Polling

Load step 1: Start External Process and Capture Process Name

  • POST request with JSON payload

  • Authorization: Bearer token authentication

  • Retries: Retry 401, 404, 429, 500, 502, 503, 504 error codes or timeouts of 10 seconds

  • Response Handling:

    • Parse process_name variable from 200/201 response by using response transformation definition

    • Failure if other responses are received.

YAML
type: HTTP
request:
  url: https://europe-north1-ade-gcp-ci.cloudfunctions.net/ade-ci-data-processor
  timeout_seconds: 10
  method: POST
  headers:
    Content-Type: application/json
    Authorization: <bearer_token_from_bq_service_account>
  content: |
    {"process_time":"30","future_result":"SUCCESS","info":"Test call 1"}
retries:
    total: 3
    rules:
      - conditions:
          - type: VALUE_MATCHER
            source: <http_status_code>
            values: [401, 404, 429, 500, 502, 503, 504]
response:
  transformations:
    - description: In case of success we extract returned process_name value with is needed in next step polling
      conditions:
        - type: VALUE_MATCHER
          source: <http_status_code>
          values:
            - 200
            - 201
      variables:
        process_name:
          type: REGEXP_VALUE
          source: <http_response_content>
          regexp: '"process_name"\s*:\s*"([^"]+)"'

Load step 2: Poll Until External Process Completes in external system

  • GET endpoint with url containing <process_name> variable extracted in previous load step

  • Authorization: Bearer token authentication

  • Retries: Retry until we get 200 response where status is not RUNNING

    • backoff min and max are configured to same to get same interval with polling (every 5 seconds)

    • total is configured to 12 → 12 * 5s = 60s (we expect external process finish in 60 seconds)

  • Response Handling:

    • Check that we get 200 response which return status SUCCESS

    • If anything else comes this step fails which indicates that actual external process executed and finished but with fail status

YAML
type: HTTP
request:
  # in url we use variable containing process name from previous step
  url: https://europe-north1-ade-gcp-ci.cloudfunctions.net/ade-ci-data-processor/<process_name>/status
  timeout_seconds: 10
  method: GET
  headers:
    # We need to change audience of token because it is otherwise taken from url and for sub resources it won't work for this cloud function
    Authorization: <bearer_token_from_bq_service_account:audience=https://europe-north1-ade-gcp-ci.cloudfunctions.net/ade-ci-data-processor>
retries:
    # we do status polling every 5 seconds and assumne process should be ready in 60 secs -> 60/5=12
    total: 12
    backoff_min_seconds: 5
    backoff_max_seconds: 5    
    # we retry polling as long as we get non 200 response or status endpoint return RUNNING state (or we hit retry total count)
    rules:
      - conditions:
          - type: VALUE_MATCHER
            source: <http_status_code>
            negate: true
            values: [200]
      - conditions:
          - type: VALUE_MATCHER
            source: <http_status_code>
            values: [200]
          - type: REGEXP_VALUE_MATCHER
            source: <http_response_content>
            regexp: 'RUNNING'
            values:
              - RUNNING
response:
  transformations:
    - description: Step is successful if 200 is returned and state is SUCCESS anymore (process has finished successfully in expected time span)
      conditions:
        - type: VALUE_MATCHER
          source: <http_status_code>
          values:
            - 200
        - type: VALUE_MATCHER
          source: <http_response_content>
          values:
           - SUCCESS

Load step 3: Retrieve Results from Completed Process

Retrieving data from external process result endpoint after process has finished.

  • GET endpoint with url containing <process_name> variable extracted in previous load step

  • Authorization: Bearer token authentication

  • Retries: Default retries

  • Response Handling:

    • Check that we get 200 response and parse result_info with reg exp

    • If anything else comes this step fails

YAML
type: HTTP
request:
  url: https://europe-north1-ade-gcp-ci.cloudfunctions.net/ade-ci-data-processor/<process_name>
  timeout_seconds: 10
  method: GET
  headers:
    Content-Type: application/json
    Authorization: <bearer_token_from_bq_service_account:audience=https://europe-north1--ade-gcp-ci.cloudfunctions.net/ade-ci-data-processor>
retries:
    total: 3
response:
  transformations:
    - description: In case of success extract affected rows
      conditions:
        - type: VALUE_MATCHER
          source: <http_status_code>
          values:
            - 200
      variables:
        result_info:
          type: REGEXP_VALUE
          source: <http_response_content>
          regexp: '"generation_time"\s*:\s*"([^"]+)"'
JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.