Load steps with user-defined functions
Guide objective
This guide focuses on how to use target database user-defined functions (UDFs) in Agile Data Engine loads.
Snowflake
See also:
Examples
Creating or replacing a function in a load step
In this example, data is generated to target table STG_GENERATED_DATA with a user-defined function written in Python. Note that this example uses built-in ADE variables target_schema and target_entity_name.
The following load steps are defined:
Truncate the target table in the first load step:
1. LOAD STEP NAME: truncate table
With SQL:
truncate table <target_schema>.<target_entity_name>
Create or replace temporary function:
2. LOAD STEP NAME: create_or_replace_temporary_function
With SQL:
create or replace temporary function <target_schema>.fake(
locale VARCHAR, seed INTEGER)
returns variant
language python
runtime_version = '3.8'
packages = ('faker','simplejson')
handler = 'fake'
as
$$
import simplejson as json
from faker import Faker
def fake(locale, seed):
fake = Faker(locale)
Faker.seed(seed)
results = {}
results['name'] = fake.name()
results['phone'] = fake.phone_number()
results['comp_name'] = fake.company()
data = json.dumps(results)
return json.loads(data)
$$;
Inserting generated data by calling the created UDF:
3. LOAD STEP NAME: insert_fake_data
With SQL:
insert into <target_schema>.<target_entity_name>
with faker as (
select
seq4() as rn,
<target_schema>.fake('fi_FI', rn) as rs
from table(generator(rowcount => 10000)))
select
date_part(epoch_millisecond, current_timestamp()) as stg_batch_id,
current_timestamp() as stg_create_time,
'faker' as stg_source_system,
'generated_data' as stg_source_entity,
'N/A' as stg_file_name,
rn,
rs:name::string as name,
rs:phone::string as phone,
rs:comp_name::string as comp_name
from faker;
Entity attributes in ADE:
Load steps in ADE:
Example package in Github
This example is available as an ADE package in GitHub. Packages can be imported with Import Package.
Using EXECUTE IMMEDIATE
With EXECUTE IMMEDIATE, you can execute SQL and Snowflake Scripting statements without creating a function object.
Use EXECUTE IMMEDIATE in a load step as follows:
LOAD STEP NAME: execute_immediate
With SQL:
EXECUTE IMMEDIATE $$
DECLARE
...
BEGIN
...
END;
$$
;
BigQuery
See also:
Examples
Using temporary functions in load steps
With BigQuery temporary functions in ADE load steps, you can use logic which could be difficult to implement with SQL.
In this example, a temporary function is created to generate test data. Javascript library faker is used to generate the data. Generated data is inserted into target entity STG_TEST_DATA.
Note that this example uses built-in ADE variables target_schema and target_entity_name.
The following load steps are added:
1. LOAD STEP NAME: truncate_table
With SQL:
truncate table <target_schema>.<target_entity_name>
Load step with temporary function and insert statement are added to same load step:
2. LOAD STEP NAME: generate_and_insert_dummy_data
With SQL:
CREATE TEMP FUNCTION entity(seed INT64)
RETURNS STRUCT<name String, phone String, ipv6 String, comp_name String>
LANGUAGE js
AS """
var f = getFaker()
f.seed(Number(seed))
var t = {};
t.name = f.name.findName();
t.phone = f.phone.phoneNumber('358-###-###-###')
t.ipv6 = f.internet.ipv6();
t.comp_name = f.company.companyName()
return t;
"""
OPTIONS (
library=["gs://<bq_bucket>/bq_fake.js"]
);
INSERT INTO <target_schema>.<target_entity_name>
SELECT
UNIX_MILLIS(current_timestamp()) as stg_batch_id,
current_timestamp() as stg_create_time,
'system' as stg_source_system,
'test_data' as stg_source_entity,
'N/A' as stg_file_name,
row_num,
entity(row_num).*
FROM UNNEST(GENERATE_ARRAY(1, 10000)) AS row_num;
If you want to use this function, be sure to place bq_fake.js in a GCS bucket.
Target entity attributes in ADE:
Load steps in ADE:
Example package in Github
This example is available as an ADE package in GitHub. Packages can be imported with Import Package.