Configuring Target Databases

Guide objective

This guide helps you set up a target database for Agile Data Engine.

Agile Data Engine supports the following database products as a target database:

Amazon Redshift
Azure SQL Database
Azure Synapse SQL (dedicated SQL pool, formerly Azure SQL DW)
Databricks SQL
Google BigQuery
Microsoft Fabric
Snowflake

Please refer to the database product specific instructions below to set up a target database for an Agile Data Engine Runtime environment.

Amazon Redshift

Contact our Professional Services for examples via support portal.

Azure SQL Database

Set up a separate Azure SQL Database for each Agile Data Engine Runtime. Within the databases, you need to set up the following objects:

LOGIN (master database)
MASTER KEY
ROLE
USER
EXTERNAL DATA SOURCE (optional)

Repeat below steps in each environment. Feel free to name objects according to your naming conventions.

Execute step 1 in the SQL server master database.

Execute steps 2-7 in the target database.

Create a login:
SQL
```
CREATE LOGIN ADE_LOGIN WITH PASSWORD = '***';
```
Generate strong environment specific passwords and store them securely.
Create a master key:
SQL
```
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '***';
```
Refer to Microsoft documentation for more information about the master key.
Create a role:
SQL
```
CREATE ROLE ADE_ROLE;
```

Grant permissions to the role:

SQL

GRANT CONTROL ON DATABASE::[CLOUDDW-DEV] TO ADE_ROLE;
GRANT ALTER ANY SCHEMA TO ADE_ROLE;
GRANT CREATE TABLE TO ADE_ROLE;
GRANT CREATE VIEW TO ADE_ROLE;

Replace CLOUDDW-DEV with the target database name.

Create a user for the login:

SQL

CREATE USER [ADE_USER] FOR LOGIN [ADE_LOGIN] WITH DEFAULT_SCHEMA = dbo;

Assign the role to the user:

SQL

EXEC sp_addrolemember 'ADE_ROLE', 'ADE_USER';

Optionally, create an EXTERNAL DATA SOURCE by following the Microsoft documentation. For more information, please refer to Loading Source Data.

Azure Synapse SQL

Set up a separate database for each Agile Data Engine Runtime. Within the databases, you need to set up the following objects:

LOGIN (master database)
MASTER KEY
ROLE
USER

Repeat below steps in each environment. Feel free to name objects according to your naming conventions.

Execute step 1 in the master database.

Execute steps 2-6 in the target database.

Create a login:
SQL
```
CREATE LOGIN ADE_LOGIN WITH PASSWORD = '***';
```
Generate strong environment specific passwords and store them securely.
Create a master key:
SQL
```
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '***';
```
Refer to Microsoft documentation for more information about the master key.
Create a role:
SQL
```
CREATE ROLE ADE_ROLE;
```

Grant permission to the role:

SQL

GRANT CONTROL ON DATABASE::[CLOUDDW-DEV] TO ADE_ROLE;
GRANT ALTER ANY SCHEMA TO ADE_ROLE;
GRANT CREATE TABLE TO ADE_ROLE;
GRANT CREATE VIEW TO ADE_ROLE;

Replace CLOUDDW-DEV with the target database name.

Create a user for the login:

SQL

CREATE USER [ADE_USER] FOR LOGIN [ADE_LOGIN] WITH DEFAULT_SCHEMA = dbo;

Assign the role to the user:

SQL

EXEC sp_addrolemember 'ADE_ROLE', 'ADE_USER';

Databricks SQL

Currently Agile Data Engine is tested to work with Databricks SQL on Azure. It requires Unity Catalog to set up and Agile Data Engine creates all the tables as Delta tables with properties

'delta.columnMapping.mode' = 'name'
'delta.minReaderVersion' = '2'
'delta.minWriterVersion' = '5'
'delta.feature.allowColumnDefaults' = 'supported'

Data loading is supported with managed identity and data location needs to be defined as external location:https://learn.microsoft.com/en-us/azure/databricks/data-governance/unity-catalog/manage-external-locations-and-credentials#manage-storage-credentials

For further examples and Terraform templates contact our Professional Services via support portal.

Google BigQuery

Set up a separate Google Cloud Project for each Agile Data Engine Runtime. Within the projects, you need to set up the following:

Service account
Custom IAM role assigned to the service account
- This IAM role needs accesses to BigQuery jobs, datasets, tables, routines and connections
GCP data bucket read access to the service account

It is recommended to use Terraform to set up these GCP resources.

Terraform template, which is using Terraform workspaces.

TERRAFORM

# Service account
resource "google_service_account" "ade_account" {
  project = var.project
  account_id = "sa-${var.bigquery_app}-${terraform.workspace}"
  display_name = "Service account for ${var.bigquery_app}-${terraform.workspace}"
}

# Role grants
resource "google_project_iam_custom_role" "ade_role" {
  role_id     = "adeRole"
  title       = "ADE role"
  description = "Role for ADE usage"
  permissions = [
              "bigquery.jobs.create",
              "bigquery.jobs.get",
              "bigquery.jobs.list",
              "bigquery.jobs.listAll",
              "bigquery.jobs.delete",
              "bigquery.jobs.update",
              "bigquery.datasets.create", 
              "bigquery.datasets.createTagBinding",
              "bigquery.datasets.delete",
              "bigquery.datasets.deleteTagBinding",
              "bigquery.datasets.get",
              "bigquery.datasets.getIamPolicy",
              "bigquery.datasets.link",
              "bigquery.datasets.listTagBindings",
              "bigquery.datasets.setIamPolicy",
              "bigquery.datasets.update",
              "bigquery.tables.create",
              "bigquery.tables.createIndex",
              "bigquery.tables.createSnapshot",
              "bigquery.tables.delete",
              "bigquery.tables.deleteIndex",
              "bigquery.tables.deleteSnapshot",
              "bigquery.tables.export",
              "bigquery.tables.get",
              "bigquery.tables.getData",
              "bigquery.tables.getIamPolicy",
              "bigquery.tables.list",
              "bigquery.tables.restoreSnapshot",
              "bigquery.tables.setCategory",
              "bigquery.tables.setIamPolicy",
              "bigquery.tables.update",
              "bigquery.tables.updateData",
              "bigquery.connections.get",
              "bigquery.connections.getIamPolicy",
              "bigquery.connections.list",
              "bigquery.connections.use",
              "bigquery.routines.create",
              "bigquery.routines.delete",
              "bigquery.routines.get",
              "bigquery.routines.list",
              "bigquery.routines.update",
              "bigquery.routines.updateTag"
              ]
}

resource "google_project_iam_binding" "ade_role_to_user" {
  project = var.project
  role = "projects/${var.project}/roles/${google_project_iam_custom_role.ade_role.role_id}"

  members = [
    "serviceAccount:${google_service_account.ade_account.email}"
  ]
}

# Bucket IAM binding
resource "google_storage_bucket_iam_binding" "databucket_binding" {
  bucket = google_storage_bucket.bucket.name
  role = "roles/storage.objectViewer"
  members = ["serviceAccount:${google_service_account.ade_account.email}"]
}

Microsoft Fabric

Contact our Professional Services for examples via support portal.

Snowflake

To connect an Agile Data Engine Runtime to Snowflake, you need to set up the following objects on your Snowflake account:

USER
ROLE
DATABASE
WAREHOUSE
STORAGE INTEGRATION (optional)

Below examples presume that the default role setup in Snowflake is used. If the role setup in your Snowflake account is custom, please adapt the examples accordingly. Also feel free to name objects according to your naming conventions.

Create a database for each ADE Runtime environment:

SQL

USE ROLE SYSADMIN;

CREATE DATABASE CLOUDDW_DEV;
CREATE DATABASE CLOUDDW_TEST;
CREATE DATABASE CLOUDDW_PROD;

Create a warehouse for each ADE Runtime environment:

SQL

USE ROLE SYSADMIN;

CREATE WAREHOUSE WH_ADE_DEV 
WITH
    WAREHOUSE_SIZE = 'XSMALL'
    AUTO_SUSPEND = 60
    AUTO_RESUME = TRUE
    INITIALLY_SUSPENDED = TRUE
    MIN_CLUSTER_COUNT = 1
    MAX_CLUSTER_COUNT = 1
    SCALING_POLICY = 'STANDARD'
    COMMENT = 'Agile Data Engine DEV';
    
CREATE WAREHOUSE WH_ADE_TEST
WITH
    WAREHOUSE_SIZE = 'XSMALL'
    AUTO_SUSPEND = 60
    AUTO_RESUME = TRUE
    INITIALLY_SUSPENDED = TRUE
    MIN_CLUSTER_COUNT = 1
    MAX_CLUSTER_COUNT = 1
    SCALING_POLICY = 'STANDARD'
    COMMENT = 'Agile Data Engine TEST';
    
CREATE WAREHOUSE WH_ADE_PROD
WITH
    WAREHOUSE_SIZE = 'XSMALL'
    AUTO_SUSPEND = 60
    AUTO_RESUME = TRUE
    INITIALLY_SUSPENDED = TRUE
    MIN_CLUSTER_COUNT = 1
    MAX_CLUSTER_COUNT = 1
    SCALING_POLICY = 'STANDARD'
    COMMENT = 'Agile Data Engine PROD';

Customize warehouse settings as needed.

Note that it is optional to create separate warehouses for each environment; you could also use just one warehouse. However, having environment specific warehouses enables more control, easier scalability, and better traceability of costs. It is even possible to configure additional e.g. use case specific warehouses (in addition to the default environment specific warehouses), see CONFIG_LOAD_SCHEDULES and PRE load steps for more information.

Create a role for each Runtime environment:

SQL

USE ROLE USERADMIN;

CREATE ROLE ROLE_ADE_DEV COMMENT = 'Agile Data Engine DEV';
CREATE ROLE ROLE_ADE_TEST COMMENT = 'Agile Data Engine TEST';
CREATE ROLE ROLE_ADE_PROD COMMENT = 'Agile Data Engine PROD';

It is a good practice to have environment specific roles.

Grant the created roles to a higher role to ensure a continuous role hierarchy which is a good practice:

SQL

USE ROLE USERADMIN;

GRANT ROLE ROLE_ADE_DEV TO ROLE SYSADMIN;
GRANT ROLE ROLE_ADE_TEST TO ROLE SYSADMIN;
GRANT ROLE ROLE_ADE_PROD TO ROLE SYSADMIN;

Create a user for each Runtime environment with defaults referring the previously created objects:

SQL

USE ROLE USERADMIN;

CREATE USER S_ADE_DEV
    PASSWORD = '***'
    COMMENT = 'Service account for Agile Data Engine DEV.'
    DEFAULT_ROLE = ROLE_ADE_DEV
    DEFAULT_WAREHOUSE = WH_ADE_DEV;
    
CREATE USER S_ADE_TEST
    PASSWORD = '***'
    COMMENT = 'Service account for Agile Data Engine TEST.'
    DEFAULT_ROLE = ROLE_ADE_TEST
    DEFAULT_WAREHOUSE = WH_ADE_TEST;
    
CREATE USER S_ADE_PROD
    PASSWORD = '***'
    COMMENT = 'Service account for Agile Data Engine PROD.'
    DEFAULT_ROLE = ROLE_ADE_PROD
    DEFAULT_WAREHOUSE = WH_ADE_PROD;

Generate strong environment specific passwords and store them securely.

Grant the roles to the users:

SQL

USE ROLE USERADMIN;

GRANT ROLE ROLE_ADE_DEV TO USER S_ADE_DEV;
GRANT ROLE ROLE_ADE_TEST TO USER S_ADE_TEST;
GRANT ROLE ROLE_ADE_PROD TO USER S_ADE_PROD;

Grant privileges on the created objects to the ADE roles:

SQL

USE ROLE SYSADMIN;

-- Warehouses
GRANT MODIFY, OPERATE, USAGE ON WAREHOUSE WH_ADE_DEV TO ROLE ROLE_ADE_DEV;
GRANT MODIFY, OPERATE, USAGE ON WAREHOUSE WH_ADE_TEST TO ROLE ROLE_ADE_TEST;
GRANT MODIFY, OPERATE, USAGE ON WAREHOUSE WH_ADE_PROD TO ROLE ROLE_ADE_PROD;

-- Databases
GRANT USAGE, CREATE SCHEMA ON DATABASE CLOUDDW_DEV TO ROLE ROLE_ADE_DEV;
GRANT USAGE, CREATE SCHEMA ON DATABASE CLOUDDW_TEST TO ROLE ROLE_ADE_TEST;
GRANT USAGE, CREATE SCHEMA ON DATABASE CLOUDDW_PROD TO ROLE ROLE_ADE_PROD;

If you are using a STORAGE INTEGRATION to load source data into Snowflake, grant privileges to the ADE roles to use the storage integration:
SQL
```
USE ROLE ACCOUNTADMIN;

GRANT USAGE ON INTEGRATION ADE_SOURCE_DATA TO ROLE ROLE_ADE_DEV;
GRANT USAGE ON INTEGRATION ADE_SOURCE_DATA TO ROLE ROLE_ADE_TEST;
GRANT USAGE ON INTEGRATION ADE_SOURCE_DATA TO ROLE ROLE_ADE_PROD;
```
Replace ADE_SOURCE_DATA with the name of your storage integration(s). For more information, please refer to Loading Source Data.
Here is a summary of the privileges granted to Agile Data Engine in Snowflake:

Object	Required privileges	Optional privileges	Notes
DATABASE	USAGE, CREATE SCHEMA	-	By default, ADE creates target schemas into the database thereby getting schema ownership. If schemas already exist, sufficient schema privileges need to be granted to the ADE role.
WAREHOUSE	OPERATE, USAGE	MODIFY	MODIFY privilege enables warehouse scaling within workflows.
STORAGE INTEGRATION	USAGE	-	-