Google BigQuery

Required connection details

The following details are required per target instance configuration:

Detail	Example	Description
Host	`https://www.googleapis.com/bigquery/v2`	BigQuery API endpoint.
Port	`443`	Default HTTPS port for BigQuery connections.
Project ID	`project_id`	Google Cloud project ID containing the BigQuery datasets.
Default schema	`PUBLIC`	Default dataset (schema).
Workload identity pool details (recommended)	`//iam.googleapis.com/projects/PROJECT_NUMBER/locations/global/workloadIdentityPools/IDENTITY_POOL/providers/PROVIDER` For example: `//iam.googleapis.com/projects/123456789012/locations/global/workloadIdentityPools/wip-ade/providers/aws-ade`	Workload Identity Federation (WIF) is the recommended authentication method for ADE when connecting to BigQuery. PROJECT_NUMBER - Google Cloud project number IDENTITY_POOL - workload identity pool id PROVIDER - workload identity provider id
Service account key (alternative)	JSON { "type": "service_account", "project_id": "PROJECT_ID", "private_key_id": "KEY_ID", "private_key": "-----BEGIN PRIVATE KEY-----\nPRIVATE_KEY\n-----END PRIVATE KEY-----\n", "client_email": "SERVICE_ACCOUNT_EMAIL", "client_id": "CLIENT_ID", "auth_uri": "https://accounts.google.com/o/oauth2/auth", "token_uri": "https://accounts.google.com/o/oauth2/token", "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs", "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/SERVICE_ACCOUNT_EMAIL" }	Using a service account key is an alternative to WIF, but it is less secure and generally not recommended for production environments.
Service account email	`ade-service-account@my-google-project.iam.gserviceaccount.com`	Service account email used for service account impersonation.

Workload identity federation

Workload Identity Federation (WIF) is a Google Cloud authentication mechanism that allows external identities (e.g. from AWS, Azure) to access Google Cloud resources without creating or managing long-lived service account keys.

Instead of using JSON key files, WIF uses short-lived, securely issued tokens exchanged from trusted identity providers (like AWS IAM roles).

Agile Data Engine uses WIF to securely access Google BigQuery:

ADE assumes an AWS IAM role within its runtime environment.
That role is federated to a Google Workload Identity Pool, which trusts the AWS IAM role as an external identity.
Google allows that federated identity to impersonate a specific Google Cloud service account.
ADE then uses the short-lived token to access BigQuery on behalf of that service account securely and without static credentials.

Token lifetime extension

When ADE connects to BigQuery using Workload Identity Federation (WIF), it impersonates a service account in your Google Cloud project to request short-lived access tokens. By default, these tokens have a lifetime of 1 hour.

In case of long-running loads or other operations, tokens with a longer lifetime (up to 12 hours) are necessary. To allow ADE to request longer-lived tokens, you must configure the following Organization Policy constraint in your Google Cloud environment:

CODE

constraints/iam.allowServiceAccountCredentialLifetimeExtension

Configuration

For each ADE Runtime environment, it is recommended to create a separate Google Cloud Project. This separation improves security, simplifies access control, and supports environment-level isolation (e.g. DEV, TEST, PROD).

Within each project, configure the following resources:

Service account

Create a dedicated service account for ADE to authenticate with Google Cloud services via impersonation.

Custom IAM role

Define and assign a custom IAM role to the service account.
This role must include permissions for (see full list in Terraform example):

BigQuery jobs
Datasets and tables
Routines and connections
Data Catalog & Tagging
Service account impersonation (WIF)

GCS data bucket access

Grant read access to the service account for any Google Cloud Storage (GCS) buckets used as data sources.

Workload identity federation

Configure a workload identity pool and provider to trust AWS IAM roles.
Grant the service account impersonation rights (roles/iam.workloadIdentityUser) for identities from the pool.

Terraform example

It is recommended to use Terraform to provision and manage these resources:

Terraform example

HCL

#############################
# VARIABLES (defined in tfvars or CLI)
#############################
# project             = "my-gcp-project-id"
# project_number      = "123456789012"                        # Required for Workload Identity Federation principal binding
# bigquery_app        = "ade"                                 # Prefix for SA and WIF naming
# environment         = "dev"                                 # e.g. dev, test, prod
# data_bucket_name    = "my-ade-dev-bucket"                   # GCS bucket name used by ADE
# aws_account_id      = "111122223333"                        # AWS account ID of the calling IAM role (request this from ADE Support)

#######################
# SERVICE ACCOUNT
#######################
resource "google_service_account" "ade_account" {
  project      = var.project
  account_id   = "sa-${var.bigquery_app}-${var.environment}"
  display_name = "Service account for ${var.bigquery_app}-${var.environment}"
}

#######################
# CUSTOM IAM ROLE
#######################
resource "google_project_iam_custom_role" "ade_role" {
  role_id     = "adeRole"
  title       = "ADE role"
  description = "Role for ADE usage"
  project     = var.project

  permissions = [
    # BigQuery Job Management
    "bigquery.jobs.create",              // Submit new jobs (queries, loads, etc.)
    "bigquery.jobs.get",                 // View metadata for individual jobs
    "bigquery.jobs.list",                // List jobs created by the user
    "bigquery.jobs.listAll",             // List all jobs in the project
    "bigquery.jobs.delete",              // Cancel/delete jobs
    "bigquery.jobs.update",              // Update job metadata
  
    # BigQuery Dataset Management
    "bigquery.datasets.create",          // Create new datasets
    "bigquery.datasets.delete",          // Delete datasets
    "bigquery.datasets.get",             // Read dataset metadata
    "bigquery.datasets.update",          // Modify dataset settings (labels, default table expiration, etc.)
    "bigquery.datasets.getIamPolicy",    // View dataset IAM policy
    "bigquery.datasets.setIamPolicy",    // Set dataset IAM policy
    "bigquery.datasets.link",            // Link datasets across projects
    "bigquery.datasets.createTagBinding",       // Attach tags to datasets
    "bigquery.datasets.deleteTagBinding",       // Remove tags from datasets
    "bigquery.datasets.listTagBindings",        // List attached tags
  
    # BigQuery Table Management
    "bigquery.tables.create",            // Create new tables
    "bigquery.tables.delete",            // Delete tables
    "bigquery.tables.get",               // View table metadata
    "bigquery.tables.getData",           // Read table data
    "bigquery.tables.list",              // List tables in a dataset
    "bigquery.tables.update",            // Modify table metadata/schema
    "bigquery.tables.updateData",        // Insert/update data via jobs
    "bigquery.tables.export",            // Export data from tables (e.g., to GCS)
    "bigquery.tables.restoreSnapshot",   // Restore table from snapshot
    "bigquery.tables.createSnapshot",    // Create snapshot of a table
    "bigquery.tables.deleteSnapshot",    // Delete snapshot
    "bigquery.tables.createIndex",       // Create search indexes
    "bigquery.tables.deleteIndex",       // Delete search indexes
    "bigquery.tables.setIamPolicy",      // Set IAM policy on tables
    "bigquery.tables.getIamPolicy",      // Get IAM policy on tables
    "bigquery.tables.setCategory",       // Apply Data Catalog category (e.g. policy tag)
  
    # BigQuery Routines (User-Defined Functions / Stored Procedures)
    "bigquery.routines.create",          // Create new routines
    "bigquery.routines.delete",          // Delete routines
    "bigquery.routines.get",             // Read routine definitions
    "bigquery.routines.list",            // List routines in a dataset
    "bigquery.routines.update",          // Modify routine definitions
    "bigquery.routines.updateTag",       // Modify associated tags on routines
  
    # BigQuery Connections
    "bigquery.connections.get",          // View connection metadata
    "bigquery.connections.list",         // List connections
    "bigquery.connections.getIamPolicy", // View IAM policy for connections
    "bigquery.connections.use",          // Use authorized external connections
  
    # Data Catalog & Tagging (for classification, masking, lineage)
    "datacatalog.taxonomies.get",              // Read taxonomy definitions
    "datacatalog.categories.fineGrainedGet",   // Read fine-grained policy tags
    "resourcemanager.tagValueBindings.create", // Attach resource tags
    "resourcemanager.tagValueBindings.delete", // Detach resource tags
    "bigquery.dataPolicies.maskedGet",         // View data masking policies
  
    # Required for Workload Identity Federation
    "iam.serviceAccounts.getAccessToken" // Allow impersonation using federated identity
  ]
}

#######################
# ROLE BINDING TO SERVICE ACCOUNT
#######################
resource "google_project_iam_binding" "ade_role_to_user" {
  project = var.project
  role    = "projects/${var.project}/roles/${google_project_iam_custom_role.ade_role.role_id}"
  members = [
    "serviceAccount:${google_service_account.ade_account.email}"
  ]
}

#######################
# STORAGE BUCKET ACCESS
#######################
resource "google_storage_bucket_iam_binding" "databucket_binding" {
  bucket = var.data_bucket_name
  role   = "roles/storage.objectViewer"
  members = [
    "serviceAccount:${google_service_account.ade_account.email}"
  ]
}

#######################
# WORKLOAD IDENTITY FEDERATION
#######################
resource "google_iam_workload_identity_pool" "pool" {
  workload_identity_pool_id = "wip-${var.bigquery_app}-${var.environment}"
  description               = "Identity Pool for Agile Data Engine AWS"
  project                   = var.project
}

resource "google_iam_workload_identity_pool_provider" "aws_ade_provider" {
  workload_identity_pool_id          = google_iam_workload_identity_pool.pool.workload_identity_pool_id
  workload_identity_pool_provider_id = "aws-ade-provider"
  project                            = var.project
  disabled                           = false

  attribute_mapping = {
    "google.subject"        = "assertion.arn.contains(\"-ManagerService\") ? assertion.arn.split(\"-ManagerService\")[0] : assertion.arn.split(\"-DaggerService\")[0]"
    "attribute.environment" = "assertion.arn.contains(\"-ManagerService\") ? assertion.arn.split(\":assumed-role/\")[1].split(\"-ManagerService\")[0] : assertion.arn.split(\":assumed-role/\")[1].split(\"-DaggerService\")[0]"
  }

  aws {
    account_id = var.aws_account_id
  }
}

resource "google_service_account_iam_member" "ade_sa_impersonation" {
  service_account_id = google_service_account.ade_account.name
  role               = "roles/iam.workloadIdentityUser"
  member             = "principalSet://iam.googleapis.com/projects/${var.project_number}/locations/global/workloadIdentityPools/${google_iam_workload_identity_pool.pool.workload_identity_pool_id}/attribute.environment/${upper(var.environment)}"
}

Restrict access by IP address

It is strongly recommended to restrict BigQuery access to trusted IP addresses only. This can be achieved in Google Cloud using VPC Service Controls combined with Context-Aware Access, which enforces network-level security at the project or service level. See details in Google documentation.

Agile Data Engine uses a tenant-specific outbound IP address, which you can request via the ADE Support Portal.

If your organization requires private connectivity, see Private Connectivity and VPN.