Authentication
The ETL Watcher SDK handles authentication automatically when deployed in cloud environments. The Watcher framework itself doesn’t require authentication - instead, authentication is handled by the cloud provider (GCP, Azure, AWS) when your application is running in their managed services.
Overview
The SDK automatically detects your cloud environment and configures the appropriate authentication method. You can also explicitly provide authentication credentials if needed.
Supported Cloud Environments
Google Cloud Platform (GCP): For GKE and Cloud Run deployments
Microsoft Azure: For AKS and Container Instances
Amazon Web Services (AWS): For EKS and ECS deployments
Local Development: No authentication required
Basic Usage
The SDK automatically handles authentication when deployed in cloud environments:
from watcher import Watcher, PipelineConfig, Pipeline
# Auto-detect cloud environment and configure authentication
watcher = Watcher("https://api.watcher.com")
# Use with OrchestratedETL
pipeline_config = PipelineConfig(
pipeline=Pipeline(name="my-pipeline", pipeline_type_name="etl"),
default_watermark="2024-01-01"
)
from watcher import OrchestratedETL
etl = OrchestratedETL("https://api.watcher.com", pipeline_config)
Explicit Authentication Configuration
For local development or custom deployments, you can provide explicit credentials:
from watcher import Watcher
# Bearer token for custom authentication
watcher = Watcher("https://api.watcher.com", "your-bearer-token")
# GCP service account file for local development
watcher = Watcher("https://api.watcher.com", "/path/to/service-account.json")
Cloud-Specific Configuration
The SDK automatically detects and handles authentication for each cloud provider:
Google Cloud Platform (GCP) - Uses metadata server for GKE/Cloud Run deployments - Falls back to service account key files when provided
Microsoft Azure - Uses managed identity for AKS/Container Instances - Falls back to service principal credentials
Amazon Web Services (AWS) - Uses IAM roles for EKS/ECS deployments - Falls back to environment variables
Local Development - No authentication required - Can provide bearer token or service account file for testing
Environment Variables
The SDK can automatically detect cloud environments based on environment variables:
GCP:
- GOOGLE_APPLICATION_CREDENTIALS: Path to service account key file
Azure:
- AZURE_TENANT_ID: Azure tenant ID
- AZURE_CLIENT_ID: Azure client ID
- AZURE_CLIENT_SECRET: Azure client secret
AWS:
- AWS_ACCESS_KEY_ID: AWS access key ID
- AWS_SECRET_ACCESS_KEY: AWS secret access key
- AWS_SESSION_TOKEN: AWS session token (optional)
- AWS_REGION: AWS region
Installation with Cloud Dependencies
To use cloud-specific authentication, install the appropriate optional dependencies:
# Install all cloud dependencies
pip install etl-watcher-sdk[cloud]
# Install specific cloud provider dependencies
pip install etl-watcher-sdk[gcp] # For Google Cloud Platform
pip install etl-watcher-sdk[azure] # For Microsoft Azure
pip install etl-watcher-sdk[aws] # For Amazon Web Services
Orchestration Integration
The authentication system works seamlessly with orchestration frameworks:
from watcher import OrchestratedETL, PipelineConfig, Pipeline
# Configure pipeline
pipeline_config = PipelineConfig(
pipeline=Pipeline(name="cloud-etl", pipeline_type_name="etl"),
default_watermark="2024-01-01"
)
# Auto-detect authentication
etl = OrchestratedETL("https://api.watcher.com", pipeline_config)
# Or provide explicit credentials
etl = OrchestratedETL("https://api.watcher.com", pipeline_config, "/path/to/service-account.json")
Error Handling
Authentication errors are raised as AuthenticationError exceptions:
from watcher import AuthenticationError, Watcher
try:
watcher = Watcher("https://api.watcher.com", "/path/to/service-account.json")
except AuthenticationError as e:
print(f"Authentication failed: {e}")
Common Issues
Missing Dependencies: Ensure you have installed the appropriate cloud dependencies.
Incorrect Credentials: Verify that your service account or credentials have the necessary permissions.
Network Access: Ensure your deployment can access the cloud metadata servers or authentication endpoints.
Token Expiry: The SDK automatically refreshes tokens, but ensure your credentials are valid.
Best Practices
Use Auto-Detection: Let the SDK auto-detect your cloud environment when possible.
Secure Credentials: Store sensitive credentials in environment variables or secure key management systems.
Minimal Permissions: Use service accounts with minimal required permissions.
Token Refresh: The SDK handles token refresh automatically, but monitor for authentication failures.
Error Handling: Always handle
AuthenticationErrorexceptions in your code.
Example: Complete Cloud Deployment
Here’s a complete example for a cloud deployment:
from watcher import OrchestratedETL, PipelineConfig, Pipeline
def my_etl_function(watcher_context):
# Your ETL logic here
return ETLResult(completed_successfully=True, inserts=100)
# Configure pipeline
pipeline_config = PipelineConfig(
pipeline=Pipeline(name="cloud-etl", pipeline_type_name="etl"),
default_watermark="2024-01-01"
)
# Auto-detect and configure authentication
etl = OrchestratedETL("https://api.watcher.com", pipeline_config)
# Execute ETL with automatic authentication
result = etl.execute_etl(my_etl_function)
print(f"Execution completed: {result.result.completed_successfully}")
This example will automatically detect whether you’re running on GCP, Azure, or AWS and configure the appropriate authentication method.