
How does Fivetran integrate with AWS Lambda?

What is AWS Lambda used for?
AWS Lambda is Amazon's event-driven, serverless compute service. It is used to run code in response to an event or a series of events (event-driven) without procuring and managing servers or containers (serverless).
AWS Lambda allows users to add custom functions or connectors to AWS resources such as DynamoDB tables and S3 buckets allowing you to efficiently compute data as it moves or enters the cloud.
AWS Lambda supports several languages using runtimes. It guarantees capacity provisions, automatic scaling, and configured logging. Besides, it is easy to configure and use.
What is Fivetran?
Fivetran is a reliable, automated, production-grade ELT solution. It is popular in data warehousing for moving data into, out of, or across cloud data platforms such as Google Cloud Platform, IBM Bluemix, and Microsoft Azure. Fivetran is a poular ETL tool for loading data into Google BigQuery, Snowflake, PostgreSQL, or SQL Server.
How does Fivetran integrate with AWS Lambda?
Makes everything easier for you
Fivetran can integrate with AWS Lambda to allow you to set up custom functions that can respond to Lambda triggers in your Fivetran data pipeline. This integration enables you to perform custom validations, transformations, and manipulations on data before you load it into your data warehouse.
Improve online fluency
You can write the Lambda functions in your chosen code, including Java, JSON, Node.js, or Python. You can customize these functions to respond to triggers, such as adding an item to your tableau.
Does Fivetran run on AWS?
Yes. Fivetran runs on AWS.
It uses Amazon Web Services as its infrastructure vendor. It uses an array of AWS services, such as Lambda, EC2, and S3, to build a scalable and secure data integration platform. Moreover, you can integrate it with Snowflake and Redshift to load and store data in the data warehouses.
Using AWS Lambda to Create a Custom Data Source
Create a Lambda function
The first step is to write code in any language supported by Amazon Web Services. The code can be in JSON, Python, Java, Node.js, or any programming language.
Create an S3 bucket
You need an Amazon S3 bucket to store your data. So, create one.
Create an IAM role
Don’t use your primary AWS account credentials for creating custom data sources. Therefore, you should create an Identity and Access Management role and configure it with sufficient permissions. Ensure you have permission to read data from the Amazon S3 bucket and execute Lambda functions.
Create and configure an Amazon QuickSight dataset
This step requires you to use the Lambda function and the IAM role you created in the previous steps. You will use both as the data source for the Amazon QuickSight data se.
Create a visual
Finally, create visualizations once your data is successfully loaded by leveraging the Lambda connector. You can try out the integration with sample functions.
Building your own Custom Connector with Fivetran
Fivetran allows you to build custom Fivetran connectors to integrate data from sources not natively supported by the platform.
Define the data structure
- Define the data structure of the source you want to connect to. You will also need to map it to the desired schema in your data warehouse or data lake.
Write the connector code
- Write the code to extract data from the source and transform it into the desired format. You can use any programming language supported by the source API, such as Python or Java. You may need to host your code on GitHub.
Package the connector
- Package the connector code and dependencies into a deployable artifact. For example, you can package it as a Docker image.
Deploy the connector
- Deploy the connector to a hosting environment accessible by Fivetran. You can use a cloud-based environment, such as Amazon Web Services (AWS), Google Cloud Platform (GCP), or an on-premise environment.
Configure the connector
- Configure the connector in Fivetran and test it to ensure it works correctly.
Configure the connector
- Integrate the connector into your data pipeline and schedule data sync to run at the desired frequency.