Loading data from Amazon Redshift
On this page, you'll learn how to add a new Amazon Redshift data source to SlicingDice and create a new loading job using it to synchronize your source with your databases.
Whitelist SlicingDice's IPs on Redshift
Before loading your data from your Redshift source you'll need to whitelist our IP addresses, allowing the Data Loading & Preparation Module to access your data.
How to whitelist the Data Loading IPs in Redshift
First of all you should access your Amazon EC2 Service
panel and go to the
Security Groups section on the menu. Now,
follow the next steps to
configure a Redshift security group.
Create or edit a security group
- Select the VPC that Redshift is actually using
- On Inbound section, add 4 new rules (one rule for each IP address)
- For each rule, select the type option as Redshift
- For each rule, insert each of the following IPs (one IP for each rule):
The following image shows how your Security Group will look like:
Update your VPC security groups in your Redshift cluster
If you created a new security group, go to the Cluster section, then click on Cluster. Select the Modify cluster option and then select the new security group on VPC Security Groups. Finally, click on Modify.
Now SlicingDice's Data Loading & Preparation Module will be able to access your Redshift clusters as soon as you create a data source on SlicingDice. This is what we'll do next!
Add an Amazon Redshift Data Source
Before adding your Amazon Redshift data source on SlicingDice, you need to be logged in our Control Panel. Then, you need to go to the Data Sources page so we can start our tutorial.
How to add a Redshift data source on SlicingDice
Before creating your data source you'll need to get the following information that must be found on Amazon
- Your Amazon Redshift cluster endpoint
- The port which Redshift is running (default is 5439)
- Your Redshift username and password
- Your database name
Data Source setup
The first step is the configuration of your data source identification on SlicingDice. The following screen shows step 1.
Three fields will appear. Each field function is described at the table below.
Field Description Data Source Name The name of your data source. Can be edited at any time. Mandatory Data Source Labels/Tags Labels/tags you might want to associate to a source, in order to organize your sources. Can be edited at any time. Optional Data Source Description The description of your data source. Can be edited at any time. Optional
When ready, click on Save & Continue to go to Step 2.
Data Source Details
Below you can see an example of the information and credentials that you should provide so SlicingDice can be able to connect to your Amazon Redshift source.
Field Description Data Source Type The type of data source. In this case we're using Amazon Redshift. Server The Amazon Redshift cluster endpoint. Port The port where Redshift is running. The default port of Redshift is 5439. Username The username used to log in to Redshift. Password The password used to log in to Redshift. Database The name of your Redshift database.
You can test the connection by clicking on Test Connection. If everything goes ok you'll see a success message.
Now you can go to the next step clicking on Save & Continue
Here you'll see a summary of the configurations defined for this data source before you finally create it.
The following image shows an example of a confirmation screen, which the name of the data source is
If everything is ok, click on Submit and then you'll receive a success creation message.
Now you'll be able to find your new data source at the data sources list as you can see in the following image.
That's it! The next step is to load your Amazon Redshift data into SlicingDice creating a loading job.
Add a new loading job using Redshift source
Now all the connection configuration with your Amazon Redshift source is completed, so the next step is to create and execute a loading job using this Redshift Data Source you've configured on SlicingDice.
Here are the creation jobs guides for each type of loading job. Choose the most helpful for your use case:
Remember that while setting up your loading job, you should set the Redshift Data Source you've created as the Data Source to be used on this job.
- One-time loading job: The one-time loading job loads your data once, needing manual intervention to execute it. This loading job type is useful if you don't update your data frequently.
- Manual incremental loading job: The manual
incremental loading job loads your data on-demand when new data needs to be inserted in SlicingDice. You
need to manually start this loading job.
Differently from an one-time loading job, only new rows will be inserted in the database. Your dataset needs to have a timestamp column in order to use this loading job type.
- Automatic loading job: The automatic loading
your data frequently, specified by a predetermined time interval. You don't need to manually start this
loading job, as it executes automatically.
Your dataset needs to have a timestamp column in order to use this loading job type.