Transforming your data

In this guide we'll help you how to add fields, aggregating instances, joining, merging and ordering datasets.

Adding fields to a dataset

If you need to create new fields (i.e., feature engineering), the ML module allows you to do it using common operations over your existing data, or writing custom operations with Flatline formulas.

To start, access the configuration option menu and select ADD FIELDS.

Responsive image

This leads you to a configuration panel for adding fields, where you can add a name for the new fields, decide which operation you wish to apply, and select the field you will use to generate the new one.

Responsive image

To know in details about all the add fields possibilities as discretization, replacing missing values,normalizing and many other, please check the subsections 8.1 of this documentation, it will help you define each of the operations you can apply to an existing field to create a new one.

Adding fields to a dataset

The aggregating instances option allows you to group the rows of a dataset by a given field.

The example above can be easily executed in the Data Intelligence module by following these steps:

Find the AGGREGATE INSTANCES option in the dataset configuration menu.

Responsive image

When the configuration panel has been displayed, select a field to aggregate your instances. You can select any type of field (numeric, categorical, text or datetime fields) and your instances will be grouped by the unique values of this field. In this case, we select “CustomerID” because we want a dataset with one row per customer.

Responsive image

The next steps can be found in the Subsection 8.2 of this documentation.

Joining datasets

It is very common to have the data scattered in two or more different datasets. Our module allows you to join several datasets to combine their fields and instances based on one or more related fields.

First of all, you need to find both sources in the Data Intelligence module and create datasets from each source. When the datasets are created, find the JOIN DATASETS option in the dataset configuration menu.

Responsive image

This option will display the join configuration panel in which you need to input parameters.
Please, follow the next steps in the subsection 8.3 of this documentation to know more about the configurations of the parameters

Merging datasets

In case you have instances in different datasets and you want to merge them all into one single dataset, you can do it using the merging datasets option.

From one of the datasets, open the CONFIGURE DATASET menu. By convention, this first dataset defines the final dataset fields. All datasets should have the same field names and IDs.
If this first dataset has fields not found in the other datasets, the merge will give an error. However, if the other datasets have some fields that are not found in the first dataset, you can still excute the merge and these fields will be dropped from the final dataset.

When the datasets are created, find the JOIN DATASETS option in the dataset configuration menu.

Responsive image

Select the datasets you want to merge.

Check the nexts steps here in the subsection 8.4 to continue your merging.

Ordering datasets

The ordering instances option allows you to sort the rows of a dataset by one or more selected fields in ascending or descending order. The instances will be sorted first by the first selected field, then by the second field, and so on. You can select up to 8 different sorting fields.This option is very useful for time series, when you have a dataset containing a date field and you need to sort your instances chronologically. Please, check the following steps:

From the dataset view, click on the ORDER INSTANCES menu option;

Responsive image Responsive image

A new dataset will be created with the sorted instances. You can see the confirmation message on top of the dataset view in blue color.