Standard Data IN flows (aka ETLs)
In ACTITO eco-system we call ETL our standard flat files exchange flows that aim to feed ACTITO Standard Data model.
At ACTITO, ETL stands for Extract -> Transfer -> Load, the E represents the data extraction from your own system, the T stands for the extracted file deposit on a file server shared between you and ACTITO, and the L is the flat file data import into your ACTITO licence data model.
An ACTITO ETL is :
- standardized, in order to ease mapping and validation between your system’s data and ACTITO data model.
- automated, so as to say that you have no manual operation to launch.
- scheduled, stating that ACTITO will run the ETL at the frequency and time you specify.
- monitored, enforcing that you can be notified of any execution’s result.
For each table you wish to feed, you should provide a single compressed CSV flat file in which each line represents a row to import into an ACTITO destination table, and each column is mapped to an attribute of this table.
You should always take care to extract only delta data. As it has huge impact on import performance, avoid to make full table extracts except for initial setup. Imported files are anyway limited to 100 Mb.
The file transfer must be done by using a remote file server that should be declared by defining a file transfer server configuration and at least one location (i.e. a folder on the file server) via our DATA API V5.
That location will have to be referenced within the ETL definition, and ACTITO will try to find the specified CSV file on that server and location each time the ETL is executed.
The loading process to your ACTITO Data Model is composed of two steps :
The validation steps will first check that the provided file is correctly formated (check CSV flat file RFC).
If correct, the validation will then check the format integrity of the provided data.
For example, to be able to write to the
birthDate attribute of a profile table, ACTITO ensures the mapped CSV column contains a valid date with pattern
yyyy-MM-dd; to be able to write a monetary amount to the
total attribute of your Sales custom table, ACTITO ensures that the column contains a valid number with
. decimal separator etc.
For each ETL, you can define the data operation that ACTITO should apply on your ACTITO Data Model. You can therefore declare that rows should be created and/or updated, or deleted.
Automate and schedule
Once the file retrieving, parsing and loading protocol is defined, you can activate your ETL.
ACTITO enables you to specify a frequency (at most once a day) and at specified fire time, an activated ETL will run and load the data from your file.
At any time, you can deactivate an ETL, what would result in no execution during deactivation period (for instance, if you have maintenance or upgrades to apply on your extract process and that result in a few days delay before being able to load data again).
Each time the ETL runs, it creates an execution. The result of an execution indicates :
- when the execution occured
- what is the global status of the execution
- what is the detail of the validation and write steps
An execution can globally fail when the file could not be retrieved (inaccessible file server, expected file not found) or when retrieved file is not a valid CSV file, contains bad headers etc.
If file correctly retrieved and format is valid, you should anyway have rows that are invalid : bad format of cell value regarding the corresponding table attribute (bad format, or wrong value if attribute has restrictions on length, or possible values etc.)
In the end you can have errors while writing the data to the table : duplicate key if a file row contains a key that already exists in the table (
CREATE_ONLY mode), if a file row does not match an existing table row (
DELETE modes), if the number of lines in the table reached the limit etc.
The behavior of the ETL is that only valid rows are written to ACTITO. All other rows are rejects (check below for rejects management).
The execution result contains the counters for each step of the read/rejected/written rows.
An optional execution result report can be sent to a list of recipients.
An e-mail with all the information needed to check the status of the execution will be then sent to each of these recipients :
For each execution, when data validation and write steps detect wrong values or key integrity issues, ACTITO ETLs produces an error file that contains all rejected rows.
The rows in this error file are the original ones with two more columns referencing the invalid column name and the reject reason.
When defining your ETL you have to specify the file transfer server location where ACTITO will upload these reject files so as you can retrieve, process them and possibly correct the data for further re-integration (in next day file for instance).