Skip to main content

Input files format and data validation

Input files#

In our ETLs, we integrate CSV flat files that will feed your Actito licence Data Model.

Compression#

Those files you'll provide should be compressed using ZIP or GZIP compression, or if not possible not compressed.
While multiple CSV files can be provided in the same ZIP archive, GZIP can compress one file at a time only.
Compressed file names and CSV file names can be defined separately, and there is no constraint on file extension usage, providing full flexibility when coming to naming files to retrieve.

CSV format#

CSV files are flat files that enforce the CSV file RFC. Anyway, in Actito ETLs - like in most CSV integration systems - the comma separator can be replaced by any single char.
Pay great care to the formatting of those CSV files. It is a common issue to have badly quoted or enclosed column data that causes the corruption of the file and an incapacity to parse it.
Ensure that all columns that should contain the separator char are correctly enclosed and that separator chars inside a cell are correctly quoted.
Pay attention to columns that contain free text in which you can often find some carriage return or line feed characters that should equally be enclosed to avoid the parsing to consider those as new CSV line separator.
A valid CSV is even a file in which all lines contain the exact same number of columns and that number is to be exactly the same than in the first line which is expected to contain the column headers.
Actito ETLs always expect to find the column headers in this first line (so as not to be forced to base on the order of the columns to map the data to the data model table attributes.

The definition of an ETL allows you to define those separator/enclosing/quoting characters.

File format vs file name extension

Note that there is no constraint among the extension of the file to retrieve.
Even if Actito ETLs only deal with CSV formatted files, those can be named like myfile.txt for instance.

Encoding#

Actito takes in charge the following encodings :

  • UTF-8
  • ISO-8859-1

You'll have to declare this encoding in all ETL definition, and overall ensure that the provided files are indeed compatible with this declared encoding.
A common issue is the presence of the BOM leading character in the file. UTF-8 WITH BOM or UTF-16 WITH BOM are not allowed in Actito ETLs.


Data formatting#

When mapping a CSV file column to an attribute of an Actito Data Model table, it is mandatory to provide data in a format that is compatible with the type defined in the table structure definition. i.e. 'johnsmith' can not be integrated in an INTEGER attribute.


Hereby follow the representation patterns that fit to every type of attribute you can find in a profile or custom table :

Raw data#

  • String

    • Max 255 characters (unless specifically defined in the attribute definition)
    • Should fit the optional REGEXP (can be defined in the attribute definition)
  • Numeric

    • Should contain integers, longs or decimals.
    • Negative numbers should be prefixed with the dash - char.
    • Positive numbers should not be prefixed with the + char.
    • Decimal separator should be the . char.
  • Boolean

    • Should be true or false.
  • Date

    • Should be formatted with the yyyy-MM-dd pattern.
  • Date-time

    • Should be formatted with the yyyy-MM-dd hh:mm:ss pattern.
Empty values

When UPDATING existing records, empty values in the CSV file mean that the value currently found in DB will be voided, unless the ignoreEmptyValues parameter of the ETL is set to true (in which case it will be ignored).
However, providing an empty value for a boolean field (including subscriptions) will not remove the current value, at is always must be true or false.

Standard Actito attributes#

  • E-mail address :

  • Phone number :

    • Should only contain those characters : +().- /0123456789.
    • Should be provided with the international country prefix.
    • Ex: +3210458514
  • Mother language :

    • ISO 639-1 format (2 chars).
    • Ex: FR
  • Country :

    • ISO 3166-1 alpha-2 format (2 chars).
    • Ex: BE
  • Sex :

    • Should be one of M (male) or F (female).
  • Person title :

    • Should be one of Mr, Mrs or Ms.
  • UTM Coordinates :

    • WSG 84 format, separator between X (latitude) and Y (longitude) should by |, decimal separator should be ..
    • Ex : 4.610927|50.675338
tip

If your system can not extract data with above formats, you can define data transformations so as the ETL can apply them on your extracted raw data before integrating into Actito data model.
Check Transform section from more information on available transformations.

Subscriptions#

Subscriptions are a specific type of attributes in a Profile table that represent the opt-in preferences of an individual. In the CSV file, a column is needed for each subscription.

  • The expected value for subscription is a Boolean with "true" and "false" as possible values.
  • In the "attributesMapping" parameter of the ETL, the "attributeName" value should be subscriptions#xxxx where xxxx is the name of the subscription.

Segmentations#

Segmentations are a specific type of attributes in a Profile table that represent the business categories an individual can be part of. A segmentation can be 'simple' or 'exclusive'.
In the CSV file, a column is needed for each segmentation.

  • For simple segmentations, the expected value is "Member" if the profile belongs to the segmentation and an empty value if not.
  • For exclusive segmentations, the expected value is the name of the segment sub-category in which the profile must be inserted. If the segmentation is not mandatory, an empty field indicates that the profile should not be put in any category of the segmentation.
  • In the "attributesMapping" parameter of the ETL, the "attributeName" value should be S_xxxx where xxxx is the name of the segmentation.

Multi-value attributes#

Multi-value attributes are fields of a Profile table that can hold several distinct values at once. For example, "hobbies" : "football, hockey, cycling".

  • The separator , must always be used between the different values. It cannot be customized.
  • If there are multi-value attributes in your ETL, make sure to use another character as main separator between columns.
  • , is never allowed in the values of the multi-value attributes (even when escaped).