Why have we developed antFarm?

Most vendors are focused on cloud-to-cloud data synchronization. Then again, most of the enterprise data workloads (90%) are still on-premises. Extracting data from on-premises sources is addressed with many challenges, such as:

  • Limited and short time frames in which data can be accessed and extracted, because operational systems are performance sensitive.
  • On-premises applications generate enormous quantities of data on a daily or even hourly basis.
  • Each data source has its own specifics, e.g. IBM DB2 mainframe, SAP Hana, Microsoft SQL Server.
  • Security related challenges for accessing the data, e.g. VPN are very common.

Architecture

antFarm is composed of two main components:

Execution engine installed on a local environment. This is the place where the ant colony resides and works to process data as fast as possible.

Central meta data repository that is created on the target destination. This is what we call the Queen Ant residence, where she can find everything she needs to successfully manage working ants.

High level architecture

CLOUD ENVIRONMENT

LOCAL ENVIRONMENT

antFarm supports different target destinations

For bigger enterprises or group of companies, there is a possibility to have distributed execution engines with single central repository.

Scalable parallel execution

How does antFarm work?

Environment setup

After you have installed antFarm, the first step is to establish connections to your data sources and to the target destination. antFarm uses named connection strings to databases and filesystems.

Data load preparation

AntFarm automatically:

  • Retrieves the list of tables and its definition from the data source catalogue.
  • Creates a meta data repository where the definitions of application sources, table lists, optimization rules, such as partitions, etc., are stored. With this information the Queen Ant can successfully manage working ants.
  • Creates target tables according to the source table definitions.
  • Converts data types based on the source and target databases, if necessary.
  • Generates SQL queries to shift 'n' lift the data.

Configurable execution

To define the data load execution logic, you first need to configure the data processing flow. Data processing flow is composed of different steps, e.g. extract, truncate, load and process. You can define as many steps as you like.


In addition, we have a parametrized workflow that takes care of the process standardization and dictates data processing flow behaviour.


Within the workflow you assign various operations to each step. In general, there are two types of operations:


  • extract operation that reads the data from the source and writes it to the buffer - CSV file
  • and database processing operations such as truncate, delete, copy, put and process.

For example, a load step could be composed of put and copy operations.


During the data processing flow each step is assigned one or multiple number of workers. Workers, we call them ants, are related to hardware resources. The more resources you have, the more workers can be activated, all resulting in faster data processing.


Based on meta data defined in the central repository for each data processing flow step, separate queues are populated with tasks. Each task is than processed by the working ant according to the workflow settings.


Optimization and table partitioning

antFarm updates queued tasks with start and end times. Gathered data is available in predefined reports. One of them displays loads execution times.
If you are not satisfied with execution times, one thing you can do is table partitioning. This way antFarm generates multiple tasks and extracted CSV files for a single table in order to achieve the best data load performance (parallel execution).
An additional option is, as mentioned, to scale the hardware.

Benefits and Key Features

Easy to use

Whole data movement processing is defined with standard SQL syntax.

Bulk loads

Data is imported into the destination in batches.

Parallel execution

antFarm was developed for speed and efficiency.

Scalability

The more hardware resources you add, the faster execution you get.

Logging

Detailed configurable logging and reporting are available out-of-the box.

Serial execution

Streaming for processing and synchronization.

Automation

Target tables, data types conversion and ETL queries are automatically generated.

Completely open solution

antFarm can be integrated into any data integration tool.

Any custom process

You can run any kind of SQL or Python processes (operations).

Out-of-the box support

We’re constantly growing the list of supported data sources and target destinations. If you need to access data from a source which isn’t currently supported, please get in touch. antFarm can be easily extended.

data sources

data warehouse destinations

References

Retail, Energy

Petrol, a sustainable energy company, is one of the largest and best-known Slovenian companies. Petrol’s retail network comprises 500 modern service stations across south-eastern Europe.

Petrol uses antFarm to move data from various data sources to the Snowflake staging database, where it is further transformed by DataMerlin - the data warehouse automation platform.

Insurance

Triglav Group is the leading insurance- financial group in the Adria region and one of the leading groups in South-East Europe.

Triglav Group uses antFarm to extract data from their group companies using different operational systems and to load it to the IBM Netezza staging area that is used as an appliance for their central data warehouse (IBM Insurance Information Warehouse).

Insurance

Modra zavarovalnica is one of the most important providers of supplementary pension insurances in Slovenia.

Modra Zavarovalnica wanted to use the same solution to extract data from on-premises and cloud data sources to the staging area of their corporate data warehouse

Banking

Nova KBM is a universal bank with the longest, more than 150-year tradition of banking in Slovenia owned by the international financial fund Apollo Global Management, L.L.C.

Nova KBM uses antFarm to synchronise their enterprise atomic data warehouse to the IBM Netezza in order to provide an accelerated reporting database for all their ad-hoc users and for advanced analysis for risk and regulatory reporting.

Do you need more than the lift 'n' shift data migration solution?

Check out our data warehouse automation solution DataMerlin that makes data warehouse implementation 10-times faster and related TCO 10-times lower.