antFarm is composed of two main components:
Execution engine installed on a local environment. This is the place where the ant colony resides and works to process data as fast as possible.
Central meta data repository that is created on the target destination. This is what we call the Queen Ant residence, where she can find everything she needs to successfully manage working ants.
High level architecture
antFarm supports different target destinations
For bigger enterprises or group of companies, there is a possibility to have distributed execution engines with single central repository.
Scalable parallel execution
How does antFarm work?
After you have installed antFarm, the first step is to establish connections to your data sources and to the target destination. antFarm uses named connection strings to databases and filesystems.
To define the data load execution logic, you first need to configure the data processing flow. Data processing flow is composed of different steps, e.g. extract, truncate, load and process. You can define as many steps as you like.
In addition, we have a parametrized workflow that takes care of the process standardization and dictates data processing flow behaviour.
Within the workflow you assign various operations to each step. In general, there are two types of operations:
For example, a load step could be composed of put and copy operations.
During the data processing flow each step is assigned one or multiple number of workers. Workers, we call them ants, are related to hardware resources. The more resources you have, the more workers can be activated, all resulting in faster data processing.
Based on meta data defined in the central repository for each data processing flow step, separate queues are populated with tasks. Each task is than processed by the working ant according to the workflow settings.
antFarm updates queued tasks with start and end times. Gathered data is available in predefined reports. One of them displays loads execution times.
If you are not satisfied with execution times, one thing you can do is table partitioning. This way antFarm generates multiple tasks and extracted CSV files for a single table in order to achieve the best data load performance (parallel execution).
An additional option is, as mentioned, to scale the hardware.
IN A SINGLE DAY YOU CAN
Lift 'n' shift data from 100+ on-premise tables to the target destination.
Take remote training and gain knowledge of how antFarm works.
Install and configure antFarm.
Benefits and Key Features
Easy to use
Whole data movement processing is defined with standard SQL syntax.
Data is imported into the destination in batches.
antFarm was developed for speed and efficiency.
The more hardware resources you add, the faster execution you get.
Detailed configurable logging and reporting are available out-of-the box.
Streaming for processing and synchronization.
Target tables, data types conversion and ETL queries are automatically generated.
Completely open solution
antFarm can be integrated into any data integration tool.
Any custom process
You can run any kind of SQL or Python processes (operations).
Out-of-the box support
We’re constantly growing the list of supported data sources and target destinations. If you need to access data from a source which isn’t currently supported, please get in touch. antFarm can be easily extended.
data warehouse destinations
Implementing extract / staging area of the data warehouse
Replicating data from the legacy data warehouse to the cloud in order to reduce on-premises infrastructure costs and enjoy compelling performance of complex analytical queries
Replicating data from operational systems to analyse data near real time without any negative performance influences on the transactional application
Updating development environment with data from a testing or production environment
Importing data from CSV files to the database
Using it as a backbone for Snowflake POC projects
Petrol uses antFarm to move data from various data sources to the Snowflake staging database, where it is further transformed by DataMerlin - the data warehouse automation platform.
Triglav Group uses antFarm to extract data from their group companies using different operational systems and to load it to the IBM Netezza staging area that is used as an appliance for their central data warehouse (IBM Insurance Information Warehouse).
Modra Zavarovalnica wanted to use the same solution to extract data from on-premises and cloud data sources to the staging area of their corporate data warehouse
Nova KBM uses antFarm to synchronise their enterprise atomic data warehouse to the IBM Netezza in order to provide an accelerated reporting database for all their ad-hoc users and for advanced analysis for risk and regulatory reporting.