ETL or ESB – which tool should you choose for your next migration?

ETL or ESB – which tool should you choose for your next migration?

Historically, there have always been use cases where we need to transport data from one system to another or multiple other systems. The data being loaded usually also has to be transformed and transported over various protocols. Sometimes we may need real-time integration, while in other cases, we need to do heavy data transformations and calculations. 

There are two kind of tools that solve these problems, each from it’s own perspective and with it’s own set of advantages and disadvantages. These are ETL and ESB. There is a very good real-life analogy that illustrates differences between these two.

DOMINO’S PIZZA

So how does domino’s work?
They prepare their dough, veggies, spices etc. in their factories. They then deliver these materials to their restaurants once or twice a day using giant trucks. The restaurants usually have their delivery bikes or cars to deliver the order in smaller chunks to the end customer that called in to get pizza.

The daily truck delivery is what ETL does. Scheduled movement of large data volumes from source to destination. In the data movement scenario, we must also cater to different protocols, integration methods and data transformation, but the core operation of moving data in a large, scheduled manner is the same.The bike and car pizza delivery is representative of ESB. Delivery of small chunks of data to the needy system in near real-time manner.

It would not make sense to use a large truck to deliver pizzas only once or twice per day or to use smaller cars or bikes to deliver supplies. That is why it is important to pick the right tool for the job.

Common functions both ETL and ESB provide

Regardless of your requirements and which tool is preferable for the job, there are common functions that both need to fulfill in order to be used effectively. When considering which tool to choose, make sure the available ones can fulfill these.

Function

Description

Orchestration

Composing several existing fine-grained components into a single higher order composite service. This can be done to achieve appropriate “granularity” of services, sequence of data transfers and promote reuse and manageability of the underlying components.

Transformation

Data transformation between specific data formats required. An example of this would be transforming between CSV, XML, JSON,… 

Transportation

Transport protocol negotiation between multiple formats (such as HTTP, JMS, JDBC). 

Mediation

Providing multiple interfaces to the same component to allow for multiple channels or for backwards compatibility.

Non-functional consistency

This includes security, error-handling, monitoring policy, etc.

Except these common functions, there are also specific treats and characteristics of each tool that make it the preferred choice in certain situations.

ETL

ESB

On-demand data transfer

Real-time data transfer

Operates on batches of operations

Operates on Single-business transaction level

ETL cannot time-out, delay, or issue transactions to front-office applications during transformation processes.

ESB is capable of timing and delaying data in queues, escalating information to the right receiver for that piece of content.

Can transfer historical records

Deals only with current records

 

The simplest way to make a decision on which tool to use can be summarized in the diagram below.

 

Representative tools in each category 

The decision may not be as simple as represented by the infographic in some cases though. That is why I am going to dive deeper into the tools available to show their strong and weak sides.

Both ETL and ESB categories have some a large number of competitor tools, but I am going to focus on one tool in each and describe their features and the way they can compete and compliment one another.

 

ETL – Talend

Talend is an open-source ETL data integration tool which anyone can download and use. No license is needed. It runs on Java and has a large amount of pre-built components which can connect to SAP, Oracle, XML, Salesforce and others. The data can then be processed and transformed using mapping, filtering, sorting, duplicate removal and also enables the user to build his own custom components using Java. It also has an App Store where users have contributed components. Talend projects are split into jobs. Each job can consist of one or multiple sub-jobs which are built using components which can be connected to represent the data flowing from one end to another. The most common sub-job consists of three component types: data reader component, data mapping and transformation components, and data writer component. There are also helper components for error-handling, logging, proxy connection,…

One thing Talend does not support is real-time integration. It is purely an ETL tool so it can not react to events in the source system. On the other hand, it can handle very large volumes of data.

Image 1. Talend UI

 

Main features

Talend is an open-source data integration tool (with the full suite, ESB, MDM, BPM, DQ).

It uses a code-generating approach. Uses a GUI, but within Eclipse RC, with an intuitive use

Very large community, and more than 800 connectors (the biggest connectors library)

It has the biggest ETL community and many finance companies and investors supporting it.

It generates java code which you later run on your server / deploy using manual or automatic

It has data quality features: from its own GUI, writing more customized SQL queries and Java.

It can run on remote and on local and the jobs can be used as java executable jars independently

It has a on premise and Cloud version

Its mature and up to date on Big data technologies (i.e Spark, Hive, AWS)

Fairly priced and has subscription model independent from your project size.

 

ESB – Mulesoft

Mulesoft supports integration between multiple systems by creating flows of data. The flow starts when an event is observed in the source system. The event can be a new record in the database, a new user or some other action. This event is then processed by reader components and forwarded to processing components which can map, filter, or otherwise transform the data and in the end can forward it to writer component.The data flows between the system in messages. Each Mule message has the same structure shown below.

Image 2. Mulesoft message structure

Mulesoft also supports data transformation and can handle various protocols. Additional features that make Mulesoft stand out from other systems are the abilities to create and host ESB Services and integrate with continuous build using Maven. The fact that Salesforce bought the company also means that there might be new features added that could improve it even further.

Image 3. Mulesoft UI

 

Main features

Lightweight Java-based ESB platform

Enables easy integration of systems, regardless of the different technologies the applications use

Loosely coupled, highly scalable, and robust

Provides a large library of component which enables significant component reuse, has a component store

Supports creation and hosting of  reusable services using ESB service container

Has an open-source (Community Edition) and an enterprise edition license

Can respond to user events like consumer request froma mobile device, data change in database, creation of a new customer ID in a SaaS application

Can integratie with Maven for Continuous build processes

Has a graphical UI to design flows

Supported and developed by Salesforce

 

Future direction

In the past, the lines between these two different middleware functions were quite clear. Organizations would simply have one tool for Application integration and another tool for bulk data loading.  This however can be frustrating and confusing because both ETL and ESB share so many of the same basic data movement requirements, namely the ability to deliver data through a structured process. In the data world we’re not talking about trucks and bicycles with physical world limitations, we’re talking about data and software so in the back of everybody’s mind, we’ve always known that these two functions could reside in the same place. 

As you can see, many of the features are shared among both Mulesoft and Talend as group representatives. Mulesoft as an ESB tool and other modern ESB tools have started to catch up with Talend and other ETL tools and they are now being built to fulfill these requirements as well.

The market is coming to the point of delivering true Multi-Latency and Multi-Volume with tools that can cover both functions. The organizations should not rush to replace their current environments just yet, but when defining a new project, consideration for latency flexibility and transformation flexibility, in the same platform, should be a priority moving into the future.

Author: Vedran Pavlic, Senior Salesforce Technical Architect at PolSource