In the current past with the boost of tools like Tableau, QlikView or PowerBI, I see a lot of questions about Talend ETL on StackExchange and how to use it to map to dimensions and fact tables. Is that really feasible?
Difference in layers
First of all, if you have worked with Tableau, PowerBI or QlikView before, chances are good you are used to this very straightforward approach:
- Get access to data
- Load data
- Connect data in visual designer on-the-fly
- Use this to present the data (in charts, reports, etc.)
Meaning, you process the data as you go along.
This is simply not possible in Talend (Data Integration, Community Edition, and so on) and not the focus of the tool. Talend provides very strong and flexible components to extract, transform and load (say: ETL) data into other databases, tables or files.
But this always means that when working with Talend you will materialize data. Meaning: With the Self-Service tools mentioned above you will always dive directly into the data. But when using Talend (or Pentaho Kettle, Microsoft SSIS etc.), you always have to store the data first and then use a presentation tool “above” this data (for charts, reports, etc.).
This is not a part of Talend ETL. It is not even the idea behind it. Talend is very strong for data preparation and integration.
If you use Talend trying to achieve the same results like Tableau etc., this won’t be possible.
There is a new product on the market called Talend Data Preparation, which can handle data transformation on the fly. This might be closer to the use case to be able to inspect data on-the-fly than the more traditional way involving some kind of storage with Talend ETL. This might be a lot more like the other presentation layer software solutions on the market.
But to “view” a star schema with Talend, that is not possible. Creating all the tables (fact tables, dimension tables) for it is exactly what this tool is about, though.
I am not involved in any way with Talend, just an ordinary user.