One thing that always and again gets my attention is date handling in ETL jobs, especially in Talend and when using file input. It seems to me that sometimes there is no clarity about the concept of date fields and formatting. This post tries to clear things up.
Have you ever been working with an internal or external customer who said he just needs this simple Excel into a report? When investigating the process, you find out that there are many undiscovered steps in this process? And you told them that this implementation might take a few days which in turn didn’t make them happy. Could this be avoided?
It could have been so easy. I have been using Eclipse for a long time when I was still developing Java application. Now I wanted to setup an IDE for Python and to challenge me, I would try a different one – namely Eric. Oh boy, this was a bit more complicated. Took me actually a few hours to find out.
In a few tutorials which I tried was a very specific dataset being used – the NYC 311 calls dataset. 311 calls in NYC are some sort of support calls to easen the caller rate on the emergency number 911.
There are a lot of people calling this number over the time. The city of New York provides this anonymized caller data via its web page. I tried some stuff with it and liked the idea to work with „real“ data. Also, this dataset from 2010 to 2017 is about 9.5 GB in size – enough to play around with. In fact, I had to take a smaller one, representing only the year 2015 – still about 1.5 GB in size. This all is being provided in csv files.
So I wanted to find out if the city I currently live in, Hamburg, has some kind of data to play around with as well.
Technologies and definitions inside of the Business Intelligence trade are changing. Data Warehouses are traditionally the way to generate data for evaluations and reports. But is this changing – and how?
I like patterns. One of the first patterns I learned about was MVC, or Model-View-Controller. For those who don’t know, very roughly: it separates storage logic from presentation logic. When you use it, you have inevitably separate storage and business logic.
SQL does all at once. Oh, boy.
You thought it would be hard to go through a Business Intelligence project in an organization. Maybe it is easier than thought… when you change focus!
In the current past I see a lot of questions about Talend ETL and how to use it to map to dimensions and fact tables. Is that really feasible?Continue reading
To implement Data Vault and its modelling technique, a database is required. Can MySQL cut it? What are important settings?Continue reading
One question I read and talk a lot about is if it is neccessary at all to use a schema with data. So, is it important to model?Continue reading
Ever wondered what Business Intelligence is? Is it important for your company? Find out here.Continue reading