I like patterns. One of the first patterns I learned about was MVC, or Model-View-Controller. For those who don’t know, very roughly: it separates storage logic from presentation logic. When you use it, you have inevitably separate storage and business logic.
SQL does all at once. Oh, boy.
Say, a colleague wants to have a new version of a report. Since you don’t want to go directly to the sources anymore but on your centralized Data Warehouse, you need to re-implement the report logic.
Of course, he has the old version „you can take a look at to get the idea“. Obviously, to understand what is happening here, you need to analyze the SQL and the data model. If no model exists, you depend on your understanding of this said SQL.
Now imagine this. I did, it is just an example. It kinda orients itself on Oracle SQL.
select id, name as FullName, number, state as oldstate, limit, previ, upfront from (select d.id, d.name, decode(s.number,1,'Open',2,'Ready',3,'Finished'), s.state, f.limit, z.previ, z.upfront from department d, sales s, financial f, (select max(g.amount) from gross g where d.id=g.d_id) z where d.id = g.d_id and s.id = g.id (+) and (case(f.amount > 100, f.link_id = g.id, f.link_id = d.id)) )
First a short analysis:
- 1 outer select
- 1 inner select
- 1 inline select
- implicit joins
- case dependend joins
When looking at this statement, the following questions come up in my mind:
- Those joins, which ones do represent the data model?
- Which ones are joins which represent business rules?
- Why is there an (costly) outer join?
Which brings me directly to my point: Unlike the mentioned MVC pattern it is not easily possible to see which logic is neccessary to connect the tables from the data model. Even if a data model is present, no easy separation between joins which are to join the model and joins which are to represent some kind of business logic is possible.
I mean, if I want to represent a report in a Data Warehouse, I need to be able to separate the loading layer from the business logic layer. This also enables me to minimize complexity. To implement faster. To prototype faster. You name it.
SQL is no big help for analysis – for getting data it is obviously great because it gets the job done – because you can do it all at one place:
- Joining tables because the model says so
- Joining tables because the colleague / report says so
- Making sub-selects on outer joins to represent some kind of needed connections
- Using the magical decode to create textual representations of several states (hopefully the textual description never changes with all 136 scripts then…)
- Letting you create subselects in unknown depth to mankind
If you have a data model with in-depth descriptions – great. I think you won’t be held up as long as everyone else without it. Also, if you are already using layers in the SQL architecture – also great. Views can help a great deal. Still, sometimes the old stuff has to be analyzed and manually separated – and well documented afterwards.
But make no mistake: It is a great feeling when all this spaghetti code is being separated into different layers in your Data Warehouse. After a lot of analysis work. One model which does this great is in my opinion the Data Vault 2.0 model.
I am a big fan of KISS and separating load logic from business logic. This keeps down maintenance time and system complexity. But for someone who is very, very good in SQL, this can be a very different view of the world.
What do you think?