BB0011 – Business Central Meta5 WTDD

0

Hello and welcome to our latest blog post here on BIDA Brains.

I am pleased you have come back to see our latest blog post.

Thank you.

This blog post is technical.

It is really only intended for advanced Business Central developers.

The Meta5 software has a feature called the Workstation Tools Data Dictionary (WTDD).

This dictionary is used to support the ability of Meta5 to generate SQL.

It is not intended to be used to query large operational systems.

It is intended to be used to query dimensional data warehouses.

We have upgraded the version of the WTDD we created for Business Central 21 to 23.

We have brought across the more than 5,000 joins we defined for Navision.

You can download an Excel workbook with the joins in it from the button below.

Download Workbook

This workbook will be handy for anyone trying to write SQL to query the tables in Business Central.

What we have done is download the joins in both directions so there are more than 10,000 rows in the joins worksheet.

This means that every join is defined from the first table in the workbook.

You can see what tables a table is joined to just by filtering on the first table column in the workbook.

Of course, actually inside the WTDD the join is only defined once because the join works in both directions.

We have included a workbook to list the tables that have not had joins defined yet.

We will be working on these definitions in coming months.

If you want to document joins between tables and send them to us to add to this workbook?

Your time and effort would be much appreciated.

We hope that each WTDD will be released to the public so that the documentation and joins in Business Central are more easily available to the public.

Even if Meta5 can not be used to properly query Business Central directly, the query tool is a great way to write SQL and then update that SQL by hand.

That is what we are using this for in our work.

It has been very effective for this.

With that?

Thank you very much for your time and attention.

We really appreciate you dropping by to read our blogs.

Best Regards.

Mihai Neacsu.

Business Development Manager.

The BIDA Team.

Ask Us Anything

You are welcome to ask us any question you would like below on our contact form.

We will be emailing our subscribers useful and
valuable information regarding business intelligence.
Please subscribe here to be on our general emailing list.

BB0010 – Business Central Beginning Of Time GL Account Balances

0

Hello and welcome to our latest blog post here on BIDA Brains.

I am pleased you have come back to see our latest blog post.

Thank you.

This blog post is very technical.

It is really only intended for advanced Business Central developers.

We will have a blog post on our main site for Business Users.

In this post want to show the world how to do something quite complex as a direct query into Business Central.

Basically this is an example to prove that direct query is able to deliver reports up to quite a complicated level.

As we have emphasized, direct query against Business Central is not a replacement for a data warehouse.

But with SSDs and columnar indexes, there is a lot more that can be done using direct query to Business Central than ever could have been done just five years ago.

So, without further ado.

You can click the button below and you can download the actual SQL and a demo spreadsheet for the BOT report.

Download Demo

For those of you who want to be convinced to press the button?

Here are a couple of screen shots of the spreadsheet that you can download.

In this first screen shot you can see that the GL Accounts are listed down the left hand side.

You can see that when there are accounts with GL entries you can see both the bot amount and the actual amount by year.

You can see the plus sign next to the year so you know you can drill down to the month.

On the following image we want to draw your attention to the Operating Equipment Total line.

We have drilled down to the months in 2024.

You can see that the opening amounts are $220,608 in 2023. Of course, you can check the demo data and you will see that it is in late December these transactions are posted.

Then, in January 2024 you have the opening balance of $220,608 and debits of $55.

Then you can see the opening balance for February is $220,553 and so on.

You can see that for each document type, for each month, the opening balance is the opening balance of the last month plus the transactions for the last month.

This is how “beginning of time” GL Balances work.

As we have said, these are very popular among accountants in finance departments of companies.

It gives them the ability to immediately see the relative impact / size of the monthly change versus the full lifetime balance for the account by document type.

The capsule that creates this spreadsheet is as follows.

We will just go through what is in this demonstration capsule as well.

The first line is fetching the SQL from the hosting master database.

One is the SQL for the query against the GL Entries.

One is the SQL for the query to get the GL Account Table itself.

The second line runs the SQL to fetch the GL Entries producing the temporary vf_gl_bot_balances table and sending it into a spreadsheet.

The third line fetches the vm_month_keys so that the posting date can be linked to these keys.

The join icons put the integer keys from the lookup tables onto the data stream for the vf_gl_bot_balances.

The fourth line fetches the vm_gl_document_type_keys to be able to join the gl document types to the vf_gl_bot_balances data stream.

The fifth line runs SQL002 which fetches the GL Account table and puts it into vm_gl_account as a dimension table.

You don’t need to worry about the column select, remove header, compress on the 5th line.

You can just take it to be a small amount of data manipulation.

On the 6th line you can see we fetch vm_month from the host master database.

And on the 7th and last line you can see we fetch the vm_gl_document_type rows from the host master database.

You, as a customer, would have the choice of what other data you put into your own hosting master database.

Each customer would get one of these databases for themselves to set up their installation.

So adding some tables for small amounts of descriptive information would also make sense.

To the bottom right you can see that all the data that is processed is collected into the collect data spreadsheet.

From there it is sent into Excel for the purpose of this demonstration.

To give you an idea we have also copied the capsule run log below.

You can see that it took about 70 seconds total to run this capsule.

This is without the refresh of the data from the workbooks to the Excel PowerPivot Model.

You can see that it took about 16 seconds to run the main SQL query to get the GL Bot Balance rows.

That is running on a VM that has a copy of the Business Central 23 demonstration database on it.

Obviously that would be much more running on your production Business Central system.

This concludes this very technical blog post.

We really hope you liked this because it was a lot of work to develop the sample code for the bot balances.

You are welcome to download the Business Central renaming script from our main site and run this SQL on your own Business Central system.

We would be very interested to see if the boundary at 36 months works properly.

You can now see and understand that any SQL CTE query could be run and the data returned to Excel.

The limitations are the ability of SQL Server to process the query and the ability to send the data into Excel.

You can see from this example that it is possible to create a very wide range of reports in Excel, or PowerBI, using this direct query technique.

If you are a Business Central on premise installed account and you have a report you want that no one has been able to write so far?

Please feel free to contact us and tell us what you want.

We would be interested in developing such reports for free as long as we can resell them to other Business Central Installed accounts.

With that?

We hoped you liked this example.

Thank you very much for your time and attention.

We really appreciate you dropping by to read our blogs.

Best Regards.

Mihai Neacsu.

Business Development Manager.

The BIDA Team.

Ask Us Anything

You are welcome to ask us any question you would like below on our contact form.

We will be emailing our subscribers useful and
valuable information regarding business intelligence.
Please subscribe here to be on our general emailing list.

BB0009 – Sending Emails Using O365

0

Hello and welcome to our latest blog post here on BIDA Brains.

I am pleased you have come back to see our latest blog post.

Thank you.

This is one for the “techies” in our audience. Only of great interest to people who are responsible for running schedules of ETL processing for their companies or customers.

One of the things that has to happen is that the ETL support “techies” have to be informed when the ETL fails so they can go and fix it in the middle of the night while the business users are fast asleep in their nice warm beds!

The standard way of informing an ETL support “techie” that a batch has failed, or has completed, is by email.

Of course, BIDA has the option of sending an email on failure.

However, this requires the email message be put on every line of the schedule.

So in practice what we do is we put emails in to long schedules at important sync points and we put an email at the end of the schedule.

Then the absence of the email is the prompt to go and check the schedule.

Of course, we could have error emails being sent as well, we just don’t really do that in practice.

With the advent of Office 365 some of our customers wanted to move the email sending over to Office 365.

This is a little different to the standard smtp email server and so we had to work out how to make it work.

Since we had to work out how to make it work we thought it was worthy of a blog post so others could benefit from our piece of work.

Our scheduler is very reliable.

We have had customers run more than 6 months without an ETL failure.

With Office 365 the first thing you need to do is to convert the email password to be used to an encrypted text string to be placed in a file.

This string will then be used for authentication with the O365 smtp mail server.

The way you do this is to create a power shell command. So let’s call it:

run-this-as-a-power-shell-command.ps1

In this power shell command you put entries something like this.

You do not need your email address, you will be prompted for your password and it will be encrypted, it will not be checked by this process.

Obviously the “Out-File” is where the encrypted password will be written to:

Read-Host -Prompt "Enter your email password" -AsSecureString | ConvertFrom-SecureString | Out-File "M:\A_BIDA_Hosting\CUSTOMER\S01-Schedule\m-neacsu-email-password.txt"

Having put that command in to the ps1 text file you just right click and say run as a power shell command.

The power shell window will open with the

"Enter your email password"

prompt. You type in your password and it will write the encrypted password to that file.

The contents of that file will be very long string. The first characters will look something like this.

"01000000d08c9ddf0115d1118c7a00c04fc297eb010000009b79e308860dc44fbcbdb8f4e16ba3490000000002000000000003660000c0000000100000..."

This is what an encoded password looks like for Office 365.

Now you want to create the power shell command that will actually send your email.

You can have one such command for each different email you want to send.

As I said, for long running ETL we usually send emails at important sync points so we know that progress is being made. And then we have one email at the end.

So next you create an power shell command that will be the actual command executed at run time in the scheduler. For example you could call it:

run-m01-send-end-etl-email.ps1

And the contents would be something like this:

$AdminName = "mihai@somedomainname.com"
$Pass = Get-Content "M:\A_BIDA_Hosting\CUSTOMER\S01-Schedule\m-neacsu-email-password.txt" | ConvertTo-SecureString
$cred = new-object -typename System.Management.Automation.PSCredential -argumentlist $AdminName, $Pass
Send-MailMessage -From mihai@somedomainname.com -To mihai@somedomainname.com -Subject "BIDAWeekDayBatch001 - Batch Completed" -Body "BIDAWeekDayBatch001 - Batch Completed" -SmtpServer smtp.office365.com -Credential $cred -UseSsl -Port 25

You store this command usually in the same directory as the password and the power shell commands to run this command.

Now, in order to run that power shell command from inside the BIDA Scheduler you need to create a mechanism by which it can be run.

BIDA executes commands using the “command” command inside the scheduler program.

So the command that is passed to the shell of the external operating system has to be a command that is acceptable to the external operating system.

It can not be a power shell command.

So we create a command in a file named something like this:

runPowerShellCommand.cmd

And inside the command file we put a statement something like this:

PowerShell.exe -Command %1

So now we have to have a way of putting

runPowerShellCommand.cmd

in to the scheduler to be run when we want it to run. In this case we want it to run at the end of the ETL to send an email notifying the support person that the ETL has ended.

So, in the scheduler in the process commands section you create an entry like this:

Process Name = bidapowershellexe01
Process Command = ####Scheduler_Signal_Files_Folder####\runPowerShellCommand.cmd ?1
Comment = Run a powershell command

Notice you must use a “?” here and not a “%” sign.

You can use variable substitution in the Process Command field as we have done here. And you invoke the

runPowerShellCommand.cmd

passing it one parameter.

Now this parameter is resolved by BIDA, not the operating system, and to avoid confusion BIDA uses “?” for parameters.

So this Process Name can be called from any line inside the schedule itself and passed a parameter of a power shell command and it will execute that power shell command.

So, then, in the schedule itself you can put:

Process Command = bidapowershellexe01
Process Command = bidapowershellexe01
Parameter = "####Scheduler_Signal_Files_Folder####\run-m01-send-end-etl-email.ps1"

Yes, we do now repeat the Process Command.

We used to have the ability to have two different values in the batch command line of the scheduler.

Now we tend to use the same for such commands as this because there was no real value in having them be different.

What this command in the actual schedule will do is to execute the file

run-m01-send-end-etl-email.ps1

by using the power shell command.

So this will then be the step that actually sends the email at the end of processing.

So there you have it.

This post has shown you how it is that we send emails via O365 accounts from inside the BIDA scheduler.

Of course, smtp emails are quite simple and it is quite well understood how to send them via a command line.

With the use of O365 it’s a little trickier because we have to use power shell and we have to provide the encrypted password.

This is why I thought it was worth my while to put this blog post up for you to show you how this works!

Of course, our scheduler has turned out to be the backbone of our run time processing.

It is more than 20 years old now and has passed the test of time.

Over the years it has proven itself to be very reliable.

We start it up at system startup time and it just runs.

We recommend putting the scheduler on the same machine as the database for the schedule.

This has proven very, very reliable.

The only real issue left with the scheduler is that if the machine crashes so hard in the middle of a batch that the scheduler did not have time to write out the fact that it was going down?

Then manual intervention is needed to get the scheduler restarted because it did not record the fact it was crashing.

If the DBA performs a shut down of the database properly then BIDA will notice that the database is being shut down and record those processes that have not been completed and the restart will find them and run them.

We also provide a way to stop the scheduler properly using a signal file.

We recommend that when the host machine is being taken down for maintenance the scheduler is stopped properly.

Today, servers crash so rarely this is not an issue.

BIDA just runs and runs like the energiser bunny.

It has parallelisation features that mean we can take a DataStage schedule of jobs and actually run it faster than DataStage can.

That’s why we have the DataStage interface, to do exactly that!

The BIDA scheduler also comes with SQL Server Report Services Reports that expose the schedules and the schedules in progress so that ETL support “techies” can see where any specific schedule is up to.

With schedules commonly having more than 5,000 steps reports for where things are up to are quite useful.

I thought it would be polite and helpful to write a short blog post to tell you how we invoke PowerShell and how we can send commands, such as sending an email via O365, using PowerShell.

And with that?

We hope you liked this blog post.

If you have any questions or comments please feel free to get in touch with us.

My details are on the contacts page.

Thank you very much for your time and attention.

We really appreciate you dropping by to read our blogs.

 

Best Regards.

Mihai Neacsu

Business Development Manager.

The BIDA Team.

Ask Us Anything

You are welcome to ask us any question you would like below on our contact form.

We will be emailing our subscribers useful and
valuable information regarding business intelligence.
Please subscribe here to be on our general emailing list.

BB0008 – The Importance Of Short Descriptions

0

Hello and welcome to our latest blog post here on BIDA Brains.

I am pleased you have come back to see our latest blog post.

Thank you.

In this blog post we just want to make an observation because we tripped up on this just now working on our own dashboards.

When creating Dashboards the “Steven Few” way you cram as much information as you can onto the dashboard.

So use of ink and real estate is something to be careful about.

We were working on a dashboard where the City name for stores was appearing.

And, surprise, surprise, we forgot to create a short description for citys.

Luckily, the test data had some long city names in it and we noticed.

So we have created a city_sdesc to put city names on the dashboard.

The short story is this.

When putting text fields onto reports we long ago settled on 15 character max for text fields.

So you can see in the codes tables in our models that we have a “code” which is varchar 10.

Then we have a short description of the code which is varchar 15.

This varchar 15 is intended to be used in all places where a short description of the code is preferable to the 10 character code itself.

Then we have the long description which is varchar 255.

Lastly we put a help description on the codes tables which is also varchar 255.

These are particularly useful for tools like Business Objects which is where it came from.

Business Objects allows you to put a “help text” into a universe and it will pop up in Business Objects when the mouse is placed over the field.

But in doing our mapping for our prototype for retail outlets the name of the city for the outlet is in the table.

And we did not create a second field as city_sdesc to put onto the dashboard.

Lesson learned!

So we are just going to go back through and add short descriptions for all those things that might be put onto a dashboard as text.

I thought it would be polite and helpful to write a short blog post to mention this to our readers.

We hope you liked this blog post.

If you have any questions or comments please feel free to get in touch with us.

My details are on the contacts page.

Thank you very much for your time and attention.

We really appreciate you dropping by to read our blogs.

 

Best Regards.

Mihai Neacsu

Business Development Manager.

The BIDA Team.

Ask Us Anything

You are welcome to ask us any question you would like below on our contact form.

We will be emailing our subscribers useful and
valuable information regarding business intelligence.
Please subscribe here to be on our general emailing list.

BB0007 – Example Plan vs Actual Meta5 vs SQL

0

Hello and welcome to our latest blog post here on BIDA Brains.

I am pleased you have come back to see our latest blog post.

Thank you.

In our last blog post we talked about why is Meta5 so easy to use and why BIDA is promoting Meta5.

In this blog post we just want to give you a small example.

In this blog post we will show you how it is possible for a business person to create a plan vs actual mini fact table for an Excel PowerPivot Model in Meta5

Creating a plan vs actual mini fact table is a very common process in Excel dashboarding.

One of the most basic dashboards is “Plan vs Actual”.

In our case we are adopting and promoting the Stephen Few Bullet charts.

So as well as “Plan vs Actual” we also use “lower limit” and “upper limit” to give more context to the target value.

Of course, plans are usually stored in spreadsheets and are held at monthly level.

They are usually held at a fairly high level on the product hierarchy as well, such as product category.

We propose to help customers create sales plans at much lower levels so that product order processing can be optimized to reduce the amount of product that is held in stock.

So, let us consider the following case in our example.

Plans are held at monthly level.

Actual sales are held at detailed level and summarized along many dimensions to both daily and monthly levels.

Therefore, historically, actual sales are available at common levels such as day, week, month.

Products are available at common levels such as Product Group and Product Category.

Any business person who is proficient with Excel can build the following capsule given a relatively modest amount of training.

The only component a business users might need help with is the “Function” icon.

So please review the capsule and we will explain what each icon is doing below.

This capsule would be passed a parameter called “Reporting Date”.

This is the date that the reported is considered to be for.

Usually it will be yesterdays date since we will usually have data complete at close of business prior day.

In this example we are using 2017-08-15 as the Reporting Date because we are using old customer data in testing.

Planned Sales LTM is querying the database to fetch the planned sales for the “Last Twelve Months” prior to the current month of the “Reporting Date”.

Actual Sales LTM is querying the database to fetch the actual sales for the Last Twelve Months prior to the current month of the “Reporting Date”.

We join the two queries and we send the result to a spreadsheet.

The join is an outer join.

This spreadsheet will have all plan records and all sales records for the Last Twelve Months.

Then we need the plans vs actuals for this month.

However, the plan is at the monthly level and the actuals are at the daily level.

So what we propose is that we will pro-rate the plan by the number of days that have passed in the month in relation to the number of days in the month.

Now, we all know this is not the greatest way to pro rate a plan, but it is the simplest way and makes for a good demonstration.

We retrieve the Planned Sales This Month and send them into the join.

We retrieve the Report Date This Month.

This will include retrieving the month level key, the number of days elapsed so far in the month as per the reporting date, and the number of days in the month.

The join will produce a spreadsheet with the planned sales, the number of the day in the month, and total number of days in the month.

This data will flow through the Function icon.

Now, we are prepared to accept that maybe a proficient excel user would not be able to use the Function icon.

But please allow us to show you what s/he has to put into this icon and you can be the judge.

Here are the controls for the Function icon.

 

The business user has to enter the formulas you see separated by commas.

$A, $B, $C, $D means that these columns will go forward unchanged.

The next 8 formulas separated by commas means that each column will be multiplied by the number of the day in the month then divided by the number of days in the month.

We would argue that good Excel users can enter a formula like e * k / l .

That is not too much of a stretch if you ask us.

Even so, you can see that there would be “someone around” who could help out with such formulas.

After the data flows out of the function icon the plans have now been pro-rated by the number of the day in month for the reporting date divided by the number of days in the month.

You can see an icon to fetch the Actual Sales for this month.

It will fetch these sales at the daily level and summarise them up to the monthly level.

The plans and actuals for this month are then joined.

And then they are appended to the plans vs actuals for the prior 12 months.

The result is put into a spreadsheet and sent “out” of the capsule.

If you do not want to take our word that a business users proficient in Excel can build such a capsule?

We have a free evaluation service for qualified prospective customers.

You are welcome to try this out for yourself if you wish.

Now, the magic of Meta5 is that the business users can perform their own analytics.

We take the position that by empowering business users to answer their own questions this is “better” than requiring them to go to IT to ask their questions.

There are many reasons why we take this position.

So given that a business user can use Meta5 to answer his/her own questions?

We claim that is inherently better than having to get anyone from IT involved in a piece of analysis work.

The SQL that is generated by each of the query icons is very simple.

The trick is that this data can be joined in Meta5, can have calculations performed on it such as pro-rating, and can be forwarded to Excel very easily.

Of course, SQL programmers would say:

“But I can do this in SQL!”

And they can.

Please click the button below and you will be able to download the SQL equivalent of this capsule as a zip file.

Download SQL

 

You can unzip it and open it in your favorite text editor.

The spacing is likely to be out because we use tabs in the writing of such code.

In the SQL file you can see we have included quite detailed documentation.

If you can read SQL you can see that it is performing 4 sub queries and then summarizing those sub queries.

It does not need to perform a separate query to get the day number in month and number of days in the month as this can be done with just a left join.

You can see that yes, indeed, this SQL delivers the exact same functionality as the capsule above.

We can load this SQL into the bida_sql_statements table.

We can fetch it from that table using Meta5.

We can then send it into the data warehouse to execute.

It will return the exact same results as the capsule above.

And yes, we did test it to prove that it did.

However, and it is a BIG HOWEVER.

No business person can write the sql you are reading.

In fact it took our Professional Services Manager about 4 hours to get that fragment of SQL to work exactly the same as the capsule.

It took him less time to build the capsule in the first place.

So even if the person creating the capsule CAN create the SQL equivalent?

It is faster to create the capsule.

Business users can not create SQL like this.

But they CAN create capsules like this.

And that is the magic and beauty of Meta5.

Business users can create their own capsules, do their own analysis, get data directly from where ever they are authorized to get it.

They don’t need to go to IT and explain what it is they want.

As long as they are authorized to query the data they need they can get it for themselves.

No matter what companies say about “self service data analysis”.

Meta5 remains the king of the “self service data analysis” products in our humble opinion.

But don’t take my word for it.

You can take Meta5 for a free evaluation spin yourself if you work for a qualified prospective customer.

We have created two very detailed training videos to give prospective customers the chance to learn how to use Meta5 and to take it for a detailed “spin”.

These are the:

  1. The Adventure Works Demo Dashboard
  2. The Demographics Dashboard

If you are an “Excel Person” who wants to evaluate Meta5?

These two Dashboards take 2 days and 3 days to create respectively.

You are welcome to try your hand.

You can review these two video training courses by clicking the button below.

GoTo How Do I...

In Summary

In this blog post we wanted to show you the difference between using Meta5 to create a plan vs actual mini fact table to be sent into Excels Power Pivot model versus using SQL.

We wanted to give you a very clear example of how Meta5 can empower business users to create their own analysis versus needing to go to IT and ask someone to please get the data from the data warehouse into Excel to do the same.

The plan vs actual is a good example because it is such a typical example of comparing data that comes from two different places across a set of common dimensions.

This is very standard work for business users who are doing data analysis.

We hope you liked this example.

We hope you liked this blog post.

If you have any questions or comments please feel free to get in touch with us.

My details are on the contacts page.

Thank you very much for your time and attention.

We really appreciate you dropping by to read our blogs.

 

Best Regards.

Mihai Neacsu

Business Development Manager.

The BIDA Team.

Ask Us Anything

You are welcome to ask us any question you would like below on our contact form.

We will be emailing our subscribers useful and
valuable information regarding business intelligence.
Please subscribe here to be on our general emailing list.

BB0006 – Why Is Meta5 So Easy To Use?

0

Hello and welcome to our latest blog post here on BIDA Brains.

I am Mihai Neacsu, and I am the Business Development Manager here at BIDA.

I am pleased you have come back to see our latest blog post.

Thank you.

This is quite a long and detailed post.

So if you would like to get a cup of tea or coffee now would be a good time to do that.

A lot of people are asking me why BIDA is promoting Meta5.

The short answer is that we know if we can persuade our prospective customers to give Meta5 to the five smartest business analysts in their company they will sustainably grow their profits.

We have seen the results that Meta5 can achieve for other companies and we want Romanian companies to avail of the same profit improvements Meta5 can deliver.

Meta5 was first designed for business people to use.

At one point Meta5 was the words number 2 data analysis software only behind SAS Software.

The difference being business people could use Meta5 and it took IT people to use SAS Software.

Even today, Meta5 still stands head and shoulders above other tool purporting to provide “self service data analysis”.

At least, that is our opinion and we are sticking to it.

Today I will show you one small example of why Meta 5 is so easy to use and so useful for business analysts.

Consider the case that you want some head office sales dashboard that shows you the following items.

Plans versus actuals using sparklines and Stephen Fews bullet charts.

Actuals this month to date, quarter to date, year to date, along with all the variances against prior periods.

And you want this by the product categories, by region manager, city, outlet and currency over the last 12 months.

That is about as standard a Head Office Dashboard as you are ever going to be asked for.

But this Head Office Dashboard will take most people a surprisingly long time to create because there are some little “gochas” in there.

The only tool available on the marketplace today that does this sort of Dashboard very easily is MicroStrategy.

We are also very familiar with the features of MicroStrategy and we quite like it.

But it is expensive and not easily sold in Romania.

Not to mention one of our partners already sell Microstrategy here in Romania.

Calculating such measures as variances this month to date to last month to date, this month to date vs this month last year in Excel require the use of such tools as SSAS or Essbase.

These sorts of measures are not generally created in Excel in the Power Pivot models.

So in this blog post we are going to show you how Meta5 gets the data for such a dashboard into the Excel Power Pivot model.

We will show you some of how it is so simple and easy to use that business users can create such dashboards.

We will also show you the SQL that is generated.

We will remove customer identifying information from all images.

So, the first picture is to show you a portion of the overall capsule.

 

This is our very first Business Central capsule so it is numbered BC01001 and called “Head Office Report”.

You can see from the image that we are going to retrieve the following data for dimension tables that will be put into the power pivot models.

  1. Period.
  2. Outlet.
  3. Product Category.
  4. Currency.

These are very standard dimensions for a head office dashboard.

Then you can see we are going to retrieve the following information for the fact tables that will be put into the power pivot model.

  1. Plans versus Actuals By Month
  2. Sales for Month, Quarter and Year to Date.
  3. Sales for Last Periods.

This last one is the one that is most interesting and will be the subject of most comments.

Now, we won’t show you that much about the dimensions.

You are assumed to be fairly technical because this post is on BIDA Brains.

So let’s start with Plan Versus Actual by Month.

Here is the picture of what is in the capsule.

You can see there is a query icon to retrieve the planned sales.

The SQL it generates is as follows:

Planned Sales

select
dbo.vm_sale_plan_period_ml.pk_vm_sale_plan_period_ml
,dbo.vm_currency.pk_vm_currency
,dbo.vm_ret_outlet_rep.pk_vm_ret_outlet_rep
,dbo.vm_product_ml.dk_vm_product_category
,sum(dbo.vf_ret_outlet_sale_plan_ml.sale_extended_amount_plan)
,sum(dbo.vf_ret_outlet_sale_plan_ml.sale_units_plan)
,sum(dbo.vf_ret_outlet_sale_plan_ml.sale_extended_amount_lower)
,sum(dbo.vf_ret_outlet_sale_plan_ml.sale_units_lower)
,sum(dbo.vf_ret_outlet_sale_plan_ml.sale_extended_amount_upper)
,sum(dbo.vf_ret_outlet_sale_plan_ml.sale_units_upper)
from dbo.vm_sale_plan_period_ml
,dbo.vf_ret_outlet_sale_plan_ml
,dbo.vm_currency
,dbo.vm_product_ml
,dbo.vm_ret_outlet_rep
where
dbo.vm_currency.pk_vm_currency=dbo.vf_ret_outlet_sale_plan_ml.dk_vm_currency
and dbo.vm_ret_outlet_rep.pk_vm_ret_outlet_rep=dbo.vf_ret_outlet_sale_plan_ml.dk_vm_ret_outlet
and dbo.vm_product_ml.pk_vm_product_ml=dbo.vf_ret_outlet_sale_plan_ml.dk_vm_product
and dbo.vf_ret_outlet_sale_plan_ml.dk_vm_sale_period=dbo.vm_sale_plan_period_ml.pk_vm_sale_plan_period_ml
and (dbo.vm_sale_plan_period_ml.sale_plan_period BETWEEN '2016-08-01' and '2017-08-01')
and (dbo.vm_sale_plan_period_ml.level_col_sdesc = '2-Month')
and (dbo.vf_ret_outlet_sale_plan_ml.current_plan_flag = 'Y')
and (dbo.vm_product_ml.level_col_sdesc = '1-ProdGroup')
and (dbo.vf_ret_outlet_sale_plan_ml.dk_vm_fact_table_level = 609)
group by
dbo.vm_sale_plan_period_ml.pk_vm_sale_plan_period_ml
, dbo.vm_currency.pk_vm_currency
, dbo.vm_ret_outlet_rep.pk_vm_ret_outlet_rep
, dbo.vm_product_ml.dk_vm_product_category
order by
dbo.vm_sale_plan_period_ml.pk_vm_sale_plan_period_ml
,dbo.vm_currency.pk_vm_currency
,dbo.vm_ret_outlet_rep.pk_vm_ret_outlet_rep
,dbo.vm_product_ml.dk_vm_product_category

You can see that we are using data from 2016 and 2017.

This is actual old customer production data we are using for testing.

So we have to remove all references to anything in the data.

The SQL is easy enough to understand.

We design multi-level models as made popular by Ralph Kimball.

So you can see this query is at the month level and the product group level.

Where you see a table with the suffix “ml” that means this table is a “multi-level” table and some constraint on the level of the table must be supplied in the query.

Because we are querying at the outlet reporting level we don’t need to mention the level of the outlets.

The code

dbo.vf_ret_outlet_sale_plan_ml.dk_vm_fact_table_level = 609

means that the multi level fact table for outlet plans will be queried at level number 609.

In our data models every summary level of every table is assigned a unique number.

This is part of the design process that we use for multi-level summary fact tables.

Please note that we not only have plans defined, we also have “lower” and “upper” amounts defined for extended sales (total revenue) and units.

These “lower” and “upper” amounts are needed to create the Stephen Few style bullet charts.

 

Actual Sales

The next icon retrieves actual sales.

In this case we are retrieving actual sales at the month level so if we have set the “report date” in the past then we will get the monthly level plans and sales for the whole month of the month of the report date.

We believe this is fair and reasonable because in another portion of the capsule we will get “month to date” using the maximum date of the report date.

select
dbo.vm_sale_meas_period_ml.pk_vm_sale_meas_period_ml
,dbo.vm_currency.pk_vm_currency
,dbo.vm_ret_outlet_rep.pk_vm_ret_outlet_rep
,dbo.vm_product_ml.dk_vm_product_category
,sum(dbo.vf_ret_outlet_sale_meas_ml.sale_extended_amount)
,sum(dbo.vf_ret_outlet_sale_meas_ml.sale_units)
from dbo.vm_sale_meas_period_ml
,dbo.vf_ret_outlet_sale_meas_ml
,dbo.vm_product_ml
,dbo.vm_ret_outlet_rep
,dbo.vm_currency
where
dbo.vm_currency.pk_vm_currency=dbo.vf_ret_outlet_sale_meas_ml.dk_vm_currency
and dbo.vm_sale_meas_period_ml.pk_vm_sale_meas_period_ml=dbo.vf_ret_outlet_sale_meas_ml.dk_vm_sale_period
and dbo.vm_ret_outlet_rep.pk_vm_ret_outlet_rep=dbo.vf_ret_outlet_sale_meas_ml.dk_vm_ret_outlet
and dbo.vm_product_ml.pk_vm_product_ml=dbo.vf_ret_outlet_sale_meas_ml.dk_vm_product
and (dbo.vf_ret_outlet_sale_meas_ml.dk_vm_fact_table_level = 509)
and (dbo.vm_product_ml.level_col_sdesc = '1-ProdGroup')
and (dbo.vm_sale_meas_period_ml.level_col_sdesc = '2-Month')
and (dbo.vm_sale_meas_period_ml.sale_meas_period BETWEEN '2016-08-01' and '2017-08-15')
group by dbo.vm_sale_meas_period_ml.pk_vm_sale_meas_period_ml
, dbo.vm_currency.pk_vm_currency
, dbo.vm_ret_outlet_rep.pk_vm_ret_outlet_rep
, dbo.vm_product_ml.dk_vm_product_category
order by
dbo.vm_sale_meas_period_ml.pk_vm_sale_meas_period_ml
,dbo.vm_ret_outlet_rep.pk_vm_ret_outlet_rep
,dbo.vm_product_ml.dk_vm_product_category

It is simple SQL.

It is retrieving the sales amount at month level between ‘2016-08-01’ and ‘2017-08-15’.

As you can see in the above capsule the two sets of data are then joined in the join icon.

This is an “outer join” and row that is not matched left or right will have it’s values set to zero.

The join icons supports all the standard join types and there are options for how to set values that are not present.

I don’t want to go into the details on those options.

What I want to focus on is that it is that simple to retrieve plans from one table and actuals from another table and to create a mini-fact table which has plans vs actuals next to each other for the month, currency, outlet, product category.

Just getting plans versus actuals next to each other in one table in a power pivot model is much more difficult than you have just seen.

Meta5’s ability to perform joins, including up to 5 way joins, means the business users can join data without any knowledge of SQL.

BB0005 – Example Project For Data Vault Strengths

0

Hello and welcome to our latest blog post here on BIDA Brains.

I am Mihai Neacsu, and I am the Business Development Manager here at BIDA.

I am pleased you have come back to see our latest blog post.

Thank you.

In our prior blog post I talked about the men we consider to be the most influential in our industry.

I also talked about how a suite of dimensional models were developed by the Data Warehouse Network.

They were called the Vertical Packaged Solutions, and they were able to implement archives as well as Operational Data Stores under certain conditions.

As a result Sybase was selling copies of the IWS Models like hot cakes.

When SAP bought Sybase there was no role for the IWS models and they were put in mothballs.

At some point in time SAP published these models.

They were still published as late as 2018.

You can go and read the blog post, that now has broken links, on the button on the blog post.

SAP IWS Models

By using the one modeling style for the end user query interface, the archive and the operational data store, the overall costs of implementing IWS were vastly lower than the third normal form models that Teradata were selling at the time.

At the same time, the benefits of a Sybase IWS data warehouse were approximately the same as using different modeling techniques.

Although most people will tell you that it’s not possible to properly archive data in a dimensional model, this was perfected twenty five years ago, and it has been implemented in to many companies.


 

So, you might ask the question, if dimensional models can archive data effectively why would you ever use a Data Vault or more complex archiving mechanisms?

Well, there are situations where this sort of dimensional modeling is not well suited for the archive.

So let us consider this example.

Just so you know, this is a real example.

Consider that most western countries have a government department that is responsible for the payment of welfare payments to citizens.

I will use Australia as an example.

Australia has a government department now called “Services Australia”.

This department is responsible for payments such as child care, child support, unemployment benefits, medical insurance and many others.

It makes payments in the order of billions of dollars per year using the taxes collected in the country.

These sorts of departments have some very complex historical data requirements.


 

Changing Systems

It is very common for government departments to be merged and split up.

Services Australia is the combination of what used to be many different government departments.

So the computer systems that these departments had would have evolved over long periods of time and been developed by different people.

They would be profoundly different for each department.

Then, as departments are merged data would be migrated from some departments to other departments computer systems, but the need for historical data is there for reasons I will mention next.

These sorts of large government departments have a far greater need for support of changing systems than your average large company.


 

Changing Legislation

Governments love to change legislation with respect to welfare payments.

This legislation can change quite dramatically with a change of government, from left of politics, to right of politics.

Australia is also a two party country for all intents and purposes.

There is a right leaning party and a left leaning party.

When the left leaning party comes in to power they spend money like a drunken sailor, and these alterations to the legislation governing payments have to be changed, and reflected, in the data warehouse.

When the right learning party comes in to power they try and reduce the over-spending, and this requires another large batch of legislation changes have to be applied to systems, including the data warehouse.

However, both parties need to version the legislation because during the election season they need to have the department staff run calculations, and forecasts, to present proposed legislation to the electorate in an effort to win votes.

So not only does the data warehouse need to support the changing of legislation.

It needs to support both past and future proposed legislation changes, along with being able to calculate the effect of these changes on groups as small as approximately 200 households.


 

Combining Complex Data With Other Complex Data

Not only does a data warehouse for such a department need to integrate all the data it has, along with versions both past, present and proposed future legislation, along with calculation scenarios for different proposed future legislation.

It also needs to integrate data from other departments and be able to integrate that data with all these similar complexities.

For example, Services Australia needs to be able to integrate data from the Australian Tax Office and their rules for taxation and their taxation records.

After all, there is not much point promising people more money in welfare payments with the left hand, if the government is only going to take that money back in increased taxes with the right hand.

Though, to be honest, that is pretty much what the Australian government does.


 

When you have such a complex environment that is so rapidly changing and so many versions of everything need to be kept?

That is a tough job for a dimensional model.

The way we archive data in a dimensional model is not well suited to this level of change and volatility.

This very complex, and relatively rare, set of requirements is far better suited to a modeling style that is meant to “archive data”.

The data also needs to be archived without much concern to how it will be queried or the costs of the overheads required to maintain different modeling styles.

In short, such a situation is better suited to Data Vault than it is suited to dimensional models.

Just like a telco or retailer or bank is better suited to using a dimensional model than a Data Vault model for the archive portion of the data warehouse.

Now sure.

Given this above example, some people will argue that a Data Vault Model offers some benefits over archiving data in a dimensional model.

While this is true, those benefits come at a cost.

That extra cost has to be justified.

In the areas of high transaction volume consumer businesses?

That extra cost is tough to justify.

In the area of complex government departments paying out billions of dollars in welfare payments?

The extra cost is relatively simple to justify.

Indeed, those people who build dimensional models won’t go near such government departments because they have been there before and they don’t want to go there again. LOL!


 

In Summary

In this blog post I just wanted to do a short post to point out that when it comes to data modeling we must somewhat observe that there are horses for courses.

There are a class of problems that are well suited to dimensional models including archiving in a dimensional model.

There are a class of problems that are well suited to archival models.

And there are a class of problems that are in the area overlapping these two modeling types.

It is the job of the Data Warehouse Architect to decide which modeling technique should be selected.

So, ladies and gentlemen, when you are proposing data warehousing modeling styles?

Please choose carefully.

Your choice will be reviewed down the track, and you will want to be able to defend your choice.

Blindly implementing just one or the other modeling style because that’s what you know?

Such an approach will not stand the test of time.

So consider your self warned, that your choices will be reviewed in the future.

This is inevitable.

Ok?

Thank you very much for your time and attention.

We really appreciate you dropping by to read our blogs.

Best Regards.

Mihai Neacsu.

Business Development Manager.

The BIDA Team.

Ask Us Anything

You are welcome to ask us any question you would like below on our contact form.

We will be emailing our subscribers useful and
valuable information regarding business intelligence.
Please subscribe here to be on our general emailing list.

BB0004 – Some History Of The Data Warehousing Area

0

Hello and welcome to our latest blog post here on BIDA Brains.

I am Mihai Neacsu and I am the Business Development Manager here at BIDA.

I am pleased you have come back to read or listen to our latest blog post.

Thank you.

As I have said in previous blog posts, there has been quite a lot of discussion about Data Vault in public spaces such as linkedin.

As a result, we decided to do a series of blog posts about Data Vault and putting it into perspective when considered among the other ways of building data warehouses.

My comments are all things that are well known in the data warehousing area.

There is nothing “controversial” about what I am saying.

So.

Let’s go.

Firstly, I wanted to touch on the positioning of Data Vault, and Dan Linstedt, in the data warehousing area.

As Kent Graziano has mentioned in public to us, Dan did not go public about Data Vault until around 2001.

Those of us who were in the data warehousing business in the 90s, like Peter, knew of Dan and his ideas.

Peter personally knew about the forerunner to Data Vault around 1996 or 1997.

Something like that.

Peter was involved with the Australian reseller of Bill Inmon’s software from 1996 onwards.

Somewhere along the line Peter heard about Dan and what he was doing.

A number of people he knew told him about Dan’s ideas.

Many people who knew about Dan looked in to whether it made sense implementing Dans ideas using Bills Prism ETL software, and later, DataStage.

Obviously, Ardent were also aware of Dan since they bought Prism Solutions.

Most people, including Peter, decided to stick with Bill’s proposed Time Variance plus Stability Analysis modeling techniques.

There were many reasons for this.

One of the main reasons was that it was hard enough to sell a data warehouse already.

They did not need to introduce a much more esoteric, and abstract, modeling technique to make their sales process harder.

Even today, you see people “complain” about how hard it is to persuade customers to use Data Vault.

Well?

Imagine what it would have been like in the late 90s.

People were still having to explain the use of dimensional models in each sale in the late 90s.

To try and sell Data Vault would have killed the sale in most cases, because IBM and Teradata were selling third normal form data warehouses.

It was hard enough to sell an Archive Layer that was designed to Bill’s proposed TV plus SA technique against IBM and Teradata.

They didn’t need to introduce something that was unheard of and esoteric at that time.

So, I want to dispel the idea, in public, that those who were the leaders in data warehousing in the late 90s did not know who Dan was or did not know what Data Vault was.

People like Peter knew who Dan was.

People like Peter knew what Data Vault was.

It was not a sellable idea in the marketplace at that point in time, because the people they had to beat were IBM, Oracle and Teradata.

IBM was not hard to beat in Australia.

But they were hard to beat in other places.

It’s that simple.

Next.

I want to talk about who are the top five men who have influenced our industry segment the most, balanced with who was the best in this business, even if their industry influence was not as great.

This is a kind of stroll down memory lane for the old guys.

It will be a history lesson for you younger guys.

Of course, these are Peters opinions since I am new to this space.


Number One.

Bill Inmon.

I think it goes without saying that Bill Inmon is the most influential figure in the data warehousing area.

Bill came up with the idea of archiving data when most of us were struggling to simply record the data that was being generated via applications.

You have to remember how expensive disk and processing was for mainframe computers at the time.

When put in that context you realize what a breakthrough idea it was to archive that operational data.

It was such a radical idea that for many years people rejected the idea out of hand.

I would like to mention that Bill Inmon invented a new idea.

He did not take an existing idea and improve on it as Ralph did.

It is one thousand times harder to invent a new idea, that has never existed before, than to take an already existing idea and improve it.

At least one thousand times harder.

So, Bill has to go at the top of the list for inventing the idea of data warehousing in the first place.

 


Number Two.

Ralph Kimball.

Ralph is a personal friend of Peter’s and he has the greatest of respect for Ralph.

That said, Ralph was an engineer.

And what engineers do is take existing ideas and made them better.

That is the specialty of an engineer.

The first commercial multi-dimensional database was released by IRI Software, some time in the 70s.

They sold the Nielsen data using their software.

People in the marketing departments of massive retailers, and CPG companies, were using their software to improve the return on their marketing dollar investment.

Ralph moved from Xerox Parc to Metaphor Computer Systems, along with David Liddle, Don Massario and Charles Irby.

You can read about Metaphor Computer Systems on the button on the web page.

Metaphor Wiki

So, Ralph was far from “on his own” in getting started at Metaphor.

David Liddle and Charles Irby are very well known from their work at Xerox Parc.

Metaphor took aim at IRI and wanted to “eat their lunch”.

One of the questions was “what sort of database should we use”?

Ralph, and others, proposed that they should develop their own relational database.

This was based on the idea that relational databases will scale better, over time, than multi-dimensional databases.

The “star schema” was simply the relational implementation of the multi-dimensional models that were already in use and made popular by IRI.

Sure, it was a big step forward from a multi-dimensional database, but it was not the invention of something new.

The Metaphor item that was a new invention was the idea of the capsule.

Ralph left Metaphor after IBM became involved, and he went on to create a company called “Red Brick”, which was a dimensional model based database company.

The Metaphor experience created a whole series of people who went on to make their mark on the data warehousing industry.

 


Number Three.

Sean Kelly.

Sean Kelly is arguably the number three person who influenced the data warehousing industry.

He was younger and came after Bill Inmon and Ralph Kimball.

He wrote a series of books and created the “Data Warehouse Network”.

He pioneered the idea of “Vertical Packages Solutions”.

He was able to sell, and implement, these models to many large companies very successfully.

In the end, the Data Warehouse Network, and Vertical Packaged Solutions, were sold to Sybase for an undisclosed amount.

Sean Kelly then went on to be the VP Business Intelligence for Sybase for EMEA.

Sean Kelly’s main area of focus was improving the profitability of the companies that bought the data models.

Those people, like Peter, worked with Sean, know the caliber of the man.

So Peter’s personal opinion is that Sean Kelly sits at number three.

 


Number Four.

John Doe number one.

He does not wish to be a public figure so I won’t mention his name.

This is the man that many call “the worlds greatest data modeler”.

He was the man who actually invented the “Vertical Packaged Solutions”, while working on projects won by Sean Kelly, as part of the Data Warehouse Network.

He came up with the design techniques, that are embedded into the Sybase Industry Warehouse Studio models, without the benefit of Ralph Kimballs book being available.

He was the man who figured out how to archive data in a dimensional model.

This was called the “entity profile” in the Sybase Industry Warehouse Studio data models.

Everyone who saw the IWS models, back in those days, realized that they had seen the future of how data warehouse models will be developed.

Especially for large business to consumer companies that have high transaction volumes.

That the models he invented were commonly sold for one hundred and fifty thousand US dollars, tells you how advanced those models were.

This is because it is very hard to sell data models into IT shops, because the people in the IT shop generally believe they can do a better job.

Some of those people who were trained on the IWS models went on to develop similar data warehouses after Sybase was purchased by SAP in 2006.

For inventing all the modeling techniques needed to build large, complex, dimensional models, that allow archiving of data, he makes our list as number four.

 


Number Five.

John Doe number two.

Again, this man does not wish to be a public figure.

He is very well known among those of you who have been around for a long time.

In the 80s, he joined Metaphor as a database modeler.

He rose up the ranks in Metaphor, and when Ralph Kimball left in 1989, he took over the role of the number one data modeler for Metaphor.

So, he was trained directly by Ralph Kimball, and took over Ralph Kimballs job, when Ralph moved on.

He led the Metaphor Data Modeling team from 1989 to 1994.

When IBM retrenched all the consultants from Metaphor in 1994, he moved to Price Waterhouse where he took over the leadership of the development of the Price Waterhouse Data Warehousing Practice.

By 1999 Price Waterhouse had merged with Coopers and Lybrand to become Price Waterhouse Coopers.

It was, by far, the largest data warehouse and business intelligence consulting firm in the world, with more than 1,500 people in the data warehouse practice area.

It was later part of Price Waterhouse Coopers business that was sold back to IBM for three and a half billion US dollars in 2002.

Presumably a nice portion of that sale price made it into the pocket of this man who was made redundant by IBM just 8 years earlier.

So, this man, as much as any, was instrumental in creating the largest, and most successful, data warehousing consulting practice in the world.

He, quite literally, wrote the Price Waterhouse Coopers, data warehousing methodology.

In 1996 Price Waterhouse Coopers, by way of John Doe two, adopted the idea of combining Bills Archival Models with Ralphs Dimensional Models to create a comprehensive, and expensive, end to end data warehousing design solution.

Price Waterhouse Coopers remained unchallenged as the leading data warehousing consultancy from 1997 through to the sale of this area of the business to IBM in 2002.

 


 

For my money, those are the top 5 most influential people in the data warehousing area.

I can tell you that Peter is very proud to say that they are all his personal friends.

Each of them has made a great contribution to his skills over the years.

Each of them has put money in Peter’s pocket from the lessons they were willing to teach him.

We have brought all that knowledge in to BIDA.

Next.

One could argue that Michael Saylor should be on the list.

However, I am not putting him on the list because MicroStrategy is much more of a software company than a data warehousing company.

Also, the ideas of the data models and the aggregate navigation built in to MicroStrategy came from Metaphor people in the first place.

So those are our proposed top 5 most influential people in the data warehousing area of our market.

What do you think of our list?

Do you agree or disagree?

Please feel free to put your comments in the comments section below.

The point of listing them is that Dan Linstedt and Data Vault are not on this list.

It would be the topic of some debate as to whether Dan belongs in the group that is six to ten.


 

As I mentioned, Kent Graziano notes that Dan did not go public with Data Vault until around 2001.

There were a vast number of data warehouses built in the 90s.

All the issues of building data warehouses were resolved by the five men I have named, plus the men who worked with them on their projects.

One of those men also no longer wishes to be a public figure.

He has recently taken down his web sites and made it clear he is no longer a public figure.

So we will call him John Doe three.

John Doe three is also a personal friend of ours here at BIDA.

However, in 2011 he was one of the very early victims of, quote, cancel culture, end quote.

John Doe three worked for IBM Australia in their international Software Development laboratory in Australia in the 80s.

He worked in the area of Billing and Pricing Systems for IBM products.

As such, he was very well trained in how to develop systems that were balanced to the cent.

Systems that had very high levels of security and auditability associated with them.

IBM had an internal software development standard called Application Systems Controls and Auditability.

This standard was intended to reduce the possibility of fraud inside IBM.

There were many well known cases of fraud inside IBM because of the vast sums of money that were changing hands.

Those who wanted to commit fraud could do it most easily, with least risk of being caught, through the new computer systems that were being deployed.

Fraud prevention in computer systems was in it’s infancy in the 80s.

So, when this man came into the data warehouse area, he brought with him all the IBM ideas on how to implement controls and auditability into data warehouses and the ETL systems that populated them.

He also brought the idea that using the data warehouse for fraud detection was a very good use of this sort of archival technology.

Indeed, the richest man in Australia in the 90s, Kerry Packer, bought many copies of John Doe three’s ETL software, and had data warehouses built, for the specific reason of detecting and preventing fraud.

His son, James Packer, and business partner, Lachlan Murdoch, had just lost seven hundred million US dollars because of claimed fraud with the company they founded called OneTel.

John Doe three had helped Kerry Packer turn one of his companies around in 2000.

In that data warehouse project he had implemented fraud prevention and detection.

Having been told of this, Kerry Packer instructed all his remaining companies to implement something similar, to help prevent any further future fraud.

John Doe three sold a number of copies of his ETL software to these companies.

I wanted to make the point that by the year 2000, all such issues as security, auditability, fraud prevention, fraud detection, and rapid generation of ETL systems, were well and truly solved.

The question of how to archive data in a dimensional model was also solved.

However, because the Sybase IWS models were being sold for one hundred and fifty thousand dollars per copy, the technique for archiving data inside the dimensional model remained a closely guarded secret.

This was similar to how Metaphor had guarded the secrets of building dimensional models, until IBM retrenched them all in 1994.

Next.

By 2000, people were also building lots of Operational Data Stores.

Some people were accidentally building Operational Data Stores in the mid 90s, but they didn’t have a name for them.

These were the subject oriented and integrated databases we were implementing for call centers.

Call centers had become a big thing by the mid 90s, and they needed a database to access.

It looked “sort of like a data warehouse” only it ran transactions from the call center.

Bill Inmon finally gave this phenomenon a name, which was, the Operational Data Store.

So.

By 2000, people were building Operational Data Stores using third normal form.

People were building archival data stores, using Bills time variance plus stability analysis models.

And people were building dimensional models on top, because that’s what was best for the business users of data warehouses.

By 2000, the matter was settled.

Dimensional models had won the day, and dimensional models would be the standard for the presentation of data to business users, who queried data directly.

The three different modeling forms meant that the development of these different models was very expensive.

Price Waterhouse Coopers was loving all this because they could justify the expense.

A lot of those 1,500 consultants, at Price Waterhouse Coopers, were working on building these three different model types.

It was well understood that there was a very high level of overlap in the data held in these models.

Consider that, in high volume transaction, consumer businesses, the transactions formed the vast bulk of the data.

This data existed in the three different model types in the three different places.

It was obvious to everyone, that if some way could be found, to use dimensional modeling, in the archive layer, and the operational data store, then the cost of deployment would dramatically decrease.

One more point before we go on.

The man listed as John Doe number two, actually went to Australia at the invitation of the man listed as John Doe number three in 1994.

This was to work for an IBM Australia customer who had Metaphor installed.

The work was to design the prototype for the “next generation” dimensional data model for the IBM customer.

While there, John Doe number two trained John Doe number three on the basics of dimensional models and the ETL development for dimensional models.

John Doe number three was then left to figure the rest out for himself.

As a result of this working together, John Does two and three worked together for years and years to figure out if there was some way to archive data in dimensional models.

They never found one.

It was not until John Doe number three joined Sean Kelly’s team at Sybase in 2001, and did the class taught John Doe number one, that he discovered that this problem he had been trying to solve for five years had, indeed, been solved, by John Doe number one

Indeed, it had been solved about five years prior.

John Doe number three told me that when John Doe number one explained the, quote, entity profile, end quote, table in the class, he immediately understood that he had been presented with the solution he had been looking for, for five years.

On seeing this solution, John Doe three discussed the feature and it was revealed that, yes indeed, the Sybase IWS models could be used to implement operational data stores and archival layers of data warehouses.

By doing so, the overall cost of implementing an Operational Data Store, an archival layer, and dimensional layers, would be reduced by at least 50% when using the Sybase IWS models.

Further, the cost of maintenance and the costs of running these three sorts of processing would also be greatly reduced.

It was explained in the class that this ability was one of the main drivers of sales to large companies, especially telcos.

John Doe number three, being a salesman, realized that he was going to make a great deal of money selling, and then implementing, the Sybase IWS models.

He went on to do so.

Indeed, he later worked very closely with Sean Kelly at Sean Kelly and Associates when the “next generation” of data models was developed starting in 2006.

So, I have added the above section to make this point.

That point being.

By the time Dan came out in to the public with the ideas of Data Vault.

Two of the top five people in our business had already come up with how to build operational data stores, archival layers, and dimensional models, all with one modeling technique.

When John Doe three joined the Sybase team, he brought with him all the IBM standards and experience on how to implement security, auditability, and fraud detection, in to the IWS models.

John Doe three later updated the ETL software he wrote in Cobol in the 90s to C++ for the zeros.

He adapted his ETL software to support the Sybase IWS models to make implementing them even faster and cheaper.

So, by 2003, Sybase had, by some margin, the best data model offering in the marketplace.

They were selling them like hot cakes.

They were selling them to very large companies like telcos.

In fact, because Sean Kelly had gotten his start at the Irish Telecoms company, Telecoms was the number one segment for the Sybase IWS models.

There are those who say online such nonsense as “Data Value is the best way to implement a data warehouse with security, controls and auditability.”

This is nonsense.

My question back would be as follows.

“What do you think the rest of of the people were doing in the 90s and early 00s?

Implementing data warehouses without security, controls and auditability?”

The track record of the Sybase IWS models speaks for themselves.

Their track record also speaks for Sean Kelly and John Doe number one.

These models were very successful.

This was despite the fact that IT people mostly refuse to buy models because they think they can develop better models themselves.

The whole idea of Data Vault being so popular in IT circles, while there are no industry standard Data Vault data model offerings, makes the point that it is IT people who are excited about Data Vault.

Not the business people.

Next.

After Sybase was purchased by SAP in 2006, Sean Kelly and Associates embarked on developing Vertical Packaged Solutions 2.0.

They focused on telco models and sold two such models to telcos in the UK on Netezza.

Having proven the two point oh models worked on Netezza, and having gained two great reference accounts for the next generation models, they partnered closely with Netezza.

In partnership with Netezza Sean Kelly was talking with all the major telcos in Europe about the next generation of telco models.

This new generation of models would deliver greatly increased functionality at a cost unchallenged by IBM, Teradata and Oracle.

Unfortunately for Sean Kelly and Associates, IBM bought Netezza in 2010.

Not long after this, Sean Kelly became ill and he passed away in 2012.

John Doe number three had been “cancelled” in 2011.

So these 2.0 models were no longer being purchased by companies, even though they would have benefited from them.

 


 

Next.

I recently found out that the old IWS Models were slightly updated by SAP and made public around 2014.

They were still public around 2018.

However, the links on the public post in 2018 are now dead.

I can’t find anyone at SAP who has any comment on those old models.

Even so, the next generation of models that were developed by Sean Kelly and Associates contained many features which were a vast improvement on the Sybase IWS models.

Not the least of which is that they are maintained in spreadsheets and not a data modeling tool.

So.

Even though there is a lot of enthusiasm and excitement for Data Vault among IT people.

Those who are so enthusiastic and so excited, might just wonder what it was the rest of the industry has been doing all this time, if they have not been doing Data Vault.

They might wonder how some people have been able to so consistently deliver very large projects, for very large companies, at such low prices, and to give their customers such great return on investment.

All without a Data Vault in sight.

Our view is that Data Vault has it’s place in the world.

We will publish another blog post about that later.

Our view is that dimensional models will continue to dominate the data warehousing area of large business to consumer companies, that have very high transaction volumes, like telcos, retailers and banks.

In these areas, we will continue to see IBM and Teradata offer third normal form archives, plus dimensional models on top of these archives.

This modeling form has it’s benefits, and it’s issues, that we all fully understand.

BIDA will continue to rely on the design points laid down by “the worlds best data modeler” in the 90s.

If we see a better idea, we will integrate it in to what we are doing.

But, and this is a very big but.

We have only seen one very small design improvement in 25 years.

That was the design improvement of putting an integer key on the front of the “entity profile” in 2002.

We haven’t seen any new design technique improvement in these models since then.

We haven’t seen any dimensional model design technique improvement suggested by anyone else in that time either.

So we think we are on pretty solid ground.

In Summary.

There is a great deal of enthusiasm and excitement in the IT community about Data Vault today.

It is our opinion that this is somewhat mis-placed given that in the high volume, business to consumer segment, dimensional models will remain dominant for the foreseeable future.

Further, and we will cover this more in another blog post.

The future of data warehousing does not include IT people developing data warehousing data models for their own companies.

That will end.

You will not have data warehouse data modelers working inside companies building their data warehouse in the future, any more than you have IT people developing ERPs for their companies today.

The days of IT people developing data warehouses and BI solutions, especially the data model portion, internally for their companies, are numbered.

In the future, Data Warehouses data models, and end to end products, will be developed and implemented by companies like BIDA.

That is our prediction.

We are putting our money where our mouth is.

We will see if we am right in about five years time.

In finishing.

Thank you very much for your time and attention.

We really appreciate you dropping by to read our blogs.

 

Best Regards.

Mihai Neacsu.

Business Development Manager.

The BIDA Team.

Ask Us Anything

You are welcome to ask us any question you would like below on our contact form.

We will be emailing our subscribers useful and
valuable information regarding business intelligence.
Please subscribe here to be on our general emailing list.

BB0003 – How Much Does A DWH Cost?

0

Hello and welcome to our latest blog post here on BIDA Brains.

I am pleased you have come back to see our latest blog post.

Thank you.

Just to be clear in this opening.

This blog post was written by Peter Jones.

Peter is the Professional Services Manager here at BIDA.

In the past we put these blog posts through voice generation software.

Now we have decided that I will read the blog posts into our videos because I am the Business Development Manager here at BIDA.

I would love to get to know you if you are interested in what we have to say.

One of the reasons Peter wrote this series of blog posts is that he was talking with people on Linkedin about Data Vault.

He had personally known about Data Vault since about 1996 or 1997.

Something like that.

However, Data Vault has “suddenly” become “popular”.

So, he thought he would ask some people about Data Vault, and read some books about Data Vault 2.0.

Peter decided to do a number of blog posts on the topic of Data Vault.

He is creating these blog posts just to set the public record straight.

His comment is that he saw a lot of LinkedIn comments about Data Vault that are just ridiculous.

It’s almost like some people have taken on Data Vault as a religion.

By setting the public record straight, some people who are such enthusiasts about Data Vault might stop to wonder if they are barking up the right tree, as Australians say.

But in this first blog post he wanted to talk about the cost benefit to building data warehouses.

The linked in discussions are woefully inadequate in discussions on the benefits of data warehousing.

He will address some of the benefits of data warehousing on later blog posts.

In this blog post he just wanted to comment on the costs of building data warehouses.

So.

On with the subject of the blog post which is the cost of building a data warehouse.

While he was still on linked in Peter put up a post outlining the publicly known cost of building a data warehouse based on a version of data models similar to the Sybase Industry Warehouse Studio Models.

Peter invited those who are Data Vault enthusiasts to publish a publicly known case study of similar size and complexity.

There was no one willing to publish such numbers.

That should concern anyone considering a Data Vault implementation.

If your vendor is telling you.

“We really don’t know how much it costs to implement a data warehouse”.

Then you should be very concerned about buying from them.

Why?

If your vendor has experience in your industry segment, and has the appropriate suite of tools, they should be able to tell you pretty much exactly how much your data warehouse is going to cost you.

Don’t believe me?

I am aware Sean Kelly and Associates provided proposals that were priced as “variable capped price” projects.

Meaning, Sean would provide a maximum price the project would cost.

If he ran over?

He would eat the difference.

If he ran under?

He would give back the money saved.

Now, how could Sean Kelly do that more than 12 years ago?

Because Sean Kelly knew what he was doing.

That’s how.

If Sean Kelly was giving “variable capped price” projects for multi-billion dollar telcos 12 years ago and your vendor is not doing so today?

That should be a red flag to you.

Those people like Peter, who have been around thirty plus years, know how much it costs to build a data warehouse.

We can tell you exactly how much we will charge you.

And we will stick to it.

We have this down to an art form now.

Sure, we will include some risk if you want a fixed price.

But we won’t charge you one cent more than the fixed price.

Now, having said all that?

Please allow me to give you an example.

This is a public example that was released many years ago before Sean Kelly passed away.

The customer was Talk Talk in the UK.

It was owned by Carphone Warehouse at the time.

It was a land line telco with four million subscribers.

Talk Talk and Sean Kelly released a promotional video with Netezza.

The numbers for the project were presented at the Netezza Users Conference in 2009.

To set the scene, the billing system was Single View.

A widely used Telco Billing System even today.

The CRM was Chordiant.

A widely used CRM at the time.

The proposal was to move every field from Single View, Chordiant, and the Network Management System, into the Telco Data Warehouse Models that Sean Kelly was promoting.

This was reported to be about 4,000 fields.

The implementation date was set about a year out.

Interested vendors were asked to provide proposals.

Sean Kelly and Associates were less than 50% of the cost of the next least expensive vendor.

The overall project took 8 months, from SKA arriving on site, to going into production.

The piece I want to isolate is the piece where the data models provided by Sean Kelly were customized, and the ETL to populate the models, was written.

These two areas are the areas that bear best comparison between using dimensional models and using Data Vault models.

So that you can do an “apples” to “apples” comparison I want to be clear that what is being discussed is.

  1. Building the Staging to SKA Data Models ETL.
  2. The customization of the Models as part of the development of the ETL.

In a Sean Kelly project this work is done by one person who usually has a DBA 50% of the time to make database changes.

In this case the database was Netezza, which was much easier to use, so only a small amount of DBA time was used to solve specific Netezza problems.

Netezza is much easier to use than Oracle.

According to Sean Kelly, there were 75 dimension tables and 55 fact tables, delivered in the project.

The elapsed time for the customization of the models and writing of the ETL was 4 work months.

Now, Sean did point out that the guy who did the work also worked weekends and long nights.

It was the habit of Sean Kelly and Associates consultants to work very long hours.

It was part of the deal.

So, maybe, we could call it 5 work months for more reasonable hours.

Something like that.

So, let’s call that one hundred thousands pounds for the consulting time.

Twenty thousand pounds per month consulting fees sounds about right for 2009.

The data models were sold to Talk Talk for eighty thousand pounds.

So.

The cost for a very advanced dimensional data warehouse, for a three billion pounds sterling Telco, with 4 million subscribers, for the Billing System, CRM, System, and Network Management System, for a total of 4,000 data fields mapped, was approximately one hundred and eighty thousand British pounds.

Please remember this is not the total cost of the whole project.

There was more work done.

We are only talking about a specified specific portion.

Now.

Those of you who have been around data warehousing for a long time know that the number quoted is stunning.

Further, that was 13 years ago.

We can do that same amount of work much faster now.

Because we are based in Romania, we can also do it much cheaper than the twenty thousand pounds per month consulting rate we have estimated.

To be clear.

We are not selling projects as described above.

We don’t generally do custom build data warehouses any more.

But if we were to do such a project, it would be much cheaper than one hundred and eighty thousands pounds for this portion of the project.

The question that this post is putting in the public is this.

What is the quote from someone who is an “experienced Data Vault consultant”?

We don’t know.

We asked the question in public and we didn’t get an answer.

That, in and of itself, raises questions for us.

Any consulting company, who knows what they are doing, can tell you how much time they would ask to build a data warehouse of similar size and complexity.

In fact, as long ago as the late 90s Peter and his colleagues standardized on 1 work month per 1,000 fields being mapped to the data warehouse.

They were able to hit that mark pretty closely for 20 years.

They saw no reason to go any faster, because at 1,000 fields mapped per work month, they were already much faster than everyone else and so they could win deals.

The first project, that I am aware of, that hit that magic 1,000 fields mapped per work month, happened in 1997.

Today, we have achieved rates of up to 400 fields mapped per day on a good day.

And less than 100 fields mapped per day on a tougher day.

When we are working on complex, compound, measures tables, those numbers are not applicable of course.

But there are generally only a small number of very complex fact tables.

Today we are in the region of six thousand to eight thousand fields mapped per work month on a 1.0 data warehouse.

Today the bigger problem is learning the database.

So, this is what we wanted to put into the public for the consideration of whoever wants to read the blog post.

We wanted to raise the warning.

If your proposed consultants are not very sure about how long the construction of the underlying data warehouse will take?

You should be very wary.

That includes consultants who are selling you dimensional models as well as data vault models.

If the consultant selling you the project knows his stuff?

He can tell you how much it will cost, how long it will take, and he will have a large amount of collateral, in tools and technologies, to prove to you that he knows what he is talking about.

If you buy from a consulting firm that is very hesitant about how much they might charge you?

Then at least you were warned.

 

One last point before I close this blog post.

If you have an approved budget for a data warehouse build, or replacement, and you have a proposal that you have decided you want to go with?

Then we would be pleased to review it for you, under a Non Disclosure Agreement, and give you our opinion.

We will do this for free because we would rather see successful projects than unsuccessful projects.

It will not be an in depth review.

It will be a review where we will tell you if the project is doomed or likely to succeed.

As simple as that.

From our experience we can usually tell if a project will fail with less than 10 minutes spent reading the proposal.

A doomed project can usually be spotted just from the proposed project plan.

If the project plan makes sense, then it can take an hour or two to confirm the person who wrote the proposal knows what they are doing.

But it is never necessary to read a proposal for 3 hours to know if the person writing it knows what they are doing.

So, for free, we are willing to give you our opinion on whether the proposal is likely to be successful or not.

What you do with our opinion is up to you.

We can’t do much more than that.

As I said.

We are not in the business of building custom data warehouses any more.

We might do some under some circumstances.

But we are building data warehouses as a product now.

 

In finishing.

Thank you very much for your time and attention.

We really appreciate you dropping by to read our blogs.

 

Best Regards.

Mihai Neacsu.

Business Development Manager.

The BIDA Team.

Ask Us Anything

You are welcome to ask us any question you would like below on our contact form.

We will be emailing our subscribers useful and
valuable information regarding business intelligence.
Please subscribe here to be on our general emailing list.

BB0002 – Introducing The BIDA Brains Blogging Series

0

Hello and welcome to our latest blog post here on BIDA Brains.

We are pleased you have come back to see our latest blog post.

Thank you.

We would like to provide a small introduction to our more technical blog posts here on BIDA Brains.

In recent times, our Professional Services Manager, Peter Jones, created a linked in profile.

On browsing around linked in he felt he could not ignore some of the comments being made.

So, he made some public comments on linked in with respect to business intelligence.

Peter has not used linked in much because he feels it is not a suitable platform for promoting our services.

He much prefers email, and blog posts, as the best way of presenting our white papers and videos to the public.

Even so.

As usual, his profile was blocked.

One of the points he made on his new profile was that honesty is punished in business today.

Linkedin blocking him is just one more example of honesty being punished in business today.

His profile was blocked despite the fact he used his BIDA office 365 account to create his linked in profile.

Apparently, the marketing geniuses over at linked in think it is a good idea to block the profile of a man who is a paying customer of their parent company.

So, Peter will write some blog posts more directed to the linked in technical audience.

We will post those blog posts here on BIDA Brains.

Basically, there is so much bad information out on Linkedin, Peter will use the BIDA Brains blog posts to set the record straight on some of the comments being made on linked in.

This is just the introduction blog of the following blog posts.

Under each blog post you will find 3 things.

  1. A Contact Us Button
  2. An Ask Us Anything Form – We will respond to questions asked if we can
  3. A Newsletter form if you would like to be emailed links to new blog posts

The best way to make sure you hear our latest news is to be on our Newsletter email list.

Thank you very much for your time and attention.

Best Regards

Mihai Neacsu
BIDA Business Development Manager

Ask Us Anything

You are welcome to ask us any question you would like below on our contact form.

We will be emailing our subscribers useful and
valuable information regarding business intelligence.
Please subscribe here to be on our general emailing list.