BI From the Trenches – Real World solutions for Real World Problems

Dejan Sarka & Davide Mauri

Pre-conference day seminar: Monday, September 26th 2011, 09-17h

Register for the seminar

Seminar level: 400

Description:

Basic knowledge about Business Intelligence is nowadays quite spread around the world. However, there are still many advanced problems that might lead to an unsuccessful project. In this seminar, we are going to present solutions to some common problems. For some attendees, the seminar could give ideas for the areas that they did not expect BI could be used in. For some, the seminar could just help them in their concrete work. All of the modules of the seminar are based on real-world projects that the speakers conducted. Basic understanding of Microsoft SQL Server BI suite is welcome.

Module 1: Fraud Detection
One of the most popular uses of Data Mining is fraud detection. Card issuers, banks, insurances and other financial institutes are always interested to detect frauds in advance, before customers do. In this session, we are going to discuss how to implement a fraud detection solution. You will learn how to prepare data, which algorithms to use, and how to measure results. In addition, you will see how with help of Data Mining Add-Ins, even Excel 2010 can become a powerful tool for fraud detection.

Module 2: Temporal Snapshots
You are designing a BI Solution and your customer ask you to keep a snapshot of the status of all their documents (orders, insurances, contracts, bills…whatever the word “document” may mean) for all the days of the year. They have millions of documents and they want to have in their Data Warehouse all the data they have gathered right from the very first operating day.
If you have 1 million of documents (on average) and you have to keep a snapshot of them for each one of the 365 days in a year, and you have 10 year of history, you’re going to have a 3 billion table just to start with. That’s a very big and challenging number, and you may have not the option to buy a Parallel Data Warehouse.
In this session, we’ll see how we can turn the usual snapshot tables into temporal table so that we can store time intervals in order to avoid data duplication, while keeping the Data Warehouse design usable by Analysis Services (that doesn’t know what an interval is) and optimizing it to have very good performance even on standard hardware.
The explained technique is a result of several month of research and has been applied to the Data Warehouse of an insurance company where we had to deal with two times the number said before.

Module 3: Logging and Monitoring
Have you ever been struck by the lack of detailed logging available within SSIS packages? If the answer to this question is yes, then you are probably a seasoned Business Intelligence solutions builder. The packages that make up your solutions probably contain extra code to facilitate effective monitoring, control and profileing. However, increasingly, SSIS is being used to unify data found on disparate yet related transactional systems. The batch processes that perform this activity thus become a vital part of the information processing ecosystem. These need to be monitored and profiled in the same way as formal BI solutions but often the resources available are much more meager. What do you do when something goes wrong and the functionality of the packages is not transparent via internal state reporting. This session will introduce you to an improved version of DTExec that can reveal the internal activity of packages without having to make any changes to the packages themselves.

Module 4: Row-Level Security
Row-level security is always a problem. In OLTP applications, a typical solution involves views. However, what can you do in a data warehouse, with even more complex problem, when the schema changes? How do you create dozens of SSRS reports which enforce row-level security per user? And how can you do row-level filtering in SSAS cube?

Module 5: Data Profiling
The data is the key asset of a company. Companies want to gather information from the data; therefore, they start BI projects. However, most of BI projects have to deal with problems with data quality. Data quality can be a huge obstacle for a successful BI project. In this session, we are going to do a brief theoretical introduction to data quality issues. After that, we are going to talk how you can use tools you get with SQL Server 2008 in order to measure data quality. We are going to show some advanced T-SQL queries, usage of Integration Services, UDM cubes and Data Mining for checking the accuracy, completeness, consistency and descriptive statistics of your data.

Register for the seminar

Comments are closed.