Monthly Archives: March 2005

Open Source OLAP

Every month I review the web traffic reports for my blog, and I’ve always found something rather interesting. Even though I post more information about Oracle and OWB than any other subject, Google seems to send me more traffic from queries like “open source ETL” and “open source OLAP.” You know what they say, customer is king and you gotta give ’em what they want!

In other words, all I needed was just a teentsee weentsee bit of an excuse to take some time to really kick the tires of the open source OLAP server Mondrian.

Some basics… Mondrian is an open source OLAP server, written in Java. It implements an MDX engine, and also exposes an XML/A interface to clients. Mondrian uses a ROLAP architecture, and ends up issuing SQL statements to a JDBC data source to retrieve and calculate. Mondrian works with Access, MySQL, postgres, and Oracle. Refer to the Mondrian architecture pages to get some more information about the architecture.

My overall impressions were positive; it’s a good core set of functionality and performs rather well. Like any Open Source project it is an alphabet soup of supporting libraries, environment variables, generators, frameworks, and takes more than the usual 10 minute commercial product install. We’re not building a kernel here, but it’s not trivial to get the examples up and running.

Mondrian works closely (and appropriately) with an open source implementation of a JSP based Pivot and Charting project, JPivot. The demo for Mondrian includes an example with JPivot querying Mondrian and uses the well known Food Mart demo. I was expecting a bit less from JPivot and was pleasantly surprised that it’s actually rather functional (not commercial product easy to use, but really quite commendable).

JPivot allows for drilling down hierarchies (I don’t think you can use multiple hierarchies) and Pivoting and exhanging the columns and rows. It has a “CUBE” editor that allows you to edit the report. It’s not drag and drop, but definitely works if you “grok” the interface.

Also pleasantly surprising was some pretty decent charting capabilities.

There are some decent selections of charts

In order to provide an OLAP view of your data you have to define some metadata about your Dimensional model (Cubes, Measures, Dimensions) and how they map to your underlying Relational Schema. Check out the samples on the Mondrian site to see how to write your own schema.

Couple of interesting things to point out, Mondrian implements a cache of “relations” used to increase performance. This is interesting because of consistency questions (some fragments are cached, but others are current) but also because it is WICKED fast once it’s loaded into memory. There are some interesting possibilities here, including some work with some distributed P2P OLAP distributed caching research.

DISCOVERER MEETS DUKE NUKEM

Is there any better way to spend a Monday than to build a 3D Dashboard (using some sample reports from my upcoming OWB course)?

You can obviously see I’m not describing 3D report effects in the context of a web based reporting application, but rather a 3D environment that has the actual dashboard content displayed in 3D space. I had been giving a lot of thought recently to the constant craving of executives for BI dashboards. In fact I think some organizations might consider implementing such a solution because of the following two realities:

  1. Executives love web based dashboards, but what they really want is to have the NASDAQ MarketSite at their office. Screen upon screen of everything they need to know to make great decisions. Building out hundreds of LCD panels might not fly with shareholders though, and I don’t blame them as they are very expensive. With this sort of interface, one can still get the “control room” metaphor without the expense.
  2. The generation of “video game playing and we’ve been using computers since 2nd grade” are now being tapped into the highest C-level positions. It’s not uncommon for an executive to know how to play a little DOOM (even if it was more than a few years back). In other words, there are a significant number of executives that would not feel “uncomfortable” in front of a multiplayer game interface.

However, there are SIGNIFICANT limitations to a 3D space, not the least of which is the ability to handle 2D information quickly and effectively (a la web pages/actual applications). When I began the exercise I was certain that there would be no use in the project unless it could integrate back with EXISTING systems. There is no way one would be able to work effectively entirely in a 3D work, so it would have to play nicely with the current applications. I’ve done a very small integration with Oracle Discoverer where by clicking on one of the dashboard 3D objects you see Discoverer Viewer drilled to the sheet in question alongside the 3D world. Without something like this, where people could get to the “real work”, I think a 3D dashboard is pretty much a nonstarter.

One could take the metaphor further, and have virtual meetings in the control room. Here I’ve pictured the dashboard from the “3rd” person view instead of the “1st” person view. Consider a situation where the District Sales Manager calls the Senior VP of Sales and asks to “show him” some area of concern. He could literally take the SVP/Sales to the dashboard area and they could sift through the data together.

I sort of haphazardly found this 3D software, as I have a customer that used it (ActiveWorlds) as a prototype for a 3D community so their web site visitors could explore articles on their site, chat with each other, play 3D games, etc. While I was at that customer site I helped them integrate some “external media” into the 3D product, and found it rather straightforward and easy. I don’t mean to offend but I found the communities who play 3D games a bit strange and having little to no interest in the external application of their technologies. I found a myriad of resources on the legend and lore of ficticious 3D worlds (and rivalries between them) but few resources on how to “drill through with URL parameters.”

I’ve attached some screenshots in this article, and there is also a 12MB AVI file that demos the solution. I built it on a corporate system so I can’t offer up access to the outside world at this time. However, I greatly value the community traffic and opinions so if there’s enough interest perhaps I’ll see about “putting it up.” Email me if you’d like to actually play with it and I’ll see what I can do.
UPDATE: There were some issues with the CODEC for the AVI. If you had problems previously, feel free to try again.

ORACLE WAREHOUSE BUILDER : WORKSHOP I

UPDATE : The next workshop will be held July 19th ! Classes are limited to 8 seats (very hands on) Email me to get registered…

Some exerpts from workshop participants:

From my viewpoint, (considerable prior training and preliminary knowledge of data warehousing concepts in general) I found the class to be exactly what I needed. I needed a step-by-step, how-do-I-accomplish-the-tasks-in-Oracle kind of course. Without this course, it would have taken me weeks of time I don’t have, to be able to put together a prototype to show the other teammembers I work with the value of the tools. Now I feel confident I can put something together fairly rapidly, using the project database and real problems we are facing. I needed the hands-on approach to gain the confidence in the toolset. Now, if I can get it loaded on the server in short order, I should be fairly competent with the tool by the time the next release comes out, which will have the full set of capabilities the project needs.


I learned so much in a day that I never found on the OWB documentation. I never thought we can do so much in mapping.

I’ve developed a one day workshop to help those interested in how to get up and running QUICKLY with OWB. If any blog readers are in the greater Seattle area (or have always need an excuse for a weekend here) consider taking this course. Email me directly if you’re interested, and I can help get you registered for the workshop at the PSOUG lab on Mercer Island. Note: the PSOUG has not posted the event on calendar yet, but it’s most certainly scheduled for MARCH 26, 2005

ORACLE WAREHOUSE BUILDER : WORKSHOP I

This course teaches students how to install OWB into their database, and begin building Data Objects and ETL Mappings using OWB. Students will learn the basics of OWB architecture, ETL development, and executing their ETL. The students are required to install their own software and configure their own database to ensure they can leave the workshop with all the requisite skills to continue their OWB education at home, or in Workshop II. Students will be able to WOW their colleagues with their new found OWB skills building quick, optimized solutions to common ETL problems.

Course Objectives
After completing the course, students should:

  • Understand OWB architecture, including the distinction between Design Objects and Runtime Objects (and repositories!)
  • Be able to install OWB onto a database server
  • Install and configure design repositories, runtime repositories, and target schemas
  • Be able to retrieve metadata (table definitions, constraints, etc) and actually source data from remote Oracle databases
  • Be able to create and deploy Tables, Sequences, Cubes, and Dimensions.
  • Be able to create and deploy ETL mappings using multi table joins and custom transformation logic
  • Be able to sample, and load data from flat files for use in OWB
  • Be able to execute and troubleshoot ETL mappings built in OWB
  • Load a star-schema that demonstrates the immense benefit of ETL/dimensional modeling

Text and Materials
All materials are supplied to students at the beginning of each class. The class textbook will be optionally available for purchase from PSOUG at a substantial discount.

Class Format
Classes consist of approximately 1.5 hours of lecture supported by slides and whiteboarding. The balance of the day, approximately 6.5 hours, is hands-on. Each students works on their own Linux server.

Prerequisites
To be successful students must be comfortable with basic Oracle administration. The following is a short list of the suggested but not required skills:

  • Understand what schemas are, how they relate to users
  • Understand basic DDL (create table, alter table, create sequence, create or replace package) and DML (select, insert, update)
  • Know how to connect to Oracle using both Localnames (tnsnames.ora) and JDBC (host/port/SID)
  • Students will benefit most when they have a basic understanding of dimensional modeling (Cubes/Facts/StarSchemas)

RITTMAN ON OWB TUNING

It is likely that most readers of the bayon blog are also readers of rittman.net. However, just in case someone missed his posts on tuning OWB Performance Tuning Framework I highly recommend that you read/bookmark them. For the casual OWB user this might not be necessary; developers who deal either with a great many mappings or mappings that process significant data volumes this is a “tool” to keep in your toolbox.

Mark is on to a great method; the application of battle tested tuning techniques to an often used but rarely fully understood tool. I’m not sure how much time I’ll have over the next few weeks, but if I do perhaps I’ll throw my oar in as well. Things I can think of to contribute to the community dovetailing on Mark’s work include:

  • Unify the approach of monitoring the Clock Time and drill down to the performance data Mark has retrieved. ie, build a set of reports (or even a small data mart!) that consumes the public runtime performance views in the OWB runtime repository. I’m a big fan of quantifying the benefit of a tuning excercise. Perhaps a way to track which processes/mappings are slowing your loads, elapsed/time versus volume (ETL throughput), etc. Consume and understand the basic metrics, then drill down on a particular mapping into the information that Mark has developed.
  • An OMBPlus implementation of the “on/off” trace flag Mark refers to :The way of enabling tracing should either be built in by default, or easily added by the OWB developer. Ideally it should be simple to switch it on or off (perhaps levels of event 10046 tracing?). I bet one can create a script that adds/removes the pre/most mapping process that Mark has added manually to the mapping. In this method we could programmatically turn tracing on for a defined list (from one to entire repository) of OWB mappings. The logic to set the key for the process flow operators using OMBPlus might be a wee bit more difficult (would have to iterate through operators in a process and possibly make changes). The biggest drawback, and I don’t see a clear solution at this point, is how to change tracing at RUNTIME instead of deployment. Making the changes via OMB requires a redeployment but it would be most beneficial if tracing was in the code, and changed at runtime.

A great start of the conversation that’s been long overdue… Perhaps a small OWB coming of age? The technology is now used widely enough to warrant some of it’s own methods, patterns, and strategies for not just building but rather building well.