Monthly Archives: October 2006

Trend Lines in Mondrian

I often mouth off on the importance and power of getting your data into a star-schema and Mondrian; the power you have to respond to time variant and analytic needs of your users is immense.  In the next few weeks I’ll cover more about these powers in a more concrete form, showing specific examples instead of just alluding to them.

Starting with a relatively straightforward implementation of a Trend Line.  Traditionally a trend line is built using a good old fashion linear regression on a set of data and then used to calculated current and future X and Y coordinates.  This usually involves some knowledge about building the linear regression formula, and then calculating points based on it.  Fortunately for us, we can skip most of this tedious process and just use an MDX function, LinRegPoint, to sort out most of that difficulty and we’ll just enjoy a beautiful trend line on our graph.

Let’s start with the output, so it’s clear what we’re talking about:

The RED is the data set, and the BLUE is the trend line we’ve built using MDX.

The only thing you need to run through this tip is the Pentaho Demo download, available at http://www.pentaho.org/download/latest.  I used 1.2RC2 for this example, but it should work on versions more recent than that.  It’s zero installation and starts up with everything you need for this tip.

Start up pentaho (start-pentaho.bat) and hop into your web browser (http://localhost:8080).  Navigate to the “Samples” section and into “Steel Wheels.”  Steel Wheels is an example we’re shipping with the demo installation now which provides some great time variant data examples (needed to do interesting things with OLAP).  Steel Wheels data is the sample data provided by the BIRT folks at Eclipse, actually.

Navigate to the Analysis folder, and then to “01. Territory Analysis by Year.”  It doesn’t really matter which one, we just need to get into JPivot on our Steel Wheels cube. 

Click on the MDX button and paste the following MDX fragment to get a base “sales view” and hit Apply:

select {[Measures].[Sales]} ON COLUMNS,
  {[Time].[Months].Members} ON ROWS
from [SteelWheelsSales]
where [Customers].[All Customers]

You should get a result that looks like this:

Ok… Now it’s time to build our Calculated Member.  This is kind of hairy: it requires some technical prowess to get the MDX calculation correct.  Just remember, once you’ve got the calcuation working properly you can include it as part of the Cube so your business users (using JPivot or Pentaho Spreadsheet Services) don’t see that complexity.

We’re going to use an MDX function, named LinRegPoint (reference link). I think the best online tutorial was done by Mosha Pasumanksy in his blog entitled “Using Linear Regression MDX functions for forecasting”  I used his tutorial to help build the regression below!  I won’t get into the details of linear regressions; you can read the reference or do some other googling for Linear Regressions. 

Basically, you rank Time to get straight numbers (X coordinates: 1,2,3,4,5), use your measure Sales as your value to regress (Y coordinates: 129754, 140836, …) and then you get it the ranked time as INPUT to your Linear Regression (which time is this) and it CALCULATES the Y output based on the Linear Regression it’s built.

Our LinRegPoint MDX formula comes down to:

LinRegPoint(
  Rank(
     [Time].CurrentMember,
     [Time].CurrentMember.Level.Members),
  {[Time].CurrentMember.Level.Members},
   [Measures].[Sales],
   Rank(
       [Time].CurrentMember,
       [Time].CurrentMember.Level.Members)
)

Enter the following MDX Fragment into the MDX editor to see the results of the Linear regression on Steel Wheels.

with member [Measures].[Line] as
‘LinRegPoint(Rank([Time].CurrentMember,
[Time].CurrentMember.Level.Members),
{[Time].CurrentMember.Level.Members}, [Measures].[Sales],
Rank([Time].CurrentMember, [Time].CurrentMember.Level.Members))’
select Crossjoin({[Markets].[All Markets]}, {[Measures].[Sales], [Measures].[Line]}) ON COLUMNS,
  {[Time].[Months].Members} ON ROWS
from [SteelWheelsSales]
where [Customers].[All Customers]

And you should see the following output:

Note: if you want to see the graph on the right, change the chart settings (icons at top of page) to be a Horizontal Line chart, Width = 300 and Height = 600.

The steel wheels only has data extending to [2005].[May].  If we had “time” members extending beyond our data set the line would extend to the future.  Careful; a simple linear regression is not best practice for doing forecasting on MANY things.  However, business users like to see the overall trend, and slope.

Was this helpful?  What would you like to see next?  Rolling Averages?  It’s VERY IMPORTANT to note that in most circumstances “MDX examples” for Microsoft Analysis Services works with Pentaho.  There’s a dirth BUNCH of articles about MDX on MSAS… That’s a wealth of tutorials that apply to your work with Pentaho.

BI Documenter looks cool

One of the things I love to do, is connect with fellow bloggers and chat over mutual interests.  It sounds silly, since there are like 100 million blogs or something, but there’s a certain camaraderie and shared identification amoungst bloggers.  I’ve always had fantastic meetups from the UK to OZ to the good ole USA.

While in Sydney I had the good fortune to connect with the folks behind the product, BI Documenter.  John, Richard, and Cyril are the principals involved in building what looks to be a sweet little product.  It’s a great concept:  collect and process the metadata from SQL Server, MSAS, Integration Services, Reporting Services and build some useful documentation about your BI Solution.

They have a live set of these docs here: http://www.bidocumenter.com/Sample/Index.htm
 but here’s a snapshot of the main table of contents once its done its slurping.

How many times have we wanted shoot ourselves when the business users ask some simple, straightforward questions about their reports because we’ve explained it about one hundred times.  Does net sales take into account product returns?   If you’re a MSFT (grumble) solution, you don’t have to answer these questions again and again and again and again.  You can provide documentation that will provide these answers for your users.  From what I saw, the navigation will be very familiar and similar to a “Help” system.

I can’t tell you how many more times the $$ of their license I would pay to have this for past Oracle engagements.  If you’re a MSFT shop, definitely check them out.  They even have a FREE VERSION which works on just one data source (SQL Server).

Disclosure:  I haven’t used it, and probably won’t because my day to day life keeps me in non-MSFT BI land.

Sales Percent increase month to month, qtr to qtr

This is a common situation:  Don’t show me what my total sales figures were month after month, show me something that describes something important to my business.  ie, Sales Growth

Chris Webb, who runs a wildly popular MSFT blog in addition to being an in demand independent consultant, wrote an article on Previous Period Growth using Pentaho.  Mondrian (Pentaho Analysis Server) uses MDX, a powerful expressive multidimensional query language which Chris is one of the leading experts on its practical use and applications.

Chris outlines how to build a “custom” calculated measure that displays the Sales Previous Period Growth:

All you need is the zero install pentaho demo installation to run through his tech tip, available at http://www.pentaho.org/download/latest.php

Remember, this isn’t trivial (ie, writing MDX fragments) but it’s VERY VERY powerful.  Check out the Mondrian MDX reference here for some of the powerful analytic calculations available.  Remember, once you’ve got your MDX member working properly HIDE that complexity from your users by adding it to the Mondrian OLAP schema definition.

Sydney Training and Community Feedback

I had the recent good fortune of traveling to Sydney to deliver a “much sought after” scheduling of our “Building Analytic Solutions with Pentaho” class.  We did little advertising but it was packed (12 people, the maximum we ever do for public classes).

I love doing training courses for more advanced topics, like the Analytic solutions course.  I love it because it’s a chance to converse with other practitioners and share knowledge, experience, and war stories.  These experiences, and the camaraderie is invaluable when one tends to be the “lesser known” topics at an organization.  It’s GREAT to hear about open source adoption in the enterprise; stories of countless millions being saved, people feeling empowered to make their infrastructure and applications what THEY want instead of what their VENDORS want.  It’s just nice to connect with people of similar interests.

It’s also a chance to hear some validation for strong points and deficiencies in Pentaho’s open source strategy.  I have my own opinions, as someone who uses the software day in day out on real customer problems.  It’s great to hear that others either feel the same way or disagree; because that’s the nature of this community driven process.  It doesn’t really matter what I think the product should be like (I work for the vendor right?) it matters what customers and community want.  I think feature X is awful, doesn’t work properly and is total crap.  OK.  If community members find it entirely suitable for their needs, and say “Go work on feature Y” then that’s PERFECT.

This is the most effecient part of open source:  The closer you are to your customer, the closer you are to your market, the closer you are to the pain or joy, the more likely you are to make better product.  Cutting out the middle men (in many cases, account managers and product managers and development managers, etc).

Thank you, Sydney trainees for sharing your praises and criticisms.  I’ll bring them to those that can actually do something about it (ie, Java Jockeys). 

PS – Based on the training people like more of our product than dislike AND I was right about Feature X.  🙂

Web Analytics and Maturing Partner Offerings

Our good friends at BreadboardBI have just released a solution to provide a common web analytics and reports.  While that, in and of itself, isn’t that earth shattering because there’s several FOSS projects that do this, what IS compelling is that it’s a Pentaho solution.  This picks up where the others leave off; the ability to build your own custom reports, extend the solution with another dimension or fact.  Add some of your own views, deliver the reports via email, etc. 

Check out the project and some of the features at BreadboardBI and sourceforge.  Here are some screenshots from their application; there’s some cool stuff in there!

Including a bunch of “OLAP views” so that users can filter, slice and dice, and search for information on their own.

I think we’re going to see more of these “solutions” pop up over time.  We just released the Software Quality for Bugzilla two weeks back, BreadBoardBI just released their Web Analytics project, OpenBI have a set of templates “OpenQuick Suite” they use for consulting gigs, Proratio with their SAP Connector, etc.  I’ll venture to say that our growing partner base is maturing in the sophistication of their services; with services partners exceeding “pure play” consulting and including some solutions and rapid starts that help deliver even MORE value on top of Pentaho. 

Software Quality Reports for Bugzilla

I’ve been working, on and off, for the past few months on a solution that really pulls together most of the major functions of our platform into an entire solution.  The SQR uses a little bit of the entire Pentaho Platform including Action Sequences, Kettle ETL, database structure initialization, Mondrian OLAP definitions, summary tables, JFreeReports, Pentaho Analysis views, user prompting, custom report rollups in Excel, etc.  It looks, feels, and operates as an entire solution, soup to nuts, running on Pentaho.

The SQR doesn’t aim to replace reports provided with Bugzilla.  Bugzilla is a good database schema for running an application (ie, Bugzilla) but it’s sometimes difficult if not impossible to ask some important analytic questions.  Questions such as:

and “Open vs Closed with a trend over time”

and the ability to build some of your own dashboards

The solution comes with sample data, provided graciously by landfill.bugzilla.org, and a bunch of sample reports, etc.  Over the coming months I’ll cover bits and pieces of the solution of which there are some great “how to” gems in there on how to roll out an entire, integrated solution on Pentaho.

If you use Bugzilla, definitely download and check out the solution; you may find some very useful reports and insight into your engineering process and software quality.

If you use Pentaho, stay tuned to this blog.  I’ll cover some of the “Pentaho” specific stuff using it as a standard downloadable reference.

Let me know what you think!