Archive

Archive for the ‘Pentaho’ Category

Mondrian: OLAP power at your fingertips

June 8th, 2006

Pentaho Analysis (aka Mondrian) is a powerful OLAP tool that can wrap just about any JDBC datasource and present it multidimensionally.  There are many resources that cover how to get Mondrian connected to a database, some of the ins and outs of the the Mondrian.xml schema definition, etc.  Comment below if you’re also interested in seeing a tutorial here in that regard.

Connecting from MDX to usefullness isn’t always the easiest… How powerful is MDX?  Why would I do something in MDX versus SQL?  Why not build another measure in my fact table to support that query instead of MDX?  Well, I won’t belabor the benefits of MDX and the expressiveness of the language - there are plenty of great books and resources in that regard.

Suffice to say that it makes certain analytic queries (Year to Date, % contribution to total, this versus period prior) easy… sometimes embarrassingly easy compared to what we used to do in pure "star schema relational and SQL group by" land.

Let’s build a custom MDX measure and then in a followup I’ll show how provide this to your end users in Mondrian without this complexity.

  1. Download the latest pre configured installation (aka DEMO) of Pentaho here:
    http://www.pentaho.org/download/latest.html
  2. Unzip it into a directory on your local machine (we’ll refer to this as $PROOT from here on out).
  3. Double click on "$PROOT/pentaho-demo/start-pentaho.bat."  When you see the message "Pentaho BI Platform Ready" the platform is ready, about 2 minutes.
  4. Open up firefox (or IE) and go to the following URL
    http://localhost:8080, and then click on Samples and Examples
  5. Find Analysis examples, and click on that

    and then click on
  6. You are now viewing the JPivot application that allows you to navigate and build OLAP reports in a web browser.
  7. (Optional Step) Feel free to explore the JPivot interface at your own leisure, a great place to start is the CUBE NAVIGATOR() which gives a pretty good graphical way of navigating OLAP cubes:  Make changes, and then click on OK to make the crosstab report below change.
  8. Click on MDX to open up the MDX Editor Window
  9. Paste the following MDX into the text area and then hit APPLY:

    select NON EMPTY {[Measures].[Actual]} ON COLUMNS,
    NON EMPTY Hierarchize(Union({[Positions].[All Positions]}, [Positions].[All Positions].Children)) ON ROWS
    from [Quadrant Analysis]

    You should see the report look like this:

  10. What we’re seeing here is the rollup of all of our head count (ie, salary expenditures) by position.  The aggregate figure is nice, for the bean counters, but of interest to the analyst and executives is the "that is that in proportion to my entire enterprise?"  In other words, 1,211,073.00 on HR Training personnel is not as interesting as what % I spend on HR Training personnel as part of the total.  Let’s open paste the following calculated member into the MDX window ABOVE the current MDX statement:
    with member [Measures].[Position Percent of Total] as ‘([Measures].[Actual] / ([Measures].[Actual], [Positions].CurrentMember.Parent))’, format_string = "|#.00%|"

    Click APPLY and then Close the MDX editor (red X or the MDX button again).

    What we’ve done here is built a calculated measure that using MDX.  MDX is very powerful so I suggest you check out some MDX books or resources on the net to explore its capabilities.

  11. Now we need to add our brand new calculated measure to our report.  Click on the cube navigator, then "Measures" then highlight the new Measure we just created:

    You have to click OK twice here to get back to the report.
  12. If all has gone well, you should see the following report with your custom MDX measure:

Well, that’s great for the techies and hard core business analysts, but we’d never want to have to have an end user to have to figure all that out.  How do we buffer the user from that complexity?  That will be tomorrows blog entry on how to make a CalculatedMember in Mondrian. 

Bonus Screenshot of a pie chart in two clicks now based on our percentage:

How To

Reporting in 60 seconds with Oracle XE

June 1st, 2006

Ok, well maybe not 60 seconds but more like 5 minutes.

Some great news today from Pentaho.  Pentaho has purchased proprietary software for visually building reports and donated it to the open source community.  The full text of the announcement is here:

 More on this in a later post… Let’s get to the bits.

Much of what I’ve been hearing in the marketplace is the perception that Open Source has hidden costs in terms of usability and the technical savvy needed of users. 

Let us try and challenge this assumption and see how "productive" we can be building a simple database driven report.

Download the Report Designer here and unzip it to a directory of your choosing.  There’s zero installation required (you’ll need Java on your computer, but that’s pretty prevalent these days).  Double click on the bat file (or .sh if you’re using linux).

When the product launches you’ll notice a panel that allows you drag and drop elements onto the canvas in an easy to use fashion.  I’m sure additional tutorials will come on this blog and Pentaho.org but suffice to say, this is an interface which is easy enough to just "figure out" by playing with it.

The "RED" circle is our palette and the "BLUE" area is our canvas.

Let’s get going and start our report!

  1. Launch the Report Wizard
  2. Select the option that indicates that you will use your own dataset, a JDBC dataset
  3. Enter in the connection information and driver location for your database.  I keep the latest Oracle Express edition (a Free version of the Oracle database) around.
    Note:  You don’t have to do anything special with the JDBC driver (put it in a lib directory or anything).  All you have to do is show the wizard where the .jar file is.

    I’m using the HR schema which comes with the Oracle database for testing/sample purposes.
  4. Navigate to find the "EMPLOYEES" and "DEPARTMENT" tables.  Enter in a simple SQL query that contains the data of interest and use the very handy Preview Option to make sure you’re getting the results desired.

    Feel free to just copy and paste if you are using XE as well:
    select
    d.department_name "Department Name",
    e.first_name "First Name",
    e.last_name "Last Name",
    e.phone_number "Phone Number"
    from
    employees e, departments d
    order by d.department_name
  5. You can just click Next on the visible feilds screen:
  6. Highlight "Department Name" and add it to the grouping and hit "Finish"
  7. It should return you to the original canvas where you can see the design of your report.  From here you can start to add images, lines, change fonts, etc.  Have fun exploring the richness of the options available here.
  8. Let’s see what it looks like!  Click Preview and page through your report!

That’s it!  You’re done!  You’ve created a report that can be parameterized, hosted, bursted, distributed in just a few minutes.  In a follow on blog I’ll show you how to publish this into the Pentaho server (it’s only about 2 more steps).

Happy Trails!

How To, Oracle, Pentaho

OLAP Survey

May 12th, 2006

I noticed some other bloggers posted but I’ll encourage people as well. The more people that respond the more accurate the survey represents reality!
“We would very much welcome your participation in The OLAP Survey 6. This is the largest independent survey of business intelligence users worldwide. The Survey will obtain input from a large number of users to better understand their buying decisions, the implementation cycle and the business success achieved. Both business and technical respondents are welcome.

The OLAP Survey is strictly independent. While Oracle, Microsoft, Cognos and other vendors assist by inviting users to participate in the Survey, the vendors do not sponsor the survey, nor influence the questionnaire design or survey results. As a participant, you will not only have the opportunity to ensure your experiences are included in the analyses, but you will also receive a summary of the results from the full survey. You will also have a chance of winning one of ten $50 Amazon vouchers. Click here to complete the survey on-line.”

Oracle, Pentaho, Technology Industry

UPSERTS using Kettle/MySQL

May 4th, 2006

Oracle Readers: Think MERGE Statement. :)

It’s common to need to generate surrogate keys as part of a Data Warehouse / Business Intelligence solution. Protecting your users from souce system changes, or using time variant data are just a couple of the reasons why one needs to generate a surrogate key when loading data into a DW.

Here is a simple example. I’m loading Product data from an XML file into a MySQL table. The transformation in Kettle looks like this:

upsert_kettle_transform.png

The output coming out of the XML file has been flattened using XQuery so that it looks like a flat table struture. Here’s a screen capture of a nifty feature of Kettle where you can preview the output of a step:

upsert_xml_data.png

Ok, this is our source data. The ID you’re seeing is the OLTP key. We’ll refer to it as the Natural Key. This is the ORDER_ID, or CUSTOMER_ID, or well, you get the idea. This is the key that your source application uses in it’s relational structures.

What we want is to INSERT or UPDATE (aka UPSERT) this record into our DW. We’ll INSERT if we don’t already have this NATURAL key in the DW and generate a surrogate primary key. We’ll UPDATE if we already have this NATURAL key in the warehouse.
Our table looks like:
upsert_product_erd.png

The DDL to create the TABLE is important:

CREATE TABLE INT_PRODUCT (
PRODUCT_ID INTEGER NOT NULL AUTO_INCREMENT,
PRODUCT_NAT_ID INTEGER NOT NULL,
TITLE TINYTEXT,
CONSTRAINT INT_PRODUCT_PK PRIMARY KEY (PRODUCT_ID),
UNIQUE INT_PRODUCT_NK (PRODUCT_NAT_ID)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Notice that our PRIMARY KEY is the Surrogate we’ve set up to be an AUTO_INCREMENTING column. This means that MySQL will generate a surrogate when we INSERT a record. All we have to do is get Kettle to match our unique contrainst (INT_PRODUCT_NK) and either INSERT/UPDATE.

I’ve used the INSERT / UPDATE operator in Kettle. The configuration looks like this:
upsert_upsert_config.png
What we’re doing is configuring Kettle to do a lookup on the table to see if the natural key already exists. If it does, we’ll perform DML (insert or update) on the fields in the lower window.

After running the transformation my INT_PRODUCT table looks like this.
upsert_surrogate.png
Now, I realized I didn’t choose very illustrative natural keys (all are 1,2,3) but the PRODUCT_ID comes from MySQL as a surrogate.

Subsequent executions of this Kettle transformation will INSERT records as they arrive or UPDATE. MySQL and Kettle in combination are making quick work out of generating these surrogates!

I invite Kettle experts to chime in on any optimizations/quirks.

Data Integration (Kettle), Pentaho

Connect to your own Database

May 4th, 2006

I get asked frequently by people who evaluate Pentaho: How do I get started? Many people (tens of thousands on a monthly basis) download the Pre Configured application and immediately get a sense for the benefits of the Pentaho open source project.

What people almost always want to do next, is get started with their OWN data. You the reader of this blog might find a pre canned demo interesting, but you’ll want to reserve judgement on the leading open source BI Project until you’ve seen your own data flowing into beautiful PDFs/Excel/OLAP Pivot Tables. I totally understand…

This is a very SIMPLE, BASIC, LIGHTWEIGHT guide on how to get the downloadable DEMO connected to YOUR DATABASE and executing YOUR SQL Query using the lynchpin of the Pentaho platform, an Action Sequence (refer to this blog for more on what Action Sequences are).

Disclaimer: Pentaho has development tools that once installed makes the development of action sequences more productive then editing xml as below. In other words, this is just rough and dirty to get you going and connect… I highly suggest downloading the Development Workbench to actually build the Action Sequences.

Step 1:
Collect the following information for your database.

  • Vendor (Oracle/MySQL) provided JDBC Jar file (classes12.zip, mysql-connector-java-3.1.12-bin.jar)
  • JDBC Connection URL (jdbc:mysql://localhost/database)
  • JDBC User Name (scott)
  • JDBC Password (tiger)
  • A SQL query that you are certain works in your database (select id, title from status)

This is the information needed for ANY Java Application to connect to a database, and is not really Pentaho specific.

Step 2:
Create (using notepad or vi) a file named myfirst-ds.xml in $DEMO_BASE/jboss/server/default/deploy/

Put the following into that file to create a JNDI datasource with the JNDI-NAME “myfirst”

myfirst_001.png
Note: You need to replace the above items with the corresponding JDBC information you collected in the first step.

Step 3:
Edit (using notepad or vi) the file $DEMO_BASE/jboss/server/default/deploy/pentaho.war/WEB-INF/web.xml.
Add the following right below the Shark Connection entry

myfirst_002.png

Step 4:
Edit (using notepad or vi) the file $DEMO_BASE/jboss/server/default/deploy/pentaho.war/WEB-INF/jboss-web.xml.
Add the following right below the Shark entry
myfirst_003.png

Step 5:
Copy your Vendor provided jdbc driver library (classes12.zip) in the following directory:
$DEMO_BASE/jboss/server/default/lib

Congratulations, you’ve setup your JNDI datasource! Whew…. Almost there.

Step 6:
Let’s create your first action sequence!
Create (using notepad or vi) a file named MyFirst.xaction in $DEMO_BASE/pentaho-solutions/samples/datasources
myfirst_004.png
Make sure and replace the select I have above with a select statement that you KNOW works in your database.

Step 7:
Restart your server using the stop-pentaho/start-pentaho commands in $DEMO_BASE.

Step 8:
Open up Firefox (cough IE) and hit the following URL:
http://localhost:8080/pentaho/ViewAction?&solution=samples&path=datasources&action=MyFirst.xaction

You should see the basic output of the Action Sequence which, in this case, is the results of the SQL you provided.

Action Successful

id title
1 Open

That’s it! You’re connected to your own datasource and now you can start to explore the immense power of Action Sequences with your data… There are approximately 50 example .xactions in the demo so use them as a reference when trying to exploring the features of the platform. I HIGHLY SUGGEST downloading the workbench as well. It’s a visual environment for authoring .xaction files. Trust me, you’ll appreciate the GUI!

How To, Pentaho