Halloween Easter Egg on Pentaho homepage
Tuesday, October 31st, 2006Pentaho, being a family orientated company, is big on kids holidays. The design team dropped an easter egg on our homepage today.
Can you find it?
Pentaho, being a family orientated company, is big on kids holidays. The design team dropped an easter egg on our homepage today.
Can you find it?
This is a common situation: Don’t show me what my total sales figures were month after month, show me something that describes something important to my business. ie, Sales Growth
Chris Webb, who runs a wildly popular MSFT blog in addition to being an in demand independent consultant, wrote an article on Previous Period Growth using Pentaho. Mondrian (Pentaho Analysis Server) uses MDX, a powerful expressive multidimensional query language which Chris is one of the leading experts on its practical use and applications.
Chris outlines how to build a “custom” calculated measure that displays the Sales Previous Period Growth:
All you need is the zero install pentaho demo installation to run through his tech tip, available at http://www.pentaho.org/download/latest.php
Remember, this isn’t trivial (ie, writing MDX fragments) but it’s VERY VERY powerful. Check out the Mondrian MDX reference here for some of the powerful analytic calculations available. Remember, once you’ve got your MDX member working properly HIDE that complexity from your users by adding it to the Mondrian OLAP schema definition.
I had the recent good fortune of traveling to Sydney to deliver a “much sought after” scheduling of our “Building Analytic Solutions with Pentaho” class. We did little advertising but it was packed (12 people, the maximum we ever do for public classes).
I love doing training courses for more advanced topics, like the Analytic solutions course. I love it because it’s a chance to converse with other practitioners and share knowledge, experience, and war stories. These experiences, and the camaraderie is invaluable when one tends to be the “lesser known” topics at an organization. It’s GREAT to hear about open source adoption in the enterprise; stories of countless millions being saved, people feeling empowered to make their infrastructure and applications what THEY want instead of what their VENDORS want. It’s just nice to connect with people of similar interests.
It’s also a chance to hear some validation for strong points and deficiencies in Pentaho’s open source strategy. I have my own opinions, as someone who uses the software day in day out on real customer problems. It’s great to hear that others either feel the same way or disagree; because that’s the nature of this community driven process. It doesn’t really matter what I think the product should be like (I work for the vendor right?) it matters what customers and community want. I think feature X is awful, doesn’t work properly and is total crap. OK. If community members find it entirely suitable for their needs, and say “Go work on feature Y” then that’s PERFECT.
This is the most effecient part of open source: The closer you are to your customer, the closer you are to your market, the closer you are to the pain or joy, the more likely you are to make better product. Cutting out the middle men (in many cases, account managers and product managers and development managers, etc).
Thank you, Sydney trainees for sharing your praises and criticisms. I’ll bring them to those that can actually do something about it (ie, Java Jockeys).
PS - Based on the training people like more of our product than dislike AND I was right about Feature X.
Someone pointed out the following image turned up by Google when searching for Pentaho.

It comes from an article about my joining Pentaho and seems to indicate that I’m a “feather in Pentahos cap.” I think a graphic designer just has too much time on their hands. Definitely.
Small little tip:
The pentaho build process doesn’t currently manage the permissions on .sh files properly. When you download the daily builds or other demo installations you may get some errors (bash command not founds, etc). You need to change to executable all .sh files in the installation. Use the following command in the “pentaho-demo” directory.
for x in `find . -name ‘*.sh’`; do chmod +x $x; done
Hope you find this helpful!
Like all great open source products, Pentaho Data Integration (Kettle) is a functional product in and of itself. It has a very productive UI and delivers exceptional value as a tool in and of itself. Most pieces of the Pentaho platform reflect a desire to keep the large communities around the original projects (Mondrian, JFree, etc) engaged; they are complete components in and of themselves.
When used together their value, as it relates to building solutions increases and exceeds their use independently. I’ll be the first to admit that Pentaho is still fairly technical, but we’re rapidly building more and more graphical interfaces and usability features on top of the platform (many in the open source edition, but much is in the professional edition). Much of this work involves making the "whole" (Pentaho) work together to exceed the value of the pieces (Mondrian, Kettle, JFree, …).
A few things immediately come to mind of why Pentaho and Kettle together provide exceptional value as compared to used individually or with another open source reporting library:
There are some things that are tough, if not downright impossible to do in SQL. Ever do an HTTP retrieval of an XML doc, slurp in a custom lookup from Excel, do a few database joins and analytical calculations in a SQL statement? I bet not. Report developers are smart data dudes; having access to a tool that allows them to sort/pivot/group/aggregate/lookup/iterate/list goes on and on/etc empowers report developers in a way that a simple "JDBC" or "CSV" or "XQuery" alone can accomplish.
How is this made possible?
Pentaho abstracts (optionally, it isn’t forced on customers) the data retrievals to lookup components. This allows BI developers to use either a SQL lookup (DB), XQuery lookup(XML), MDXLookup (OLAP), or Kettle lookup (EII) to populate a "ResultSet." Here’s the beauty; reports are generated off a result set instead of directly accessing the sources. This means that a user can use the same reporting templates, framework, designer, etc and feed/calculate data from wherever they desire. Truly opens a world of possibiliy where before there was "just SQL" or "ETL into DB tables."
Pentaho has invested greatly in the idea of the solution being a set of "things" that make up your BI, reporting, DW solution. This means you don’t have ETL in one repository, reports managed somewhere else, scheduling managed by a third party, etc. It’s open source so that’s obviously a choice, but we can add much value by ensuring that someone who has to transform data, schedule that, email and monitor, secure, build reports, administer email bursting, etc can do some from one "solution repository." Managing an entire BI solution from one CVS repository? Now that’s COOL (merge diff/patch anyone?).
Kettle is quite flexible; the 2.3.0 release extends the scope and locations where you can use variable substitution. From a practical standpoint this means that an entire Chef job can be parameterized and called from a Pentaho action sequence. For instance, because you can do your DW load from inside Pentaho action sequences that means you can secure it, schedule it, monitor it, initiate it from an outside workflow via web service, etc. In one of my recent Kettle solutions ALL OF THE PHYSICAL database, file, and security information was managed by Pentaho so the Kettle mappings can literally be moved from place to place and work inside of Pentaho.
Pentaho is investing in making the tools more seamless. In practice (this is not a roadmap or product direction statement) this means being able to interact with tables, connections, business views inside of Kettle in an identical (at least similar way) in the report designer. For example, if you’ve defined the business name for a column to be "Actual Sales" Kettle and the Report Designer can now key off that same metadata and present a "consistent" view to the report/ETL developer instead of knowing that "ACT_SL_STD_CURR" is actual sales.
Another example is the plans to do some additional Mondrian/Kettle integration to make the building of Dimensions, Cubes, and Aggregates easier.
Tomorrow MySQL and Pentaho are presenting on how MySQL and Pentaho can work together to deliver exceptional value when used in combination to solve Business Intelligence and Reporting business challenges.
I’ve been working more and more with MySQL over the past couple of months since joining Pentaho and I’m pleasantly surprised. On the good side of the 80/20 rule (ie, 80% of users want 20% of the features) it’s exceptionally "good enough" for things that I want to do.
Back to the tagline.
Tomorrow, Pentaho is highlighting our desire to be as easy to MySQL users as MySQL is. We want to understand how to make it increasingly easy to use Pentaho with MySQL. In return for providing Pentaho with much needed feedback on ease of use and the user experience for installation/configuration Pentaho is giving away a Mac Mini. It’s no iPod, thank heavens, as everyone is giving those away these days.

10am PT, 1pm ET in the US. Register and dial in here. Read the press release here.
I just got on to Roland Boumans blog.
He has an excellent write up on how to get started with Pentaho Data Integration (aka Kettle):
Kettle is a free, open source (LGPL) ETL (Extraction, Transformation and Loading) tool. The product name should actually be spelled as K.E.T.T.L.E, which is a recursive acronym for "Kettle Extraction, Transport, Transformation and Loading Environment".
….
An interesting feature of Kettle is that it is model-driven. Both Spoon and Chef offer a graphical user interface to define the ETL processes on a high level. Typically, this involves no actual programming at all - rather, it’s a purely declarative task which results in a model.
He wonders at the end about how Kettle is deployed inside of Pentaho. That’s a great question and something Pentaho/Matt have been working at over the past few months. Perhaps I can shed a bit of light on this.
In the latest build of Pentaho (I used 1.1.6 for the below screenshots) we ship an example of a Kettle mapping returning a result set which demonstrates the great architectures of both Kettle and Pentaho. Kettle provides an easy way for creating plugins and interfaces for steps allowing Pentaho to access data at the "end" of a transformation. Pentaho has a multipurpose result set object which allows for reports to key off of "data" instead of a SQL Result or an MDX Result, or in this case, a real time data slurp (EII result?!?).
The transformation in spoon looks like this:
It reads data from the Pentaho sample database (QUADRANT_ACTUALS) filters, and does some calculcations and places then in an operator XML Output. This output operator is superflous, the KettleComponent (from Pentaho) sources the data DIRECTLY from the in memory XML Output object in Kettle.
The ETL example is basic; it doesn’t even place the data slurped from Kettle into a nicely formatted report.

Just to be clear on what you’re seeing…
Pentaho users gain the full power of the Pentaho Data Ingegration tool (proper ETL sorts, groups, XML, csv, xml, filters, calculations, database lookups, aggregations, etc) when used as a SOURCE for their reports. A full ETL tool where you don’t HAVE to persist the data to tables or files or anything.
If it suits your needs, you can simply run your ETL everytime your report is run and you never have to build a staging area, data warehouse, or OLAP cubes. Your mileage may vary and I don’t think this is wholesale replacement for a data warehouse at all! Just a great way to provide additional options for BI developers.
Going beyond a single transformation in Pentaho we are currently augmenting this component to allow you to execute Chef Jobs from inside Pentaho. I have no idea when that code is included in a release but I’ll be sure and blog about it here. That means that you can schedule your "DW load process" or "Data Integration Routines" to run from inside Pentaho. I’ll post an example of that shortly…
Again, check out Rolands article! It’s a great addition to the pentaho blogosphere.
As promised, todays followup on how to limit the complexity of calculated MDX measures from your users using Mondrian.
If you haven’t already, start with yesterdays blog, entitled Mondrian: OLAP power at your fingertips. We leveraged the power of MDX to quickly build a calculated MDX member. While in and of itself not all that powerful (% of total is not that tough) it provides with a quick and easy way for users to start exploring the power of MDX without having to do any XML editing, etc. You can just type it into the web browser and “try before you buy.”
Ok, so let’s say you’re sold. You’ve got your calculated measure exactly how you want; great. There’s no WAY that you’d want your users to have to see that and copy and paste that in themselves. It also leaves room for people changing formulas, which in the day and age of SOX is just a bad idea. What you want to do is make this MDX part of your Cube so that a user can just reference it, and disregard the complexity of the expression.



That’s it! This is really useful for providing your users a bunch of reusable measures (YTD, Versus Period Prior, etc) without having them have to hack it all day in Excel. Enjoy!
Pentaho Analysis (aka Mondrian) is a powerful OLAP tool that can wrap just about any JDBC datasource and present it multidimensionally. There are many resources that cover how to get Mondrian connected to a database, some of the ins and outs of the the Mondrian.xml schema definition, etc. Comment below if you’re also interested in seeing a tutorial here in that regard.
Connecting from MDX to usefullness isn’t always the easiest… How powerful is MDX? Why would I do something in MDX versus SQL? Why not build another measure in my fact table to support that query instead of MDX? Well, I won’t belabor the benefits of MDX and the expressiveness of the language - there are plenty of great books and resources in that regard.
Suffice to say that it makes certain analytic queries (Year to Date, % contribution to total, this versus period prior) easy… sometimes embarrassingly easy compared to what we used to do in pure "star schema relational and SQL group by" land.
Let’s build a custom MDX measure and then in a followup I’ll show how provide this to your end users in Mondrian without this complexity.






select NON EMPTY {[Measures].[Actual]} ON COLUMNS,
NON EMPTY Hierarchize(Union({[Positions].[All Positions]}, [Positions].[All Positions].Children)) ON ROWS
from [Quadrant Analysis]
You should see the report look like this:

What we’ve done here is built a calculated measure that using MDX. MDX is very powerful so I suggest you check out some MDX books or resources on the net to explore its capabilities.


Well, that’s great for the techies and hard core business analysts, but we’d never want to have to have an end user to have to figure all that out. How do we buffer the user from that complexity? That will be tomorrows blog entry on how to make a CalculatedMember in Mondrian.
Bonus Screenshot of a pie chart in two clicks now based on our percentage: