Those that regularly visit the site know that I focus my professional consulting hours on Data Warehousing, and specifically with Oracle and Oracle Warehouse Builder. However, much of my R&D time I spend researching, downloading, and kicking the tires of Open Source projects I find cool and interesting. Pentaho is clearly one that exists at the intersection between what puts bread on the table and what stimulates the mind, so I accepted their invitation for a week of training in Orlando, FL.
Day 1, like many first days, is mostly introductory, architecture blueprints, team, background, etc. We start getting into some technical details in the subsequent days, but today was mostly an overview. I am continually impressed by this company; not the product per se, because it is a 1.0 product in a very mature market that can only now start to be compared with the “big boys.” However, the company has the mojo to pull this off, I think. They are on a first name basis with CEOs of other key open source companies. Their ranks are filled with former Business Objects and Hyperion recruits. Their board members were senior VPs at Oracle, and on and on. They are building a solid company that is healthy and, from what I can see, can deliver on their vision of an open source BI stack.
What is Pentaho? Pentaho is an open source BI stack that provides the full stack of BI components: Reports, Bursting, OLAP analysis, Dashboards, BAM, etc. Lofty goal indeed… CEO Richard Daley puts it quite simply (paraphrased): We don’t want to be a disruptive technology in just Open Source BI, we want to disrupt the entire BI market place with our technology… It’s a lot of fun… It’s process centric (workflow driven) and has conceded the fact that it won’t be a silo, as the center of the universe. It pragmatically embraces the idea that BI should be part of an overall business process, and that if it is not, then you’re not getting the full value of your BI assets.
This makes profound sense, yes? If your business process of analyzing order fulfillment efficiency is the end of your process (ok, looks like the warehouse in Toronto is 3 times slower than anyone else) then you’re hosed. The process must continue to notify someone of this result, and collaborate on a solution if the intelligence is actionable.
I continue to be critical of their regard for ETL/Data Warehouse as secondary to their platform. I think they have BI covered, and are comprehensive in this regard. What I see, like others, is that the other key piece of “doing BI” is the information integration, cleansing, and transformation. If the data is unintegrated, the business context is difficult to infuse from a “straight SQL query” then you’re in the ETL and Data Warehousing business. That being said, the architecture I’ve seen put forth (haven’t been in the details yet), allows for this relatively easily. Perhaps this will be more robust that it first appears as the details of their product are run through this week.