Open Source BI – I like Pentaho

Business Intelligence software, databases, and their supporting hardware are expensive. I mean really, really expensive (hundreds of thousands to millions of dollars). Many people working in the Business Intelligence/Data Warehousing fields have seen their “operational application” colleagues adopting open source solutions (Linux, JBoss, Eclipse, Apache, etc.) but have seen little attention paid to the software required to build and deliver Business Intelligence. That is beginning to change.

I’ve blogged about this before, specifically my experiences with downloading and testing Mondrian, an open source ROLAP server written in Java. It appears as if there is some gaining momentum and maturity of projects suitable for BI in the Open Source(OS) world. I’ve felt for some time that the open source community had not embraced BI in quite the same way they have other applications of technology. It is, in earnest, a technology stack to make bigger companies bigger and smart companies smarter. While these precepts aren’t in opposition of open source ideals, they aren’t what typically motivates communities of developers to band together to make software for free (ie, change the world, provide a framework used by 10,000 websites, etc.).

The state of open source BI was relatively slim not too long ago. There were a variety of possible toll sets one could use for ETL (Clover, Enhydra Octopus), some initial OLAP components (Mondrian, JPivot), some portal frameworks for dashboards (JetSpeed, JBoss Portal), and some databases with maturity for DW situations with smaller volumes (MySQL, Postgres). Things have been heating up this past year, and we should review whats going on in the Open Source BI realm. The lead is buried, make sure you check out Pentaho at the bottom.

CA’s Open Source release of Ingres
Albeit a funny OSI approved license (there are many provisions which will scare away the OS purists, and make others at least think twice about including it in their products or service) Ingres is officially open source and free. Ingres has some pretty significant “enterprise” features including replication, partitioning, and “in the works” linux clustering (a la RAC). This is great news because Ingres is a rather mature database and is better suited for large DW volumes than MySQL and PostGres. It is noticeably (and perhaps critically) lacking the vibrant community required to increase uptake. At this point it feels like CA is still the only one “interested” in Ingres. This might change, but I believe the funny CATOSL has hindered acceptance from open source communities.

Netezza/DATAllegro are using open source
These two providers of DW appliances are using open source databases as part of their solution. It’s a mixed technology stack, which means that unless you purchase the appliances you will benefit from none of the work that these two companies have put into their implementations. One uses Postgres, the other uses Ingres. There must be quite a bit of technology surrounding it to make it actually work for corporate DW environments. Netezza is actually doing rather well I believe, and some of the bigger vendors are starting to “see them on the radar” as a player in the space.

GreenPlum (aka Metapa) takes another shot
When Metapa wasn’t getting the traction with marketing their inexpensive proprietary Clustered DB implementation they figured they needed something to get more traction. Open Source is powerful enough that even a few years into the hype it still attracts attention. They relaunched themselves as an Open Source solution and are sponsoring the BizGres project (a few extensions to PostGres that are useful for BI environments) along with allowing the single instance version of their product to be used for free. I don’t think they’ll get the OS community embrace they desire because people are discerning these days; the only interesting work GreenPlum is doing is related to their MPP and shared nothing clustering technology which is very much NOT open source. I don’t think they’ll get the OS thrust they expected, because they are only opening their kimono an inch, not even a halfway mark.

Mondrian/JPivot releases
These two projects underwent new releases this year that provided the most visible part of an open source DW/BI system their legs. While not comparable to commercial OLAP interfaces they are certainly suited for ISV/Developers to embed in their application. These are great components for including in a project, and if your report consumers don’t really care to write their own reports (a la graphical report builder) and just want to pivot and page this could be an excellent, inexpensive solution.

BIRT and JasperReports are actually pretty good
Two commercially backed (one by Actuate, the other by JasperSoft) projects that are building the basis for business quality reports. Don’t turn off your Crystal installation yet because these both have a way to go, but they’re improving at a steady pace.

Pentaho Nation
This is truly the most exciting thing I’ve found in the Open Source BI space, and they’ve just begun their work so I’m running on faith at this point. Industry veterans who are passionate about BI and open source have pooled their minds and money (they’ve made $$ from previous entrepreneurial activities) to build a pure, 100% open source distribution for BI. They are collecting various open source projects, building their own components and releasing the whole thing as open source. A partial list of the projects they are planning (no official distro yet): Mondrian OLAP server, JPivot, Firebird RDBMS, Enhrydra ETL, Shark and JaWE, JBoss, Hibernate, JBoss Portal, Weka Data Mining, Eclipse, BIRT, JOSSO, Mozilla Rhino.
The company will follow in RedHat footsteps and make money on support, training, and consulting. Their plans are ambitious, but they are focused on assembling and configuring all these disparate projects into a comprehensive platform that will be at least comparable to the “big boys” at Hyperion, Cognos, Microstrategy, etc.

They are engaging the community, clearly understand the need in the space, and are committed to the ideals of getting paid for solutions instead of software. They are certainly strong in the presentation, dashboard, BPM/workflow, OLAP end of the spectrum but don’t appear to be including much in the ETL/DW end (there is some, but it appears to be for data movement and loading as opposed to building a DW). I’m not sure if it’s strategic or not, but it might makes sense. Most people adopting an open source BI platform for their reporting users will feel comfortable rolling their own ETL/DW for the backroom. It should also be noted that they haven’t made any releases yet, so what we’re seeing is all conceptual now but they’ll be rolling something out sometime in 2005. It appears as if the founders have a track record of “doing what they say they’ll do.”

What does this all mean?
There are three things that will happen as the Open Source and BI worlds start dating.

  1. Hardly anything for your current BI project and technologies. It is still emerging and is just now being utilized by early adopters.
  2. Cost pressure on the “big boys” will occur as the maturity of these components provide at least comparable options. Currently the small number of vendors along with their constantly increasing prices will show up as an area to be trimmed (ironic enough probably in a financial report provided inside the software in question). I don’t believe that it will have a significant impact, but will have a small impact over the next 3-5 years. It will also affect prices of BI OEM and inclusion of BI capabilities in vertical applications (more BI in existing products).
  3. Increased adoption of BI at small and mid sized business who can now afford to enter into the BI space. Previously inhibited by the exorbitant software costs business can now spend a few thousand dollars to start their foray into BI.