Category Archives: Uncategorized

Pentaho goes GPL: A non-event

Pentaho announced last week that their BI Platform version 2.x and onward would be released under the GPLv2 license. I’m an outspoken critic of GPL for a lot of use cases, and personally lean toward an Apache/MIT/BSD myself. However, for nearly everyone involved in Pentaho this is a non event, not that big of a deal, and good for Pentaho.

By now, if you’ve ever read anything I’ve written before about GPL for “business-eee” type projects you’re probably wondering “Has Nicholas completely sold out?” Well, I’ll leave that conclusion for another venue/time, wink wink, but there’s some very clear reasons why GPL is not a bad thing for most people involved in Pentaho.

First and foremost, is to understand what is moving to GPL. That makes a huge difference in understanding the impact. It is only the BI Platform technologies that are going GPL and the core libraries (Reporting, Kettle, Mondrian, …) are remaining under their original (ie, somewhat permissive) licensing. The things that are being GPL’ed are the things that end users are using. For instance, the ability to navigate through a set of reports. Run reports with parameters, etc. This is the code that makes the Pentaho core technologies (OLAP/ETL/Reporting) look and feel like a full product with login screens, UIs, run scheduling, etc.

The other piece to mention is that GPL only really affects ISV/OEMs.

For end users (even SaaS providers) it makes no difference GPLv2 vs MPL. So, if you’re considering downloading Pentaho to start a project at your company for your own intranet, extranet, BI, dashboards, etc this will have NO affect on you.

One of my beefs with the GPL has always been that it stunts adoption and the ability for multiple parties to work on the project, embed and utilize it in a commercial venture. The core libraries remain in tact in this regard – Mondrian can be embedded just as easily as it was originally because it’s license remains unaffected. Kettle can as well (LGPL). Pentaho Reporting – good to go too. The Platform as a set of UI (and productized versions of the core libraries) will be, in my opinion, cast aside for anyone wanting to embed these technologies into their own product.

The license will now be a big contributor to this decision, but to be truthful, if you want to “just use” Mondrian then you’re BETTER OFF by “just using” Mondrian. If you want Mondrian in conjunction with Reporting now you’ll want to consider the Platform but my experience shows that if you’re using these technologies in your application using the core applications/interfaces is preferable. The platform makes the projects work for end customers, but the platform is kind of “a lot” for someone who just wants to execute some ETL jobs or use JPivot/Mondrian in their application. That’s not to say that ISV/OEMs shouldn’t reach out to Pentaho to still get OEM support on embedding “just Mondrian” into their application. Pentaho’s subscription and services are quite valuable in this regard – I can think of no better group of people to help make a project successful then the people who wrote it.

It’s not clear to me whether or not Pentaho Metadata will be GPL. When I was working at Pentaho I advocated strongly against GPL for it, because I believed that done correctly the project could become *the* metadata editor/infrastructure for just about any new Open Source or proprietary project. For a variety of reasons, this hasn’t happened. GPL, in my opinion, ensures that Pentaho’s Metadata project will remain solely and simply that: Pentahos Metadata project. I don’t think they’ll be any other salient, significant contributor if it goes GPL. However, it’s not a big loss to Pentaho since there has been hardly any (have there been any?) contributions to that project to date anyhow.

GPL, should it provide Pentaho more “protection” on the Platform code so that it can not be ISV/OEM’ed without payments, could end up benefitting most everyone. Why? Because should Pentaho feel like it’s able to monetize the open source edition consistently, there is less need to keep more in the professional edition. If GPL provides additional cover, I’d hope to see more code flying into the Open Source (GPL) edition of the product. However, I’ve not heard anything about this from Pentaho and only time will tell. 🙂

There you have it.

GPL makes pretty much no difference to end users, customers, SaaS providers, etc. It pretty much makes no difference to ISV/OEMs because they’ll want to embed the core libraries, not necessarily the entire platform. Pentaho remains a strong choice in every regard; customers are signing up in droves, the value is immense.

It is, for all intensive purposes, a non event.

How to Generate a GUID in an XAction

I needed to uniquely identify a request to Pentaho (one particular action sequence request). Found a pretty darn easy way to do this with the help from Java RMI classes.

– Insert a Javascript data source

200805011651

– Enter the following script

function getGUID() {
var VMID = new Packages.java.rmi.dgc.VMID();
return VMID.toString();
}
getGUID();

– Set return type as “string” for a new value

200805011650-1

– Add it to your response

200805011652

200805011653

– Enjoy your GUIDs!
cef9372c035a42ed:-b0917ee:119a6d47d72:-7ff4
cef9372c035a42ed:-b0917ee:119a6d47d72:-7ff3
cef9372c035a42ed:-b0917ee:119a6d47d72:-7ff2

PS – I personally hate GUIDs when stored in the database. 🙂 However, for matching up with a particular request, yippeee!!

Subreports Example Zip

There’s been some questions floating through the Pentahoshpere (I think I’m the first person to use that word, btw) about how to use Subreports. I think there’s a good description at the wiki that covers the basics, but I don’t think there’s a good working example that is shipping with Pentaho open source yet.

I don’t really have time to delve into all of it so in the spirit of “early and often” I’ll just post the zip file with a working example (on 1.6 GA Designer and Server).

Here tis:
http://nicholasgoodman.com/entry_images/pentaho_subreport_example.zip

2007 was a desert of Blogging

2007 was an off year for me when it comes to blogging. Not a surprise, since my first blog after returning from my trip to Argentina was entitled “Am I done blogging?

Seeing that I’m a self proclaimed “Data Dynamo and BI Geek” and even google agrees (search term “BI Geek” yields me at the top) it seems only fitting for me to see how bad 2007 was… You know, by the numbers.

First, pop the top on the wordpress database schema. (5 minutes)
Second, write a simple SQL Query based cube (blogmart.mondrian.xml) on top of my blog data (posts, categories) (15 minutes)
Third, do some analysis in JPivot to see how 2007 really shaped up against previous years ( instant )

2007 was my worst blogging year, ever. Including 2004 which I started blogging mid year! Ouch!

200712231215

Another interesting data point. I’ve blogged a bunch about Oracle, and Open Source. When I was first blogging I was working with Oracle as a focus of my consultancy. End of 2005 I began investigating Open Source BI in earnest, and even jumped onto the Pentaho ship for the better part of 2006 and 2007. How did this change in professional life affect my blog content? A bunch!

200712231234

In 2004 and 2005 my blog content was give or take, 75% Oracle and 25% open source. 2006 that proportion flipped and the desert of 2007 I’ve done ZERO Oracle blogs.

What about comments? What topics yield the most number of comments and discussion?
200712231244

Open Source, Pentaho, Personal, General Topics. Oracle was one of the least “commented” sections, even though I know from google keyword analytics its one of the things that drives traffic to my blog.

Well, there you have it. A year in the review of my (lacking) blog. Perhaps my new years resolution should be to blog more?

Why I don't have a .sig on email

One of my pet peeves is an email thread that grows 100 lines with every “Sounds good to me” reply. You know what I’m talking about.

10 screens of text, with about 1 screen of actual content/conversation.

All these logos and titles, fax numbers, clever logans and sayings, etc. AHHHH….

It’s a networked world, it doesn’t have to be on EVERY SINGLE EMAIL RESPONSE. If you want to get in touch with me, you can google me and immediately find my site, etc.

I’m Twitter’ed, LinkedIn, YahooMessenger, AIM, MSN, skype, etc. I’m easy to get a hold of, you don’t need to have 10 copies of ALL MY CONTACT INFO in an email.

Passionate Career Change

One of my professional mentors, and my “boss” through my time at Matchlogic, Inc. recently took a leap from Software Development to Solar Energy.  Steve is an exceptional architect, developer, and all around skilled software engineer.  He’s built systems that are exceptional functional and well designed. 

While the Java world will mourn the loss of an exceptional technologist, the Solar Energy industry will benefit greatly from his talent.  I know that Steve will be successful in his new business; he’s smart, capable, and more than anything else he’s passionate about Solar Energy.

Check out his blog if you’re interested in Solar Energy.

Web Analytics and Maturing Partner Offerings

Our good friends at BreadboardBI have just released a solution to provide a common web analytics and reports.  While that, in and of itself, isn’t that earth shattering because there’s several FOSS projects that do this, what IS compelling is that it’s a Pentaho solution.  This picks up where the others leave off; the ability to build your own custom reports, extend the solution with another dimension or fact.  Add some of your own views, deliver the reports via email, etc. 

Check out the project and some of the features at BreadboardBI and sourceforge.  Here are some screenshots from their application; there’s some cool stuff in there!

Including a bunch of “OLAP views” so that users can filter, slice and dice, and search for information on their own.

I think we’re going to see more of these “solutions” pop up over time.  We just released the Software Quality for Bugzilla two weeks back, BreadBoardBI just released their Web Analytics project, OpenBI have a set of templates “OpenQuick Suite” they use for consulting gigs, Proratio with their SAP Connector, etc.  I’ll venture to say that our growing partner base is maturing in the sophistication of their services; with services partners exceeding “pure play” consulting and including some solutions and rapid starts that help deliver even MORE value on top of Pentaho. 

Software Quality Reports for Bugzilla

I’ve been working, on and off, for the past few months on a solution that really pulls together most of the major functions of our platform into an entire solution.  The SQR uses a little bit of the entire Pentaho Platform including Action Sequences, Kettle ETL, database structure initialization, Mondrian OLAP definitions, summary tables, JFreeReports, Pentaho Analysis views, user prompting, custom report rollups in Excel, etc.  It looks, feels, and operates as an entire solution, soup to nuts, running on Pentaho.

The SQR doesn’t aim to replace reports provided with Bugzilla.  Bugzilla is a good database schema for running an application (ie, Bugzilla) but it’s sometimes difficult if not impossible to ask some important analytic questions.  Questions such as:

and “Open vs Closed with a trend over time”

and the ability to build some of your own dashboards

The solution comes with sample data, provided graciously by landfill.bugzilla.org, and a bunch of sample reports, etc.  Over the coming months I’ll cover bits and pieces of the solution of which there are some great “how to” gems in there on how to roll out an entire, integrated solution on Pentaho.

If you use Bugzilla, definitely download and check out the solution; you may find some very useful reports and insight into your engineering process and software quality.

If you use Pentaho, stay tuned to this blog.  I’ll cover some of the “Pentaho” specific stuff using it as a standard downloadable reference.

Let me know what you think!

ETL for MySQL using Pentaho Kettle Webinar

Our good friends at MySQL and Pentaho are hosting a webinar about ETL for MySQL using Pentaho Kettle. 

ETL is a multipurpose technology, from straight data integration to data migration to reporting systems.  Kettle is primarily used in building Data Warehouse/Data Marts but can also be used for doing other useful MySQL admin tasks. 

Matt Casters and Lance Walter will give you the low down on Kettle:
http://www.mysql.com/news-and-events/web-seminars/etl-using-pentaho-kettle.php

I recommend it, even if you’re not interested in MySQL.  You’ll be surprised that an Open Source ETL tool is so visual and easy to use you may consider it for all kind of tasks.

Overview of Business Intelligence / DW

Back in May Dan Morgan was kind enough to invite me to do a guest lecture at the University of Washington about “Data Warehousing Basics.”  After having emailed these slides as a decent overview to a few customers lately, I realized they’d probably be useful online.  It is obviously a little light on content (their just slides) but they do provide some good “high level views” of dimensional modeling/DW/BI in general.  My employer, Pentaho, was generous enough to allow me time to build this presentation for the students at UW for which they, and I are grateful.  THANK YOU!

The online version: Univerisity of Washington Guest Lecture May 9
The PDF of the presentation:  University_of_Washington_Guest_Lecture_May_9.pdf

My cliffs notes:
 - If you’re doing a BI or DW project find “that guy” with the MS Access database or Excel jockey that sits outside the COO’s office.  Make him your BEST BEST friend.
 - Facts are “What” / Dimensions are “How”  Good graphics that drive the point home:

and

Thanks to Matt Casters for leeting me pillage some of his graphics and slides.