Kettle and Pentaho: 1+1=3

Like all great open source products, Pentaho Data Integration (Kettle) is a functional product in and of itself.  It has a very productive UI and delivers exceptional value as a tool in and of itself.  Most pieces of the Pentaho platform reflect a desire to keep the large communities around the original projects (Mondrian, JFree, etc) engaged; they are complete components in and of themselves.

When used together their value, as it relates to building solutions increases and exceeds their use independently.  I’ll be the first to admit that Pentaho is still fairly technical, but we’re rapidly building more and more graphical interfaces and usability features on top of the platform (many in the open source edition, but much is in the professional edition).  Much of this work involves making the "whole" (Pentaho)  work together to exceed the value of the pieces (Mondrian, Kettle, JFree, …).

A few things immediately come to mind of why Pentaho and Kettle together provide exceptional value as compared to used individually or with another open source reporting library:

  1. Pentaho abstracts data access (optionally) from report generation which gives report developers the full POWER of Kettle for building reports.

    There are some things that are tough, if not downright impossible to do in SQL.  Ever do an HTTP retrieval of an XML doc, slurp in a custom lookup from Excel, do a few database joins and analytical calculations in a SQL statement?  I bet not.  Report developers are smart data dudes; having access to a tool that allows them to sort/pivot/group/aggregate/lookup/iterate/list goes on and on/etc empowers report developers in a way that a simple "JDBC" or "CSV" or "XQuery" alone can accomplish. 
    How is this made possible?
    Pentaho abstracts (optionally, it isn’t forced on customers) the data retrievals to lookup components.  This allows BI developers to use either a SQL lookup (DB), XQuery lookup(XML), MDXLookup (OLAP), or Kettle lookup (EII) to populate a "ResultSet."  Here’s the beauty; reports are generated off a result set instead of directly accessing the sources.  This means that a user can use the same reporting templates, framework, designer, etc and feed/calculate data from wherever they desire.  Truly opens a world of possibiliy where before there was "just SQL" or "ETL into DB tables."

  2. Ability to manage the entire solution in one place

    Pentaho has invested greatly in the idea of the solution being a set of "things" that make up your BI, reporting, DW solution.  This means you don’t have ETL in one repository, reports managed somewhere else, scheduling managed by a third party, etc.  It’s open source so that’s obviously a choice, but we can add much value by ensuring that someone who has to transform data, schedule that, email and monitor, secure, build reports, administer email bursting, etc can do some from one "solution repository." Managing an entire BI solution from one CVS repository?  Now that’s COOL (merge diff/patch anyone?).

  3. Configuration Management

    Kettle is quite flexible; the 2.3.0 release extends the scope and locations where you can use variable substitution.  From a practical standpoint this means that an entire Chef job can be parameterized and called from a Pentaho action sequence.  For instance, because you can do your DW load from inside Pentaho action sequences that means you can secure it, schedule it, monitor it, initiate it from an outside workflow via web service, etc.  In one of my recent Kettle solutions ALL OF THE PHYSICAL database, file, and security information was managed by Pentaho so the Kettle mappings can literally be moved from place to place and work inside of Pentaho. 

  4. Metadata and Additional Integration

    Pentaho is investing in making the tools more seamless.  In practice (this is not a roadmap or product direction statement) this means being able to interact with tables, connections, business views inside of Kettle in an identical (at least similar way) in the report designer.  For example, if you’ve defined the business name for a column to be "Actual Sales" Kettle and the Report Designer can now key off that same metadata and present a "consistent" view to the report/ETL developer instead of knowing that "ACT_SL_STD_CURR" is actual sales. 
    Another example is the plans to do some additional Mondrian/Kettle integration to make the building of Dimensions, Cubes, and Aggregates easier.

9 thoughts on “Kettle and Pentaho: 1+1=3

  1. James

    First of all, great blog, thanks. And Pentaho is shaping up to be a great project. I just found this one statement a bit surprising.

    “but we’re rapidly building more and more graphical interfaces and usability features on top of the platform (many in the open source edition, but much is in the professional edition)”

    I was always under the impression that the professional edition of Pentaho followed similar practices that many open source projects follow, which is that the commercial edition’s $ value comes from support, indemnification and maybe enterprise scalability. In other words, the free edition usually is not a half-baked version.
    Whereas it now sounds like Pentaho might be taking a more hawkish approach by enticing people to purchase by witholding some functionality like GUI tools from the standard version.
    Of course, it is entirely Pentaho’s perogative to come up with whatever project/revenue model they wish. But, from my perspective I find it a little discouraging that if I go the Pentaho route, I may be eventually forced to upgrade to Professional edition before I’m financially ready to when the Pentaho team decides they can get more revenue by witholding some key functions from the free edition.
    Again, that’s their right, but I’d like to know what I’m getting into before I get into it.

  2. John Sichi

    Regarding “There are some things that are tough, if not downright impossible to do in SQL. Ever do an HTTP retrieval of an XML doc, slurp in a custom lookup from Excel, do a few database joins and analytical calculations in a SQL statement? I bet not.”:

    That’s just because you’ve been limited by the available SQL technology 🙂

  3. Matt Casters

    Hi James,

    certainly you have a valid concern there, and I think it does apply to certain other open source companies, but I really don’t think it applies to Pentaho. The thing we always do at Pentaho is provide all functionality as open source. It’s as simple as that. The pro edition will have a shorter turnaround time because it has better GUI’s, wizards perhaps, but the functionality will still be in the open source edition.
    Let me just say that this approach is a lot better for you than the competitors offerings that reserve certain functionalities for the “real/pro” version.
    Of-course, we build frameworks where other people, companies and Pentaho itself can hook into, but certainly your not going to hold it against us that for example an SAP/R3 or JDE connector is not (F|f)ree.

    BTW, nothing is for free of charge. When you have a professional BI consultant working on a project for 50 days, how much does that cost? Maybe you would benefit from purchasing the Pro edition because it would get work done faster. If you are a student, working for free and don’t care how long you spend on a certain task, then the open source version would probably do everything you need. Then again, Pentaho as a company wouldn’t really make money or want to make money from selling that student anything, right?

    One other thing, if you look at other “pure” open source projects like Postgres, Eclipse, etc. They all have pro versions too offering “improved” editions through companies like EnterpriseDB, MyEclipse, etc. So how is it OK for these companies to sell that software and not for Pentaho to do it themselves?
    That just doesn’t make sense to me.

    You know what this looks like? It’s like getting 9 bars of gold and complaining you can’t have the 10th as well. 🙂

    Take care,


  4. James

    Hi Matt,
    You started off good, then got a little bitter at the end. I tried to prevent the defensive reaction with several qualifications in my statement. I thought I made it clear that I felt Pentaho should do whatever they need to be successful or profitable, whatever is their intent. I was simply stating things from my perspective, as I’m obviously more concerned with my own well-being than Pentaho’s (and vice versa). I’m not an expert in the matter, I’m sure you’re much more knowledgable on revenue models from other ‘open soure’ projects than myself. I brought that element up thinking that a comparison to others would help me have a better understanding.
    The final result though, is that you did give me some more clarity, and that’s what I was looking for, so thanks for the info.

  5. ngoodman Post author

    James and John –

    Thanks for your feedback!

    James I think your feedback regarding the mix of features in Pentaho’s two editions is valuable. The guiding principle at Pentaho has been “reducing TCO for enterprise customers” as a litmus test for Pro versus Open Source contribution. That being said, even amoungst Pentaho staff and partners we debate what features go into the Pro. The open source version of Pentaho is a fully functional product by itself. ie, there shouldn’t be anything you “can’t do” with enough desire in the open source version. In the open source version, some more advanced features (building of aggregate tables) might require building of ETL, configuring mondrian’s XML, etc. In the Pro edition, there is a GUI that automates this (a wizard sort of).

    I’ll pass along your feedback to others at Pentaho. Pentaho values their community and is always willing to listen. Point in fact, I joined Pentaho from the community when they made a “90 degree” turn based on collective community feedback. Pentaho really does listen! 🙂

    John – Touche. I look forward to playing with that SQL technology!


  6. James

    Those are encouraging words. Half of picking a solution is understanding the functional/technical aspects, but the other half is making a best guess at where it is going to take you in the future. So when I throw my concerns out there for people to respond to, it’s good that you don’t interpret that as ‘anger’ or ‘complaining’.
    I think Pentaho is going places. I’m looking at it as a potentially cost-effective way of enabling my own proprietary commerical solution by utilizing the ‘free’ license to help get my feet off the ground and compete with the big dogs. If I am successful, then there will be a need to ‘go pro’ and likely a need for consulting and maybe development contributions. Then we will have both been successful in our ventures. At least I think so. Correct me if wrong in thinking that this is at least one type of ‘customer’ that Pentaho was hoping to attract.

  7. Pingback: Matt Casters on Data Integration » Independence

  8. Amit Gupta (Analyticsworks)

    Hi Nick,

    Kettle rocks !!!

    We are developing Analytics Studio which offers various BI functionalities like basic reports, charts, olap reports, dashboards, alerts, widgets, data uploader and a wiki. Although I am not using Pentaho’s products completely to offer our services, but we are using Kettle & Mondrian/JPivot within our product suite. I must say that ETL has been the biggest pain point in most of our implementations due to diversity in input data types. Kettle is a great initiative in solving this pain. Thanks once again for this masterpiece. I look forward to further innovation in this direction.


  9. Tin

    Well, Im not really a fan of open source software and still havnt used the Pentaho personally. But based on your review, sounds that its pretty cool!


Leave a Reply

Your email address will not be published. Required fields are marked *