Pentaho Tech Tips: Call for prioritization

Open source is democratic, open, real.

While I have a good sense for which Tech Tips would be useful, I’d also like to ask the community for what tips they’d like to see written up:

  • Mondrian: Star Schema to OLAP cubes
    A very basic Star Schema with a Fact and Two Dimensions show how this is built into a Mondrian cube and how to built a “Pivot view” Pentaho report.
  • Mondrian: Advanced MDX
    Sets, top, running totals, etc
  • Kettle: Portable ETL
    Showing how to use paramater injection to make your Kettle solution (Jobs and Transforms) executable inside of Pentaho.
  • Kettle: Custom rollups using Excel
    Showing how to build a dimension, reporting table, etc using a very easy to use interface for business users.
  • Reporting: List of Values
    Show how to use the most unfortunately named Secure Filter component to do list of values (even though you are not REQUIRED to do any security).  Not very eloquent but the suggestion has been to call it a “Prompt For” component (see below).  Think “parameter page” driven by “select distinct name from my_reporting_table.”
  • Report Designer: How to build reports with Charts
    The latest release included the charting expressions so now one can build reports with lovely looking charts.
  • Report Designer: How to pass “Pentaho” parameters to reports
    This allows the building of drill thru parameters, titles, and other “context” from the server
  • Pentaho Spreadsheet Services: Your data looking sexy in Excel
    A quick how to of how to get an instant excel analytic interface into ANY database.  Example with Oracle XE.

Comments are ON… vote, have your say.  I WANT to do all of these, and will, eventually.  What do YOU want to see?

Microsoft doing good things with their money!

I’ll pay some praise the guerrilla from Redmond:

Brilliant and Hilarious shorts featuring Ricky Gervais of Office fame:

David Brent rules!

Great use of those profits!  🙂

Pentaho Linux .sh files

Small little tip:

The pentaho build process doesn’t currently manage the permissions on .sh files properly.  When you download the daily builds or other demo installations you may get some errors (bash command not founds, etc).  You need to change to executable all .sh files in the installation.  Use the following command in the “pentaho-demo” directory.

for x in `find . -name ‘*.sh’`; do chmod +x $x; done

Hope you find this helpful!

Open Source is agile

I’m not talking about the methodology in particular, I’m just saying compared to traditional software engineering practices with customer advisory boards vetting major features, rounds of marketing approvals of features, etc.

For instance, I submitted a Jira case to the Pentaho development staff for including a jar in our demo application need to run certain Pentaho Data Integration mappings.  In 20 hrs the jar had been included (already vetted for license since it’s part of another project) and is now part of the daily builds.  This is the oil that makes the open source machine great; ability for software (Pentaho as a project) to respond to real customer needs (from me).  It’s awesome!

Now that reminds me, I hadn’t highlighted some of the cool new “open source — eee” things at Pentaho yet:

  • Public Issue/Feature Roadmap:
    We have launched Jira as a place to track new feature requests, bug submissions, etc.  I greatly encourage you to register and begin using it to submit bugs / suggestions.  Can’t always say they’ll get fixed in 20 hours but they have a MUCH GREATER chance of being fixed if they’re in Jira in addition to the forums.
  • Public Source Control:
    While we’ve always published our source with every release that source repository wasn’t available to anyone on an anonymous basis.  We’re hosting a subversion now that allows easier access and contribution from our always valued community.  Consider this an open invitation to dig in, build a cool plugin, etc.

I’m glad these two things have happened; I think it just makes communication easier, effective, and more transparent.  What do you think?

Finally, not in lame-oh music devoid desktop

I’ve recently made the switch to Linux as many of you have read my previous blogs on the matter. 

One of the things that I missed dearly, but was not a critical priority, was getting streaming MP3 (shoutcast) on my headphones.  Too many higher priority things on my plate, but I finally got XMMS and the MP3 codecs.  What a pain those pesky patents have caused for end users like me. 

977 the Kickin Country Channel never sounded so good!

Windows never looked so GOOD!

In my last blog entry I was clear: Windows had crashed on me for the last time. I was through with the operating system from Redmond…

Except…

It’s a Microsoft world and I’m pragmatic enough to understand that there are simply SOME things that can not be done from Linux (device drivers for my all in one printer/scanner/fax are non existent for example). VMWare is invaluable in this regard and while I’ve raved about it before, I’ll say it again. It’s about the best 150 USD you can spend if you’re a developer.

So… Here’s how I’m using Windows that suits me just fine because it’s a) in VMWare so i only fire it up when need be and b) I’m using XGL and even Windows looks cool on the side of a 3D cube desktop.

Windows Looks Good

Last time Windows crashes on me

End of last week, Windows was kind enough to give me the annual “Blue Screen of Somehow I Screwed Up My Own Internals I Hope You Weren’t Doing Any Real Important Work Because You’ll Have to Reinstall the Operating System of Death.”  Gasp.

We’ve all been there.  What really bugged me is that when it happened, I sighed and just thought to myself that this is “the price of computing.”  This had become normal and acceptable to me… Then I shook myself a bit and became determined to rid myself, as much as possible, of the OS from Redmond.  No offense; I love Excel, think there’s some great usability in there, but it’s just not my cup of tea.

Eventually I’ll end up with a Macbook Pro; I feel the call of the siren as much as anybody.  Until then, I’m on Suse 10.1 desktop and so far I’m quite pleased. 

I’ll blog again later along on the specifics of the setup, but I’ll just say that the XGL desktop is both wicked COOL and very functional.

Donating to Open Source: Gratitude

A while back I blogged about gratitude and generosity, which was mostly about how it made ME feel when I was experiencing those feelings in times of change and growth.  What’s the flip side of that coin, or the other end of that stick, or whatever metaphor you want to use?  How does expressing gratitude to others for what they do feel?

Apparently pretty good; or at least good enough to respond with some very kind, personal notes of thanks.  A few weeks back I realized that I use two open source projects that provide exceptional products.  Truly, they’ve transcended the open source motto of "the code is the documentation and RTFM if there were one" and have created wonderful, easy to use products.  I realized that I had not given these people anything in return (I never encountered any bugs/etc to submit patches for!).

I donated, via their website instructions, to a CYGWIN developer and Gallery.  I received personal notes of thanks, expressing real gratitude.  It wasn’t for the money either (I donated 25 USD to each developer) but more of recongition of their contribution.  I get this.  If someone (me) is willing to pay someone whom they’ve never met before, willing to seek out the method (donation pages and paypal hoops), part with real money, while they’re under absolutely NO OBLIGATION or expectation to means that I think they did a great job.

Well they have! 

Have you ever considered donating to an open source project?  What open source projects do you get value from?  Consider dropping them $20 and see how good it makes them AND you feel!  I bet you’ll feel better giving $20 to the Apache foundation than paying your next enterprise software bill.

Kettle and Pentaho: 1+1=3

Like all great open source products, Pentaho Data Integration (Kettle) is a functional product in and of itself.  It has a very productive UI and delivers exceptional value as a tool in and of itself.  Most pieces of the Pentaho platform reflect a desire to keep the large communities around the original projects (Mondrian, JFree, etc) engaged; they are complete components in and of themselves.

When used together their value, as it relates to building solutions increases and exceeds their use independently.  I’ll be the first to admit that Pentaho is still fairly technical, but we’re rapidly building more and more graphical interfaces and usability features on top of the platform (many in the open source edition, but much is in the professional edition).  Much of this work involves making the "whole" (Pentaho)  work together to exceed the value of the pieces (Mondrian, Kettle, JFree, …).

A few things immediately come to mind of why Pentaho and Kettle together provide exceptional value as compared to used individually or with another open source reporting library:

  1. Pentaho abstracts data access (optionally) from report generation which gives report developers the full POWER of Kettle for building reports.

    There are some things that are tough, if not downright impossible to do in SQL.  Ever do an HTTP retrieval of an XML doc, slurp in a custom lookup from Excel, do a few database joins and analytical calculations in a SQL statement?  I bet not.  Report developers are smart data dudes; having access to a tool that allows them to sort/pivot/group/aggregate/lookup/iterate/list goes on and on/etc empowers report developers in a way that a simple "JDBC" or "CSV" or "XQuery" alone can accomplish. 
    How is this made possible?
    Pentaho abstracts (optionally, it isn’t forced on customers) the data retrievals to lookup components.  This allows BI developers to use either a SQL lookup (DB), XQuery lookup(XML), MDXLookup (OLAP), or Kettle lookup (EII) to populate a "ResultSet."  Here’s the beauty; reports are generated off a result set instead of directly accessing the sources.  This means that a user can use the same reporting templates, framework, designer, etc and feed/calculate data from wherever they desire.  Truly opens a world of possibiliy where before there was "just SQL" or "ETL into DB tables."

  2. Ability to manage the entire solution in one place

    Pentaho has invested greatly in the idea of the solution being a set of "things" that make up your BI, reporting, DW solution.  This means you don’t have ETL in one repository, reports managed somewhere else, scheduling managed by a third party, etc.  It’s open source so that’s obviously a choice, but we can add much value by ensuring that someone who has to transform data, schedule that, email and monitor, secure, build reports, administer email bursting, etc can do some from one "solution repository." Managing an entire BI solution from one CVS repository?  Now that’s COOL (merge diff/patch anyone?).

  3. Configuration Management

    Kettle is quite flexible; the 2.3.0 release extends the scope and locations where you can use variable substitution.  From a practical standpoint this means that an entire Chef job can be parameterized and called from a Pentaho action sequence.  For instance, because you can do your DW load from inside Pentaho action sequences that means you can secure it, schedule it, monitor it, initiate it from an outside workflow via web service, etc.  In one of my recent Kettle solutions ALL OF THE PHYSICAL database, file, and security information was managed by Pentaho so the Kettle mappings can literally be moved from place to place and work inside of Pentaho. 

  4. Metadata and Additional Integration

    Pentaho is investing in making the tools more seamless.  In practice (this is not a roadmap or product direction statement) this means being able to interact with tables, connections, business views inside of Kettle in an identical (at least similar way) in the report designer.  For example, if you’ve defined the business name for a column to be "Actual Sales" Kettle and the Report Designer can now key off that same metadata and present a "consistent" view to the report/ETL developer instead of knowing that "ACT_SL_STD_CURR" is actual sales. 
    Another example is the plans to do some additional Mondrian/Kettle integration to make the building of Dimensions, Cubes, and Aggregates easier.