Twitter is what blogging was 4 years ago?

I had an email from a good friend, whom I actually got to know through mutual open source interests, but also by simply discussing topics on my blog.  He writes:

Whether you like it or not, twitter is now what blogs were 3 or 4 years ago, and you need to get involved in it, Nick.

Twitter seems like a medium that while easy and fast, is ill suited to the topics I like to blog about.  Technical articles, tips, with samples and screenshots seems like it would be tough to get across on Twitter.  I tried twitter – actually, ages ago but then deleted my account almost immediately because I just thought it was downright “silly” to “text the world” 150 characters at a time.

What am I missing?  My friend quoted above is highly regarded, certainly by me.  What am I missing?  Why would I want to communicate to the world in medium without image, italics, attachments, etc?  Phrasing it more positively, what is the best part about Twitter that makes it work foregoing time spent on this blog for Tweeting instead?

Amazon's Pre Ordering of books sucks!

I pre-ordered a copy of the new, (first, only, best, and original) Pentaho book “Pentaho Solutions” by Roland Bouman and Jos van Dongen two weeks back.  Saw from a tweet that the book was shipping from Amazon.  Cool – had a look at the page.  Sure, they can ship today if I get my order in on time so I know they can ship it.

How about my pre order, which I would assume would go out before regular orders?  Won’t ship until next week?  Delivered by 9/11/2009?  Lesson learned – don’t pre order from Amazon.  🙂

Good Riddens Ecto

I was using Ecto, a Mac OS X blog client for the past year or so.  Overall I really DIDN’T like it, except that it had the ability to allow me to “Paste” images into the blog using the clipboard.  You’d be surprised at how many blog clients still force you save a file on to your computer, then browse to it, and then upload it, then include it in your post.  A real drag when you’re trying to do a few simple software screen shots.

I upgraded to a new Mac recently and the Ecto license key I had didn’t come over from the old Mac.  According to the Ecto website, just email them and they’ll send it along.  I emailed on 8/11/2009 and no response.  Not good customer service at all. 

But I really wanted to blog about the encrypted variables; I downloaded and gave ScribeFire a try and I’m switching.  It can paste images same as Ecto and it’s built right into Firefox.  Excellent!

So… Ecto customer service really goofed on this one.  You’ve got a customer who isn’t happy, gave me an opportunity to find something I like better, and a chance to blog about the whole thing!

Encrypted Variables in PDI

Every once in a while, I get to sound like a royal arse in front of a customer by saying something “I know” to be true about Pentaho that isn’t.  Usually, this is a REALLY good thing because it’s usually some limitation, or Gotcha that existed in the product that has magically disappeared with the latest release.  The danger of open source is that these things can change underneath you quickly, without any official fan fare and leave you looking like a total dolt at a customer site.  Bad for consultants like me who are constantly having to keep up with extraordinarily fast product development.  Good for customers because they get extraordinarily fast product development.

One of these experiences, which I was absolutely THRILLED to look like a dolt about, was

“If you use variables for database connection information, the password will be clear text in” 

A huge issue for many security conscious institutions.  Customers were faced with a choice: use variables which centrally manages the connection information to a database (good thing) but then the password is clear text (bad thing).  No longer!

Our good friend Sven quietly committed this little gem nearly 18 months ago. It’s been in the product since 3.0.2!  It allows encrypted variables to be decrypted in the password field for database connections.

Let’s test it out… our goal here is to make sure we can get a string “Encrypted jasiodfjasodifjaosdifjaodfj” which is a simple encrypted version of the password to be set as a regular ole variable but then be used as the “password” of a database connection.

We have a transformation that will set the variables, and then we’ll use that variable in the next transformation.

The first one sets the variable ${ENCRYPTED_PASSWORD} from a text file.  This string would be “lifted” from a .ktr after having been saved that represents the encrypted password.

Then we use it in the next transformation and select from a database, and outputs the list of tables in the database to a text file.

Output – works like a charm! 

Customers can now have the best of both worlds.   Centralize their variables for host/user/password using variables (including, and keep those passwords away from casual hackers.  I say casual because PDI is open source so in order for someone to decrypted a password they only need know Java, and know where to find PDI SVN.  🙂

As always, example attached:

CDF Tutorials

The folks at webdetails have posted their Pentaho Community Dashboard Framework tutorials that look great!  They run you through building CDF dashboards which is usually a crucial, user facing part of any BI implementations.  While much of the work is the ETL/OLAP configuration, tuning, etc on the backend most users think of Pentaho as the dashboard/reports they interact with not the data munching for the Data Warehouse.

These tutorials look great; I’ve implemented more than 20 CDF dashboards at four customers already but I still bought them to learn even more ins and outs.  You should too! No better way to learn something than from the source of the technology which in this case is Pedro and team @ webdetails.

Off Topic: My First Airplane Solo

I’ve wanted to fly since I was 14.  I started training in 2004, but cut it short when I moved from Boston to Seattle.  The wife was kind enough to give me a swift kick in the butt telling me to go get the license now; doing so later in life will be more difficult.

In May I logged 15 hours of dual instruction at Boeing Field (short 15 min drive from my house) and yesterday I soloed the 172 Skyhawk N52139 for the first time.

I was expecting the solo to be intense; thinking intensely and hearing my instructors voice in my head with all the things that have to happen all at the same time to get the plane safely on the ground.  Instead, I found that after the training, I know how to simply “fly the plane” and was just doing what I knew the plane needed to land softly without much thinking.  After 2 touch and gos and a full stop landing both me and my instructor concluded I’ve acquired the “walking and chewing gum” skills when it comes to flying.

I’ll be continuing training hours over the next couple of months.  I’m aiming for September for getting my license so I can take my wife places with me in the plane.  After all, her encouragement is what got me going on this old aspiration; least I can do is fly her to dinner in the San Juans.

MDX Humor from Portugal

Pedro Alves, the very talented lead developer behind the Pentaho Community Dashboard Framework gave me a good chuckle with his high opinion of MDX as a language:

MDX is God’s gift to business language; When God created Adam and Eve he just spoke [Humanity].[All Members].Children . That’s how powerful MDX is. And Julian Hyde allowed to use it without being bound to microsoft.

If you haven’t checked out Pedro’s blog, definitely get over there. It’s a recent start but he’s already getting some great stuff posted.

PDI Scale Out Whitepaper

I’ve worked with several customers over the past year helping them scale out their data processing using Pentaho Data Integration. These customers have some big challenges – one customer was expecting 1 billion rows / day to be processed on their ETL environment. Some of these customers were rolling their own solutions; others had very expensive proprietary solutions (Ab Initio I’m pretty sure however they couldn’t say since Ab Initio contracts are bizarre). One thing was common: they all had billions of records, a batch window that remained the same, and software costs that were out of control.

None of these customer specifics are public; they likely won’t be which is difficult for Bayon / Pentaho because sharing these top level metrics would be helpful for anyone using or evaluating PDI. Key questions when evaluating a scale out ETL tool: Does it scale with more nodes? Does it scale with more data?

I figured it was time to share some of my research, and findings on how PDI scales out and this takes the form of a whitepaper. Bayon is please to present this free whitepaper, Pentaho Data Integration : Scaling Out Large Data Volume Processing in the Cloud or on Premise. In the paper we cover a wide range of topics, including results from running transformations with up to 40 nodes and 1.8 billion rows.

Another interesting set of findings in the paper also relates to a very pragmatic approach in my research – I don’t have a spare 200k to simply buy 40 servers to run these tests. I have been using EC2 for quite a while now, and figured it was the perfect environment to see how PDI could scale on the cheapest of cheap servers ($0.10 / hour). Some other interesting metrics, relating to Cloud ETL is the top level benchmark of a utility compute cost of ETL processing of 6 USD per Billion Rows processed with zero long term infrastructure commitments.

Matt Casters, myself, and Lance Walter will also be presenting a free online webinar to go over the top level results, and have a discussion on large data volume processing in the cloud:

High Performance ETL using Cloud- and Cluster-based Deployment
Tuesday, May 26, 2009 2:00 pm
Eastern Daylight Time (GMT -04:00, New York)

If you’re interested in processing lots of data with PDI, or wanting to deploy PDI to the cloud, please register for the webinar or contact me.

NYC 4/9 and 4/10

I’ll be roaming in Manhattan this Thursday and Friday. Have some plans, but actually have some bits of time.

Anyone want to shoot the breeze about Pentaho / BI / Open Source? Email me:

Pentaho Partner Summit

I’m at the Westin close to the event space for the summit…

I’m around tonight – meeting Bryan Senseman from OpenBI a bit later (730 or 800pm).  Anyone else around and want to meet up for dinner?  Email me