Subreports Example Zip

There’s been some questions floating through the Pentahoshpere (I think I’m the first person to use that word, btw) about how to use Subreports. I think there’s a good description at the wiki that covers the basics, but I don’t think there’s a good working example that is shipping with Pentaho open source yet.

I don’t really have time to delve into all of it so in the spirit of “early and often” I’ll just post the zip file with a working example (on 1.6 GA Designer and Server).

Here tis:
http://nicholasgoodman.com/entry_images/pentaho_subreport_example.zip

2007 was a desert of Blogging

2007 was an off year for me when it comes to blogging. Not a surprise, since my first blog after returning from my trip to Argentina was entitled “Am I done blogging?

Seeing that I’m a self proclaimed “Data Dynamo and BI Geek” and even google agrees (search term “BI Geek” yields me at the top) it seems only fitting for me to see how bad 2007 was… You know, by the numbers.

First, pop the top on the wordpress database schema. (5 minutes)
Second, write a simple SQL Query based cube (blogmart.mondrian.xml) on top of my blog data (posts, categories) (15 minutes)
Third, do some analysis in JPivot to see how 2007 really shaped up against previous years ( instant )

2007 was my worst blogging year, ever. Including 2004 which I started blogging mid year! Ouch!

200712231215

Another interesting data point. I’ve blogged a bunch about Oracle, and Open Source. When I was first blogging I was working with Oracle as a focus of my consultancy. End of 2005 I began investigating Open Source BI in earnest, and even jumped onto the Pentaho ship for the better part of 2006 and 2007. How did this change in professional life affect my blog content? A bunch!

200712231234

In 2004 and 2005 my blog content was give or take, 75% Oracle and 25% open source. 2006 that proportion flipped and the desert of 2007 I’ve done ZERO Oracle blogs.

What about comments? What topics yield the most number of comments and discussion?
200712231244

Open Source, Pentaho, Personal, General Topics. Oracle was one of the least “commented” sections, even though I know from google keyword analytics its one of the things that drives traffic to my blog.

Well, there you have it. A year in the review of my (lacking) blog. Perhaps my new years resolution should be to blog more?

Zipcar buys Flexcar WOO WOO

Zipcar and Flexcar are both car sharing companies. For background on the concept, read about it here.

My wife and I share one car for a variety of reasons and it works out really well. We end up walking a bunch more. We take the bus downtown. I work from a home office so I don’t need a car for a daily commute. Foregoing expense of a car used only once or twice a week. Lots of reasons. Car sharing has been a great way to have the “extra car” when we need it. If I need a car in the middle of the day, I reserve one online for a couple of hours, run my errands, come back and drop it off. My wife and I LOVE the concept for so many reasons.

Now that I’ve buried the lead:

I’m THRILLED that Zipcar is acquiring Flexcar. Why? Completely selfish reasons.

I lived in Boston so my first car sharing experience was with Zipcar. I thought their “execution” on the operations side was excellent. Great website. Great card/entry system. Great refueling. Great phone interface. Great billing system (can see when I used what). Everything was great. Then I moved to Seattle.

Zipcar wasn’t in Seattle, but Flexcar was. I signed up with Flexcar and have been solidly underwhelmed with their operation. Sure, it’s the same sort of thing and its been quite similar but hands down Zipcar was much much better. Flexcar had weird lock boxes and keypads. Refueling numbers on your reservation. Flexcar website doesn’t do positional location of cars (you have to know your neighborhood). Flexcar requires you use your Member ID (2039884) to login instead of a username (ngoodman). I’ve used Flexcar for the past three years and, well, it was adequate.

The news that Flexcar has been acquired (release says merged, but Zipcar is x3 the size of Flexcar) by Zipcar is AWESOME. As a customer of BOTH companies I’m absolutely thrilled that I’ll be able to get the same Boston experience I’ve been missing here in Seattle.
200710311210
Woo Woo Zipcar.

bayon is back

For readers who have been perusing since the early days of this blog (bayon blog) you’ll know what I’m talking about. If you’re a reader that has joined in the past year and half you’re probably wondering “What is bayon?”

bayon is a boutique consulting firm specializing in Business Intelligence implementations; it’s my company that I’ve operated since 2002. I put it on the back burner when I put on a Pentaho jersey and played a few games on the Pentaho team. I’m leaving (actually, left) Pentaho. My time at Pentaho was great. The Pentaho tribe is a great group of kind, honest, smart people. Rare to find the intersection of good people and good technologists.

I’ve felt the siren call of helping customers in a more entrenched way. Consulting does that I think. So, not like it’s a big announcement, but it is belated as my last day at Pentaho was nearly two months ago:

I’m now working at bayon full time building a dedicated practice around Open Source BI technologies in the enterprise. Bayon has joined the Pentaho partner program as a Certified Systems Integrator.

So there you have it. Shingle is out.

If you are interested in Pentaho, Open Source ETL, Open Source BI, etc don’t hesitate to be in touch.

PS – It’s also worth noting that my leaving has no reflection on the progress of the business. Quite the opposite really; some would consider me foolish for leaving when the company is doing as well as it is!

A company that doesn't get "it" – ParAccel homepage

UPDATE: ParAccel have updated their website and no longer have the below silly teaser link.

I ran into someone from ParAccel at a TDWI conference this year and she alluded to the fact that I might find their product interesting. I saw another blog posting that mentioned them so I figured out it was time to go and find some information.

I’m a consultant. I’m not into vague marketing statements like “We scale linearly and process data 100x faster than everyone else.” Everyone has that on a website with a Dashboard or some whizbang graph. What is going to get a company into my sales channel (I have customers)? Good information about their product. What it is. Specs. Diagrams with approach to scaling. Things people like me need to understand what a product is.

So – here’s a dumb move. Put a link on your homepage (http://www.paraccel.com) that invites people to learn about your Architecture but then come up with some self important fake popup about an architecture that is SOOOO secret we can’t even give you a big diagram with bubbles and concepts.

200710241847

In the day and age where Open Source and user generated content, tips, ideas are progressing technology at impressive rates there are clearly some people who get “it” and don’t. Getting customers jazzed about your product (with that big diagram with bubbles and concepts) is more important than what you think your competitors will gleam from it (big diagram with bubbles and concepts).

Yawn. Moving on. 🙂

Irony: Good Service to Bad Service to Good Service

Or why Speakeasy now sucks.

Let’s set this up (and tune out if you’re not in for a customer service rant).

Cable companies have historically done an exceptionally poor job about treating customers well. Chalk it up to a condoned monopoly, stagnant business model, etc. We’d all heard the horror stories about the cable appointments missed, must be there between 8am to 5pm three weeks from today, etc.

Speakeasy, a rather hip DSL provider (VOIP, data solutions, and DSL service) has always had grand service. Call them and they are helpful and smart. The other people on the end of the line aren’t people reading scripts taking orders, these are people that know what they’re doing. I first became a speakeasy customer in 2000 and have recommended them to several people.

Speakeasy, or more specifically COVAD (their subcontractor for doing installations), muffed up my DSL installation order when I recently moved. Suffice to say that the COVAD dispatcher who berated me for not being available for an appointment that I never knew about was what unfortunately the deciding factor to leave a company that I had, otherwise, had a good experience with. So, having spent approximately 5000 USD over my lifetime as a customer (I get their top shelf DSL with all the bells and whistles) I had to say no more I’ll make other arrangements. Called and cancelled the move.

I was surprised then to find the 49.00 “order charge” on my last invoice for the cancelled order. Clearly they’re not going to charge me for having an awful customer experience which already caused me to cancel my service with them. Yup! Called to ask to have it removed. Nope. Pointed out that I’ve been one of their top shelf DSL clients. No dice. So, here a company which used to be savvy, hip, and customer focused is now trading 50USD administrative fees for happy big spending customer satisfaction (5000 USD).

Speakeasy; you had much potential to be different. But, like other phone/utility providers you’ve crossed over and you pretty much blow. Sold your soul to the nickel and diming “concession this and administration that” fees.

The irony is that when I called the Cable company who has historically done exceptionally poor in terms of responsiveness and customer support they responded exactly the way I would have expected the new, hip company to. Called two days before my installation, asked if they could do Internet in addition to Cable. No problem. 10am (when they said they’d be here) they showed up, polite, courteous, helpful, and 30 minutes later Cable + Internet up and running.

There you have it. Old companies can renew their service focus and end up wowing a customer. New companies can let their service slide and lose customers over silly stuff. Goes to show that a focus on the customer is the thing that matters more than the size and age of the providers. It’s timeless for new and old.

Using Kettle for EII

Pentaho Data Integration (aka Kettle) can be used for ETL but it can also be used in EII scenarios. For instance, you have a report that can be run from a customer service application that will allow the customer service agent to see the current issues/calls up to the minute (CRM database) but also give a strategic snapshot of the customer from the customer profitability and value data mart (data warehouse). You’d like to look a this on the same report that with data coming from two different systems with different Operating Systems and databases.

Kettle can make short work of this using the integration Pentaho provides and the ability to SLURP data from an ETL transform into a report without the need to persist to some temporary or staging table. The thing that Pentaho has NOT made short work of, is being able to use the visual report authoring tools (Report Designer and Report Design Wizard) to be able to use a Kettle transform as a source for the report during design time. That’s an important point worth repeating.

As of Pentaho 1.6, Pentaho provides EII functionality at RUNTIME but NOT at DESIGNTIME.

So, you can use an ETL transform as the source of a report, and there two examples of that. In the samples/etl directory that ships in the Pentaho BI Suite demo or you can see another example in an earlier blog entitled “Simple Chart from CSV“.

What is the best method for building reports that are going to use this functionality?

I, like others who use the Pentaho product suite, would like to use the Report Designer to build my report visually but have the data actually coming from an EII transformation. This blog is about those steps.

Step 1. Create your data set

Build an ETL transformation that ends with the data you want to get on your report. Use several databases, lookups, calculations, excel files, whatever you want. Just get your data ready (use the Preview functionality in Kettle). You’d do this with Kettle 2.5.x if you want to deploy into Pentaho 1.6. I’ve created a simple ETL transformation that does something absurdly simple: generate summary sales figures by product.
200708151622
Step 2. Add a table output step to the transformation

What we’re going to do now is create a table that we’ll use ONLY during design time to build our report. Just use any database that you have access to while designing the report (MySQL or Oracle XE on your local machine?). Add a table output step to the transformation and click on the “SQL” button to have it generate the DDL for the table. Go ahead and execute the DDL to create your temporary table that we’ll use for designing our report. Name the table something silly like MYTEMPTABLE.
200708151624
200708151627

Step 3. Execute the mapping and populate the temporary table

Hit run and get data into that table. Now we have a table, MYTEMPTABLE that has the format and a snapshot of data we want to use for building our report.

Step 4. Write your report using the temporary table as a source

Open up Report Designer. Run through the wizard (or the Report Designer) as usual and build your report (with groupings, logos, images, totals, etc) just like you normally would. You will use the MYTEMPTABLE in your temporary database as your source for this report.
200708151631

Nothing spectacular yet. All we’ve done is write a basic report against a basic table.

Step 5. Publish your report to Pentaho server and test

Using Publish (or Publish to Server) in the Pentaho Report Designer publish the report to the server so you can execute your report from the web using Pentaho. In this example I published the report to samples/etl so it’s alongside the example that we shipped with Pentaho demo server.
200708151634

Let’s make sure that report showed up.
200708151635

Great. Let’s click on it to make sure the report runs.
200708151636

Ok. Our report (etlexample.xaction) runs inside of Pentaho. Again, at this point we’ve not done anything spectacular this is just a basic (read Ugly basic grey/white) report that just selects data from MYTEMPTABLE.

Step 6. Copy transformation so it’s beside the report

It’s not REQUIRED but it’s a very good idea to DISABLE the hop from the for_pentaho step and the table output. When we run this report now we don’t actually want to do any INSERTS into a table. If we disable the hop after for_pentaho then the transformation does ZERO DML.

The ETL transformation can really be anywhere, but it’s good practice to put the transformation (.ktr file) alongside the report. Copy the kettleexample.ktr file (from Step 1) to samples/etl so that it is sitting alongside etlexample.xaction.

Step 7. Swap from Relational to Pentaho Data Integration.

You could make the change directly to the .xaction to get it to source data from the Kettle transform. However, I’m going to copy etlexample.xaction to etlexample2.xaction just so that I can see both running side by side.

In Design Studio, copy etlexample.xaction to a new action sequence etlexample2.xaction.

Open up etlexample2.xaction and make the following changes.

First, change the name of the action sequence from ETL Transformation to ETL Transformation – NO TABLE
200708151647

Second, remove the “relational” data that is producing the data for the report by highlighting the step named “rule” and then hitting the RED X to remove it.
200708151649
Third, add a Get Data From Pentaho Data Integration step ABOVE the report step.

200708151651

Fourth, configure the Pentaho Data Integration as follows.

200708151650

Some notes about what we’ve just done there. We’ve told it the name of the Kettle transformation we’d like to use to get our data is kettleexample.ktr. There are two other important pieces of information we’ve filled in on that screen as well. We’ve told the component that we’ll get our data (rows) from the step named “for_pentaho.” The component will SLURP data from that step and stream it into the result. The other piece of information we’ve given to the component is what we want to name the result set so that the report knows where to get the results. Name the result set “rule_result.”

Finally, highlight the report step and make sure that the report is getting its data from “rule_result” but we shouldn’t have to change anything else about the report. Just where it was getting its data.
200708151658

Step 8. Test your EII version of your same report

Navigate to find your new report you created that uses the Kettle ETL transformation INSTEAD of the table.
200708151658-1

Click on ETL Example – NO TABLE and you should see the same data/report.
200708151659

This report is NOT using MYTEMPTABLE and is instead, peering inside of kettleexample.ktr and getting its data from “for_pentaho” and generating the report.

Congratulations! You now have a method that you can use to create EII reports using the same visual tools as when normally developing against a relational source. Imagine the possibilities…. what you can do in Kettle (pivot, unpivot, lookup, calculate, javascript, excel, flat file, web service, XML streaming, call database procedures, and on and on and on) you can do for your reports.

Feedback welcome. The zip file for this example here. I built this example on 1.2 Demo Server GA but should work on 1.6 as well. All you need to do is unzip the file above into pentaho-demo/pentaho-solutions/samples/etl and you should have another working example.

Well, I noticed

My workday goes much smoother because I listen to a variety of Online Radio stations. Today they all went silent or played public awareness campaigns from SaveNetRadio.org. People have been wondering if anyone will notice. Well, like the title says: I noticed!

I don’t know all the mechanics, but it comes to this. A lot of these small, boutique-ish online radio stations will shut down because the cost structure of the compensation will be, in their opinion, unsuitable.

I, for one, being a proponent of open content, software, and standards think there must be some underlying disconnect between the “Copyright Royalty Board” and broadcasters.

These small, hobbyist online broadcasters are part of larger shift in broadcasting/economies. Web 2.0-ish user generated content and participatory systems of consumer and producer.

The CRB probably needs to take another look at what it’s doing to see if it’s just trying to hang on to old ideologies in a new world.

I support Net Radio. 🙂 You should too.
SaveNetRadio.org

Pet Peeve: EST != EDT

I work with people all over the country and the world. What that means is that we often schedule meetings, calls, webex meetings, remote consulting sessions, etc. Lacking some great shared calendar in the cloud that we can use to do this adhoc (I’m sure there’s some web 2.0 startup who does this so please comment if you know of something GOOD) this means that people email and put suggested and adjusted times in emails.

For instance, just yesterday, I received the following email:

The regular 10am EST XYZ meeting tomorrow is cancelled until further notice.

What’s the issue with this email? Well, we don’t have a 10am EST meeting. We have a meeting scheduled at 10am Eastern (ie, when the clock in the eastern time zone hits 10am during the summer months).

EDT and EST have VERY SPECIFIC timezone offsets.

EDT = UTC – 4
EST = UTC – 5

I use generally, and think many others also use “Eastern” to refer ONLY to local time. Ie, what the clock on the wall says in New York regardless of EDT/EST.

Let’s take the above example:

  • 10:00 EST on June 22 (someone sends an email requesting a meeting)
  • 10:00 EST = 13:00 UTC (given the definition of EST, with an offset of -5 hours)
  • 13:00 UTC = 11:00 EDT (ie, makes sense right, 10:00 EST = 11:00 EDT)
  • 11:00 Eastern = 10:00 EST (on June 22 when New York is in EDT the actual meeting time)

Obviously you assume that someone requesting a meeting for 10am EST on day that falls on EDT was ACTUALLY requesting a meeting at 10am EDT. However, why bother doing that?

My suggestion to people that can’t keep it all straight:

Use Eastern/Pacific instead of EDT/PSTs. Eastern/Pacific is clear that it’s local time but you haven’t confused it by requesting an incorrect time.

Kettles secret in-memory database

Kettles secret in-memory database is

  1. Not actually secret
  2. Not actually Kettles

There. I said it, and I feel much better. 🙂

In most circumstances, Kettle is used in conjunction with a database. You are typically doing something with a database: INSERTs, UPDATEs, DELETEs, UPSERTs, DIMENSION UPDATEs, etc. While I do know of some people that are using Kettle without a database (think log munching and summarization) a database is something that a Kettle developer almost always has at their disposal.

Sometimes there isn’t a database. Sometimes you don’t want the slowdown of persistence in a database. Sometimes you just want Kettle to just have an in memory blackboard across transformations. Sometimes you want to ship an example to a customer using database operations but don’t want to fuss with database install, dump files, etc.

Kettle ships with a Hypersonic driver, and therefore, has the ability to create an in memory database that does (most) everything you need database wise.

For instance, I’ve created two sample transformations that use this in-memory database.

The first one, kettle_inprocess_database.ktr, loads data into a simple table:
200706202230

The second one, kettle_inprocess_database_read.ktr, reads the data back from that simple table:
200706202235

To setup the database used in both of these transformations, which has no files, and is only valid for the length of the JVM I’ve used the following Kettle database connection setup.

I setup a connection named example_db using the Generic option. This is so that I have full control over the JDBC URL.
200706202227

I then head to the Generic tab and input by URL and Driver. Nothing special with the driver class, org.hsqldb.jdbcDriver that is just the regular HSQLDB driver name. The URL is a little different then usual. The URL provided tells hypersonic to use a database in-memory with no persistence, and no data fil.e”
200706202225

Ok, that means the database “example_db” should be setup for the transformations.

Remember, there is NOTHING persistent about this database. That means, every time I start Kettle the database will have no tables, no rows, no nothing. Some steps to run through this example.

  1. Open kettle_inprocess_database. “Test” the example_db connection to ensure that I / you have setup the in-memory database correctly.
  2. Remember, nothing in the database so we have to create our table. In the testing table operator, hit the SQL Button at the bottom of the editor to generate the DDL for this smple table.
  3. Run kettle_inprocess_database and verify that it loaded 10 rows into testingtable.
  4. Run kettle_inprocess_database_read and verify that it is reading 10 rows from the in-memory table testingtable.

I should note that using this approach isn’t always a good idea. In particular there’s issues with memory management, thread safety, it definitely won’t work with Kettles clustering features. However, it’s a simple easy solution for some circumstances. Your mileage may vary but ENJOY!