Archive

Archive for the ‘Uncategorized’ Category

SaaS or On Site? Who cares with Pentaho On Demand

June 8th, 2010

Pentaho launched their On Demand initiative today: Press Release.

While the launch is new, I know that Pentaho has already onboarded some customers in a quiet soft launch and the response has been very positive.  Why wouldn’t it be?  This offering is the best of both worlds, and makes purchasing a business department driven BI project easy.

SaaS BI’s key selling point (there are many small, nice to haves, but the thing that gets people to reach for their wallet) is the ability to get a solution with an almost total lack of IT involvement.  Throw in non cap-ex expenditures for the solution (4k USD / mo instead of a 30k license) and it’s a huge win.

Business users have their data (feeds, dumps, extracts, or connections to DBs), have the budget but then need the tools/expertise to get their BI system “up and running.”  SaaS and On Demand BI is a perfect fit for these customers - up and running quickly.  Where it breaks down, is that with a SaaS offering, once you go SaaS you can’t EVER go back.  You’re stuck with a solution built entirely upon a proprietary, vendor controlled software and infrastructure.

Recap:

Biggest draw of SaaS is quick easy startup without IT, and smaller monthly pay as you go
Biggest drawback of SaaS is lock in like you’ve never seen before.  Not just software, but operations as well.

That’s why I find the Pentaho On Demand BI initiative to hit the middle of the sweet (suite?) spot.  It is absolutely the best of both worlds.

Their On Demand offering allows

  • Business sponsors to get a complete BI suite up and running quickly (72 hr challenge is wicked cool) without IT involvement (or minimal).
  • Incremental, pay as you go billing.  This is huge - not sure your BI project will generate a return?  Spin up Pentaho On Demand, do the 72 hr challenge, and shop it around the demo to the users for a month or two.
  • This is what knocks it out of the park though; Once you’re done with your eval/build out you have so many options whereas with pure SaaS you don’t.  Not what the users wanted? No problem: throw it away.  Is what they want?  Keep going On Demand?  On Demand not something your IT likes -> bring it On Site.  That’s right, Pentaho and their hosting partners have designed their On Demand offering to permit shutdown and transport of the machine VMWare images in house.  Even without that, just getting your Pentaho Solution you can install Pentaho locally and run it in house with your reports/cubes/dashboards.

On Demand BI won’t be for everyone - it’s still geared towards people wanting a complete suite of BI tools, and are willing to pay for a private, secure Pentaho instance in the cloud.  SaaS BI will still be better for people who actually prefer ZERO infrastructure and for those that don’t have the budget for a private infrastructure (if you have $100 / mo for BI, this isn’t right for you).

Great news for “frustrated with IT but still wanting to build out a real, long term BI solution” analysts everywhere.  :)

Uncategorized

DynamoDB: Time Dimension table with MERGE

November 20th, 2009

So, even with my disclaimer note on the last blog, the DynamoDB developers slapped me around a bit for suggesting using a view for a Time Dimension. The Time Dimension is the most important dimension table and should be an actual table, not a view. Creating the table allows us to perform all kinds of optimizations like star joins, bitmap indexes on attributes, etc. Probably wouldn’t be that big of a deal for a tiny fact table (< 5million records) but you’ll want CREATE TABLE if you want good performance.

Good news is, that we can use our exact same table function (with fiscal year offset)

select * from table(applib.fiscal_time_dimension (2000, 1, 1, 2009, 12, 31, 3))

to populate and keep our Time Dimension TABLE up to date.

If you use a TABLE, it’s 2 steps:

  1. CREATE TABLE : “dim_time”
  2. POPULATE TABLE : “merge into dim_time”

We’ll be using another great tool in the DynamoDB / LucidDB toolkit, the MERGE statement. The MERGE statement is a logical UPSERT. It checks to see if key is already present. If it is, we UPDATE the table. If it isn’t, we INSERT it into the table. I’ll go into more detail at some point in the future as MERGE is crucial for keeping dimensions up to date.

Let’s create our Time Dimension table:

create table dim_time (
FISCAL_YEAR_END_DATE DATE
, FISCAL_YEAR_START_DATE DATE
... ABBREVIATED ...
, TIME_KEY DATE
, TIME_KEY_SEQ INTEGER
,constraint dim_time_pk primary key (day_from_julian));

NOTE: We’ve abbreviated the statements, but all the columns are used in the actual scripts. We also should add bitmap indexes on YR, MONTH, etc columns.

We’ve now got a TABLE that matches the VIEW we created in the previous blog. We’ve made day_from_julian as our PK, and we’ll use this date as our key for the MERGE statement. We can run this query as many times as we like and it will always just keep our “dim_time” table up to date.

merge into dim_time using (select * from
      table(applib.fiscal_time_dimension (2000, 1, 1, 2010, 12, 31, 3))) src
on dim_time.day_from_julian = src.day_from_julian
when matched then UPDATE set
FISCAL_YEAR_END_DATE=src.FISCAL_YEAR_END_DATE
,FISCAL_YEAR_START_DATE=src.FISCAL_YEAR_START_DATE
... ABBREVIATED ...
,TIME_KEY=src.TIME_KEY
,TIME_KEY_SEQ=src.TIME_KEY_SEQ
when not matched then INSERT
(FISCAL_YEAR_END_DATE
 , FISCAL_YEAR_START_DATE
... ABBREVIATED ...
 , TIME_KEY
 , TIME_KEY_SEQ)
values(
src.FISCAL_YEAR_END_DATE
 , src.FISCAL_YEAR_START_DATE
... ABBREVIATED ...
 , src.TIME_KEY
 , src.TIME_KEY_SEQ);

Uncategorized

What if Hot Dog vendors sold you Power Tools?

October 13th, 2009

Well, since it’s been topical, and I can never resist an urge to discuss open source licensing…

What do you buy from a Hot Dog Vendor? Hot Dogs, duh!

What do you buy from an Office Supplies Vendor? Office Supplies, duh!

What do you buy from (most) Open Source Vendors? Proprietary Software, … huh?

Let’s do this another way…

What do you get if you buy Yellow Products? Products that are Yellow, duh.

What do you get if you buy Enterprise Software Products? Products that are fit for use by Enterprises, duh!

What do you get if you buy (most) Open Source Products? A proprietary product built on an open source project, … huh?

If you want to be called a “insert term here” vendor should sell “insert term here.” Otherwise you aren’t really “vending” it, you’re just using it as part of a strategy, marketing, development method, etc. Which, in my opinion, is what open source is: A way to develop and distribute software, not what you are selling. Very few of the open source companies actually sell an “Open Source Product.” They sell a proprietary one and services built on top of a great open source project, aka Open Core.

Most “Open Core” companies should simply be defined as “Software Companies with exceptional Open Source development models.” You can not purchase an “open source product” from an Open Core company. You can purchase their proprietary product on top of the open source project, but there is no product you can buy that is “open source” from most Open Core companies.

See the difference? Product and Project are not the interchangable. Vendor and “Model/Company” are not interchangable.

Don’t get me wrong - I’m no enemy of Pentaho by any means; quite the opposite. Just last week I wholeheartedly recommended to a customer they renew their EE subscription based ONLY on the new Pentaho Analyzer (which is GREAT, btw)!  It doesn’t negate the value they sell to customers, Open Core companies still deliver exceptional value.  I don’t call in to question the validity of the Open Core model and it’s mutual benefit for those involved (as James points out consistently in his BeeKeeper).

Open Core Companies just shouldn’t be surprised when people experience cognitive dissonance when they buy a proprietary product without an open source license from an “open source vendor.”

Uncategorized

Pentaho Solutions Book

September 3rd, 2009

I wonder how many people can we get looking like this?

In all seriousness, the book looks great.  Will bring it on my next extended plane ride - already saw a brilliant diagram (Figure 9-2 on page 234) that perfectly shows PDIs “streaming” architecture.  So, Amazon apparently was trying to exceed expectations (you’ll get by 9/11) but here I have it in hand.

Will put together a more thoughtful review and post it here for those that might be on the fence about buying it.  Based on my initial look, I’m sure the book is worth well beyond its price as a practical guide for using Pentaho.

Uncategorized

Twitter is what blogging was 4 years ago?

September 2nd, 2009

I had an email from a good friend, whom I actually got to know through mutual open source interests, but also by simply discussing topics on my blog.  He writes:

Whether you like it or not, twitter is now what blogs were 3 or 4 years ago, and you need to get involved in it, Nick.

Twitter seems like a medium that while easy and fast, is ill suited to the topics I like to blog about.  Technical articles, tips, with samples and screenshots seems like it would be tough to get across on Twitter.  I tried twitter - actually, ages ago but then deleted my account almost immediately because I just thought it was downright “silly” to “text the world” 150 characters at a time.

What am I missing?  My friend quoted above is highly regarded, certainly by me.  What am I missing?  Why would I want to communicate to the world in medium without image, italics, attachments, etc?  Phrasing it more positively, what is the best part about Twitter that makes it work foregoing time spent on this blog for Tweeting instead?

Uncategorized

Encrypted Variables in PDI

August 18th, 2009

Every once in a while, I get to sound like a royal arse in front of a customer by saying something “I know” to be true about Pentaho that isn’t.  Usually, this is a REALLY good thing because it’s usually some limitation, or Gotcha that existed in the product that has magically disappeared with the latest release.  The danger of open source is that these things can change underneath you quickly, without any official fan fare and leave you looking like a total dolt at a customer site.  Bad for consultants like me who are constantly having to keep up with extraordinarily fast product development.  Good for customers because they get extraordinarily fast product development.

One of these experiences, which I was absolutely THRILLED to look like a dolt about, was

“If you use variables for database connection information, the password will be clear text in kettle.properties.” 

A huge issue for many security conscious institutions.  Customers were faced with a choice: use variables which centrally manages the connection information to a database (good thing) but then the password is clear text (bad thing).  No longer!

Our good friend Sven quietly committed this little gem nearly 18 months ago. It’s been in the product since 3.0.2!  It allows encrypted variables to be decrypted in the password field for database connections.

Let’s test it out… our goal here is to make sure we can get a string “Encrypted jasiodfjasodifjaosdifjaodfj” which is a simple encrypted version of the password to be set as a regular ole variable but then be used as the “password” of a database connection.

We have a transformation that will set the variables, and then we’ll use that variable in the next transformation.

The first one sets the variable ${ENCRYPTED_PASSWORD} from a text file.  This string would be “lifted” from a .ktr after having been saved that represents the encrypted password.

Then we use it in the next transformation and select from a database, and outputs the list of tables in the database to a text file.

Output - works like a charm! 

Customers can now have the best of both worlds.   Centralize their variables for host/user/password using variables (including, kettle.properties) and keep those passwords away from casual hackers.  I say casual because PDI is open source so in order for someone to decrypted a password they only need know Java, and know where to find PDI SVN.  :)

As always, example attached: encrypted_variables.zip

Uncategorized

PentahoFlashChart: Basics

January 8th, 2009

PentahoFlashChart is a component built for Pentaho 2.0.stable (or Enterprise Edition) that allows you to render flash charts in the User Console and Dashboards. It uses the flash library “Open Flash Chart 2.0” which is an LGPL licensed SWF. The charts are NOT available for rendering in Pentaho Reporting or Pentaho Report Designer.

Let’s first understand what we’re talking about: Consider the following example run from Bayon’s ‘demo’ server.

NOTE: This demo server is not a 100% uptime server - if you don’t see anything above don’t worry. Check back later.

A chart can be rendered via a URL:

200901081053

via the Pentaho User Console:

200901081054

via Dashboards (JSP/AJAX/CDF):

200901081055

The PentahoFlashChart component is intended to be used, like any other component, from an .xaction which is created in the Pentaho Design Studio.

You can follow download this example here: ofc_linechart.xaction

Just like any chart or report, you have to get your data for reporting. In this case, we’re getting a Relational data set from the Pentaho SampleData data source.

200901081059

We also have a “chart template” which is an XML document VERY similar (same same but different) to the pentaho chart xml definition that defines how to build the chart. It contains things like chart tile, chart type, etc. The “chart template” is really where most of the “interesting” work occurs.

<chart><chart-type>LineChart</chart-type><title>Line Chart–Product “1948 Porsche 356-A Roadster”</title><x-steps>1000</x-steps><x-max>20000</x-max><domain-title>Day</domain-title><range-title>Sold Price</range-title><y-axis><labels></labels><y-steps>1000</y-steps><y-max>10000</y-max><color>#909090</color></y-axis></chart>

Notice that this chart template has a <chart-type>LineChart</chart-type> which means that the component will output a LineChart in Open Flash Chart. The full scope of the tags, and all their functioning is beyond the scope of this intro article and it should also change. The next release of PentahoFlashChart should be 100% compatible with the Pentaho Chart XML definition. We’ll see. :)

This renders the flash fragment to the page, with a “callback” for the data file that has been added to the content repository.

Consider installing it into Pentaho 2.0.stable and playing with it. The sample solution comes with 11 Examples so it should be relatively easy to get going, even though it lacks good documentation.

200901081110

It’s great for dashboards, and produces some great looking (if not limited) charts.

Uncategorized

Beautiful Flash Charts: Part II

June 13th, 2008

So, it appears as if there was some pent up demand for great looking flash charts. The brief couple of days that my initial post on my rough integration work with Open Flash Charts I’ve had:
- 2 Pentaho Partners ask for the solution so they can start using it
- 3 Community members ask about it (including one who started but never finished a similar task)
- An existing customer decide to implement it

Cool! As an open source guy, I believe in early and often, so I’m posting my .xactions for this stuff here.

Installation Steps

  1. Have a working Sample BI Server
  2. Drop open-flash-chart-.swf into pentaho-demo/jboss/server/default/deploy/pentaho-style.war pentaho-demo/jboss/server/default/deploy/pentaho-style.war/images (nice catch in comments below)
  3. Drop flash_chart_example_bar.xaction and flash_chart_example.xaction into pentaho-solutions/samples/charts

That should you get two the sample bar chart and the sample pie chart working.

These action sequences are kind of fancy. They do a fair bit of string replacements, result set walking, etc. So, they aren’t for the casual user but if you’ve done some Pentaho stuff before you’ll be able to work your way through it.

The interesting part is really the “datacall=true” branch. The first time the action sequence is called it returns a fragment of code that contains the flash object.

<object classid=”clsid:d27cdb6e-ae6d-11cf-96b8-444553540000″ codebase=”http://fpdownload.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=8,0,0,0″ width=”600″ height=”500″ id=”graph-2″ align=”middle”><param name=”allowScriptAccess” value=”sameDomain” /> <param name=”movie” value=”/pentaho-style/images/open-flash-chart.swf?width=600&height=500&data=http%3A//localhost%3A8080/pentaho/ViewAction%3Fsolution%3Dsamples%26path%3Dcharts%26action%3Dflash_chart_example_bar.xaction%26datacall%3Dtrue” /> <param name=”quality” value=”high” /><param name=”bgcolor” value=”#FFFFFF” /> <embed src=”/pentaho-style/images/open-flash-chart.swf?width=600&height=500&data=http%3A//localhost%3A8080/pentaho/ViewAction%3Fsolution%3Dsamples%26path%3Dcharts%26action%3Dflash_chart_example_bar.xaction%26datacall%3Dtrue” quality=”high” bgcolor=”#FFFFFF” width=”600″ height=”500″ name=”open-flash-chart” align=”middle” allowScriptAccess=”sameDomain” type=”application/x-shockwave-flash” pluginspage=”http://www.macromedia.com/go/getflashplayer” /> </object>

In this fragment, the flash object is given a “datafile” location which is the same action sequence but with a datacall=true.

The datacall=”true” basically returns a text file that looks like this:

&y_min=0& &y_max=40000000& &y_steps=4& &title=Actual vs Budget by Region,{font-size:20px; color: #bcd6ff; margin:10px; background-color: #5E83BF; padding: 5px 15px 5px 15px;}& &y_legend=USD,12,#736AFF& &x_labels=Central,Eastern,Southern,Western& &x_axis_colour=#909090& &x_grid_colour=#D2D2FB& &y_axis_colour=#909090& &y_grid_colour=#D2D2FB& &bar_glass=55,#D54C78,#C31812,Actuals,12& &values=37893162,35248940,35248940,35248940& &bar_glass_2=55,#5E83BF,#424581,Budget,12& &values_2=38397600,35487861,34803861,34510067&

This text file is really what gives the flash chart it’s form, labels, and data.

Again, this is quick and dirty implementation but it’s a life saver if you need something more than the charting in the platform.

Uncategorized

Beautiful Flash Charts for Pentaho

June 11th, 2008

I’ve worked on several customer dashboards and found the charting in Pentaho to be pretty good in a lot of circumstances, but lacking for a lot of circumstances. In particular, certain shading, animations, etc aren’t supported in Pentaho charts (based on JFreeChart).

There are a bunch of Flash charting libraries, and I recently worked with a customer that was using “Open Flash Charts.” I helped them get Mondrian data streaming through to this flash charting engine. I was surprised to find that the library is open source, and is moving to LGPL (away from GPL) to ensure that people feel comfortable embedding it in their applications.

I started integrating these charting capabilities with Pentaho to see how the charts look. I was seeing some really great results. The integration was done via a fair amount of fancy Javascript/Xaction sequence stuff but this integration did not require any custom Java application work. Just Pentaho .xactions and the basic open-flash-chart.swf. I might start looking at building a small little JSP library to help with some of this.

The first one I built was a little pie chart, that has a nice animation (copied and pasted here without the dynamic .xaction stuff)

The second one I built was this beautiful bar chart, comparing actuals and budgets.

In all cases, if you’re needing some “more” from Pentaho in terms of data visualization, don’t hesitate to be in touch. This flash chart is the latest in a series of dashboards that Bayon has been building for customers.

UPDATE: I built another one for a new customer, and changed the data labels for presentation here. Having a grand time with open flash chart. This chart below is the output of an MDX query to Mondrian. The one is the metric ( a base measure ) the other is a running total.

Uncategorized

Pentaho goes GPL: A non-event

June 10th, 2008

Pentaho announced last week that their BI Platform version 2.x and onward would be released under the GPLv2 license. I’m an outspoken critic of GPL for a lot of use cases, and personally lean toward an Apache/MIT/BSD myself. However, for nearly everyone involved in Pentaho this is a non event, not that big of a deal, and good for Pentaho.

By now, if you’ve ever read anything I’ve written before about GPL for “business-eee” type projects you’re probably wondering “Has Nicholas completely sold out?” Well, I’ll leave that conclusion for another venue/time, wink wink, but there’s some very clear reasons why GPL is not a bad thing for most people involved in Pentaho.

First and foremost, is to understand what is moving to GPL. That makes a huge difference in understanding the impact. It is only the BI Platform technologies that are going GPL and the core libraries (Reporting, Kettle, Mondrian, …) are remaining under their original (ie, somewhat permissive) licensing. The things that are being GPL’ed are the things that end users are using. For instance, the ability to navigate through a set of reports. Run reports with parameters, etc. This is the code that makes the Pentaho core technologies (OLAP/ETL/Reporting) look and feel like a full product with login screens, UIs, run scheduling, etc.

The other piece to mention is that GPL only really affects ISV/OEMs.

For end users (even SaaS providers) it makes no difference GPLv2 vs MPL. So, if you’re considering downloading Pentaho to start a project at your company for your own intranet, extranet, BI, dashboards, etc this will have NO affect on you.

One of my beefs with the GPL has always been that it stunts adoption and the ability for multiple parties to work on the project, embed and utilize it in a commercial venture. The core libraries remain in tact in this regard - Mondrian can be embedded just as easily as it was originally because it’s license remains unaffected. Kettle can as well (LGPL). Pentaho Reporting - good to go too. The Platform as a set of UI (and productized versions of the core libraries) will be, in my opinion, cast aside for anyone wanting to embed these technologies into their own product.

The license will now be a big contributor to this decision, but to be truthful, if you want to “just use” Mondrian then you’re BETTER OFF by “just using” Mondrian. If you want Mondrian in conjunction with Reporting now you’ll want to consider the Platform but my experience shows that if you’re using these technologies in your application using the core applications/interfaces is preferable. The platform makes the projects work for end customers, but the platform is kind of “a lot” for someone who just wants to execute some ETL jobs or use JPivot/Mondrian in their application. That’s not to say that ISV/OEMs shouldn’t reach out to Pentaho to still get OEM support on embedding “just Mondrian” into their application. Pentaho’s subscription and services are quite valuable in this regard - I can think of no better group of people to help make a project successful then the people who wrote it.

It’s not clear to me whether or not Pentaho Metadata will be GPL. When I was working at Pentaho I advocated strongly against GPL for it, because I believed that done correctly the project could become *the* metadata editor/infrastructure for just about any new Open Source or proprietary project. For a variety of reasons, this hasn’t happened. GPL, in my opinion, ensures that Pentaho’s Metadata project will remain solely and simply that: Pentahos Metadata project. I don’t think they’ll be any other salient, significant contributor if it goes GPL. However, it’s not a big loss to Pentaho since there has been hardly any (have there been any?) contributions to that project to date anyhow.

GPL, should it provide Pentaho more “protection” on the Platform code so that it can not be ISV/OEM’ed without payments, could end up benefitting most everyone. Why? Because should Pentaho feel like it’s able to monetize the open source edition consistently, there is less need to keep more in the professional edition. If GPL provides additional cover, I’d hope to see more code flying into the Open Source (GPL) edition of the product. However, I’ve not heard anything about this from Pentaho and only time will tell. :)
There you have it.

GPL makes pretty much no difference to end users, customers, SaaS providers, etc. It pretty much makes no difference to ISV/OEMs because they’ll want to embed the core libraries, not necessarily the entire platform. Pentaho remains a strong choice in every regard; customers are signing up in droves, the value is immense.

It is, for all intensive purposes, a non event.

Uncategorized