Category Archives: Open Source

New open source project: OWBScripts

I hadn’t had a chance to post yet, but Mark made mention of it on his blog so I figure it’s about time to post about it.

OMB is the TCL based scripting language that comes with Oracle Warehouse Builder that allows you to do OWB “things” programatically (ie, without the GUI).  It is very useful for doing ETL generation, mass updates, deploying mappings, etc.  Basically, anything that you are doing repetitively is a good candidate for making into an OMB script.  OMB is a cure for “tennis elbow” from clicking hours on end in the OWB GUI.

I’ve released a handful of OMB scripts that I used on consulting gigs, presentations, articles, etc.  There is nothing spectacular here, but hey, they’re not doing me any good!  If just one or two people find them useful it was worth the time to slap the Apache 2.0 license and upload them to http://sourceforge.net.

The release (initial and only unless someone else out there wishes to take on the management/augementation) includes scripts to:

a) Generate base SOURCE to STAGING Truncate/Staging mappings and tables.
b) Generate base STAGING to WAREHOUSE Insert/Update mappings, tables, and sequences.
c) Install repository and the standard CIF targets (Staging, Warehouse, AreaMart).

Let me know what you think and I do hope someone, somewhere finds it useful!
PS – I haven’t used OWB for nearly 9 months.  For something I used day in day out for YEARS that’s a long time to have not even touched it!

Pentaho tops 4.4 Million USD in open source code

Startups are interesting: Some days you love your job, other days you want to throw yourself out the window.  Yesterday I hated my job, for a variety of reasons.  One of the things that cheers me up when I’m in the midst of some tough stuff is looking at the big picture.

To date, Pentaho has built more than 313,981 lines of open source code.  It’s an estimated 81 person years.  At 55,000 USD / year for a developer that roughly equates to about 4,400,000 USD of “code” built and released under a business friendly, OSI approved, open source license (MPL).

WOW!

We put the vast majority of our stuff into the open source project (more than 80%); it’s a complete product in and of itself and that’s something I personally am proud of.  I’ve added the “OHLOH” badge for Pentaho to the upper right hand corner so there’s a ticker on this page to keep track of the breadth, size, and investment in the open source edition of Pentaho:

Incidentally, the metrics are calculated by a very cool upstart ohloh.  They slurp data from source control systems and display cool metrics about projects, like ours.  Check them out!

Open Source has a little secret: Exhibit B

UPDATE: I’ve submitted the WPL to OSI for approval. It’s a proxy for the Exhibit B licenses used by the companies listed below. We’ll find out soon enough if the OSI believes Exhibit B meets OSD.
UPDATE: OSI has refused to approve the WPL; not because it has actually been vetted to OSD but because the OSI does not want to consider a license from anyone except the original license author. That means, like now, the FOSS community won’t know if Exhibit B companies are actually releasing open source.
UPDATE: Someone told me that Mulesource is also using this. Added them to the list below.
UPDATE: It’s like a bad dream. Dimdim doesn’t care about “Open Source” either. Added them to the list below.
First of all, and no tongue in cheek, I’d like to attribute this as a follow on and continuation of the debate on attribution licenses started on “AC/OS.” In that spirit, I hope this is a “distribution” of that debate, and not a “fork.”

Second, I have nothing but respect for the principals at the companies that are using Exhibit B. I take issue with the substance of their license; nothing more. From my use/evaluation/understanding their products are excellent. I have a sincere desire for them to be successful, just believe they need to do it adhering to the same principles adhered to by the majority of the open source community.

What is Exhibit B?

It’s a clause appended to the Mozilla Public license by some open source startups (listed below). In this blog we’ll consider a fictituous version of this license from “WhizbangAppCompany.”

What Exactly is Exhibit B?

It’s the second clause of a two clause addition to the Mozilla Public License that basically states:
a) You must include on each UI screen a tagline or logo reading “Powered by WHIZBANGAPPCOMPANY.”
b) You have no right to use the trademark WHIZBANGAPPCOMPANY even if it’s included in the UI.

Here’s the Exhibit B text from our WPL (this is just a copy and replace on actual Exhibit Bs).

I’ve copied and pasted one here for reference:

WhizbangAppCompany Public License 1.0 – Exhibit B

Additional Terms applicable to the WhizbangAppCompany Public License.

I. Effect.

These additional terms described in this WhizbangAppCompany Public License – Additional Terms shall apply to the Covered Code under this License.

II. WhizbangAppCompany and logo.

This License does not grant any rights to use the trademarks “WhizbangAppCompany” and the “WhizbangAppCompany” logos even if such marks are included in the Original Code or Modifications.

However, in addition to the other notice obligations, all copies of the Covered Code in Executable and Source Code form distributed must, as a form of attribution of the original author, include on each user interface screen (i) the WhizbangAppCompany Community” logo, (ii) the vendor disclaimer “Supplied free of charge with no support, no certification, no maintenance, no warranty and no indemnity by WhizbangAppCompany or its certified partners. Click here for support. And certified Versions” and (iii) the copyright notice in the same form as the latest version of the Covered Code distributed by WhizbangAppCompany at the time of distribution of such copy. In addition, the “WhizbangAppCompany Community” logo and vendor disclaimer must be visible to all users and be located at the very bottom left of each user interface screen. Notwithstanding the above, the dimensions of the ” WhizbangAppCompany Community “ logo must be at least 176 x 26 pixels. When users click on the “WhizbangAppCompany Community ” logo it must direct them back to http://www.whizbangappcompany.com. When users click on the vendor disclaimer it must direct them to http://www.whizbangappcompany.com In addition, the copyright notice must remain visible to all users at all times at the bottom of the user interface screen. When users click on the copyright notice, it must direct them back to http://www.whizbangappcompany.com.

What does that actually mean?

There’s a lot of implications… Suffice to say it means A LOT because it’s the difference between meeting the definition of Open Source (OSI approved) and not meeting the defintion of Open Source (not OSI approved). The Exhibit B license is being evaluated currently but the determination of whether these companies are actually releasing open source code is in question.

Implication One: What the fork?!?

A long term “litmus” test for user rights re: open source, is to not be bound to one company or organization. Open Source must be able to fork, even though it’s often undesirable.

I’ll reiterate a scenario I posted on a scenario with how this could turn out really bad for customers “thinking” they have the benefit of open source when they implement and purchase services:

2007 – WhizbangAppCompany (company and project) flourishes. Acquires 1000 customers on the premise of Open Source.
2007 – WhizbangAppCompany (company) bought by big mean company where products go to die.
2008 – WhizbangAppCompany users and customers are unhappy. Partners, users, developers, customers are relieved they are using “open source.”
2008 – Coalition of users, customers, and developers “fork” and a new company is formed “EmailRulez”
2008 – EmailRulez screwed, customers LOCKED IN. Can’t remove references to WhizbangAppCompany (can’t remove from UI), but are threatened by large big mean company for Trademark infringement for distributing a product with WhizbangAppCompany trademark.

Probably the primary reason this doesn’t meet the open source definition is that a royalty or other fee (trademark) can be enforced by anyone who uses, or distributes this product. Would these companies actually do this? Probably not, but they CAN. Exhibit B was conceived (in part) to prevent a fork; damned if you do (break license to remove trademarks), damned if you don’t (use a Trademark you don’t have a license for).

Only an attorney could put those two terms (can’t remove trademark, and you can’t use trademark) next to each other and take them seriously. No offense to attorneys who would recognize these two opposing stipulations and ring the “common sense” bell.

Implication Two: I’ve got that Exhibit B thing going around.

Exhibit B is MORE VIRAL than GPL. This has profound implications. Take for instance, a scenario, again, outlined on the original blog:

WhizbangAppCompany code is used as an integration/data transport engine in another open source project that does data profiling (data quality). WhizbangAppCompany consists of approximately 5% of the code of that project. According to the LICENSE it matters NOT anything about intentions (which are tough to put into a license anyway; consider long debate on derivative work). This project now has to SLAP WhizbangAppCompany on every UI on every screen. Now this data quality project must use WhizbangAppCompany trademark and has no use to the trademark.

Consider the implications: No matter what proportion of code you use, whether or not you even USE the projects UI code (perhaps you used one of their libraries), you are now OBLIGED to place their “Powered by WhizbangAppCompany” on every UI screen in your application. You may not have to release your source (GPL) but now every product/project/mashup/integration/etc must have on EACH UI SCREEN the attribution.

Implication Three: Swing and a miss!

Right or wrong, this doesn’t even close the ASP Loophole.
The ASP loophole has long been discussed; smart web 2.0 and web companies use open source and benefit immensely, but don’t trigger GPL and force them to contribute back. Ok, fine. It’s ramifications are, I believe still being determined as part of GPLv3 (fact check, can someone add clarity to this?).

Developers have long opined about how they want the Googles and Yahoos and Web 2.0 companies using and modifying their code to contribute that code back. Exhibit B forces those companies to place a trademark on their screen but STILL DOESN’T FORCE THEM TO RELEASE THE CODE. These companies are taking a dig at ASPs to get the code but don’t actually “write something” to get the code; just money for trademark licensing.

Implication Four: What happened to that “freedom” stuff?

Customers don’t have freedoms to make the code their own. In a good old fashioned, behind the firewall, building an intranet, and mashing up 5 open source projects to build an internal “Asset Tracking System” or a “Conference Room Scheduling” system.

Again, a scenario outlined on the original blog:

Joe just implemented the “community” version of WhizbangAppCompany. His managers invested 6 months of his time to build out this project, and he’s ready to go roll it into the corporate intranet. The corporate intranet, which this product will be embedded has it’s own UI. Joe has to remove the trademarks to “deploy” his application but…. Joe can’t deploy to his portal/intranet without getting code under a commercial license.

Exhibit B doesn’t “trigger” on some sort of distribution clause; it’s ALWAYS there. Is everyone listening? Customers: end using, not making any money off selling any services/products, good ole fashioned support yourself community customers, are violating the license if they do not place the Attribution on EACH UI screen in their application.

The pragmatist in me knows these companies wouldn’t enforce this; communities are their lifeblood. I’m just saying that according to the LICENSE, if Joe doesn’t put Powered by WhizbangAppCompany on every UI screen on his portal application he’s violated the LICENSE.

Common enterprise intranets, portals, and applications are aggregations of several 10s if not 100s of open source projects. XML Parsers, security implementations, regex libraries, jsp libraries, etc etc. It’s part of how we work; do something defined, do it well, and place nicely with others. Where would open source be if all these companies and individuals believed they were special enough to get attribution on the UI? This is a little dramatic but it makes the point:

(speaking of attribution, this is a Web 2.0 logos image not FOSS logos done by stablio-boss. View more of his work here)

If all those that came *before* (apache, xerces, hibernate, jboss, etc etc) these companies believed they needed UI attribution OR if OSI allows this UI attribution this screen COULD ACTUALLY BE REQUIRED. What happens when you allow people to dictate the use of this so called “free software?” You lose some of that magic “freedom ingredient,” yes?

Implication Five: Errr…. Big difference!

Calling it Mozilla causes a HUGE credibility gap. Learning the many open source licenses is tough; the reason we reuse licenses is so that we can quickly understand implications. There’s a big difference of projects selected (and companies with services supporting those projects) based on their license. GPL, Apache, BSD, LGPL, Mozilla. All known quantities, vetted by pundits, attorneys, industry. Claiming to be this, when you’re not is dressing a wolf in a lambs clothing.

Customers don’t know the actual bits arrive with Exhibit B when the advertisements at Sourceforge explictly say Mozilla Public License 1.1 (NOTE: there is an option for “Custom License” so it’s not because they couldn’t pick another license type they CHOSE To say they are Mozilla.).

Remedy:

Work within the open source framework instead of “protecting your IP” with a cleverly disguised license. Apache, Eclipse, IBM, HP, Redhat, Oracle, JBoss, Sun have much varied stakes in Open Source; they’ve found licenses that meet the definition of open source.

Remember the vision and value you sell to customers: it’s not about the software license, the bits. It’s about the value, innovation, and service that comes from it. Protect your brand, not your IP or code. Each of these companies is a clear leader in the space they’re building; embrace that. You can be the Redhat of whatever. Redhat does just fine, even with CentOS and WhiteBox and all the other variants.

It’s not the BITs that matter. Get over it.

Users of Exhibit B companies (listed below)

  • Read the license, interpret yourself.
  • Better yet, since it’s not MPL and isn’t “well known” have your attorneys review it.
  • Ask your organization if it is willing to accept the “Powered by XYZ” on every application which uses any portion of that code (or pay $$ to remove it).
  • Ask your organization if it is willing to accept a license that is not OSI certified and consider it open source.
  • Ask the COMPANY: Why use a license that doesn’t meet the definition of OS? Why isn’t your license OSI certified?
  • Ask the COMPANY: Why not use regular MPL?
  • Ask the COMPANY: Why they think they need to change the definition of OS for their business?
  • Ask the COMPANY to use an OSI approved license.
  • Tell the OSI you are concerned about the implications of Exhibit B on open source.
  • Tell your friends… They may not know.

What’s the Net Net:

Assuming these companies drop their Exhibit B’s and become OSI certified I should say we all should applaud them for being responsible, open source community members, and valuable economic factors in our movement. Until then we should be honest and say they are not open source; community source, shared source, available source, public source, whatevernameyouwant source. Call a spade a spade, and there are definitions for that reason.

References:
OSI = Open Source Institute http://www.opensource.org
OSD = Open Source Definition http://www.opensource.org/docs/definition.php
MPL = Mozilla Public License http://www.opensource.org/licenses/mozilla1.1.php
WPL = wget -O – http://dev.alfresco.com/legal/licensing/apl.txt | sed -e ‘s/[Aa]lfresco/WhizbangApplicationCompany/g’ > wpl.txt

Exhibit B companies and Exhibit B’s:
Alfresco, SugarCRM, Zimbra, Jitterbit, MuleSource, DimDim

DISCLAIMER: These words are ENTIRELY my own. They in no way reflect my employers beliefs or in any way should be construed to speak for them in any way!

Sales Percent increase month to month, qtr to qtr

This is a common situation:  Don’t show me what my total sales figures were month after month, show me something that describes something important to my business.  ie, Sales Growth

Chris Webb, who runs a wildly popular MSFT blog in addition to being an in demand independent consultant, wrote an article on Previous Period Growth using Pentaho.  Mondrian (Pentaho Analysis Server) uses MDX, a powerful expressive multidimensional query language which Chris is one of the leading experts on its practical use and applications.

Chris outlines how to build a “custom” calculated measure that displays the Sales Previous Period Growth:

All you need is the zero install pentaho demo installation to run through his tech tip, available at http://www.pentaho.org/download/latest.php

Remember, this isn’t trivial (ie, writing MDX fragments) but it’s VERY VERY powerful.  Check out the Mondrian MDX reference here for some of the powerful analytic calculations available.  Remember, once you’ve got your MDX member working properly HIDE that complexity from your users by adding it to the Mondrian OLAP schema definition.

Sydney Training and Community Feedback

I had the recent good fortune of traveling to Sydney to deliver a “much sought after” scheduling of our “Building Analytic Solutions with Pentaho” class.  We did little advertising but it was packed (12 people, the maximum we ever do for public classes).

I love doing training courses for more advanced topics, like the Analytic solutions course.  I love it because it’s a chance to converse with other practitioners and share knowledge, experience, and war stories.  These experiences, and the camaraderie is invaluable when one tends to be the “lesser known” topics at an organization.  It’s GREAT to hear about open source adoption in the enterprise; stories of countless millions being saved, people feeling empowered to make their infrastructure and applications what THEY want instead of what their VENDORS want.  It’s just nice to connect with people of similar interests.

It’s also a chance to hear some validation for strong points and deficiencies in Pentaho’s open source strategy.  I have my own opinions, as someone who uses the software day in day out on real customer problems.  It’s great to hear that others either feel the same way or disagree; because that’s the nature of this community driven process.  It doesn’t really matter what I think the product should be like (I work for the vendor right?) it matters what customers and community want.  I think feature X is awful, doesn’t work properly and is total crap.  OK.  If community members find it entirely suitable for their needs, and say “Go work on feature Y” then that’s PERFECT.

This is the most effecient part of open source:  The closer you are to your customer, the closer you are to your market, the closer you are to the pain or joy, the more likely you are to make better product.  Cutting out the middle men (in many cases, account managers and product managers and development managers, etc).

Thank you, Sydney trainees for sharing your praises and criticisms.  I’ll bring them to those that can actually do something about it (ie, Java Jockeys). 

PS – Based on the training people like more of our product than dislike AND I was right about Feature X.  🙂

Last time Windows crashes on me

End of last week, Windows was kind enough to give me the annual “Blue Screen of Somehow I Screwed Up My Own Internals I Hope You Weren’t Doing Any Real Important Work Because You’ll Have to Reinstall the Operating System of Death.”  Gasp.

We’ve all been there.  What really bugged me is that when it happened, I sighed and just thought to myself that this is “the price of computing.”  This had become normal and acceptable to me… Then I shook myself a bit and became determined to rid myself, as much as possible, of the OS from Redmond.  No offense; I love Excel, think there’s some great usability in there, but it’s just not my cup of tea.

Eventually I’ll end up with a Macbook Pro; I feel the call of the siren as much as anybody.  Until then, I’m on Suse 10.1 desktop and so far I’m quite pleased. 

I’ll blog again later along on the specifics of the setup, but I’ll just say that the XGL desktop is both wicked COOL and very functional.

Donating to Open Source: Gratitude

A while back I blogged about gratitude and generosity, which was mostly about how it made ME feel when I was experiencing those feelings in times of change and growth.  What’s the flip side of that coin, or the other end of that stick, or whatever metaphor you want to use?  How does expressing gratitude to others for what they do feel?

Apparently pretty good; or at least good enough to respond with some very kind, personal notes of thanks.  A few weeks back I realized that I use two open source projects that provide exceptional products.  Truly, they’ve transcended the open source motto of "the code is the documentation and RTFM if there were one" and have created wonderful, easy to use products.  I realized that I had not given these people anything in return (I never encountered any bugs/etc to submit patches for!).

I donated, via their website instructions, to a CYGWIN developer and Gallery.  I received personal notes of thanks, expressing real gratitude.  It wasn’t for the money either (I donated 25 USD to each developer) but more of recongition of their contribution.  I get this.  If someone (me) is willing to pay someone whom they’ve never met before, willing to seek out the method (donation pages and paypal hoops), part with real money, while they’re under absolutely NO OBLIGATION or expectation to means that I think they did a great job.

Well they have! 

Have you ever considered donating to an open source project?  What open source projects do you get value from?  Consider dropping them $20 and see how good it makes them AND you feel!  I bet you’ll feel better giving $20 to the Apache foundation than paying your next enterprise software bill.

Kettle and Pentaho: 1+1=3

Like all great open source products, Pentaho Data Integration (Kettle) is a functional product in and of itself.  It has a very productive UI and delivers exceptional value as a tool in and of itself.  Most pieces of the Pentaho platform reflect a desire to keep the large communities around the original projects (Mondrian, JFree, etc) engaged; they are complete components in and of themselves.

When used together their value, as it relates to building solutions increases and exceeds their use independently.  I’ll be the first to admit that Pentaho is still fairly technical, but we’re rapidly building more and more graphical interfaces and usability features on top of the platform (many in the open source edition, but much is in the professional edition).  Much of this work involves making the "whole" (Pentaho)  work together to exceed the value of the pieces (Mondrian, Kettle, JFree, …).

A few things immediately come to mind of why Pentaho and Kettle together provide exceptional value as compared to used individually or with another open source reporting library:

  1. Pentaho abstracts data access (optionally) from report generation which gives report developers the full POWER of Kettle for building reports.

    There are some things that are tough, if not downright impossible to do in SQL.  Ever do an HTTP retrieval of an XML doc, slurp in a custom lookup from Excel, do a few database joins and analytical calculations in a SQL statement?  I bet not.  Report developers are smart data dudes; having access to a tool that allows them to sort/pivot/group/aggregate/lookup/iterate/list goes on and on/etc empowers report developers in a way that a simple "JDBC" or "CSV" or "XQuery" alone can accomplish. 
    How is this made possible?
    Pentaho abstracts (optionally, it isn’t forced on customers) the data retrievals to lookup components.  This allows BI developers to use either a SQL lookup (DB), XQuery lookup(XML), MDXLookup (OLAP), or Kettle lookup (EII) to populate a "ResultSet."  Here’s the beauty; reports are generated off a result set instead of directly accessing the sources.  This means that a user can use the same reporting templates, framework, designer, etc and feed/calculate data from wherever they desire.  Truly opens a world of possibiliy where before there was "just SQL" or "ETL into DB tables."

  2. Ability to manage the entire solution in one place

    Pentaho has invested greatly in the idea of the solution being a set of "things" that make up your BI, reporting, DW solution.  This means you don’t have ETL in one repository, reports managed somewhere else, scheduling managed by a third party, etc.  It’s open source so that’s obviously a choice, but we can add much value by ensuring that someone who has to transform data, schedule that, email and monitor, secure, build reports, administer email bursting, etc can do some from one "solution repository." Managing an entire BI solution from one CVS repository?  Now that’s COOL (merge diff/patch anyone?).

  3. Configuration Management

    Kettle is quite flexible; the 2.3.0 release extends the scope and locations where you can use variable substitution.  From a practical standpoint this means that an entire Chef job can be parameterized and called from a Pentaho action sequence.  For instance, because you can do your DW load from inside Pentaho action sequences that means you can secure it, schedule it, monitor it, initiate it from an outside workflow via web service, etc.  In one of my recent Kettle solutions ALL OF THE PHYSICAL database, file, and security information was managed by Pentaho so the Kettle mappings can literally be moved from place to place and work inside of Pentaho. 

  4. Metadata and Additional Integration

    Pentaho is investing in making the tools more seamless.  In practice (this is not a roadmap or product direction statement) this means being able to interact with tables, connections, business views inside of Kettle in an identical (at least similar way) in the report designer.  For example, if you’ve defined the business name for a column to be "Actual Sales" Kettle and the Report Designer can now key off that same metadata and present a "consistent" view to the report/ETL developer instead of knowing that "ACT_SL_STD_CURR" is actual sales. 
    Another example is the plans to do some additional Mondrian/Kettle integration to make the building of Dimensions, Cubes, and Aggregates easier.

Use Open Source BI: Win a Mac Mini

Tomorrow MySQL and Pentaho are presenting on how MySQL and Pentaho can work together to deliver exceptional value when used in combination to solve Business Intelligence and Reporting business challenges.

I’ve been working more and more with MySQL over the past couple of months since joining Pentaho and I’m pleasantly surprised.  On the good side of the 80/20 rule (ie, 80% of users want 20% of the features) it’s exceptionally "good enough" for things that I want to do.

Back to the tagline.

Tomorrow, Pentaho is highlighting our desire to be as easy to MySQL users as MySQL is.  We want to understand how to make it increasingly easy to use Pentaho with MySQL.  In return for providing Pentaho with much needed feedback on ease of use and the user experience for installation/configuration Pentaho is giving away a Mac Mini.  It’s no iPod, thank heavens, as everyone is giving those away these days. 

10am PT, 1pm ET in the US.  Register and dial in here.  Read the press release here.

JBoss and Redhat officially wed

In a JBoss community email today:

I am writing to you today to announce that the Red Hat acquisition of JBoss has closed and we now are officially a part of the largest independent open source company. I am excited about this news and the great opportunity that it represents. We are entering a new era in the technology industry that puts customers back in charge of their destiny, where innovation and value replace lock-in and costly proprietary-vendor software licenses. Together, we believe we can change the economics of the industry, delivering unmatched value to our customers and partners by creating better software faster, systematically driving down costs and simplifying IT.

Some news on it here here here as well.

That makes Redhat the largest independent Open Source company in the world. Cool.