Archive

Archive for the ‘Open Source’ Category

LucidDB has a new Logo/Mascot

September 2nd, 2010

At yesterdays Eigenbase Developer Meetup at SQLstream’s offices in San Francisco we arrived at a new logo for LucidDB.  DynamoBI is thrilled to have supported and funded the design contest to arrive at our new mascot.  Over the coming months you’ll see the logo make it’s way out to the existing luciddb.org sites, wiki sites, etc.  I’m really happy to have a logo that matches the nature of our database - BAD ASS!

DynamoBI, Open Source

DynamoDB: Built in Time Dimension support!

November 20th, 2009

DynamoDB (aka LucidDB) is not just another column store database. Our goal is being the best database for actually doing Business Intelligence; while that means being fast and handling large amounts of data there’s a lot of other things BI consultant/developers need. I’ll continue to post about some of the great BI features that DynamoDB has for the modern datasmiths.

First feature to cover that’s dead easy, is the built in ability to generate a time dimension, including a Fiscal Calendar attributes. If you’re using Mondrian (or come to that, your own custom SQL on a star schema) you need to have a time dimension. Time is the most important dimension! Every OLAP model I’ve ever built uses one! It something that you, as a datasmith will need to do with every project; that’s why we’ve built it right into our database.

Here’s a dead simple way to create a fully baked, ready to use Time Dimension to use with Mondrian.

-- Create a view that is our time dimension for 10 years, with our
-- Fiscal calendar starting in March (3)
create view dim_time as select * from
table(applib.fiscal_time_dimension (2000, 1, 1, 2009, 12, 31, 3));

OK, that’s it. You’ve created a Time Dimension! * see NOTE at end of post.

So, we’ve created our time dimension, complete with a Fiscal calendar for 10 years in a single statement! Awesome - but what does it contain?

-- Structure of new time dimension
select "TABLE_NAME", "COLUMN_NAME", "DATATYPE" from sys_root.dba_columns
where table_name = 'DIM_TIME';
+-------------+---------------------------------+-----------+
| TABLE_NAME  |           COLUMN_NAME           | DATATYPE  |
+-------------+---------------------------------+-----------+
| DIM_TIME    | FISCAL_YEAR_END_DATE            | DATE      |
| DIM_TIME    | FISCAL_YEAR_START_DATE          | DATE      |
| DIM_TIME    | FISCAL_QUARTER_NUMBER_IN_YEAR   | INTEGER   |
| DIM_TIME    | FISCAL_QUARTER_END_DATE         | DATE      |
| DIM_TIME    | FISCAL_QUARTER_START_DATE       | DATE      |
| DIM_TIME    | FISCAL_MONTH_NUMBER_IN_YEAR     | INTEGER   |
| DIM_TIME    | FISCAL_MONTH_NUMBER_IN_QUARTER  | INTEGER   |
| DIM_TIME    | FISCAL_MONTH_END_DATE           | DATE      |
| DIM_TIME    | FISCAL_MONTH_START_DATE         | DATE      |
| DIM_TIME    | FISCAL_WEEK_NUMBER_IN_YEAR      | INTEGER   |
| DIM_TIME    | FISCAL_WEEK_NUMBER_IN_QUARTER   | INTEGER   |
| DIM_TIME    | FISCAL_WEEK_NUMBER_IN_MONTH     | INTEGER   |
| DIM_TIME    | FISCAL_WEEK_END_DATE            | DATE      |
| DIM_TIME    | FISCAL_WEEK_START_DATE          | DATE      |
| DIM_TIME    | FISCAL_DAY_NUMBER_IN_YEAR       | INTEGER   |
| DIM_TIME    | FISCAL_DAY_NUMBER_IN_QUARTER    | INTEGER   |
| DIM_TIME    | FISCAL_YEAR                     | INTEGER   |
| DIM_TIME    | YEAR_END_DATE                   | DATE      |
| DIM_TIME    | YEAR_START_DATE                 | DATE      |
| DIM_TIME    | QUARTER_END_DATE                | DATE      |
| DIM_TIME    | QUARTER_START_DATE              | DATE      |
| DIM_TIME    | MONTH_END_DATE                  | DATE      |
| DIM_TIME    | MONTH_START_DATE                | DATE      |
| DIM_TIME    | WEEK_END_DATE                   | DATE      |
| DIM_TIME    | WEEK_START_DATE                 | DATE      |
| DIM_TIME    | CALENDAR_QUARTER                | VARCHAR   |
| DIM_TIME    | YR                              | INTEGER   |
| DIM_TIME    | QUARTER                         | INTEGER   |
| DIM_TIME    | MONTH_NUMBER_OVERALL            | INTEGER   |
| DIM_TIME    | MONTH_NUMBER_IN_YEAR            | INTEGER   |
| DIM_TIME    | MONTH_NUMBER_IN_QUARTER         | INTEGER   |
| DIM_TIME    | MONTH_NAME                      | VARCHAR   |
| DIM_TIME    | WEEK_NUMBER_OVERALL             | INTEGER   |
| DIM_TIME    | WEEK_NUMBER_IN_YEAR             | INTEGER   |
| DIM_TIME    | WEEK_NUMBER_IN_QUARTER          | INTEGER   |
| DIM_TIME    | WEEK_NUMBER_IN_MONTH            | INTEGER   |
| DIM_TIME    | DAY_FROM_JULIAN                 | INTEGER   |
| DIM_TIME    | DAY_NUMBER_OVERALL              | INTEGER   |
| DIM_TIME    | DAY_NUMBER_IN_YEAR              | INTEGER   |
| DIM_TIME    | DAY_NUMBER_IN_QUARTER           | INTEGER   |
| DIM_TIME    | DAY_NUMBER_IN_MONTH             | INTEGER   |
| DIM_TIME    | DAY_NUMBER_IN_WEEK              | INTEGER   |
| DIM_TIME    | WEEKEND                         | VARCHAR   |
| DIM_TIME    | DAY_OF_WEEK                     | VARCHAR   |
| DIM_TIME    | TIME_KEY                        | DATE      |
| DIM_TIME    | TIME_KEY_SEQ                    | INTEGER   |
+-------------+---------------------------------+-----------+

-- Let's look at a few rows
select time_key_seq, time_key, yr, month_number_in_year, fiscal_year
, fiscal_month_number_in_year from dim_time;
+---------------+-------------+-------+-----------------------+--------------+------------------------------+
| TIME_KEY_SEQ  |  TIME_KEY   |  YR   | MONTH_NUMBER_IN_YEAR  | FISCAL_YEAR  | FISCAL_MONTH_NUMBER_IN_YEAR  |
+---------------+-------------+-------+-----------------------+--------------+------------------------------+
| 1             | 2000-01-01  | 2000  | 1                     | 2000         | 11                           |
| 2             | 2000-01-02  | 2000  | 1                     | 2000         | 11                           |
| 3             | 2000-01-03  | 2000  | 1                     | 2000         | 11                           |
| 4             | 2000-01-04  | 2000  | 1                     | 2000         | 11                           |
| 5             | 2000-01-05  | 2000  | 1                     | 2000         | 11                           |
| 6             | 2000-01-06  | 2000  | 1                     | 2000         | 11                           |
| 7             | 2000-01-07  | 2000  | 1                     | 2000         | 11                           |
| 8             | 2000-01-08  | 2000  | 1                     | 2000         | 11                           |
| 9             | 2000-01-09  | 2000  | 1                     | 2000         | 11                           |
| 10            | 2000-01-10  | 2000  | 1                     | 2000         | 11                           |
+---------------+-------------+-------+-----------------------+--------------+------------------------------+

Generating the Time Dimension is accomplished using DynamoDBs ability to include Java based UDF Table Functions. Table functions are really powerful - they allow a BI developer to write custom functions that output a “table” that can be queried like ANY OTHER TABLE (mostly). Check out the wiki page FarragoUdx if your interested.

And of course: download LucidDB and give it a whirl!

NOTE: To be candid, doing it as a view isn’t the best approach. For anything beyond tiny (5 million +) we should actually create the table, and do an INSERT INTO SELECT * FROM TABLE(fiscal_time_dimension).

DynamoBI, Open Source

Book Review: Pentaho Reporting 3.5 for Java Developers

November 9th, 2009

I have two customers that if they had access to Will Gormans book, Pentaho Reporting 3.5 for Java Developers, they would not have needed me for their project! That’s how good the book is for those who need to embed Pentaho Reporting into their Java application.

The book is certainly geared towards Java developers, and specifically, developers you are trying to simply use the Pentaho reporting library. I’d venture to say that MOST customers should be using Pentaho; in this case, the book is useful as a reference, but the HOWTO past Chapter 3 would probably be lost on many users; except for Chapter 11 (see below).

However, for people trying to embed Pentaho reporting, WOW: THIS IS THE DEFINITIVE RESOURCE. Buy it, RIGHT NOW! The information it contains was locked in just a few peoples minds (Thomas, Bunch of People sitting at the “citadel” in Orlando aka Pentaho Employees, a handful of consultants). Will has unlocked it and I’m glad he did.

Will taught me something new in this book. In fact, I hope this is “new” in 3.5 which was release just a few weeks back. If it’s been around longer than I’m a total dolt. Chapter 11 covers how to add your own custom Expressions/Formulas to Pentaho (including the PRD).

At customer engagements, or when I put on my Pentaho hat and teach their public courses, or custom onsite training, I’m asked all the time: Can I make my own Reporting Functions and plug them into Pentaho Report Designer? Up until WIll showed me how to do it on page 281, I thought this was only possible for Pentaho (the company). Will gives us a step by step guide to add our own “DoMyCustomThing” to the Pentaho Report Designer. Customers can now create their own corporate expressions/functions they can leverage across hundreds of reports.

I’ll keep several copies on my shelf, and give it away to any current/future “embedded Pentaho Reporting” customers. Thanks Will for such a great book!

Open Source

DynamoBI: website? bits?

October 30th, 2009

Well, what a soft launch it has been. :)
Some people have asked:

When are you going to get a website? Errr…. Soon! We soft launched a bit early, due to some “leaking information” but figured heck, it’s open source let’s let it all out. Soon enough, I swear!

Where can I download DynamoDB? Errr… you can’t yet cause we haven’t finished our build/QA/certification process.

However, since DynamoDB is the alter ego business suit wearing brother of LucidDB, just download the 0.9.2 release if you want to get a sense of what DynamoDB is.

There are 3 built binaries (Linux 32, Linux 64, and Windows 32): http://sourceforge.net/projects/luciddb/files/luciddb/luciddb-0.9.2/ and you can find installation instructions here.

DynamoDB will have the same core database, etc. So, from a raw feature/function perspective what you download and see with LucidDB will be what you get in DynamoDB. DynamoDB will have an administration UI to make things like setting up foreign servers, managing users, etc easier. And lots of other cool new features on the longer term roadmap, which if when we get a website would be a great place for that to go!

Until then, use the open source project, LucidDB. I think you’ll like it!

DynamoBI, General BI, Open Source

OpenSQLCamp 2009 with LucidDB

October 26th, 2009

LucidDB is now a sponsor OpenSQLCamp 2009, courtesy of DynamoBI which doesn’t have its own logo yet! :)
OpenSQL Camp 2009 in Portland, OR

I’ll be attending representing the LucidDB camp, since Mr. John Sichi will be running a marathon (1/2?) that weekend. Let me know if you’ll be there!

DynamoBI, General BI, Open Source

LucidDB: DynamoBI is running with it

October 24th, 2009

I can think of no better analogy than that of a multi leg race. You know, the races where one sprinter runs as fast as they can, before passing the baton to the next sprinter.

200910240932

First it was Broadbase.
Second it was LucidEra.
Third it was Eigenbase / LucidEra / SQLstream (joint development w/ Eigenbase).

Having purchased commercial rights from LucidEra it’s ours to run with now, alongside Eigenbase and SQLstream.

LucidDB has been described as the “best database no one ever told you about.” That stops today (the telling part, not the best part). Dynamo Business Intelligence Corp will take this great technology to a wider audience and we’ll be telling EVERYONE about it!

Over time, the exceptional features of this open source project will come to light (column store, bit map idxs, drop in java based user plugins, transparent remote JDBC data access, etc). I think it is important to acknowledge how LucidDB arrived to where it is today.

LucidDB is built by smart smart people (people wayyyy smarter than me!). People who’ve written parallel execution engines in Oracle. People who’ve developed Bitmap IDX implementations and helped file those patents. The heritage of LucidDB starts at Broadbase; LucidEra purchased it and brought it to Eigenbase. Eigenbase, and it’s sponsoring companies, have most claim to its current state. Their stewardship and ongoing evolution of the project is a testament to their talents and commitment to open source development. When you pick up LucidDB/DynamoDB and get your first “Ahhhh Cool! 10x Faster than my current database” you have LucidEra/SQLstream/Eigenbase devs to thank. John V. Sichi (lead and main project sponsor), Tai Tran, Julian Hyde, Rushan Chen, Zelaine Fong, Sunny Choi, Steve, Marc, Richard, Hunter, Edan, Damian, Boris, Benny, Stephan, Oscar, …. and the list goes on and on and on. Some of these people will be helping (in small and big ways) with the new company which is great for customers knowing that the people that wrote this stuff will be helping them be successful!

What’s the plan?

  • Open Source.
    Lots of it. Any readers of this blog, or who know me in general, will know I’m a “burn the boats,” open source kind of guy. We’ll be creating some new projects to make using the features/functions already in LucidDB easier. We’ll also be adding new features, which will make their way back into the LucidDB mainline.
  • Commercial in Name Only.
    Mainline DBMS enhancements and development continue, and will continue to be, in LucidDB (Eigenbase). New projects will be available under an OSI approved license. DynamoDB is the prepackaged, assembled, UI included distribution built for customers/evaluators that we’ll offer support on. Should be as easy as we can possibly make it to evaluate, purchase, and use.
  • In Progress.
    We’ve let the announcement ahead of having our website built, or having completed our own DynamoDB QA’ed build. Our open source roots guide us to an “early and often” approach and we’re taking that approach here. Be patient with us as we roll out the business bit by bit over the next few months. Our #1 priority: establish our support/build/qa infrastructure and get an already great piece of software into hands of people who can benefit from it. Hint: If you’ve ever done a star schema on MySQL you need to talk to us!

One thing I am personally looking forward to is getting to work even more extensively with everyone involved at Eigenbase, including the very talented devs at SQLstream (who produce the best real time analytics/integration engine available).

Feel free to join up in taking LucidDB to a whole new level: Download LucidDB and give it a go yourself, since we just released a new version (0.9.2) yesterday! I believe, like others have already mentioned, adding a bit of commercial support behind an already great piece of software is a winning combination!

Drop a line on through to me if you’re interested in getting involved early on (as a charter customer, developer, user, etc). ngoodman at bayontechnologies (with the .COM).

DynamoBI, Open Source

Amazon’s Pre Ordering of books sucks!

August 26th, 2009

I pre-ordered a copy of the new, (first, only, best, and original) Pentaho book “Pentaho Solutions” by Roland Bouman and Jos van Dongen two weeks back.  Saw from a tweet that the book was shipping from Amazon.  Cool - had a look at the page.  Sure, they can ship today if I get my order in on time so I know they can ship it.

How about my pre order, which I would assume would go out before regular orders?  Won’t ship until next week?  Delivered by 9/11/2009?  Lesson learned - don’t pre order from Amazon.  :)

Open Source, Pentaho

CDF Tutorials

July 22nd, 2009

The folks at webdetails have posted their Pentaho Community Dashboard Framework tutorials that look great!  They run you through building CDF dashboards which is usually a crucial, user facing part of any BI implementations.  While much of the work is the ETL/OLAP configuration, tuning, etc on the backend most users think of Pentaho as the dashboard/reports they interact with not the data munching for the Data Warehouse.

These tutorials look great; I’ve implemented more than 20 CDF dashboards at four customers already but I still bought them to learn even more ins and outs.  You should too! No better way to learn something than from the source of the technology which in this case is Pedro and team @ webdetails.

Open Source, Pentaho

MDX Humor from Portugal

April 21st, 2009

Pedro Alves, the very talented lead developer behind the Pentaho Community Dashboard Framework gave me a good chuckle with his high opinion of MDX as a language:

MDX is God’s gift to business language; When God created Adam and Eve he just spoke [Humanity].[All Members].Children . That’s how powerful MDX is. And Julian Hyde allowed to use it without being bound to microsoft.

If you haven’t checked out Pedro’s blog, definitely get over there. It’s a recent start but he’s already getting some great stuff posted.

General BI, Open Source, Pentaho

PDI Scale Out Whitepaper

April 21st, 2009

I’ve worked with several customers over the past year helping them scale out their data processing using Pentaho Data Integration. These customers have some big challenges - one customer was expecting 1 billion rows / day to be processed on their ETL environment. Some of these customers were rolling their own solutions; others had very expensive proprietary solutions (Ab Initio I’m pretty sure however they couldn’t say since Ab Initio contracts are bizarre). One thing was common: they all had billions of records, a batch window that remained the same, and software costs that were out of control.

None of these customer specifics are public; they likely won’t be which is difficult for Bayon / Pentaho because sharing these top level metrics would be helpful for anyone using or evaluating PDI. Key questions when evaluating a scale out ETL tool: Does it scale with more nodes? Does it scale with more data?

I figured it was time to share some of my research, and findings on how PDI scales out and this takes the form of a whitepaper. Bayon is please to present this free whitepaper, Pentaho Data Integration : Scaling Out Large Data Volume Processing in the Cloud or on Premise. In the paper we cover a wide range of topics, including results from running transformations with up to 40 nodes and 1.8 billion rows.

Another interesting set of findings in the paper also relates to a very pragmatic approach in my research - I don’t have a spare 200k to simply buy 40 servers to run these tests. I have been using EC2 for quite a while now, and figured it was the perfect environment to see how PDI could scale on the cheapest of cheap servers ($0.10 / hour). Some other interesting metrics, relating to Cloud ETL is the top level benchmark of a utility compute cost of ETL processing of 6 USD per Billion Rows processed with zero long term infrastructure commitments.

Matt Casters, myself, and Lance Walter will also be presenting a free online webinar to go over the top level results, and have a discussion on large data volume processing in the cloud:

High Performance ETL using Cloud- and Cluster-based Deployment
Tuesday, May 26, 2009 2:00 pm
Eastern Daylight Time (GMT -04:00, New York)

If you’re interested in processing lots of data with PDI, or wanting to deploy PDI to the cloud, please register for the webinar or contact me.

Data Integration (Kettle), General BI, Grid/Distributed Computing, Open Source, Pentaho