Category Archives: Open Source

DynamoBI is dead, long live LucidDB!

To our Partners, Employees, Customers, Friends, and Community:

It is my unfortunate duty to inform all of you that DynamoBI is ceasing commercial operations October 31, 2012; we are immensely grateful for all the support that you all have shown our company, in so many different ways, over the past 3 years and we hope to make this shutdown as painless as possible for all involved. We know that we are not the only people who are invested in LucidDB, so we wanted to explain our rationale for shutting down along with the implications for the entire LucidDB community (not just our customers).

We started DynamoBI 3 years ago when we saw our most favorite open source project, LucidDB, finding limited prospects for adoption without a growth to full, commercial support which many (most!) companies need to be able to adopt open source software. We had been actively working with LucidDB for a long while, and knew that it is a fantastic piece of database/analytic software; to say that it’s a gem and provides some amazing capabilities in an open source package is an understatement.

However, markets and businesses are not quite as simple as having a great open source project and community. I think separately I may blog about the lessons learned from this startup (the entrepreneurial badge of honor #fail blog) but the community deserves to know that, for the most part, the failure to achieve success was about the market and selling environment (and our successes here) than any innate defects in LucidDB.

In short, we were not successful in the marketplace for two primary reasons:

1) In a crowded, loud market of more than 40+ Analytic data storage solutions, raw single query speed remains the singular priority. LucidDB often well improved over MySQL/Oracle but was not as fast as our Analytic peers. All of our other very interesting and compelling features (versioning of data, EII type connectors, pluggable/extensible systems) were often not even evaluated as we were often eliminated from evaluation based on the single raw query speed. LucidDB performs as advertised (great BI database, much faster than what you’re currently using), but that wound up not being enough.

2) Open Source price points are compelling for customers, but work only if you can build a high volume business. It became clear earlier this year (even with building enough cash flow to pay full time staff, etc!) that the size of our “funnel” was not large enough to support a high growth, interesting business. We determined that if we had X number of downloads we ended up with Y prospects that converted to Z customers at price A. We experimented with price, offering, prospect development, etc. We improved our conversion rates over time, but ultimately found that unless we could find some way to increase the mouth of the funnel by more than 100x we wouldn’t have a growing business that would allow us to continue/further our investment in LucidDB.

There are other reasons as well, many of which are missteps or mistakes by me personally. That could fill an entire other blog (and likely will at some point).

We’ve been working with our customers over the past few months to help them prepare for the future with us no longer providing the customer support. We’ve been communicating this message to them, and now we’re bringing it to the greater community about our future participation in LucidDB.

DynamoBI will:
1) Host the git repositories and continue to provide a legal contribution framework so that the IP for the project remains clean for all. The Apache license means that DynamoBI remains free and accessible for anyone/everyone wishing to use it (or parts of it).
2) Contribute any “interesting” pieces of the amazing framework to projects that can use it. In particular, we’re thrilled to see the Optiq project leveraging, as a starting point, some of the LucidDB components.
3) Host the forums, and wiki, and issue tracking for the LucidDB community as we have been for the past few years (http://luciddb.org).
4) Continue to participate as active users in the community; we are still fond of LucidDB and hope to see the community/project be successful.

However, DynamoBI will no longer:
1) Provide releases or builds. We’ve shut down our continuous integration server and do not plan on making any release after 0.9.4.
2) Offer any commercial services for LucidDB (consulting, services, sponsored development, etc).
3) Provide active development on the core project, or ancillary projects.

Once again, thank you for your support over the past few years and we encourage you to continue to look at LucidDB, even though we were unable to make it a commercial success. It has some very unique features that are a perfect fit for some use cases (Big Data access via BI tools, etc) that make it a great open source project.

Kind Regards,
Nick
Former CEO of DynamoBI Corporation

LucidDB has left Eigenbase moved to Apache License

3 Replies

This has been a long time in the making, but the LucidDB project is leaving the Eigenbase foundation to continue our development outside that organizations IP sharing, framework, and governance.

Community members will notice (or have already):

We are no longer using Perforce (YAAY!) and are now doing our primary LucidDB, Farrago, Fennel, and relevant extensions/test/build development work at github: https://github.com/dynamobi/luciddb/
The Wiki is now hosted at http://www.luciddb.org/wiki. We will, over time, remove references to Eigenbase in that project documentation/etc.
Issue tracking is now ALSO over at github, and we have migrated all issues (historical and outstanding) over to the github project.

Part of the impetus for leaving Eigenbase was our desire for a more inclusive license, to permit additional use/collaboration by other companies in the spirit of open source. We initiated this process, in good company and like minded individuals early last year. Long story short this plight and political battles cost Eigenbase the resignations of the two, most critical participants at Eigenbase: Julian Hyde and John V. Sichi. I join them now, as I resigned from the Eigenbase Board March 26.

Today I’m announcing that DynamoBI has released the entirety of the codebase, under the Apache Software License 2.0. We welcome our community members ongoing contributions, and hope that companies looking to leverage such a great framework and technology take a look. We welcome, wholeheartedly, your participation in the project under it’s new permissive license.

We continue to serve our existing customers with annual subscriptions to DynamoDB, our QA’ed and prepackaged commercial version of LucidDB.

Happy LucidDB-ing!

NoSQL Now 2011: Review of AdHoc Analytic Architectures

1 Reply

For those that weren’t able to attend the fantastic NoSQL Now Conference in San Jose last week, but are still interested in the slides about how people are doing Ad Hoc analytics on top of NoSQL data systems, here’s my slides from my presentation:

No sql now2011_review_of_adhoc_architectures

View more presentations from ngoodman

We obviously continue to hear from our community that LucidDB is a great solution sitting in front of a Big Data/NoSQL system. Allowing easy SQL access (including super fast, analytic database cached views) is a big win for reducing load *AND* increasing usability of data in NoSQL systems.

PDI Loading into LucidDB

1 Reply

By far, the most popular way for PDI users to load data into LucidDB is to use the PDI Streaming Loader. The streaming loader is a native PDI step that:

Enables high performance loading, directly over the network without the need for intermediate IO and shipping of data files.
Lets users choose more interesting (from a DW perspective) loading type into tables. In particular, in addition to simple INSERTs it allows for MERGE (aka UPSERT) and also UPDATE. All done, in the same, bulk loader.
Enables the metadata for the load to be managed, scheduled, and run in PDI.

However, we’ve had some known issues. In fact, until PDI 4.2 GA and LucidDB 0.9.4 GA it’s pretty problematic unless you run through the process of patching LucidDB outlined on this page: Known Issues.

In some ways, we have to admit, that we released this piece of software too soon. Early and often comes with some risk, and many have felt the pain of some of the issues that have been discovered with the streaming loader.

In some ways, we’ve built an unnatural approach to loading for PDI: PDI wants to PUSH data into a database. LucidDB wants to PULL data from remote sources, with it’s integrated ELT and DML based approach (with connectors to databases, salesforce, etc). Our streaming loader “fakes” a pull data source, and allows PDI to “push” into it.

There’s mutliple threads involved, when exceptions happen users have received cruddy error messages such as “Broken Pipe” that are unhelpful at best, frustrating at worse. Most all of these contortions will have sorted themselves out and by the time 4.2 GA PDI and 0.9.4 GA of LucidDB are released the streaming loader should be working A-OK. Some users would just assume avoid the patch instructions above and have posed the question: In a general sense, if not the streaming loader how would I load data into LucidDB?

Again, LucidDB likes to “pull” data from remote sources. One of those is CSV files. Here’s a nice, easy, quick (30k r/s on my MacBook) method to load a million rows using PDI and LucidDB:

This transformation outputs to a Text File 1 million rows, waits for that to complete then proceeds to the load that data into a new table in LucidDB. Step by Step the LucidDB statements

— Points LucidDB to the directory with the just generated flat file
— LucidDB has some defaults, and we can “guess” the datatypes by scanning the file
CREATE or replace SERVER csv_file_server FOREIGN DATA WRAPPER SYS_FILE_WRAPPER OPTIONS ( DIRECTORY ‘?’ );
— Let’s create a foreign table for the data file (“DATA.txt”) that was output by PDI
>create foreign table applib.data server csv_file_server;
— Create a staging, and load the data from the flat file (select * from applib.data)
CALL APPLIB.CREATE_TABLE_AS (‘APPLIB’, ‘STAGING_TABLE’, ‘select * from applib.data’, true);

We hope to have the streaming loader ready to go in 0.9.4 (LucidDB) and 4.2 (PDI). Until then, consider this easy, straight forward method of loading data that’s high performance, proven, and stable for loading data from PDI into LucidDB.

Example file: csv_luciddb_load.ktr

Pushdown Query access to Hive/Hadoop data

0.9.4 did not hit the 1 year mark!

4 Replies

Our last LucidDB release was now, just more than 12 months ago on June 16, 2010. We were really really trying to beat the 1 year mark for our 0.9.4 release but we just couldn’t. A tenet of good, open source development is early and often and we need to do better. Since the 0.9.3 release we’ve:

Built out an entire Web Services infrastructure
Developed a wicked cool Admin user interface
Developed cool connectors to Hive, CouchDB
Built a whole ton of extensions (auto indexing, DDL generation, improved load routines)
Scriptable functions, and procedures
Updated our connectors (JDBC, Salesforce, etc)

All in this is a VERY exciting release… I apologize it’s taken this long, but please bear with us. We’ll be release in the next couple of weeks!

Why OLAP4J 1.0 matters

LucidDB has a new Logo/Mascot

2 Replies

At yesterdays Eigenbase Developer Meetup at SQLstream‘s offices in San Francisco we arrived at a new logo for LucidDB. DynamoBI is thrilled to have supported and funded the design contest to arrive at our new mascot. Over the coming months you’ll see the logo make it’s way out to the existing luciddb.org sites, wiki sites, etc. I’m really happy to have a logo that matches the nature of our database – BAD ASS!

DynamoDB: Built in Time Dimension support!

9 Replies

DynamoDB (aka LucidDB) is not just another column store database. Our goal is being the best database for actually doing Business Intelligence; while that means being fast and handling large amounts of data there’s a lot of other things BI consultant/developers need. I’ll continue to post about some of the great BI features that DynamoDB has for the modern datasmiths.

First feature to cover that’s dead easy, is the built in ability to generate a time dimension, including a Fiscal Calendar attributes. If you’re using Mondrian (or come to that, your own custom SQL on a star schema) you need to have a time dimension. Time is the most important dimension! Every OLAP model I’ve ever built uses one! It something that you, as a datasmith will need to do with every project; that’s why we’ve built it right into our database.

Here’s a dead simple way to create a fully baked, ready to use Time Dimension to use with Mondrian.

-- Create a view that is our time dimension for 10 years, with our
-- Fiscal calendar starting in March (3)
create view dim_time as select * from
table(applib.fiscal_time_dimension (2000, 1, 1, 2009, 12, 31, 3));

OK, that’s it. You’ve created a Time Dimension! * see NOTE at end of post.

So, we’ve created our time dimension, complete with a Fiscal calendar for 10 years in a single statement! Awesome – but what does it contain?

-- Structure of new time dimension
select "TABLE_NAME", "COLUMN_NAME", "DATATYPE" from sys_root.dba_columns
where table_name = 'DIM_TIME';
+-------------+---------------------------------+-----------+
| TABLE_NAME  |           COLUMN_NAME           | DATATYPE  |
+-------------+---------------------------------+-----------+
| DIM_TIME    | FISCAL_YEAR_END_DATE            | DATE      |
| DIM_TIME    | FISCAL_YEAR_START_DATE          | DATE      |
| DIM_TIME    | FISCAL_QUARTER_NUMBER_IN_YEAR   | INTEGER   |
| DIM_TIME    | FISCAL_QUARTER_END_DATE         | DATE      |
| DIM_TIME    | FISCAL_QUARTER_START_DATE       | DATE      |
| DIM_TIME    | FISCAL_MONTH_NUMBER_IN_YEAR     | INTEGER   |
| DIM_TIME    | FISCAL_MONTH_NUMBER_IN_QUARTER  | INTEGER   |
| DIM_TIME    | FISCAL_MONTH_END_DATE           | DATE      |
| DIM_TIME    | FISCAL_MONTH_START_DATE         | DATE      |
| DIM_TIME    | FISCAL_WEEK_NUMBER_IN_YEAR      | INTEGER   |
| DIM_TIME    | FISCAL_WEEK_NUMBER_IN_QUARTER   | INTEGER   |
| DIM_TIME    | FISCAL_WEEK_NUMBER_IN_MONTH     | INTEGER   |
| DIM_TIME    | FISCAL_WEEK_END_DATE            | DATE      |
| DIM_TIME    | FISCAL_WEEK_START_DATE          | DATE      |
| DIM_TIME    | FISCAL_DAY_NUMBER_IN_YEAR       | INTEGER   |
| DIM_TIME    | FISCAL_DAY_NUMBER_IN_QUARTER    | INTEGER   |
| DIM_TIME    | FISCAL_YEAR                     | INTEGER   |
| DIM_TIME    | YEAR_END_DATE                   | DATE      |
| DIM_TIME    | YEAR_START_DATE                 | DATE      |
| DIM_TIME    | QUARTER_END_DATE                | DATE      |
| DIM_TIME    | QUARTER_START_DATE              | DATE      |
| DIM_TIME    | MONTH_END_DATE                  | DATE      |
| DIM_TIME    | MONTH_START_DATE                | DATE      |
| DIM_TIME    | WEEK_END_DATE                   | DATE      |
| DIM_TIME    | WEEK_START_DATE                 | DATE      |
| DIM_TIME    | CALENDAR_QUARTER                | VARCHAR   |
| DIM_TIME    | YR                              | INTEGER   |
| DIM_TIME    | QUARTER                         | INTEGER   |
| DIM_TIME    | MONTH_NUMBER_OVERALL            | INTEGER   |
| DIM_TIME    | MONTH_NUMBER_IN_YEAR            | INTEGER   |
| DIM_TIME    | MONTH_NUMBER_IN_QUARTER         | INTEGER   |
| DIM_TIME    | MONTH_NAME                      | VARCHAR   |
| DIM_TIME    | WEEK_NUMBER_OVERALL             | INTEGER   |
| DIM_TIME    | WEEK_NUMBER_IN_YEAR             | INTEGER   |
| DIM_TIME    | WEEK_NUMBER_IN_QUARTER          | INTEGER   |
| DIM_TIME    | WEEK_NUMBER_IN_MONTH            | INTEGER   |
| DIM_TIME    | DAY_FROM_JULIAN                 | INTEGER   |
| DIM_TIME    | DAY_NUMBER_OVERALL              | INTEGER   |
| DIM_TIME    | DAY_NUMBER_IN_YEAR              | INTEGER   |
| DIM_TIME    | DAY_NUMBER_IN_QUARTER           | INTEGER   |
| DIM_TIME    | DAY_NUMBER_IN_MONTH             | INTEGER   |
| DIM_TIME    | DAY_NUMBER_IN_WEEK              | INTEGER   |
| DIM_TIME    | WEEKEND                         | VARCHAR   |
| DIM_TIME    | DAY_OF_WEEK                     | VARCHAR   |
| DIM_TIME    | TIME_KEY                        | DATE      |
| DIM_TIME    | TIME_KEY_SEQ                    | INTEGER   |
+-------------+---------------------------------+-----------+

-- Let's look at a few rows
select time_key_seq, time_key, yr, month_number_in_year, fiscal_year
, fiscal_month_number_in_year from dim_time;
+---------------+-------------+-------+-----------------------+--------------+------------------------------+
| TIME_KEY_SEQ  |  TIME_KEY   |  YR   | MONTH_NUMBER_IN_YEAR  | FISCAL_YEAR  | FISCAL_MONTH_NUMBER_IN_YEAR  |
+---------------+-------------+-------+-----------------------+--------------+------------------------------+
| 1             | 2000-01-01  | 2000  | 1                     | 2000         | 11                           |
| 2             | 2000-01-02  | 2000  | 1                     | 2000         | 11                           |
| 3             | 2000-01-03  | 2000  | 1                     | 2000         | 11                           |
| 4             | 2000-01-04  | 2000  | 1                     | 2000         | 11                           |
| 5             | 2000-01-05  | 2000  | 1                     | 2000         | 11                           |
| 6             | 2000-01-06  | 2000  | 1                     | 2000         | 11                           |
| 7             | 2000-01-07  | 2000  | 1                     | 2000         | 11                           |
| 8             | 2000-01-08  | 2000  | 1                     | 2000         | 11                           |
| 9             | 2000-01-09  | 2000  | 1                     | 2000         | 11                           |
| 10            | 2000-01-10  | 2000  | 1                     | 2000         | 11                           |
+---------------+-------------+-------+-----------------------+--------------+------------------------------+

Generating the Time Dimension is accomplished using DynamoDBs ability to include Java based UDF Table Functions. Table functions are really powerful – they allow a BI developer to write custom functions that output a “table” that can be queried like ANY OTHER TABLE (mostly). Check out the wiki page FarragoUdx if your interested.

And of course: download LucidDB and give it a whirl!

NOTE: To be candid, doing it as a view isn’t the best approach. For anything beyond tiny (5 million +) we should actually create the table, and do an INSERT INTO SELECT * FROM TABLE(fiscal_time_dimension).

Book Review: Pentaho Reporting 3.5 for Java Developers

3 Replies

I have two customers that if they had access to Will Gormans book, Pentaho Reporting 3.5 for Java Developers, they would not have needed me for their project! That’s how good the book is for those who need to embed Pentaho Reporting into their Java application.

The book is certainly geared towards Java developers, and specifically, developers you are trying to simply use the Pentaho reporting library. I’d venture to say that MOST customers should be using Pentaho; in this case, the book is useful as a reference, but the HOWTO past Chapter 3 would probably be lost on many users; except for Chapter 11 (see below).

However, for people trying to embed Pentaho reporting, WOW: THIS IS THE DEFINITIVE RESOURCE. Buy it, RIGHT NOW! The information it contains was locked in just a few peoples minds (Thomas, Bunch of People sitting at the “citadel” in Orlando aka Pentaho Employees, a handful of consultants). Will has unlocked it and I’m glad he did.

Will taught me something new in this book. In fact, I hope this is “new” in 3.5 which was release just a few weeks back. If it’s been around longer than I’m a total dolt. Chapter 11 covers how to add your own custom Expressions/Formulas to Pentaho (including the PRD).

At customer engagements, or when I put on my Pentaho hat and teach their public courses, or custom onsite training, I’m asked all the time: Can I make my own Reporting Functions and plug them into Pentaho Report Designer? Up until WIll showed me how to do it on page 281, I thought this was only possible for Pentaho (the company). Will gives us a step by step guide to add our own “DoMyCustomThing” to the Pentaho Report Designer. Customers can now create their own corporate expressions/functions they can leverage across hundreds of reports.

I’ll keep several copies on my shelf, and give it away to any current/future “embedded Pentaho Reporting” customers. Thanks Will for such a great book!

Goodman on BI

Thoughts on Open Source, Analytics