It this thing still on? Bueller, Bueller

My last post, was early 2013. That’s more than FIVE years ago. What, you may ask, has been keeping me away from technical musings here? Two things.

I haven’t felt like it. I started blogging in 2004. When I took a break, I had been blogging for almost 10 years. That’s a LONG time.
I’ve focused any additional time outside of my direct work to being a dad…. Yeah, my daughter turned 5 earllier this year. It’s been a busy 5 years. 😉

Now I’m just now getting the itch to publish some stuff. I have been DOING a ton. There’s some Machine Learning work, some ETL as a service stuff, some IoT things, etc. I am considering moving to medium though…

At first, it’s A then B testing not A/B testing

Lean Startup Method improves my showers!

1 Reply

I’ve been doing a little bit of reading, and opted for a quick weekend in LA to give myself a crash course in the Lean Startup method. Why? I wanted to develop some of my own abilities and instincts to move faster to a product/market fit. In my review of the last 3 years starting an open source DB company, I realized I was WILDLY under informed on what customers really wanted, needed. I am determined that, whatever I do next, I will spend MUCH more time working on customer development and product market fit than feature and solution development. The Lean Startup Method allows you work through, in a rough and tumble way, to QUICKLY disqualify your current business model/solution as viable.

That’s a pretty stock answer/elevator pitch for the Lean Startup method. What in the world does it have to do with my showers?

There are these people; inventors, entrepreneurs, creative types that get ideas ALL the time. They get an idea, they think about it’s place in the world. They think how’d they build this product. They think about what it would look like. How awesome it would be, etc. Many, myself included, typically see problems/solutions all the time. At the moment, I have no fewer than 15 ideas for new business that vary from Food Preparation, to Easy Re-ording of Groceries, to population/search analytics, to improving pace of play on golf courses, utility trading, etc.

These ideas percolate; I think about them in the shower. I think about them in the car. They pick at me; I think about how great they could be. I figure out a piece of the puzzle, put that into a mental cabinet and then go about my day. Days later, it pops back into my head, and I start working on another aspect of the idea. I call this my own Entrepreneurial Porn. It’s fluffy, stimulating but is a lazy time consuming alternative to actually building a business. I only stop thinking about an idea after a year or so has passed since I thought about it, or I find some critical problem with the SOLUTION that means it won’t work. This comes after a significant amount of mental power/effort is expended.

Here’s the real curse: I invent new ideas (into the queue) faster than they head out (solution won’t work). I turn over and spend many of my waking hours (probably sleeping ones too) thinking about, and creatively looking upon ideas that will go nowhere. Emotionally I feel like I’ve wasted this time and feel guilty about it, and it increases frustration internally knowing I’m not building a great business because I’m spending too much time on Entrepreneurial porn.

My hope now is that Lean Startup Method will provide me the ability to banish these ideas from my shower time, driving time, and other moments that I could be thinking about viable businesses, my loved ones and be a more present during the day. I now have the ability to spend a few hours (or day or two) to figure out why the idea won’t work; then I don’t have to spend the thoughtless and wasted hours of time mulling over the solution.

Point in case on the curse; while I write this blog from 38,000 ft I just got a new idea inspired by observing a woman using her Windows Tablet. One more idea to try and exorcise! Entrepreneurial Devils be Damned!

DynamoBI is dead, long live LucidDB!

5 Replies

To our Partners, Employees, Customers, Friends, and Community:

It is my unfortunate duty to inform all of you that DynamoBI is ceasing commercial operations October 31, 2012; we are immensely grateful for all the support that you all have shown our company, in so many different ways, over the past 3 years and we hope to make this shutdown as painless as possible for all involved. We know that we are not the only people who are invested in LucidDB, so we wanted to explain our rationale for shutting down along with the implications for the entire LucidDB community (not just our customers).

We started DynamoBI 3 years ago when we saw our most favorite open source project, LucidDB, finding limited prospects for adoption without a growth to full, commercial support which many (most!) companies need to be able to adopt open source software. We had been actively working with LucidDB for a long while, and knew that it is a fantastic piece of database/analytic software; to say that it’s a gem and provides some amazing capabilities in an open source package is an understatement.

However, markets and businesses are not quite as simple as having a great open source project and community. I think separately I may blog about the lessons learned from this startup (the entrepreneurial badge of honor #fail blog) but the community deserves to know that, for the most part, the failure to achieve success was about the market and selling environment (and our successes here) than any innate defects in LucidDB.

In short, we were not successful in the marketplace for two primary reasons:

1) In a crowded, loud market of more than 40+ Analytic data storage solutions, raw single query speed remains the singular priority. LucidDB often well improved over MySQL/Oracle but was not as fast as our Analytic peers. All of our other very interesting and compelling features (versioning of data, EII type connectors, pluggable/extensible systems) were often not even evaluated as we were often eliminated from evaluation based on the single raw query speed. LucidDB performs as advertised (great BI database, much faster than what you’re currently using), but that wound up not being enough.

2) Open Source price points are compelling for customers, but work only if you can build a high volume business. It became clear earlier this year (even with building enough cash flow to pay full time staff, etc!) that the size of our “funnel” was not large enough to support a high growth, interesting business. We determined that if we had X number of downloads we ended up with Y prospects that converted to Z customers at price A. We experimented with price, offering, prospect development, etc. We improved our conversion rates over time, but ultimately found that unless we could find some way to increase the mouth of the funnel by more than 100x we wouldn’t have a growing business that would allow us to continue/further our investment in LucidDB.

There are other reasons as well, many of which are missteps or mistakes by me personally. That could fill an entire other blog (and likely will at some point).

We’ve been working with our customers over the past few months to help them prepare for the future with us no longer providing the customer support. We’ve been communicating this message to them, and now we’re bringing it to the greater community about our future participation in LucidDB.

DynamoBI will:
1) Host the git repositories and continue to provide a legal contribution framework so that the IP for the project remains clean for all. The Apache license means that DynamoBI remains free and accessible for anyone/everyone wishing to use it (or parts of it).
2) Contribute any “interesting” pieces of the amazing framework to projects that can use it. In particular, we’re thrilled to see the Optiq project leveraging, as a starting point, some of the LucidDB components.
3) Host the forums, and wiki, and issue tracking for the LucidDB community as we have been for the past few years (http://luciddb.org).
4) Continue to participate as active users in the community; we are still fond of LucidDB and hope to see the community/project be successful.

However, DynamoBI will no longer:
1) Provide releases or builds. We’ve shut down our continuous integration server and do not plan on making any release after 0.9.4.
2) Offer any commercial services for LucidDB (consulting, services, sponsored development, etc).
3) Provide active development on the core project, or ancillary projects.

Once again, thank you for your support over the past few years and we encourage you to continue to look at LucidDB, even though we were unable to make it a commercial success. It has some very unique features that are a perfect fit for some use cases (Big Data access via BI tools, etc) that make it a great open source project.

Kind Regards,
Nick
Former CEO of DynamoBI Corporation

LucidDB has left Eigenbase moved to Apache License

3 Replies

This has been a long time in the making, but the LucidDB project is leaving the Eigenbase foundation to continue our development outside that organizations IP sharing, framework, and governance.

Community members will notice (or have already):

We are no longer using Perforce (YAAY!) and are now doing our primary LucidDB, Farrago, Fennel, and relevant extensions/test/build development work at github: https://github.com/dynamobi/luciddb/
The Wiki is now hosted at http://www.luciddb.org/wiki. We will, over time, remove references to Eigenbase in that project documentation/etc.
Issue tracking is now ALSO over at github, and we have migrated all issues (historical and outstanding) over to the github project.

Part of the impetus for leaving Eigenbase was our desire for a more inclusive license, to permit additional use/collaboration by other companies in the spirit of open source. We initiated this process, in good company and like minded individuals early last year. Long story short this plight and political battles cost Eigenbase the resignations of the two, most critical participants at Eigenbase: Julian Hyde and John V. Sichi. I join them now, as I resigned from the Eigenbase Board March 26.

Today I’m announcing that DynamoBI has released the entirety of the codebase, under the Apache Software License 2.0. We welcome our community members ongoing contributions, and hope that companies looking to leverage such a great framework and technology take a look. We welcome, wholeheartedly, your participation in the project under it’s new permissive license.

We continue to serve our existing customers with annual subscriptions to DynamoDB, our QA’ed and prepackaged commercial version of LucidDB.

Happy LucidDB-ing!

NoSQL Now 2011: Review of AdHoc Analytic Architectures

1 Reply

For those that weren’t able to attend the fantastic NoSQL Now Conference in San Jose last week, but are still interested in the slides about how people are doing Ad Hoc analytics on top of NoSQL data systems, here’s my slides from my presentation:

No sql now2011_review_of_adhoc_architectures

View more presentations from ngoodman

We obviously continue to hear from our community that LucidDB is a great solution sitting in front of a Big Data/NoSQL system. Allowing easy SQL access (including super fast, analytic database cached views) is a big win for reducing load *AND* increasing usability of data in NoSQL systems.

Splunk is NoSQL-eee and queryable via SQL

1 Reply

Last week at the Splunk user conference in San Francisco, Paul Sanford from the Splunk SDK team demoed a solution we helped assemble showing easy SQL access to data in Splunk. It was very well received and we look forward to helping Splunk users access their data via SQL and commodity BI tools!

While you won’t find it in their marketing materials, Splunk is a battle hardened, production grade NoSQL system (at least for analytics). They have a massively scalable, distributed compute engine (ie, Map-Reduce-eee) along with free form schema-less type data processing. So, while they don’t necessarily throw their ring into new NoSQL type projects (and let’s be honest, if you’re not Open Source it’d be hard to get selected for a new project) their thousands of customers have been very happy with their ingestion, alerting, and reporting on IT data and is very NoSQL-eee.

The SDK team has been working on opening up the underlying engine, framework so that additional developers can innovate and do some cool stuff on top of Splunk. Splunk developers from California (some of whom worked at LucidEra prior) kick started a project that gives LucidDB the ability to talk to Splunk hereby enabling SQL access to commodity BI tools. We’ve adopted the project, and built out some examples using Pentaho to show the power of SQL access to Splunk.

First off, our overall approach is similar to our existing ability to talk to CouchDB, Hive/Hadoop, etc remains the same.

We learn how to speak the remote language (in this case, Spunk search queries) generally. This means we can simply stream data across the wire in it’s entirety and then do all the SQL
We enable some rewrite rules so that if the remote system (Splunk) knows how to do things (such as simple filtering in the WHERE clause, or GROUP BY stats) we’ll rewrite our query and push more to the work down to the remote system.

Once we’ve done that, we can enable any BI tool (that can do things such as SQL, read catalogs/tables, enable metadata/etc) connect up and do cool things like drag and drop reports. Here’s some examples created using Pentaho’s Enterprise Edition 4.0 BI Suite (which looks great, btw!):

These dashboards were created, entirely within a Web Browser using Drag and Drop (via Pentaho’s modeling/report building capabilities). Total time to build these dashboards was less than 45 minutes including model definition and report building (caveat: I know Pentaho dashboards inside and out).

Splunk users can now access data in their Splunk system, including matching/mashing it with others simply and easily in everyday, inexpensive BI tools.

In fact, this project came about initially as a Splunk customer wanted to do some advanced visualization in Tableau. Using our experimental ODBC Connectivity the user was able to visualize their Splunk data in Tableau using some of their fantastic visualizations.

PDI Loading into LucidDB

1 Reply

By far, the most popular way for PDI users to load data into LucidDB is to use the PDI Streaming Loader. The streaming loader is a native PDI step that:

Enables high performance loading, directly over the network without the need for intermediate IO and shipping of data files.
Lets users choose more interesting (from a DW perspective) loading type into tables. In particular, in addition to simple INSERTs it allows for MERGE (aka UPSERT) and also UPDATE. All done, in the same, bulk loader.
Enables the metadata for the load to be managed, scheduled, and run in PDI.

However, we’ve had some known issues. In fact, until PDI 4.2 GA and LucidDB 0.9.4 GA it’s pretty problematic unless you run through the process of patching LucidDB outlined on this page: Known Issues.

In some ways, we have to admit, that we released this piece of software too soon. Early and often comes with some risk, and many have felt the pain of some of the issues that have been discovered with the streaming loader.

In some ways, we’ve built an unnatural approach to loading for PDI: PDI wants to PUSH data into a database. LucidDB wants to PULL data from remote sources, with it’s integrated ELT and DML based approach (with connectors to databases, salesforce, etc). Our streaming loader “fakes” a pull data source, and allows PDI to “push” into it.

There’s mutliple threads involved, when exceptions happen users have received cruddy error messages such as “Broken Pipe” that are unhelpful at best, frustrating at worse. Most all of these contortions will have sorted themselves out and by the time 4.2 GA PDI and 0.9.4 GA of LucidDB are released the streaming loader should be working A-OK. Some users would just assume avoid the patch instructions above and have posed the question: In a general sense, if not the streaming loader how would I load data into LucidDB?

Again, LucidDB likes to “pull” data from remote sources. One of those is CSV files. Here’s a nice, easy, quick (30k r/s on my MacBook) method to load a million rows using PDI and LucidDB:

This transformation outputs to a Text File 1 million rows, waits for that to complete then proceeds to the load that data into a new table in LucidDB. Step by Step the LucidDB statements

— Points LucidDB to the directory with the just generated flat file
— LucidDB has some defaults, and we can “guess” the datatypes by scanning the file
CREATE or replace SERVER csv_file_server FOREIGN DATA WRAPPER SYS_FILE_WRAPPER OPTIONS ( DIRECTORY ‘?’ );
— Let’s create a foreign table for the data file (“DATA.txt”) that was output by PDI
>create foreign table applib.data server csv_file_server;
— Create a staging, and load the data from the flat file (select * from applib.data)
CALL APPLIB.CREATE_TABLE_AS (‘APPLIB’, ‘STAGING_TABLE’, ‘select * from applib.data’, true);

We hope to have the streaming loader ready to go in 0.9.4 (LucidDB) and 4.2 (PDI). Until then, consider this easy, straight forward method of loading data that’s high performance, proven, and stable for loading data from PDI into LucidDB.

Example file: csv_luciddb_load.ktr