<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>

<channel>
	<title>Goodman on BI</title>
	<atom:link href="http://www.nicholasgoodman.com/bt/blog/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.nicholasgoodman.com/bt/blog</link>
	<description>Musings on reporting, OLAP, ETL, open source</description>
	<pubDate>Thu, 02 Sep 2010 17:53:19 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6.5</generator>
	<language>en</language>
			<item>
		<title>LucidDB has a new Logo/Mascot</title>
		<link>http://www.nicholasgoodman.com/bt/blog/2010/09/02/luciddb-has-a-new-logomascot/</link>
		<comments>http://www.nicholasgoodman.com/bt/blog/2010/09/02/luciddb-has-a-new-logomascot/#comments</comments>
		<pubDate>Thu, 02 Sep 2010 17:53:19 +0000</pubDate>
		<dc:creator>Nicholas Goodman</dc:creator>
		
		<category><![CDATA[DynamoBI]]></category>

		<category><![CDATA[Open Source]]></category>

		<guid isPermaLink="false">http://www.nicholasgoodman.com/bt/blog/?p=496</guid>
		<description><![CDATA[At yesterdays Eigenbase Developer Meetup at SQLstream&#8217;s offices in San Francisco we arrived at a new logo for LucidDB.  DynamoBI is thrilled to have supported and funded the design contest to arrive at our new mascot.  Over the coming months you&#8217;ll see the logo make it&#8217;s way out to the existing luciddb.org sites, wiki sites, [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.nicholasgoodman.com/bt/blog/wp-content/uploads/2010/09/5383087-smallcrop.gif"><img class="alignleft size-full wp-image-495" title="luciddb-small" src="http://www.nicholasgoodman.com/bt/blog/wp-content/uploads/2010/09/5383087-smallcrop.gif" alt="" width="180" height="140" /></a>At yesterdays <a href="http://www.meetup.com/San-Francisco-Eigenbase-Developers/calendar/14311008/">Eigenbase Developer Meetup</a> at <a href="http://sqlstream.com/">SQLstream</a>&#8217;s offices in San Francisco we arrived at a new logo for LucidDB.  DynamoBI is thrilled to have supported and funded the design contest to arrive at our new mascot.  Over the coming months you&#8217;ll see the logo make it&#8217;s way out to the existing <a href="http://www.luciddb.org">luciddb.org</a> sites, wiki sites, etc.  I&#8217;m really happy to have a logo that matches the nature of our database -<strong> BAD ASS!</strong></p>
]]></content:encoded>
			<wfw:commentRss>http://www.nicholasgoodman.com/bt/blog/2010/09/02/luciddb-has-a-new-logomascot/feed/</wfw:commentRss>
		</item>
		<item>
		<title>SaaS or On Site?  Who cares with Pentaho On Demand</title>
		<link>http://www.nicholasgoodman.com/bt/blog/2010/06/08/saas-or-on-site-who-cares-with-pentaho-on-demand/</link>
		<comments>http://www.nicholasgoodman.com/bt/blog/2010/06/08/saas-or-on-site-who-cares-with-pentaho-on-demand/#comments</comments>
		<pubDate>Tue, 08 Jun 2010 15:54:04 +0000</pubDate>
		<dc:creator>Nicholas Goodman</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.nicholasgoodman.com/bt/blog/?p=490</guid>
		<description><![CDATA[Pentaho launched their On Demand initiative today: Press Release.
While the launch is new, I know that Pentaho has already onboarded some customers in a quiet soft launch and the response has been very positive.  Why wouldn&#8217;t it be?  This offering is the best of both worlds, and makes purchasing a business department driven BI project [...]]]></description>
			<content:encoded><![CDATA[<p>Pentaho launched their <a href="http://www.pentaho.com/services/on-demand/">On Demand</a> initiative today: <a href="http://www.pentaho.com/news/releases/20100608_pentaho_announces_on-demand_BI.php" target="_blank">Press Release</a>.</p>
<p>While the launch is new, I know that Pentaho has already onboarded some customers in a quiet soft launch and the response has been very positive.  Why wouldn&#8217;t it be?  This offering is the best of both worlds, and makes purchasing a business department driven BI project easy.</p>
<p>SaaS BI&#8217;s key selling point (there are many small, nice to haves, but the thing that gets people to reach for their wallet) is the ability to get a solution with an almost total lack of IT involvement.  Throw in non cap-ex expenditures for the solution (4k USD / mo instead of a 30k license) and it&#8217;s a huge win.</p>
<p>Business users have their data (feeds, dumps, extracts, or connections to DBs), have the budget but then need the tools/expertise to get their BI system &#8220;up and running.&#8221;  SaaS and On Demand BI is a perfect fit for these customers - up and running quickly.  Where it breaks down, is that with a SaaS offering, once you go SaaS you can&#8217;t EVER go back.  You&#8217;re stuck with a solution built entirely upon a proprietary, vendor controlled software and infrastructure.</p>
<p>Recap:</p>
<blockquote><p>Biggest draw of SaaS is quick easy startup without IT, and smaller monthly pay as you go<br />
Biggest drawback of SaaS is lock in like you&#8217;ve never seen before.  Not just software, but operations as well.</p></blockquote>
<p><em><strong>That&#8217;s why I find the Pentaho On Demand BI initiative to hit the middle of the sweet (suite?) spot.  It is absolutely the best of both worlds. </strong></em></p>
<p>Their On Demand offering allows</p>
<ul>
<li>Business sponsors to get a complete BI suite up and running quickly (72 hr challenge is wicked cool) without IT involvement (or minimal).</li>
<li>Incremental, pay as you go billing.  This is huge - not sure your BI project will generate a return?  Spin up Pentaho On Demand, do the 72 hr challenge, and shop it around the demo to the users for a month or two.</li>
<li>This is what knocks it out of the park though; Once you&#8217;re done with your eval/build out you have so many options whereas with pure SaaS you don&#8217;t.  Not what the users wanted? No problem: throw it away.  Is what they want?  Keep going On Demand?  On Demand not something your IT likes -&gt; bring it On Site.  That&#8217;s right, Pentaho and their hosting partners have designed their On Demand offering to permit shutdown and transport of the machine VMWare images in house.  Even without that, just getting your Pentaho Solution you can install Pentaho locally and run it in house with your reports/cubes/dashboards.</li>
</ul>
<p>On Demand BI won&#8217;t be for everyone - it&#8217;s still geared towards people wanting a complete suite of BI tools, and are willing to pay for a private, secure Pentaho instance in the cloud.  SaaS BI will still be better for people who actually prefer ZERO infrastructure and for those that don&#8217;t have the budget for a private infrastructure (if you have $100 / mo for BI, this isn&#8217;t right for you).</p>
<p>Great news for &#8220;<em>frustrated with IT but still wanting to build out a real, long term BI solution</em>&#8221; analysts everywhere.  <img src='http://www.nicholasgoodman.com/bt/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /></p>
]]></content:encoded>
			<wfw:commentRss>http://www.nicholasgoodman.com/bt/blog/2010/06/08/saas-or-on-site-who-cares-with-pentaho-on-demand/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Encrypt PDI passwords</title>
		<link>http://www.nicholasgoodman.com/bt/blog/2010/01/29/encrypt-pdi-passwords/</link>
		<comments>http://www.nicholasgoodman.com/bt/blog/2010/01/29/encrypt-pdi-passwords/#comments</comments>
		<pubDate>Fri, 29 Jan 2010 20:33:32 +0000</pubDate>
		<dc:creator>Nicholas Goodman</dc:creator>
		
		<category><![CDATA[Data Integration (Kettle)]]></category>

		<category><![CDATA[Pentaho]]></category>

		<guid isPermaLink="false">http://www.nicholasgoodman.com/bt/blog/2010/01/29/encrypt-pdi-passwords/</guid>
		<description><![CDATA[PDI has a basic obfuscation method for making it difficult for casual people to lift passwords for DB connections.  I have customers that maintain different versions of a &#8220;shared.xml&#8221; file that maintain different physical connections to databases (think development, QA/testing, and production).
In order to generate the different shared.xml, a user has to usually (per [...]]]></description>
			<content:encoded><![CDATA[<p>PDI has a basic obfuscation method for making it difficult for casual people to lift passwords for DB connections.  I have customers that maintain different versions of a &#8220;shared.xml&#8221; file that maintain different physical connections to databases (think development, QA/testing, and production).</p>
<p>In order to generate the different shared.xml, a user <span style="text-decoration: line-through;">has to</span> <em>usually (per Matt Casters comment below there is a utility that allows user to do this outside of Spoon)</em> open up PDI, created the connections, save them, and then sometimes copy and paste the sections needed to create their &#8220;dev&#8221; version of shared.xml or their &#8220;production&#8221; version of shared.xml.  Many times this just to generate the password, as they can hand edit the other pieces (hostname, schema, etc).</p>
<p>I just committed a <a href="http://source.pentaho.org/svnkettleroot/Kettle/trunk/samples/transformations/Encrypt%20Password.ktr">quick little PDI transformation</a> that gives you the PDI encrypted form of a password.</p>
<p><a onclick="window.open('http://www.nicholasgoodman.com/bt/blog/wp-content/uploads/2010/01/201001291332.jpg','popup','width=659,height=313,scrollbars=no,resizable=yes,toolbar=no,directories=no,location=no,menubar=no,status=yes,left=0,top=0');return false" href="http://www.nicholasgoodman.com/bt/blog/wp-content/uploads/2010/01/201001291332.jpg"><img src="http://www.nicholasgoodman.com/bt/blog/wp-content/uploads/2010/01/201001291332-tm.jpg" border="1" alt="201001291332" hspace="4" vspace="4" width="210" height="100" /></a></p>
<p>Happy Password Encrypting!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.nicholasgoodman.com/bt/blog/2010/01/29/encrypt-pdi-passwords/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Asking this question means you don&#8217;t get BI market</title>
		<link>http://www.nicholasgoodman.com/bt/blog/2009/11/23/asking-this-question-means-you-dont-get-bi-market/</link>
		<comments>http://www.nicholasgoodman.com/bt/blog/2009/11/23/asking-this-question-means-you-dont-get-bi-market/#comments</comments>
		<pubDate>Tue, 24 Nov 2009 03:43:09 +0000</pubDate>
		<dc:creator>Nicholas Goodman</dc:creator>
		
		<category><![CDATA[General BI]]></category>

		<guid isPermaLink="false">http://www.nicholasgoodman.com/bt/blog/?p=479</guid>
		<description><![CDATA[In almost every technology company, if you&#8217;re explaining your business model savvy technology executives ask the question:
Who are you selling this too?  What&#8217;s his/her title, where does he work?  What&#8217;s the size of their company?
It&#8217;s a question that helps the questioner understand, and the responder clarify exactly who is buying the product.  This is critical [...]]]></description>
			<content:encoded><![CDATA[<p>In almost every technology company, if you&#8217;re explaining your business model savvy technology executives ask the question:</p>
<blockquote><p>Who are you selling this too?  What&#8217;s his/her title, where does he work?  What&#8217;s the size of their company?</p></blockquote>
<p>It&#8217;s a question that helps the questioner understand, and the responder clarify <strong><em>exactly </em></strong>who is buying the product.  This is critical for a business!  Is it the System Administrator manager who is looking for his DBAs to coordinate their efforts (groupware for DBAs)?  Is it a CRM system that the business users are primarily evaluating for use (salesforce/sugarcrm) but requires huge IT investment for configuration/integration??  For a software or IT services provider, deciding WHO you sell to (Business Users or IT) is hugely important!</p>
<p><em><strong>Asking this question when that technology or services is in Business Intelligence is just plain useless though.</strong></em> Business Intelligence is always a mix of the two.  IT?  Sometimes they&#8217;re the ones buying, but never without HUGE amounts of time spent with the business side (casual report developers and business analysts).  Business Users buying tableau, and coordinating parts of the purchase or data access with IT?  Yup.   Analysts embedded with business teams buying SAS/SPSS, through an IT purchasing process?  Sure thing.</p>
<p>Business Intelligence is always sold to two groups at once which makes it a tricky thing to sell.  Anyone reading this, consider how much tension you&#8217;ve observed between your IT/Business groups.  Trying to get dead in the middle on this is a tricky proposition.</p>
<p>Business Intelligence sales guy earn their money for sure!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.nicholasgoodman.com/bt/blog/2009/11/23/asking-this-question-means-you-dont-get-bi-market/feed/</wfw:commentRss>
		</item>
		<item>
		<title>DynamoDB: Time Dimension table with MERGE</title>
		<link>http://www.nicholasgoodman.com/bt/blog/2009/11/20/dynamodb-time-dimension-table-with-merge/</link>
		<comments>http://www.nicholasgoodman.com/bt/blog/2009/11/20/dynamodb-time-dimension-table-with-merge/#comments</comments>
		<pubDate>Sat, 21 Nov 2009 00:48:09 +0000</pubDate>
		<dc:creator>Nicholas Goodman</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.nicholasgoodman.com/bt/blog/?p=469</guid>
		<description><![CDATA[So, even with my disclaimer note on the last blog, the DynamoDB developers slapped me around a bit for suggesting using a view for a Time Dimension.  The Time Dimension is the most important dimension table and should be an actual table, not a view.  Creating the table allows us to perform all [...]]]></description>
			<content:encoded><![CDATA[<p>So, even with my disclaimer note on the last blog, the DynamoDB developers slapped me around a bit for suggesting using a <a href="http://www.nicholasgoodman.com/bt/blog/2009/11/20/dynamodb-built-in-time-dimension-support/">view for a Time Dimension</a>.  The Time Dimension is the most important dimension table and should be an actual table, not a view.  Creating the table allows us to perform all kinds of optimizations like star joins, bitmap indexes on attributes, etc.  Probably wouldn&#8217;t be that big of a deal for a tiny fact table (&lt; 5million records) but you&#8217;ll want CREATE TABLE if you want good performance.</p>
<p>Good news is, that we can use our exact same table function (with fiscal year offset)</p>
<blockquote><p>select * from table(applib.fiscal_time_dimension (2000, 1, 1, 2009, 12, 31, 3))</p></blockquote>
<p>to populate and keep our Time Dimension TABLE up to date.</p>
<p>If you use a TABLE, it&#8217;s 2 steps:</p>
<ol>
<li>CREATE TABLE : &#8220;dim_time&#8221;</li>
<li>POPULATE TABLE : &#8220;merge into dim_time&#8221;</li>
</ol>
<p>We&#8217;ll be using another great tool in the DynamoDB / LucidDB toolkit, the <a href="http://pub.eigenbase.org/wiki/LucidDbUpsert">MERGE</a> statement.  The <a href="http://pub.eigenbase.org/wiki/LucidDbUpsert">MERGE</a> statement is a logical UPSERT.  It checks to see if key is already present.  If it is, we UPDATE the table.  If it isn&#8217;t, we INSERT it into the table.  I&#8217;ll go into more detail at some point in the future as MERGE is crucial for keeping dimensions up to date.</p>
<p>Let&#8217;s create our Time Dimension table:</p>
<pre>create table dim_time (
FISCAL_YEAR_END_DATE DATE
, FISCAL_YEAR_START_DATE DATE
... ABBREVIATED ...
, TIME_KEY DATE
, TIME_KEY_SEQ INTEGER
,constraint dim_time_pk primary key (day_from_julian));</pre>
<p>NOTE: We&#8217;ve abbreviated the statements, but all the columns are used in the actual scripts.  We also should add bitmap indexes on YR, MONTH, etc columns.</p>
<p>We&#8217;ve now got a TABLE that matches the VIEW we created in the previous blog.  We&#8217;ve made day_from_julian as our PK, and we&#8217;ll use this date as our key for the MERGE statement.  We can run this query as many times as we like and it will always just keep our &#8220;dim_time&#8221; table up to date.</p>
<pre>merge into dim_time using (select * from
      table(applib.fiscal_time_dimension (2000, 1, 1, 2010, 12, 31, 3))) src
on dim_time.day_from_julian = src.day_from_julian
when matched then UPDATE set
FISCAL_YEAR_END_DATE=src.FISCAL_YEAR_END_DATE
,FISCAL_YEAR_START_DATE=src.FISCAL_YEAR_START_DATE
... ABBREVIATED ...
,TIME_KEY=src.TIME_KEY
,TIME_KEY_SEQ=src.TIME_KEY_SEQ
when not matched then INSERT
(FISCAL_YEAR_END_DATE
 , FISCAL_YEAR_START_DATE
... ABBREVIATED ...
 , TIME_KEY
 , TIME_KEY_SEQ)
values(
src.FISCAL_YEAR_END_DATE
 , src.FISCAL_YEAR_START_DATE
... ABBREVIATED ...
 , src.TIME_KEY
 , src.TIME_KEY_SEQ);</pre>
]]></content:encoded>
			<wfw:commentRss>http://www.nicholasgoodman.com/bt/blog/2009/11/20/dynamodb-time-dimension-table-with-merge/feed/</wfw:commentRss>
		</item>
		<item>
		<title>DynamoDB: Built in Time Dimension support!</title>
		<link>http://www.nicholasgoodman.com/bt/blog/2009/11/20/dynamodb-built-in-time-dimension-support/</link>
		<comments>http://www.nicholasgoodman.com/bt/blog/2009/11/20/dynamodb-built-in-time-dimension-support/#comments</comments>
		<pubDate>Fri, 20 Nov 2009 21:23:22 +0000</pubDate>
		<dc:creator>Nicholas Goodman</dc:creator>
		
		<category><![CDATA[DynamoBI]]></category>

		<category><![CDATA[Open Source]]></category>

		<guid isPermaLink="false">http://www.nicholasgoodman.com/bt/blog/?p=465</guid>
		<description><![CDATA[DynamoDB (aka LucidDB) is not just another column store database.  Our goal is being the best database for actually doing Business Intelligence; while that means being fast and handling large amounts of data there&#8217;s a lot of other things BI consultant/developers need.  I&#8217;ll continue to post about some of the great BI features [...]]]></description>
			<content:encoded><![CDATA[<p>DynamoDB (aka <a href="http://www.luciddb.org">LucidDB</a>) is not just another column store database.  Our goal is being the best database for actually <em>doing</em> Business Intelligence; while that means being fast and handling large amounts of data there&#8217;s a lot of other things BI consultant/developers need.  I&#8217;ll continue to post about some of the great BI features that DynamoDB has for the modern datasmiths.</p>
<p>First feature to cover that&#8217;s dead easy, is the built in ability to <a href="http://pub.eigenbase.org/wiki/LucidDbAppLib_FISCAL_TIME_DIMENSION">generate a time dimension</a>, including a Fiscal Calendar attributes.  If you&#8217;re using Mondrian (or come to that, your own custom SQL on a star schema) you need to have a time dimension.  <strong>Time is the most important dimension!</strong>  Every OLAP model I&#8217;ve ever built uses one!  It something that you, as a datasmith will need to do with every project; <strong>that&#8217;s why we&#8217;ve built it right into our database</strong>.</p>
<p>Here&#8217;s a dead simple way to create a fully baked, ready to use Time Dimension to use with Mondrian.</p>
<pre>-- Create a view that is our time dimension for 10 years, with our
-- Fiscal calendar starting in March (3)
create view dim_time as select * from
table(applib.fiscal_time_dimension (2000, 1, 1, 2009, 12, 31, 3));
</pre>
<p><strong>OK, that&#8217;s it.  You&#8217;ve created a Time Dimension!  </strong><em>* see NOTE at end of post.</p>
<p></em>So, we&#8217;ve created our time dimension, complete with a Fiscal calendar for 10 years in a single statement!  Awesome - but what does it contain?</p>
<pre>
-- Structure of new time dimension
select "TABLE_NAME", "COLUMN_NAME", "DATATYPE" from sys_root.dba_columns
where table_name = 'DIM_TIME';
+-------------+---------------------------------+-----------+
| TABLE_NAME  |           COLUMN_NAME           | DATATYPE  |
+-------------+---------------------------------+-----------+
| DIM_TIME    | FISCAL_YEAR_END_DATE            | DATE      |
| DIM_TIME    | FISCAL_YEAR_START_DATE          | DATE      |
| DIM_TIME    | FISCAL_QUARTER_NUMBER_IN_YEAR   | INTEGER   |
| DIM_TIME    | FISCAL_QUARTER_END_DATE         | DATE      |
| DIM_TIME    | FISCAL_QUARTER_START_DATE       | DATE      |
| DIM_TIME    | FISCAL_MONTH_NUMBER_IN_YEAR     | INTEGER   |
| DIM_TIME    | FISCAL_MONTH_NUMBER_IN_QUARTER  | INTEGER   |
| DIM_TIME    | FISCAL_MONTH_END_DATE           | DATE      |
| DIM_TIME    | FISCAL_MONTH_START_DATE         | DATE      |
| DIM_TIME    | FISCAL_WEEK_NUMBER_IN_YEAR      | INTEGER   |
| DIM_TIME    | FISCAL_WEEK_NUMBER_IN_QUARTER   | INTEGER   |
| DIM_TIME    | FISCAL_WEEK_NUMBER_IN_MONTH     | INTEGER   |
| DIM_TIME    | FISCAL_WEEK_END_DATE            | DATE      |
| DIM_TIME    | FISCAL_WEEK_START_DATE          | DATE      |
| DIM_TIME    | FISCAL_DAY_NUMBER_IN_YEAR       | INTEGER   |
| DIM_TIME    | FISCAL_DAY_NUMBER_IN_QUARTER    | INTEGER   |
| DIM_TIME    | FISCAL_YEAR                     | INTEGER   |
| DIM_TIME    | YEAR_END_DATE                   | DATE      |
| DIM_TIME    | YEAR_START_DATE                 | DATE      |
| DIM_TIME    | QUARTER_END_DATE                | DATE      |
| DIM_TIME    | QUARTER_START_DATE              | DATE      |
| DIM_TIME    | MONTH_END_DATE                  | DATE      |
| DIM_TIME    | MONTH_START_DATE                | DATE      |
| DIM_TIME    | WEEK_END_DATE                   | DATE      |
| DIM_TIME    | WEEK_START_DATE                 | DATE      |
| DIM_TIME    | CALENDAR_QUARTER                | VARCHAR   |
| DIM_TIME    | YR                              | INTEGER   |
| DIM_TIME    | QUARTER                         | INTEGER   |
| DIM_TIME    | MONTH_NUMBER_OVERALL            | INTEGER   |
| DIM_TIME    | MONTH_NUMBER_IN_YEAR            | INTEGER   |
| DIM_TIME    | MONTH_NUMBER_IN_QUARTER         | INTEGER   |
| DIM_TIME    | MONTH_NAME                      | VARCHAR   |
| DIM_TIME    | WEEK_NUMBER_OVERALL             | INTEGER   |
| DIM_TIME    | WEEK_NUMBER_IN_YEAR             | INTEGER   |
| DIM_TIME    | WEEK_NUMBER_IN_QUARTER          | INTEGER   |
| DIM_TIME    | WEEK_NUMBER_IN_MONTH            | INTEGER   |
| DIM_TIME    | DAY_FROM_JULIAN                 | INTEGER   |
| DIM_TIME    | DAY_NUMBER_OVERALL              | INTEGER   |
| DIM_TIME    | DAY_NUMBER_IN_YEAR              | INTEGER   |
| DIM_TIME    | DAY_NUMBER_IN_QUARTER           | INTEGER   |
| DIM_TIME    | DAY_NUMBER_IN_MONTH             | INTEGER   |
| DIM_TIME    | DAY_NUMBER_IN_WEEK              | INTEGER   |
| DIM_TIME    | WEEKEND                         | VARCHAR   |
| DIM_TIME    | DAY_OF_WEEK                     | VARCHAR   |
| DIM_TIME    | TIME_KEY                        | DATE      |
| DIM_TIME    | TIME_KEY_SEQ                    | INTEGER   |
+-------------+---------------------------------+-----------+

-- Let's look at a few rows
select time_key_seq, time_key, yr, month_number_in_year, fiscal_year
, fiscal_month_number_in_year from dim_time;
+---------------+-------------+-------+-----------------------+--------------+------------------------------+
| TIME_KEY_SEQ  |  TIME_KEY   |  YR   | MONTH_NUMBER_IN_YEAR  | FISCAL_YEAR  | FISCAL_MONTH_NUMBER_IN_YEAR  |
+---------------+-------------+-------+-----------------------+--------------+------------------------------+
| 1             | 2000-01-01  | 2000  | 1                     | 2000         | 11                           |
| 2             | 2000-01-02  | 2000  | 1                     | 2000         | 11                           |
| 3             | 2000-01-03  | 2000  | 1                     | 2000         | 11                           |
| 4             | 2000-01-04  | 2000  | 1                     | 2000         | 11                           |
| 5             | 2000-01-05  | 2000  | 1                     | 2000         | 11                           |
| 6             | 2000-01-06  | 2000  | 1                     | 2000         | 11                           |
| 7             | 2000-01-07  | 2000  | 1                     | 2000         | 11                           |
| 8             | 2000-01-08  | 2000  | 1                     | 2000         | 11                           |
| 9             | 2000-01-09  | 2000  | 1                     | 2000         | 11                           |
| 10            | 2000-01-10  | 2000  | 1                     | 2000         | 11                           |
+---------------+-------------+-------+-----------------------+--------------+------------------------------+
</pre>
<p>Generating the Time Dimension is accomplished using DynamoDBs ability to include Java based UDF Table Functions.  Table functions are really powerful - they allow a BI developer to write custom functions that output a &#8220;table&#8221; that can be queried like ANY OTHER TABLE (<em>mostly</em>).  Check out the wiki page <a href="http://pub.eigenbase.org/wiki/FarragoUdx">FarragoUdx</a> if your interested.</p>
<p>And of course: download LucidDB and give it a whirl!</p>
<p><strong><em>NOTE: </em></strong><em>To be candid, doing it as a view isn&#8217;t the best approach.  For anything beyond tiny (5 million +) we should actually create the table, and do an INSERT INTO SELECT * FROM TABLE(fiscal_time_dimension).</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.nicholasgoodman.com/bt/blog/2009/11/20/dynamodb-built-in-time-dimension-support/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Book Review: Pentaho Reporting 3.5 for Java Developers</title>
		<link>http://www.nicholasgoodman.com/bt/blog/2009/11/09/book-review-pentaho-reporting-35-for-java-developers/</link>
		<comments>http://www.nicholasgoodman.com/bt/blog/2009/11/09/book-review-pentaho-reporting-35-for-java-developers/#comments</comments>
		<pubDate>Tue, 10 Nov 2009 03:28:27 +0000</pubDate>
		<dc:creator>Nicholas Goodman</dc:creator>
		
		<category><![CDATA[Open Source]]></category>

		<guid isPermaLink="false">http://www.nicholasgoodman.com/bt/blog/2009/11/09/book-review-pentaho-reporting-35-for-java-developers/</guid>
		<description><![CDATA[I have two customers that if they had access to Will Gormans book, Pentaho Reporting 3.5 for Java Developers, they would not have needed me for their project!  That&#8217;s how good the book is for those who need to embed Pentaho Reporting into their Java application.
The book is certainly geared towards Java developers, and [...]]]></description>
			<content:encoded><![CDATA[<p>I have two customers that if they had access to <a href="http://www.willgorman.com/">Will Gormans</a> book, <a href="http://www.packtpub.com/pentaho-reporting-3-5-for-java-developers?utm_source=nicholasgoodman.com&amp;utm_medium=bookrev&amp;utm_content=blog&amp;utm_campaign=mdb_000903">Pentaho Reporting 3.5 for Java Developers</a>, they would not have needed me for their project!  That&#8217;s how good the book is for those who need to embed Pentaho Reporting into their Java application.</p>
<p>The book is certainly geared towards Java developers, and specifically, developers you are trying to simply use the Pentaho reporting library.  I&#8217;d venture to say that MOST customers should be using Pentaho; in this case, the book is useful as a reference, but the HOWTO past Chapter 3 would probably be lost on many users; except for Chapter 11 (see below).</p>
<p>However, for people trying to embed Pentaho reporting, <strong>WOW</strong>:  <strong>THIS IS THE DEFINITIVE RESOURCE</strong>.  Buy it, RIGHT NOW!  The information it contains <strong><em>was</em></strong> locked in just a few peoples minds (Thomas, Bunch of People sitting at the &#8220;citadel&#8221; in Orlando aka Pentaho Employees, a handful of consultants).  Will has unlocked it and I&#8217;m glad he did.</p>
<p>Will taught me something new in this book.  In fact, I hope this is &#8220;new&#8221; in 3.5 which was release just a few weeks back.  If it&#8217;s been around longer than I&#8217;m a total dolt.  Chapter 11 covers how to add your own custom Expressions/Formulas to Pentaho (including the PRD).</p>
<p>At customer engagements, or when I put on my Pentaho hat and teach their public courses, or custom onsite training, I&#8217;m asked all the time: <strong>Can I make my own Reporting Functions and plug them into Pentaho Report Designer?  </strong>Up until WIll showed me how to do it on page 281, I thought this was only possible for Pentaho (the company).  Will gives us a step by step guide to add our own &#8220;DoMyCustomThing&#8221; to the Pentaho Report Designer.  Customers can now create their own corporate expressions/functions they can leverage across hundreds of reports.</p>
<p>I&#8217;ll keep several copies on my shelf, and give it away to any current/future &#8220;embedded Pentaho Reporting&#8221; customers.  Thanks Will for such a great book!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.nicholasgoodman.com/bt/blog/2009/11/09/book-review-pentaho-reporting-35-for-java-developers/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Dreamhost Uptime Numbers are TERRIBLE!</title>
		<link>http://www.nicholasgoodman.com/bt/blog/2009/11/04/dreamhost-uptime-numbers-are-terrible/</link>
		<comments>http://www.nicholasgoodman.com/bt/blog/2009/11/04/dreamhost-uptime-numbers-are-terrible/#comments</comments>
		<pubDate>Thu, 05 Nov 2009 05:20:12 +0000</pubDate>
		<dc:creator>Nicholas Goodman</dc:creator>
		
		<category><![CDATA[Technology Industry]]></category>

		<guid isPermaLink="false">http://www.nicholasgoodman.com/bt/blog/2009/11/04/dreamhost-uptime-numbers-are-terrible/</guid>
		<description><![CDATA[I don&#8217;t care what their marketing stats say, I have my own indepedent verification.  I&#8217;ve been using Wormly for quite a while monitoring some of my demo sites, and other services that are part of Bayon and part of Dynamo.  Since I was already paying for it, I figured I&#8217;d turn it loose [...]]]></description>
			<content:encoded><![CDATA[<p>I don&#8217;t care what their marketing stats say, I have my own indepedent verification.  I&#8217;ve been using Wormly for quite a while monitoring some of my demo sites, and other services that are part of Bayon and part of Dynamo.  Since I was already paying for it, I figured I&#8217;d turn it loose on this blog (nicholasgoodman.com) and see what the uptime was like.</p>
<p>I always thought Dreamhost was a little skiddish, and my email box finds approximately one email per day with a failure, but i figured they were small, single request failures.  Nope.  The independent measuring of the uptime of this blog is a CRUDDY, CRAPPY, 97.6%.</p>
<p><a href="http://www.nicholasgoodman.com/bt/blog/wp-content/uploads/2009/11/200911042116.jpg" onclick="window.open('http://www.nicholasgoodman.com/bt/blog/wp-content/uploads/2009/11/200911042116.jpg','popup','width=867,height=299,scrollbars=no,resizable=yes,toolbar=no,directories=no,location=no,menubar=no,status=yes,left=0,top=0');return false"><img src="http://www.nicholasgoodman.com/bt/blog/wp-content/uploads/2009/11/200911042116-tm.jpg" height="100" width="289" border="1" hspace="4" vspace="4" alt="200911042116" /></a><br />
That&#8217;s pathetic!  My blog is nothing special, an out of the box Wordpress installation backed by their MySQL.  I haven&#8217;t done any of my own installations, customizations (excepting a theme) and yet my blog uptime is awful.  I&#8217;ve liked the dreamhost panel; it gives the &#8220;technical but uninterested in actually administering their own server&#8221; user a lot of power and I&#8217;d be willing to tolerate a little downtime (truthfully, anything above 99.5% is OK with me).  But 97% uptime?  Shyeah&#8230; Time to start looking.</p>
<p>Anyone have any suggestions for good Wordpress / PHP / MySQL hosts?  WIlling to pay top dollar and I&#8217;ll bring with me registrations for about 25 domains.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.nicholasgoodman.com/bt/blog/2009/11/04/dreamhost-uptime-numbers-are-terrible/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Instant Relief from MySQL Reporting Queries: Incremental Updates</title>
		<link>http://www.nicholasgoodman.com/bt/blog/2009/11/03/instant-relief-from-mysql-reporting-queries-incremental-updates/</link>
		<comments>http://www.nicholasgoodman.com/bt/blog/2009/11/03/instant-relief-from-mysql-reporting-queries-incremental-updates/#comments</comments>
		<pubDate>Wed, 04 Nov 2009 05:37:54 +0000</pubDate>
		<dc:creator>Nicholas Goodman</dc:creator>
		
		<category><![CDATA[DynamoBI]]></category>

		<guid isPermaLink="false">http://www.nicholasgoodman.com/bt/blog/2009/11/03/instant-relief-from-mysql-reporting-queries-incremental-updates/</guid>
		<description><![CDATA[Yesterday, I covered how you can do an initial &#8220;replication&#8221; of data from MySQL to DynamoDB and how this can improve performance, and save storage space.  The follow on question becomes:  
That&#8217;s Great Nick.  But how do I do keep my data up to date?
We&#8217;ve got data in our Airline Performance dataset [...]]]></description>
			<content:encoded><![CDATA[<p>Yesterday, I covered how you can do an <a href="http://www.nicholasgoodman.com/bt/blog/2009/11/02/instant-relief-from-slow-mysql-reporting-queries-using-dynamodb/">initial &#8220;replication&#8221; of data from MySQL to DynamoDB</a> and how this can improve performance, and save storage space.  The follow on question becomes:  </p>
<p><strong>That&#8217;s Great Nick.  But how do I do keep my data up to date?</strong></p>
<p>We&#8217;ve got data in our Airline Performance dataset through 31-DEC-2007.  I loaded 1 year, all of 2007, for the previous example.  What happens when the FAA publishes their 2008 January results, and we&#8217;ve loaded the new months worth of data into MySQL?</p>
<p>MySQL:</p>
<blockquote><p>select count(*) from otp.ontime; <strong>8061223</strong><br />
select count(*) from ontime where FlightDate &gt; &#8216;2007-12-31&#8242;; <strong>605765</strong><br />
select count(*) from ontime where FlightDate &lt;= &#8216;2007-12-31&#8242;; <strong>7455458</strong></p></blockquote>
<p>DynamoDB:</p>
<blockquote><p>select count(*) from FASTER.&#8221;ontime&#8221;; <strong>7455458</strong> </p></blockquote>
<p>So, we&#8217;ve added approximately 600k new records to our source system that we don&#8217;t have in our reporting system.  How do we incrementally insert these records and get just the 600k new rows into our DynamoDB reporting instance?</p>
<p>Easy Easy Easy.</p>
<p>We&#8217;ve already done all the work, all we have to do is simply get records we haven&#8217;t processed yet!  Should take just a few minutes to get our current table &#8220;up to date&#8221; with the one over in MySQL.</p>
<p>DynamoDB:</p>
<blockquote><p>select max(&#8221;FlightDate&#8221;) from FASTER.&#8221;ontime&#8221;;  <strong>2007-12-31</strong><br />
insert into FASTER.&#8221;ontime&#8221; select * from MYSQL_SOURCE.&#8221;ontime&#8221; where &#8220;FlightDate&#8221; &gt; DATE &#8216;2007-12-31&#8242;; <strong>605765</strong></p></blockquote>
<p>In other words, let&#8217;s select from MySQL any records whose date is beyond what we have currently (2007-12-31).</p>
<blockquote><p>select count(*) from FASTER.&#8221;ontime&#8221;;  <strong>8061223<br />
</strong>select count(*) from FASTER.&#8221;ontime&#8221; where &#8220;FlightDate&#8221; &gt; DATE &#8216;2007-12-31&#8242;;  <strong>605765</strong></p></blockquote>
<p>MySQL:<br />
While the DynamoDB <strong>INSERT</strong> statement was running, the following SQL was being run on MySQL.</p>
<blockquote><p>show processlist shows a SQL session with the following SQL:<br />
SELECT * FROM `ontime` WHERE `FlightDate` &gt; DATE &#8216;2007-12-31&#8242;;</p></blockquote>
<p>A single SQL statement (<strong>insert into select * from table where date &gt; last time</strong>) has you up to date for reporting!  Long term we may look to work with Tungsten to be able to keep our data up to date using replication bin log records but for now, this simple pull based approach.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.nicholasgoodman.com/bt/blog/2009/11/03/instant-relief-from-mysql-reporting-queries-incremental-updates/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Instant Relief from Slow MySQL Reporting Queries using DynamoDB</title>
		<link>http://www.nicholasgoodman.com/bt/blog/2009/11/02/instant-relief-from-slow-mysql-reporting-queries-using-dynamodb/</link>
		<comments>http://www.nicholasgoodman.com/bt/blog/2009/11/02/instant-relief-from-slow-mysql-reporting-queries-using-dynamodb/#comments</comments>
		<pubDate>Tue, 03 Nov 2009 06:49:27 +0000</pubDate>
		<dc:creator>Nicholas Goodman</dc:creator>
		
		<category><![CDATA[DynamoBI]]></category>

		<category><![CDATA[Pentaho]]></category>

		<guid isPermaLink="false">http://www.nicholasgoodman.com/bt/blog/?p=444</guid>
		<description><![CDATA[Here&#8217;s the scenario.  You&#8217;ve got a table in MySQL for reporting that has a few million rows, and is denormalized for reporting.  You&#8217;ve got a Pentaho Report that is querying this MySQL table.  You have two problems with the current report.

Your users are complaining that the query is slow, and they have [...]]]></description>
			<content:encoded><![CDATA[<p>Here&#8217;s the scenario.  You&#8217;ve got a table in MySQL for reporting that has a few million rows, and is denormalized for reporting.  You&#8217;ve got a Pentaho Report that is querying this MySQL table.  You have two problems with the current report.</p>
<ol>
<li>Your users are complaining that the query is <strong>slow</strong>, and they have to wait around for longer than they&#8217;d like to see their report. (approx 40s)</li>
<li>Your DBAs are cranky because they see the <strong>size</strong> of this table is getting bigger.  (approx 1.8GB)</li>
</ol>
<p>MySQL is fundamentally designed to be an OLTP database and while it does a fantastic job at that, its data warehouse features were built as &#8220;bolt on&#8221; additions.  Can it be used for BI? Absolutely, I&#8217;ve used it a many customer sites.  Does DynamoDB provide a better set of features/capabilities for doing BI?  We think so!  Are they both 100% open source?  You bet;why not choose the right tool for the right job then?</p>
<p>DynamoDB (aka LucidDB) is a &#8220;<em>purpose built for BI&#8221;</em> database.  What does that mean?  Well, I&#8217;ll be blogging about a lot of features that speak to our philosophy of a complete &#8220;BI Database&#8221; not just a fast one.  One of the features that makes LucidDB complete, and not just a drag racer, is its ability to connect to remote data sources via JDBC and retrieve data.  If you&#8217;re doing simple table replications, you don&#8217;t have to use an ETL tool, or do export or imports, or LOAD DATA INFILEs, etc.  Our ability to connect to remote databases and access them as &#8220;remote tables&#8221; makes retrieving data into DynamoDB as easy as &#8220;insert into mytable select * from remote_table.&#8221;</p>
<p>Back to our original issue with our current MySQL</p>
<p>Our report is <strong>slow</strong>, and our database is <strong>big</strong>.  How slow?  Well, not really that bad, but at about 40s per query run that&#8217;s enough to tempt your business analyst to go fetch a coffee instead of continuing his work.  How big?  Well, not really that big, but at about 1.8GB it&#8217;s starting to get non trivial in terms of tuning the I/O etc.</p>
<p><strong>Our goal is to improve both using DynamoDB</strong>; we&#8217;ll leave MySQL as our main OLTP application.  We&#8217;re not trying to replace it - in fact, we&#8217;ll embrace MySQL as the system of record and simply &#8220;slurp/report&#8221; off this table in a separate reporting environment.</p>
<p>It&#8217;s a two step process.</p>
<ol>
<li>Connect from DynamoDB to MySQL using a JDBC connector, access the remote table, and draw over the data using a simple INSERT statement.</li>
<li>Change our Pentaho Report to use the DynamoDB JDBC connector instead of MySQL.</li>
</ol>
<p><img src="/entry_images/2009-11-02_2301.png" hspace="4" vspace="4" /></p>
<p>Our Pentaho Report is based on the following SQL</p>
<blockquote><p>SELECT t.Carrier as &#8220;CARRIER&#8221;,<br />
c as &#8220;C&#8221;, c2 as &#8220;C2&#8243;, c*1000/c2 as &#8220;C3&#8243; FROM<br />
(SELECT Carrier, count(Carrier) AS c FROM ontime<br />
WHERE DepDelay&gt;10 GROUP BY Carrier) t JOIN<br />
(SELECT Carrier, count(Carrier) AS c2 FROM ontime<br />
GROUP BY Carrier) t2 ON (t.Carrier=t2.Carrier) ORDER BY c3 DESC;</p></blockquote>
<p><a href="http://www.nicholasgoodman.com/bt/blog/wp-content/uploads/2009/11/200911022216.jpg" onclick="window.open('http://www.nicholasgoodman.com/bt/blog/wp-content/uploads/2009/11/200911022216.jpg','popup','width=625,height=534,scrollbars=no,resizable=yes,toolbar=no,directories=no,location=no,menubar=no,status=yes,left=0,top=0');return false"><img src="http://www.nicholasgoodman.com/bt/blog/wp-content/uploads/2009/11/200911022216-tm.jpg" height="100" width="117" border="1" hspace="4" vspace="4" alt="200911022216" /></a><br />
This takes approximately 40s to run on MySQL database running the same machine.</p>
<p><strong>Step 1:  Connect, and load the data into our DynamoDB table.</strong></p>
<blockquote><p>&#8211; Create DynamoDB reporting table first<br />
create schema faster;</p>
<p>create table faster.&#8221;ontime&#8221; (<br />
&#8220;Year&#8221; int,<br />
&#8220;Quarter&#8221; tinyint ,<br />
&#8220;Month&#8221; tinyint ,<br />
&#8230;.. Abbreviated for Brevity &#8230;.<br />
&#8220;Div5TailNum&#8221; varchar(10)<br />
);</p>
<p>&#8211; Get access the MySQL table OnTime in the OTP schema on host localhost<br />
create schema MYSQL_SOURCE;<br />
set schema &#8216;MYSQL_SOURCE&#8217;;</p>
<p>CREATE SERVER MYSQL_REMOTE_SOURCE FOREIGN DATA WRAPPER<br />
sys_jdbc OPTIONS (<br />
driver_class &#8216;com.mysql.jdbc.Driver&#8217;,<br />
url &#8216;jdbc:mysql://localhost/otp?useCursorFetch=true&#8217;,<br />
user_name &#8216;root&#8217;,<br />
password &#8216;easy&#8217;,<br />
fetch_size &#8216;1000&#8242;,<br />
table_types &#8216;TABLE&#8217;,<br />
schema_name &#8216;otp&#8217;);</p>
<p>import foreign schema OTP from server MYSQL_REMOTE_SOURCE into MYSQL_SOURCE;</p>
<p>&#8211; Load DynamoDB table from MySQL database directly<br />
insert into FASTER.&#8221;ontime&#8221; select * from MYSQL_SOURCE.&#8221;ontime&#8221;;</p>
<p>Notice that last statement.  You don&#8217;t have to export to intermediate files, or use an ETL tool (not that that&#8217;s bad, I&#8217;m a big fan of ETL tools!).  You can use good old fashioned SQL to get data from a remote database into DynamoDB.</p></blockquote>
<p><strong>Step 2: Change the Pentaho Report to use the new connection.</strong></p>
<blockquote><p>We open up our report and change our connection from MySQL</p>
<p><a href="http://www.nicholasgoodman.com/bt/blog/wp-content/uploads/2009/11/200911022234.jpg" onclick="window.open('http://www.nicholasgoodman.com/bt/blog/wp-content/uploads/2009/11/200911022234.jpg','popup','width=403,height=446,scrollbars=no,resizable=yes,toolbar=no,directories=no,location=no,menubar=no,status=yes,left=0,top=0');return false"><img src="http://www.nicholasgoodman.com/bt/blog/wp-content/uploads/2009/11/200911022234-tm.jpg" height="100" width="90" border="1" hspace="4" vspace="4" alt="200911022234" /></a><br />
to DynamoDB</p>
<p><a href="http://www.nicholasgoodman.com/bt/blog/wp-content/uploads/2009/11/200911022236.jpg" onclick="window.open('http://www.nicholasgoodman.com/bt/blog/wp-content/uploads/2009/11/200911022236.jpg','popup','width=589,height=454,scrollbars=no,resizable=yes,toolbar=no,directories=no,location=no,menubar=no,status=yes,left=0,top=0');return false"><img src="http://www.nicholasgoodman.com/bt/blog/wp-content/uploads/2009/11/200911022236-tm.jpg" height="100" width="129" border="1" hspace="4" vspace="4" alt="200911022236" /></a><br />
NOTE: Until we finish our QA&#8217;ed builds we&#8217;re using LucidDB driver instead of DynamoDB but they are, one and the same.</p></blockquote>
<p>We make some minor adjustment to the SQL (quoting some tables/etc) and rerun our query and Voila, <strong>our report runs in 10s down from 40s, an improvement of 400%.<br />
</strong><br />
How about storage?  <strong>Our storage report shows that DynamoDB is using only .3 GB to store the same 7 Million records as compared to MySQL at 1.8GB, or 1/6 of the storage.</p>
<p></strong>Not a bad investment of a few minutes of time, I&#8217;d say.  DynamoDB (<a href="http://www.luciddb.org">LucidDB</a>) takes just a few minutes to install, and because of its focus on BI you should find things like retrieving data from remote data sources easy, and effective.  Let&#8217;s be truthful here as well; once you speed up a report by 400% and reduce its storage by 6x your boss will be calling <strong><em>you</em></strong> a dynamo.</p>
<p>Notes:  Full set of scripts posted here: <a href="/entry_images/mysql_relief.zip">mysql_relief.zip</a>.  Original queries and dataset from Vadim at <a href="http://www.mysqlperformanceblog.com">MySQLPerformanceBlog</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.nicholasgoodman.com/bt/blog/2009/11/02/instant-relief-from-slow-mysql-reporting-queries-using-dynamodb/feed/</wfw:commentRss>
		</item>
	</channel>
</rss>
