{"id":16,"date":"2004-07-09T06:53:40","date_gmt":"2004-07-09T13:53:40","guid":{"rendered":"http:\/\/www.nicholasgoodman.com\/bt\/blog\/?p=16"},"modified":"2004-07-09T06:53:40","modified_gmt":"2004-07-09T13:53:40","slug":"2gig-2gig-50gig","status":"publish","type":"post","link":"http:\/\/www.nicholasgoodman.com\/bt\/blog\/2004\/07\/09\/2gig-2gig-50gig\/","title":{"rendered":"2gig + 2gig = 50gig"},"content":{"rendered":"<p>I was recently writing up a volume and performance specification for a customer project when a discussion arose with the current DBA staff about volume projections.  The intuitive thinking was the volume requirements for the BI\/Data Warehouse would be the sum of the systems from which it sourced data.  The group was thinking this would be a good way to approximate the required space for the system.  I asserted this method is flawed, and had to suggest why that BI volume is proportionate but not necessariliy directly proportionate.<br \/>\nBI systems required much greater storage than the sum of their sources because:<\/p>\n<ul>\n<li><strong>Data are denormalized<\/strong>.  With denormalizing data to increase query performance one increases storage from 1-10000 times (it depends on the data, perhaps more).\n<li><strong>New Data are created<\/strong>.  There are many analytically significant items that occur that never even show up in source systems.  For instance, a &#8220;Sales Fact&#8221; will be closely related in volume to &#8220;Order Line Items&#8221; in a source system.  However, many BI solutions have business events like &#8220;Customer Acquired&#8221;, &#8220;Customer Lost&#8221;.  These new business events are the result of the source system data but had not previously existed anywhere else (it was just created).\n<li><strong>Summaries\/Aggregates for query performance<\/strong>.  Much of the data storage requirements is a factor of how much performance is required from commmon data access patterns (ie, user reports and ad-hoc analysis).  If there are small data sets and end user performance requirements are minimal then summaries won&#8217;t require a great deal of space.  <\/ul>\n<p><strong>So, make a point!<\/strong>  A good way to estimate the actual storage requirements is to run sample datsets.  Load a month or two of data using the summary\/aggregate parameters you think will be required in your depoloyment.  Examine the database for it&#8217;s storage utilization and take measurements.  Measure it on one day, one week, one month, one year (if possible).  Graph it.  See what the data looks like.  You could even build a function to calculate out the future based on the curve of growth.<br \/>\n<strong><em>Most importantly, be ready for it to be different than what you expect!<\/em><\/strong>  A few reports will require a summary that will make your predictions seem way off base.  It&#8217;s to be expected; BI is a <em>system<\/em> not an application!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I was recently writing up a volume and performance specification for a customer project when a discussion arose with the current DBA staff about volume projections. The intuitive thinking was the volume requirements for the BI\/Data Warehouse would be the sum of the systems from which it sourced data. The group was thinking this would [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[6,10],"tags":[],"_links":{"self":[{"href":"http:\/\/www.nicholasgoodman.com\/bt\/blog\/wp-json\/wp\/v2\/posts\/16"}],"collection":[{"href":"http:\/\/www.nicholasgoodman.com\/bt\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.nicholasgoodman.com\/bt\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.nicholasgoodman.com\/bt\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/www.nicholasgoodman.com\/bt\/blog\/wp-json\/wp\/v2\/comments?post=16"}],"version-history":[{"count":0,"href":"http:\/\/www.nicholasgoodman.com\/bt\/blog\/wp-json\/wp\/v2\/posts\/16\/revisions"}],"wp:attachment":[{"href":"http:\/\/www.nicholasgoodman.com\/bt\/blog\/wp-json\/wp\/v2\/media?parent=16"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.nicholasgoodman.com\/bt\/blog\/wp-json\/wp\/v2\/categories?post=16"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.nicholasgoodman.com\/bt\/blog\/wp-json\/wp\/v2\/tags?post=16"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}