{"id":457,"date":"2009-11-03T22:37:54","date_gmt":"2009-11-04T05:37:54","guid":{"rendered":"http:\/\/www.nicholasgoodman.com\/bt\/blog\/2009\/11\/03\/instant-relief-from-mysql-reporting-queries-incremental-updates\/"},"modified":"2009-11-03T22:37:54","modified_gmt":"2009-11-04T05:37:54","slug":"instant-relief-from-mysql-reporting-queries-incremental-updates","status":"publish","type":"post","link":"http:\/\/www.nicholasgoodman.com\/bt\/blog\/2009\/11\/03\/instant-relief-from-mysql-reporting-queries-incremental-updates\/","title":{"rendered":"Instant Relief from MySQL Reporting Queries: Incremental Updates"},"content":{"rendered":"<p>Yesterday, I covered how you can do an <a href=\"http:\/\/www.nicholasgoodman.com\/bt\/blog\/2009\/11\/02\/instant-relief-from-slow-mysql-reporting-queries-using-dynamodb\/\">initial &#8220;replication&#8221; of data from MySQL to DynamoDB<\/a> and how this can improve performance, and save storage space.  The follow on question becomes:<\/p>\n<p><strong>That&#8217;s Great Nick.  But how do I do keep my data up to date?<\/strong><\/p>\n<p>We&#8217;ve got data in our Airline Performance dataset through 31-DEC-2007.  I loaded 1 year, all of 2007, for the previous example.  What happens when the FAA publishes their 2008 January results, and we&#8217;ve loaded the new months worth of data into MySQL?<\/p>\n<p>MySQL:<\/p>\n<blockquote><p>select count(*) from otp.ontime; <strong>8061223<\/strong><br \/>\nselect count(*) from ontime where FlightDate &gt; &#8216;2007-12-31&#8217;; <strong>605765<\/strong><br \/>\nselect count(*) from ontime where FlightDate &lt;= &#8216;2007-12-31&#8217;; <strong>7455458<\/strong><\/p><\/blockquote>\n<p>DynamoDB:<\/p>\n<blockquote><p>select count(*) from FASTER.&#8221;ontime&#8221;; <strong>7455458<\/strong> <\/p><\/blockquote>\n<p>So, we&#8217;ve added approximately 600k new records to our source system that we don&#8217;t have in our reporting system.  How do we incrementally insert these records and get just the 600k new rows into our DynamoDB reporting instance?<\/p>\n<p>Easy Easy Easy.<\/p>\n<p>We&#8217;ve already done all the work, all we have to do is simply get records we haven&#8217;t processed yet!  Should take just a few minutes to get our current table &#8220;up to date&#8221; with the one over in MySQL.<\/p>\n<p>DynamoDB:<\/p>\n<blockquote><p>select max(&#8220;FlightDate&#8221;) from FASTER.&#8221;ontime&#8221;;  <strong>2007-12-31<\/strong><br \/>\ninsert into FASTER.&#8221;ontime&#8221; select * from MYSQL_SOURCE.&#8221;ontime&#8221; where &#8220;FlightDate&#8221; &gt; DATE &#8216;2007-12-31&#8217;; <strong>605765<\/strong><\/p><\/blockquote>\n<p>In other words, let&#8217;s select from MySQL any records whose date is beyond what we have currently (2007-12-31).<\/p>\n<blockquote><p>select count(*) from FASTER.&#8221;ontime&#8221;;  <strong>8061223<br \/>\n<\/strong>select count(*) from FASTER.&#8221;ontime&#8221; where &#8220;FlightDate&#8221; &gt; DATE &#8216;2007-12-31&#8217;;  <strong>605765<\/strong><\/p><\/blockquote>\n<p>MySQL:<br \/>\nWhile the DynamoDB <strong>INSERT<\/strong> statement was running, the following SQL was being run on MySQL.<\/p>\n<blockquote><p>show processlist shows a SQL session with the following SQL:<br \/>\nSELECT * FROM `ontime` WHERE `FlightDate` &gt; DATE &#8216;2007-12-31&#8217;;<\/p><\/blockquote>\n<p>A single SQL statement (<strong>insert into select * from table where date &gt; last time<\/strong>) has you up to date for reporting!  Long term we may look to work with Tungsten to be able to keep our data up to date using replication bin log records but for now, this simple pull based approach.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Yesterday, I covered how you can do an initial &#8220;replication&#8221; of data from MySQL to DynamoDB and how this can improve performance, and save storage space. The follow on question becomes: That&#8217;s Great Nick. But how do I do keep my data up to date? We&#8217;ve got data in our Airline Performance dataset through 31-DEC-2007. [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[5],"tags":[],"_links":{"self":[{"href":"http:\/\/www.nicholasgoodman.com\/bt\/blog\/wp-json\/wp\/v2\/posts\/457"}],"collection":[{"href":"http:\/\/www.nicholasgoodman.com\/bt\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.nicholasgoodman.com\/bt\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.nicholasgoodman.com\/bt\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/www.nicholasgoodman.com\/bt\/blog\/wp-json\/wp\/v2\/comments?post=457"}],"version-history":[{"count":0,"href":"http:\/\/www.nicholasgoodman.com\/bt\/blog\/wp-json\/wp\/v2\/posts\/457\/revisions"}],"wp:attachment":[{"href":"http:\/\/www.nicholasgoodman.com\/bt\/blog\/wp-json\/wp\/v2\/media?parent=457"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.nicholasgoodman.com\/bt\/blog\/wp-json\/wp\/v2\/categories?post=457"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.nicholasgoodman.com\/bt\/blog\/wp-json\/wp\/v2\/tags?post=457"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}