Yesterday, I covered how you can do an initial “replication” of data from MySQL to DynamoDB and how this can improve performance, and save storage space. The follow on question becomes:
That’s Great Nick. But how do I do keep my data up to date?
We’ve got data in our Airline Performance dataset through 31-DEC-2007. I loaded 1 year, all of 2007, for the previous example. What happens when the FAA publishes their 2008 January results, and we’ve loaded the new months worth of data into MySQL?
select count(*) from otp.ontime; 8061223
select count(*) from ontime where FlightDate > ‘2007-12-31’; 605765
select count(*) from ontime where FlightDate <= ‘2007-12-31’; 7455458
select count(*) from FASTER.”ontime”; 7455458
So, we’ve added approximately 600k new records to our source system that we don’t have in our reporting system. How do we incrementally insert these records and get just the 600k new rows into our DynamoDB reporting instance?
Easy Easy Easy.
We’ve already done all the work, all we have to do is simply get records we haven’t processed yet! Should take just a few minutes to get our current table “up to date” with the one over in MySQL.
select max(“FlightDate”) from FASTER.”ontime”; 2007-12-31
insert into FASTER.”ontime” select * from MYSQL_SOURCE.”ontime” where “FlightDate” > DATE ‘2007-12-31’; 605765
In other words, let’s select from MySQL any records whose date is beyond what we have currently (2007-12-31).
select count(*) from FASTER.”ontime”; 8061223
select count(*) from FASTER.”ontime” where “FlightDate” > DATE ‘2007-12-31’; 605765
While the DynamoDB INSERT statement was running, the following SQL was being run on MySQL.
show processlist shows a SQL session with the following SQL:
SELECT * FROM `ontime` WHERE `FlightDate` > DATE ‘2007-12-31’;
A single SQL statement (insert into select * from table where date > last time) has you up to date for reporting! Long term we may look to work with Tungsten to be able to keep our data up to date using replication bin log records but for now, this simple pull based approach.