Oracle Readers: Think MERGE Statement.
It’s common to need to generate surrogate keys as part of a Data Warehouse / Business Intelligence solution. Protecting your users from souce system changes, or using time variant data are just a couple of the reasons why one needs to generate a surrogate key when loading data into a DW.
Here is a simple example. I’m loading Product data from an XML file into a MySQL table. The transformation in Kettle looks like this:

The output coming out of the XML file has been flattened using XQuery so that it looks like a flat table struture. Here’s a screen capture of a nifty feature of Kettle where you can preview the output of a step:

Ok, this is our source data. The ID you’re seeing is the OLTP key. We’ll refer to it as the Natural Key. This is the ORDER_ID, or CUSTOMER_ID, or well, you get the idea. This is the key that your source application uses in it’s relational structures.
What we want is to INSERT or UPDATE (aka UPSERT) this record into our DW. We’ll INSERT if we don’t already have this NATURAL key in the DW and generate a surrogate primary key. We’ll UPDATE if we already have this NATURAL key in the warehouse.
Our table looks like:

The DDL to create the TABLE is important:
CREATE TABLE INT_PRODUCT (
PRODUCT_ID INTEGER NOT NULL AUTO_INCREMENT,
PRODUCT_NAT_ID INTEGER NOT NULL,
TITLE TINYTEXT,
CONSTRAINT INT_PRODUCT_PK PRIMARY KEY (PRODUCT_ID),
UNIQUE INT_PRODUCT_NK (PRODUCT_NAT_ID)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Notice that our PRIMARY KEY is the Surrogate we’ve set up to be an AUTO_INCREMENTING column. This means that MySQL will generate a surrogate when we INSERT a record. All we have to do is get Kettle to match our unique contrainst (INT_PRODUCT_NK) and either INSERT/UPDATE.
I’ve used the INSERT / UPDATE operator in Kettle. The configuration looks like this:

What we’re doing is configuring Kettle to do a lookup on the table to see if the natural key already exists. If it does, we’ll perform DML (insert or update) on the fields in the lower window.
After running the transformation my INT_PRODUCT table looks like this.

Now, I realized I didn’t choose very illustrative natural keys (all are 1,2,3) but the PRODUCT_ID comes from MySQL as a surrogate.
Subsequent executions of this Kettle transformation will INSERT records as they arrive or UPDATE. MySQL and Kettle in combination are making quick work out of generating these surrogates!
I invite Kettle experts to chime in on any optimizations/quirks.
Nicholas,
thanks for this tantalizing insight into Kettle. On the basis of this article I invested some time in installing the Pentaho stack and trying to learn the various parts of the solution. I would welcome a complete worked example of building something like the foodmart DW, a la the MSAS 2000 tutorial shipped with the Microsoft product. I offered this as a suggestion on the Pentaho forum for Kettle and was treated to an amazingly sarcastic response. For those of us familiar with other BI platforms, it would help to see a complete example rather than isolated examples of features, or minimal examples such as the Hello ETL sample.
– Peter
Peter,
I’m glad that you are taking the time to use Pentaho. There’s a wealth of functionality in the product and I’m usually amazed at how many features are present.
I totally understand the desire for a real sample solution. ie, here’s your relational database and all the technologies (Chef and Spoon inside of Kettle) to move it from a relational to dimensional, the mondrian/pentaho configuration to go with it, etc.
I’m working on these reference application as we speak… I just read the thread @ pentaho as well… I’m working on these “solutions” matt refers to.
Feel free to use me/forums as a resource if you have specific feature questions. I’ll be sure and post here the samples which won’t be foodmart, but it will be illustrative in the same way. I bet you’ll find it very helpful moving from the other tools you mentioned.
Kind regards!
Hello, can I use PHP with Kettle?
Please mail me hunter_mchl@yahoo.com
That is interesting
This upsert step is very slow and inefficient. It does a query for every incoming row from the source, to find out if it exists on the target. Then based on the result it does an insert or update. This does not scale and it does not compare well with the Oracle merge function,
casino online slot gambling gambling casino online bingo
Hi Nicholas
My situation is like this. (1) whenever a record (already existing in dim table) with new info arrives, I need to insert it and date-invalidate the previous record and generate a new dim_key. How do I do it using Kettle?
When we use the Update/Insert option, it successfully updates, but it gives an error of ‘Duplicate entry 2 for key Primary’.
Please help to remove this error.
Thanks
i got addicted to online gambling in the last few months*’.
online gambing sites are growing because there is a growing number of online gamblers today’-:
I’d should test with you here. Which is not one thing I normally do! I enjoy reading a publish that can make folks think. Additionally, thanks for allowing me to remark!
Howdy very nice web site!! Guy .. Beautiful .. Wonderful .. I will bookmark your blog and take the feeds also…I am glad to find so many helpful info here within the post, we’d like work out more techniques on this regard, thanks for sharing.
Spot on with this write-up, I truly assume this website needs way more consideration. I’ll most likely be once more to learn way more, thanks for that info.
wonderful publish, very informative. I’m wondering why the other specialists of this sector do not realize this. You must proceed your writing. I am confident, you have a great readers’ base already!
It’s onerous to find knowledgeable folks on this topic, but you sound like you understand what you’re speaking about! Thanks
strongzz Hi there, I found your blog via Google while searching for a related topic, your website came up, it looks good. I have bookmarked it in my google bookmarks.
There is visibly a bunch to realize about this. I think you made some nice points in features also.
I simply could not leave your web site prior to suggesting that I extremely loved the standard info a person supply on your visitors? Is going to be back often to check up on new posts.
I must show my appreciation to the writer for bailing me out of this particular crisis. Just after exploring through the the web and meeting proposals which are not powerful, I was thinking my life was well over. Living without the strategies to the issues you’ve fixed by means of the posting is a serious case, as well as those which may have in a negative way damaged my career if I had not discovered your blog. That competence and kindness in maneuvering the whole lot was tremendous. I’m not sure what I would’ve done if I hadn’t come across such a step like this. I’m able to at this point look ahead to my future. Thanks a lot so much for your high quality and sensible help. I won’t hesitate to suggest your web site to any individual who needs to have care about this issue.
You made some first rate factors there. I appeared on the internet for the difficulty and found most people will go along with together with your website.
I am glad for commenting to make you be aware of of the incredible experience my wife’s daughter found viewing yuor web blog. She mastered a good number of details, most notably what it is like to possess a great giving style to have other individuals with no trouble learn specified tricky topics. You really did more than visitors’ expectations. Many thanks for rendering such great, trusted, informative and even cool tips about that topic to Sandra.
Great work! This is the kind of information that are supposed to be shared across the net. Shame on the seek engines for no longer positioning this publish upper! Come on over and consult with my website . Thank you =)
It’s exhausting to search out educated folks on this subject, but you sound like you know what you’re talking about! Thanks
I’ve recently started a blog, the info you offer on this website has helped me tremendously. Thanks for all of your time & work.
strongzz Hi, Neat post. There’s a problem with your web site in internet explorer, would check this… IE still is the market leader and a good portion of people will miss your fantastic writing due to this problem.
Can I simply say what a aid to find someone who truly is aware of what theyre talking about on the internet. You undoubtedly know how to convey a problem to light and make it important. Extra individuals must read this and understand this side of the story. I cant consider youre no more fashionable since you positively have the gift.
Great web site. Plenty of helpful information here. I’m sending it to a few friends ans also sharing in delicious. And obviously, thank you on your effort!
I was so confused about what to buy, but this makes it uendrtsandbale.
i39yzL xxxvteqwoecr
3X8yjm xjhktcmuggjw
The subsequent time I learn a blog, I hope that it doesnt disappoint me as a lot as this one. I mean, I know it was my choice to learn, however I actually thought youd have one thing attention-grabbing to say. All I hear is a bunch of whining about one thing that you would fix should you werent too busy on the lookout for attention.
great submit, very informative. I’m wondering why the other specialists of this sector do not realize this. You must proceed your writing. I’m confident, you have a great readers’ base already!
Sup there administrator, I truly desired to actually leave a short observation to declare that I respected your blog post. Thanks!