{"id":604,"date":"2011-06-29T23:07:04","date_gmt":"2011-06-30T06:07:04","guid":{"rendered":"http:\/\/www.nicholasgoodman.com\/bt\/blog\/?p=604"},"modified":"2011-06-29T23:07:04","modified_gmt":"2011-06-30T06:07:04","slug":"pdi-loading-into-luciddb","status":"publish","type":"post","link":"http:\/\/www.nicholasgoodman.com\/bt\/blog\/2011\/06\/29\/pdi-loading-into-luciddb\/","title":{"rendered":"PDI Loading into LucidDB"},"content":{"rendered":"<p>By far, the most popular way for PDI users to load data into LucidDB is to use the PDI Streaming Loader. The streaming loader is a native PDI step that:<\/p>\n<ul>\n<li>Enables high performance loading, directly over the network without the need for intermediate IO and shipping of data files.<\/li>\n<li>Lets users choose more interesting (from a DW perspective) loading type into tables. In particular, in addition to simple INSERTs it allows for MERGE (aka UPSERT) and also UPDATE. All done, in the same, bulk loader.<\/li>\n<li>Enables the metadata for the load to be managed, scheduled, and run in PDI.<\/li>\n<\/ul>\n<p>\n<a href=\"http:\/\/www.nicholasgoodman.com\/bt\/blog\/wp-content\/uploads\/2011\/06\/201106292256.jpg\"><img decoding=\"async\" loading=\"lazy\" src=\"http:\/\/www.nicholasgoodman.com\/bt\/blog\/wp-content\/uploads\/2011\/06\/201106292256-tm.jpg\" width=\"300\" height=\"217\" alt=\"201106292256.jpg\" \/><\/a><\/p>\n<p>However, we\u2019ve had some known issues. In fact, until PDI 4.2 GA and LucidDB 0.9.4 GA it\u2019s pretty problematic unless you run through the process of patching LucidDB outlined on this page: <a href=\"https:\/\/studio.dynamobi.com\/wiki\/display\/DOC\/Known+Issues\" title=\"Known Issues\">Known Issues<\/a>.<\/p>\n<p>In some ways, we have to admit, that we released this piece of software too soon. Early and often comes with some risk, and many have felt the pain of some of the issues that have been discovered with the streaming loader.<\/p>\n<p>In some ways, we\u2019ve built an unnatural approach to loading for PDI: <b>PDI wants to PUSH data into a database<\/b>. <b>LucidDB wants to PULL data from remote sources<\/b>, with it\u2019s integrated ELT and DML based approach (with connectors to databases, salesforce, etc).&nbsp;&nbsp; Our streaming loader \u201cfakes\u201d a pull data source, and allows PDI to \u201cpush\u201d into it.<\/p>\n<p>There\u2019s mutliple threads involved, when exceptions happen users have received cruddy error messages such as \u201cBroken Pipe\u201d that are unhelpful at best, frustrating at worse. Most all of these contortions will have sorted themselves out and by the time 4.2 GA PDI and 0.9.4 GA of LucidDB are released the streaming loader should be working A-OK. Some users would just assume avoid the patch instructions above and have posed the question: <i>In a general sense, if not the streaming loader how would I load data into LucidDB?<\/i><\/p>\n<p>Again, LucidDB likes to \u201cpull\u201d data from remote sources. One of those is CSV files. Here\u2019s a nice, easy, quick (30k r\/s on my MacBook) method to load a million rows using PDI and LucidDB:<\/p>\n<p>\n<a href=\"http:\/\/www.nicholasgoodman.com\/bt\/blog\/wp-content\/uploads\/2011\/06\/201106292249.jpg\"><img decoding=\"async\" loading=\"lazy\" src=\"http:\/\/www.nicholasgoodman.com\/bt\/blog\/wp-content\/uploads\/2011\/06\/201106292249-tm.jpg\" width=\"300\" height=\"121\" alt=\"201106292249.jpg\" \/><\/a><\/p>\n<p>This transformation outputs to a Text File 1 million rows, waits for that to complete then proceeds to the load that data into a new table in LucidDB. Step by Step the LucidDB statements<\/p>\n<blockquote><p>\n  \u2014 Points LucidDB to the directory with the just generated flat file<br \/>\n  \u2014 LucidDB has some defaults, and we can \u201cguess\u201d the datatypes by scanning the file<br \/>\n  CREATE or replace SERVER csv_file_server FOREIGN DATA WRAPPER SYS_FILE_WRAPPER OPTIONS ( DIRECTORY \u2018?\u2019 );<br \/>\n  \u2014 Let\u2019s create a foreign table for the data file (\u201cDATA.txt\u201d) that was output by PDI<br \/>\n  &gt;create foreign table applib.data server csv_file_server;<br \/>\n  \u2014 Create a staging, and load the data from the flat file (select * from applib.data)<br \/>\n  CALL APPLIB.CREATE_TABLE_AS (\u2018APPLIB\u2019, \u2018STAGING_TABLE\u2019, \u2018select * from applib.data\u2019, true);\n<\/p><\/blockquote>\n<p>We hope to have the streaming loader ready to go in 0.9.4 (LucidDB) and 4.2 (PDI). Until then, consider this easy, straight forward method of loading data that\u2019s high performance, proven, and stable for loading data from PDI into LucidDB.<\/p>\n<p>Example file: <a href=\"\/entry_images\/csv_luciddb_load.ktr\">csv_luciddb_load.ktr<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>By far, the most popular way for PDI users to load data into LucidDB is to use the PDI Streaming Loader. The streaming loader is a native PDI step that: Enables high performance loading, directly over the network without the need for intermediate IO and shipping of data files. Lets users choose more interesting (from [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[15,9,11],"tags":[],"_links":{"self":[{"href":"http:\/\/www.nicholasgoodman.com\/bt\/blog\/wp-json\/wp\/v2\/posts\/604"}],"collection":[{"href":"http:\/\/www.nicholasgoodman.com\/bt\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.nicholasgoodman.com\/bt\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.nicholasgoodman.com\/bt\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/www.nicholasgoodman.com\/bt\/blog\/wp-json\/wp\/v2\/comments?post=604"}],"version-history":[{"count":0,"href":"http:\/\/www.nicholasgoodman.com\/bt\/blog\/wp-json\/wp\/v2\/posts\/604\/revisions"}],"wp:attachment":[{"href":"http:\/\/www.nicholasgoodman.com\/bt\/blog\/wp-json\/wp\/v2\/media?parent=604"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.nicholasgoodman.com\/bt\/blog\/wp-json\/wp\/v2\/categories?post=604"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.nicholasgoodman.com\/bt\/blog\/wp-json\/wp\/v2\/tags?post=604"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}