{"id":281,"date":"2008-11-26T17:01:45","date_gmt":"2008-11-27T00:01:45","guid":{"rendered":"http:\/\/www.nicholasgoodman.com\/bt\/blog\/2008\/11\/26\/an-arms-race-my-customers-dont-care-about\/"},"modified":"2008-11-26T17:01:45","modified_gmt":"2008-11-27T00:01:45","slug":"an-arms-race-my-customers-dont-care-about","status":"publish","type":"post","link":"http:\/\/www.nicholasgoodman.com\/bt\/blog\/2008\/11\/26\/an-arms-race-my-customers-dont-care-about\/","title":{"rendered":"An arms race my customers don&#039;t care about"},"content":{"rendered":"<p>Perfect is the enemy of good enough.  This is fertile soil for why people choose to use the simpler, functional, cheaper open source cousins of proprietary feature function behemoths.  Don&#8217;t get me wrong &#8211; too few features \/ crappy performance you lose customers because you&#8217;re not helping people solve problems if you lack too many features.<\/p>\n<p>Recently, I observed a thread at the blog of Goban Saor entitled &#8220;<a href=\"http:\/\/blog.gobansaor.com\/2008\/10\/30\/open-source-metrics\/\">Open Source Metrics<\/a>.&#8221;<\/p>\n<p>It basically has turned into a discussion which keeps creeping up about which tool is faster: <strong>Talend or Kettle<\/strong>.  Which leads me to ask the question: <strong>Who Friggin&#8217; Cares?<\/strong><\/p>\n<blockquote><p>I&#8217;m a Kettle Expert so I think Kettle is Wicked Fast.<br \/>\nIf I were a Talend Expert I&#8217;d think Talend is Wicked Fast.<\/p><\/blockquote>\n<p>Performance for customers who are focused on results, and aren&#8217;t technophiles boils down to these two requirements<\/p>\n<ol>\n<li><strong>It has to meet my performance requirements for my project<\/strong>.  If I have to load 1 million records per day and I have 10 minutes to do that then the tool either does or does not meet that performance requirement.<\/li>\n<li><strong>It has to allow me to grow beyond my current performance requirements. <\/strong> I am loading 1 million records now, but in 3 years I may be loading 100 million records.  Given the right investment in tuning and scaling I don&#8217;t want to have to change to a different tool when I go much bigger.<\/li>\n<\/ol>\n<p>For Kettle the answer is pretty simple:<\/p>\n<ol>\n<li>I do a few simple mappings, hit run, do very little tuning\/database optimization.  Wham-o.  20k records \/ second throughput.  Look and notice Kettle is simply sitting idle waiting for a database lookup.  Add an index.  Wham-o 35k records \/ second throughput.  Have extra CPUs, fire up a few extra threads of a calculation step.  Wham-o 40k \/ second.  Surpasses customer batch window needs sufficiently; enough said.  Requirement met &#8211; whether 35k records per second is slower or faster than someone else is irrelevant.  Requirement met.<\/li>\n<li>This usually involves outside validations.  What are other people doing &#8211; what are the proof points about the performance.  I personally have worked on a Kettle scale out cluster with 5 nodes that reads, sorts, aggregates, and summarizes a billion FAT (wide character) records in an HOUR and scales almost perfectly linearly (* no tool grows at perfect linear).  Telling a customer using the exact same binary you have there, you can scale out and process hundreds of millions into billions of records per hour.  Requirement met &#8211; you can grow with your tool.<\/li>\n<\/ol>\n<p>I think Kettle performance is superb.  I&#8217;d welcome Talend folks to comment here and blog about their proof points for how Talend performance is superb. <strong> I believe that it is. <\/strong> Let&#8217;s just all consider the most important thing: open source ETL is about solving the ETL need well, not necessarily incremental performance differences.<\/p>\n<p>It&#8217;s a debate with no winner.  I don&#8217;t care if your tool is 2.5% faster at reading character text files than mine.  I do care if it can scale out (requirement 2) and solves customer problems (requirement 1).<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Perfect is the enemy of good enough. This is fertile soil for why people choose to use the simpler, functional, cheaper open source cousins of proprietary feature function behemoths. Don&#8217;t get me wrong &#8211; too few features \/ crappy performance you lose customers because you&#8217;re not helping people solve problems if you lack too many [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[9],"tags":[],"_links":{"self":[{"href":"http:\/\/www.nicholasgoodman.com\/bt\/blog\/wp-json\/wp\/v2\/posts\/281"}],"collection":[{"href":"http:\/\/www.nicholasgoodman.com\/bt\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.nicholasgoodman.com\/bt\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.nicholasgoodman.com\/bt\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/www.nicholasgoodman.com\/bt\/blog\/wp-json\/wp\/v2\/comments?post=281"}],"version-history":[{"count":0,"href":"http:\/\/www.nicholasgoodman.com\/bt\/blog\/wp-json\/wp\/v2\/posts\/281\/revisions"}],"wp:attachment":[{"href":"http:\/\/www.nicholasgoodman.com\/bt\/blog\/wp-json\/wp\/v2\/media?parent=281"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.nicholasgoodman.com\/bt\/blog\/wp-json\/wp\/v2\/categories?post=281"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.nicholasgoodman.com\/bt\/blog\/wp-json\/wp\/v2\/tags?post=281"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}