An arms race my customers don't care about

Perfect is the enemy of good enough. This is fertile soil for why people choose to use the simpler, functional, cheaper open source cousins of proprietary feature function behemoths. Don’t get me wrong – too few features / crappy performance you lose customers because you’re not helping people solve problems if you lack too many features.

Recently, I observed a thread at the blog of Goban Saor entitled “Open Source Metrics.”

It basically has turned into a discussion which keeps creeping up about which tool is faster: Talend or Kettle. Which leads me to ask the question: Who Friggin’ Cares?

I’m a Kettle Expert so I think Kettle is Wicked Fast.
If I were a Talend Expert I’d think Talend is Wicked Fast.

Performance for customers who are focused on results, and aren’t technophiles boils down to these two requirements

  1. It has to meet my performance requirements for my project. If I have to load 1 million records per day and I have 10 minutes to do that then the tool either does or does not meet that performance requirement.
  2. It has to allow me to grow beyond my current performance requirements. I am loading 1 million records now, but in 3 years I may be loading 100 million records. Given the right investment in tuning and scaling I don’t want to have to change to a different tool when I go much bigger.

For Kettle the answer is pretty simple:

  1. I do a few simple mappings, hit run, do very little tuning/database optimization. Wham-o. 20k records / second throughput. Look and notice Kettle is simply sitting idle waiting for a database lookup. Add an index. Wham-o 35k records / second throughput. Have extra CPUs, fire up a few extra threads of a calculation step. Wham-o 40k / second. Surpasses customer batch window needs sufficiently; enough said. Requirement met – whether 35k records per second is slower or faster than someone else is irrelevant. Requirement met.
  2. This usually involves outside validations. What are other people doing – what are the proof points about the performance. I personally have worked on a Kettle scale out cluster with 5 nodes that reads, sorts, aggregates, and summarizes a billion FAT (wide character) records in an HOUR and scales almost perfectly linearly (* no tool grows at perfect linear). Telling a customer using the exact same binary you have there, you can scale out and process hundreds of millions into billions of records per hour. Requirement met – you can grow with your tool.

I think Kettle performance is superb. I’d welcome Talend folks to comment here and blog about their proof points for how Talend performance is superb. I believe that it is. Let’s just all consider the most important thing: open source ETL is about solving the ETL need well, not necessarily incremental performance differences.

It’s a debate with no winner. I don’t care if your tool is 2.5% faster at reading character text files than mine. I do care if it can scale out (requirement 2) and solves customer problems (requirement 1).

One thought on “An arms race my customers don't care about

  1. Patrick

    A customer who decides to utilize benchmarks to assist their decision making on which product to use is bound to fail. This is a legacy of the proprietary vendor sales strategy where the customer has no other options than to rely on marketing material, benchmarks and feature checklists, and maybe word-of-mouth to determine whether or not to use the software for their company. The reason being simply that you wouldn’t get to even try/install the software until you purchased it.

    This is where open source is a key differentiator as a business model, and as a solution model. You don’t need a team of 3-5 polished sales people to regurgitate the product’s fact sheet and visit your office and tell you all the great things it can do, 100% faster than their competitors, reduce TOC by 100%! Our experience with Pentaho is that it simply shows you that it can be done. We don’t need to boast about the capabilities of the system as long as it can meet the requires. Customers want a solution, and if a screwdriver is what they want, give them a screwdriver. Too often proprietary solutions will sell customers the whole toolbox cabinet, and end up with tools that simply aren’t needed.


Leave a Reply

Your email address will not be published. Required fields are marked *