Liars, Money and Benchmarking
Updated: Dec 8, 2021
There is no lack of money finding its way into corporate repository technologies. Look at the financial evidence:
The biggest IPO of 2020 was Snowflake - market cap over USD$100 billion
Hortonworks and Cloudera merged in 2018
Databricks - with a focus on open source Apache Spark - has a market cap of USB$38 billion in August, 2021
Couchbase raised over USD$250 million
PostgreSQL is ubiquitous as the go to relational database for transactional workloads
But with money and valuations come the need to get aggressive in landing customers. One of the ways to bring a quantitative measure to technology is to benchmark them against each other. With databases - one such benchmarking organization is the Transaction Processing Council (TPC). In the 1990s, there were no lack of highway billboards and full-page ads in Computerworld magazine between the likes of Oracle, Informix, Sybase and Microsoft SQLServer comparing their transactional and analytic prowess. All with a footnote "Your mileage will vary".
Fast forward to 2021 and the 'gaming' of benchmarking is back in vogue when I read this blog from Databricks not just talking about their TPC-DS results, but immediately began to badmouth Snowflake with language like "recently ran a different benchmark comparing Databricks SQL and Snowflake".
This author was personally put off by the entire tone of Databricks as it was a weak desire to attempt to build market credibility at the expense of Snowflake. The founders of Snowflake wrote a counter-blog that made it very clear the gaps and showed how anyone could try this themselves. I did try it - and it did work as advertised. What was more impressive was how easy it was from end-to-end with Snowflake.
Legacy database vendors like Oracle, Teradata and IBM allowed their products to languish in this move to cloud. At the same time, that opened the door for open source and 'born cloud first' to take over. The best part of cloud is how easy a corporate IT departments can 'fail quickly or succeed quickly' unlike the days when you made big 'ELAs' for licenses that stay on the shelf and tools that are never used.
What you should do:
Do not believe a vendor benchmark unless they give you full documentation.
If you are still running databases like IBM DB2, IBM Netezza, Teradata or Oracle - make that move to where real innovation is happening today.
Open-source is perceived as 'free' - but even the price line in the TPC-DS benchmark from Databricks shows a big subscription cost.
Do the cloud math - everything costs money and only move the data that makes sense. This is (1) move what you should, (2) archive what you can and (3) delete what you shall.