> The legacy players like Teradata and Exadata (from Oracle) really don't scale.
I get why Teradata gets labelled "legacy", but one of Teradata's main differentiators is scale. Teradata engineers have been tackling incredibly interesting scale problems (on many dimensions of "scale") for 40 years. Teradata has many customers who routinely manage and perform analytics on many petabytes of data.
> Historically, only transactional data was dumped into the warehouse.
That was once true, because initially that was all the data that companies had. However, companies have long since used data warehouses for all kinds of data — sensor data, text, behavioral data, product info/BOMs, vendor info, contract info, etc. — whatever's necessary to run the business.
> Snowflake is selling storage at S3 price…
This is important, but not unique. For example, Teradata's current product has native support for S3 and S3-compatible object stores, and you can query them just like any other database table, join that data with data in high-performance native storage, etc.
Sorry, I didn't clarify well. I am sure it scales technically well but not on cost.
My experience of TD is > 10 yrs and then the multi-node version was substantially more expensive than the single-node version. Also, storage and compute was coupled which meant I had to pay for nodes even if 99% of my data was cold. That's a problem with RedShift too but not for Snowflake.
De-coupling storage and compute was a brilliant move by Snowflake. BigQuery can completely abstracted compute - you don't provision compute and only pay for data scanned. However, it gives you a sense of insecurity around cost - A single bad cron job running a query every sec can blow up your cost (real-life experience). Snowflake provides the best cost/performance tradeoff I have seen.
> I am sure it scales technically well but not on cost.
The honest answer is "it depends". Because Teradata is a different beast, per-query pricing can be significantly cheaper than Snowflake with high-volume workloads. It's worth trying both to evaluate cost and performance.
> Also, storage and compute was coupled which meant I had to pay for nodes even if 99% of my data was cold.
Yes, it used to be that everything had to into Teradata's high-performance filesystem. These days, Vantage's native object storage support means that you can keep that cold data in S3.
>>However, it gives you a sense of insecurity around cost - A single bad cron job running a query every sec can blow up your cost (real-life experience).
Only if you're using on-demand. Instead reserve some slots and pay flat rate. The minimum quantity is very low, and the minimum time is 1min.
> Teradata's current product has native support for S3 and S3-compatible object stores too, and you can query them just like any other database table, join that data with data in high-performance native storage, etc.
Storage costs for S3 (or any cloud-provider object storage) are only one dimension of the price. The other is interaction costs which can get prohibitively expensive, for example if you accidentally forget to provide a partition key in your query predicate. Snowflake absorbs this cost if you use internal storage (or just copy into tables).
Snowflake doesn't absorb the cost because there is no cost.
The benefit of native tables for all columnar databases is that it provides an optimized format with metadata for each column, which is then used to eliminate most of the data retrieval during query time. The more selective your query, the faster the results.
> The legacy players like Teradata and Exadata (from Oracle) really don't scale.
I get why Teradata gets labelled "legacy", but one of Teradata's main differentiators is scale. Teradata engineers have been tackling incredibly interesting scale problems (on many dimensions of "scale") for 40 years. Teradata has many customers who routinely manage and perform analytics on many petabytes of data.
> Historically, only transactional data was dumped into the warehouse.
That was once true, because initially that was all the data that companies had. However, companies have long since used data warehouses for all kinds of data — sensor data, text, behavioral data, product info/BOMs, vendor info, contract info, etc. — whatever's necessary to run the business.
> Snowflake is selling storage at S3 price…
This is important, but not unique. For example, Teradata's current product has native support for S3 and S3-compatible object stores, and you can query them just like any other database table, join that data with data in high-performance native storage, etc.