This looks *extremely* cool. This is basically incremental view maintenance in d...

jonmoore · on Sept 29, 2024

The VLDB paper mentioned is https://www.vldb.org/pvldb/vol16/p1601-budiu.pdf.

Abstract:

"Incremental view maintenance has been for a long time a central problem in database theory. Many solutions have been proposed for restricted classes of database languages, such as the relational algebra, or Datalog. These techniques do not naturally generalize to richer languages. In this paper we give a general solution to this problem in 3 steps: (1) we describe a simple but expressive language called DBSP for describing computations over data streams; (2) we give a general algorithm for solving the incremental view maintenance problem for arbitrary DBSP programs, and (3) we show how to model many rich database query languages (including the full relational queries, grouping and aggregation, monotonic and non-monotonic recursion, and streaming aggregation) using DBSP. As a consequence, we obtain efficient incremental view maintenance techniques for all these rich languages."

lsuresh · on Sept 29, 2024

Thanks for the kind words! (Feldera's CEO here)

- We evaluate top-k queries incrementally and the nesting shouldn't be a problem for the engine (or it'd be a bug). If you have an example of a query, we can try it out at our end.

- Yes. It is internally consistent. We've verified with the experiment here: https://www.scattered-thoughts.net/writing/internal-consiste....

Our guarantee is that we always produce the same answer as if you'd ran the queries in a batch system. All views update together. You can see the computation model here: https://www.feldera.com/blog/synchronous-streaming/

And thanks for the catch about the broken paper link. This is the published version: https://www.vldb.org/pvldb/vol16/p1601-budiu.pdf

cube2222 · on Sept 29, 2024

Thanks for the response and clarifications!

I think this scenario would illustrate it.

Make a table with one column, x, and insert into it rows with values 1-5, and then 8-20.

Then query it using more or less `SELECT x FROM (SELECT x FROM xs LIMIT 15 ORDER BY x) LIMIT 10`, and then insert 6 into the table. Output should be 1-6, 8-11. Of course as long as the limits aren't merged together during optimisation, that would make the test-case moot.

Good luck with your product!

lsuresh · on Sept 29, 2024

Thanks! Looks like that works.

Here is the query I set up on try.feldera.com.

  CREATE TABLE foo (x INTEGER NOT NULL PRIMARY KEY) WITH ('materialized' = 'true') ;

  CREATE MATERIALIZED VIEW bar AS SELECT x FROM (SELECT x FROM foo ORDER BY x LIMIT 15) LIMIT 10;

I then used our CLI tool fda to insert some rows and inspect the states after starting the pipeline: https://docs.feldera.com/reference/cli

  try.feldera.com/foo> select * from foo;

  +----+
  | x  |
  +----+
  | 1  |
  | 2  |
  | 3  |
  | 4  |
  | 5  |
  | 8  |
  | 9  |
  | 10 |
  | 11 |
  | 12 |
  | 13 |
  | 14 |
  | 15 |
  | 16 |
  | 17 |
  | 18 |
  | 19 |
  | 20 |
  +----+

  try.feldera.com/foo> insert into foo values (6);

  +-------+
  | count |
  +-------+
  | 1     |
  +-------+

  try.feldera.com/foo> select * from bar;

  +----+
  | x  |
  +----+
  | 1  |
  | 2  |
  | 3  |
  | 4  |
  | 5  |
  | 6  |
  | 8  |
  | 9  |
  | 10 |
  | 11 |
  +----+

cube2222 · on Sept 29, 2024

Awesome, thanks for double-checking!

tveita · on Sept 29, 2024

I think Rama [1] (by Nathan Marz behind Apache Storm) is interesting as a "NoSQL" solution for a similar problem space, as I understand it. Impressive if this can support similar scale using only SQL.

[1] https://redplanetlabs.com/

emmanueloga_ · on Sept 29, 2024

Also raising wave.

—-

https://risingwave.com/