Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In practice, Git

Some CRDT purists would say "its not perfectly conflict free so its not a CRDT".

Sure[0], but for the rest of us that are pragmatic about best effort conflict resolution Git is likely the most successful CRDT application.

[0] https://en.wikipedia.org/wiki/No_true_Scotsman



This comes up every time but there’s three criterion for CRDTs and git fails 2 of them. Even ignoring the requirement for automatic conflict resolution (which git can’t meet since automatic resolution fails as a matter of course) and ignoring that the conflict resolution algorithm has to be part of the data type (it’s not), it fails the requirement that separate different copies must eventually converge when brought online but that’s demonstrably false as people may use different conflict resolution mechanisms AND the commit graph of a resolved conflict may itself then be different resulting in different histories from the process of brining git online.

This is because the commit graph itself isn’t CRDT. If I commit A and then B but someone’s clone only has B applied, you can’t synchronize even automatically; different git histories don’t resolve in any way automatically at all and once you pick a solution manually your copy will not have the same history as anyone else that tries to redo your operations.

No true Scotsman doesn’t apply here because there is a very precise definition of what makes a CRDT that is a clear delineating line.


> 1. The application can update any replica independently, concurrently and without coordinating with other replicas.

> 2. An algorithm (itself part of the data type) automatically resolves any inconsistencies that might occur.

> 3. Although replicas may have different state at any particular point in time, they are guaranteed to eventually converge.

[0]

Again, in theory it fails 2 and 3. However, in practice 3 is a normal part of working with git in a team. Barring a hard fork in git - which is equivalent to a deep copy of a CRDT. Like any deep copy of a data type, a CRDT's deep copy can be used in non-conformant manners (forks are VCS specific jargon for a CRDT deep copy; or shallow copy sometimes).

> If I commit A and then B but someone’s clone only has B applied, you can’t synchronize even automatically; different git histories don’t resolve in any way automatically

Maybe I don't understand your point specifically, but this example seems entirely solved by --rebase. In practice --rebase is typical, and best described as "do your best to automatically resolve histories; I'll handle any of the complex conflicts".

All that said, I already agreed: "academically Git is not a CRDT". However, and I'm happy to disagree with you, in practice Git is the most popular CRDT.

[0] https://en.wikipedia.org/wiki/Conflict-free_replicated_data_...


given how easy it is to run into merge conflicts doing normal things with git, I can't say that I'd agree that in practice git is a CRDT either.


CRDT literally means Conflict-free Replicated Data Type. Expecting CRDTs to be conflict-free isn't purism, it's simple validation. Git is, inarguably, not a CRDT.


There are CDTS that have "multiple versions", which look an awful lot like conflicts to me, ie, the Multi-Value Register in this paper:

https://inria.hal.science/inria-00555588/


Multi-value registers are CRDTs for sure. Conflict-free doesn't mean that values can't have concurrent histories (or, as you say, "multiple versions") -- it means that the merge operation always succeeds.


What's the definition of a conflict, then? Equations welcome.


Feel free to read up on CRDTs; I'm confident this will answer your question.

The short answer is, roughly, that a conflict is a discrepancy in state which cannot be mechanically resolved.


I've read a few CRDT papers. Perhaps you could name a specific one.


> CRDT literally means Conflict-free Replicated Data Type.

Git could be "conflict-free" with a simple `rand(ours, theirs)`.

It would be useless, but technically "conflict-free". Is the addition or removal of that rand function really, pragmatically the difference in the answer to "what is a CRDT?"


Adding extra rules on top of git to try to turn it into a CRDT doesn't make git one, even if you succeed (and rand would not succeed). You can do that with a pencil and paper, but that doesn't make paper a CRDT.


This is definitely not true.

Are you open to understanding why this isn't true? If so, I'm happy to help you come to the correct understanding; please ask whatever questions are necessary. Otherwise, then, well, okay!


`rand` wouldn’t work because all peers must reach the same state without coordination.


I really wish someone would make a git-like tool on top of CRDTs. I want conflicts when merging commits like git does, but I also want the crdt features - like better cherry-picking and cleaner merging when merging several branches together. And of course, live pair programming.

CRDTs store a superset of the information git stores - so we should be able to use that information to emit git style conflicts when we want. But I think nobody has implemented that yet. (Pijul probably comes closest.)


I suspect a major reason why CRDTs haven't been a clear dominator in VCSes is that the "conflict free" decision is not necessarily the correct decision. It's merely a consistent one.

When editing is relatively "live", those are small enough that they're probably also correct. But adding your month of changes to a dozen other people's month of changes doesn't mean it's going to build correctly (or even look sane) when you change the same file. Manually seeing the issue and fixing it gives you a chance to correct that, at the time it's relevant, rather than "it all changed, good luck".

---

If you're interested in distributed-VCS systems that have some different semantics than git, https://pijul.org/ might be interesting. And Jujutsu is... complicated, as it abstracts across a few and brings in a few things from multiple VCSes, but it's mostly git-like afaict: https://github.com/martinvonz/jj

No doubt there are others too (if anyone has favorites, please post! I'm curious too)


Fossil (https://fossil-scm.org) is actually a proper grow-only set from what I can tell: merge conflicts upon synchronization are encoded in the DAG as forks of the history at the conflicting points in time. If no action is taken after a fork, all clones will eventually see the exact same view of the history.

The tooling will normally bail with a "would fork" message if a `fossil update` (and, hence, local resolution of those conflicts) wasn't undertaken before pushing, but this is not at all necessary to preserve useful operation.


> the "conflict free" decision is not necessarily the correct decision. It's merely a consistent one.

Yep. But there's no reason you couldn't build a system that supports both approaches - conflict-free when live pair programming, but conflicts are still generated when merging long lived branches. As I say, text CRDTs store all the data needed to do this. Just, nobody (as far as I know) has built that.


CRDTs seem to give the best experience when they correctly model the "intent" of changes.

But a diff between two different states of raw text can't convey the intent of a code change (beyond very simple changes).

This is why I think CRDTs haven't caught on for VCSes and I'm not sure they _could_ without some kind of structured editing.


I remember jj, pijul or another CRDT-git website offering a javascript demo (I can't find it now). I tried that user 'A' removes a line, user 'B' modifies that line, and it converges to that line becoming the modifications. I don't think that automatic conflict resolving is the future.


Yeah, I don't think so either. Conflicts are good - the knowledge that someone else also did something [here] is useful, and external context can completely change what the correct next step is. The info needed will never be fully encoded into the text.

That said, a clearer model for why it conflicts, and how things got there, can definitely help understand what happened and therefore what to do next. Git isn't great there, you're just given the final conflict pair.


I've been researching CRDTs for a reference manager that I'm building (https://getcahier.com), and thought that some hybrid of automatic and user-operated conflict resolution would be ideal, as you described. But current efforts are mostly directed to automatic resolution.


I would want the git-like tool to have semantic diffs, and so have much better conflict resolution as a result; not CRDTs and more obtuse & confused conflicts than it's already capable of.


(hi seph, hope all’s well) — i did exactly this with Redwood and showcased it multiple times during Braid meetings. alas, nobody was very interested in trying to push it forward


Ah sorry - I don’t know why redwood slips my mind! Please correct me if I’m wrong but I thought you were using redwood as a git backend, rather than replacing git’s internals with a text crdt?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: