I'm not familiar with Pijul, and haven't finished watching this presentation, but IME the problems with modern version control tools is that they still rely on comparing lines of plain text, something we've been doing for decades. Merge conflicts are an issue because our tools are agnostic about the actual content they're tracking.
Instead, the tools should be smarter and work on the level of functions, classes, packages, sentences, paragraphs, or whatever primitive makes sense for the project and file that is being changed. In the case of code bases, they need to be aware of the language and the AST of the program. For binary files, they need to be aware of the file format and its binary structure. This would allow them to show actually meaningful diffs, and minimize the chances of conflicts, and of producing a corrupt file after an automatic merge.
There has been some research in this area, and there are a few semantic diffing tools[1,2,3], but I'm not aware of this being widely used in any VCS.
Nowadays, with all the machine learning advances, the ideal VCS should also use ML to understand the change at a deeper level, and maybe even suggest improvements. If AI can write code for me, it could surely understand what I'm trying to do, and help me so that version control is entirely hands-free, instead of having to fight with it, and be constantly aware of it, as I have to do now.
Or, since it's more than likely that humans won't be writing code or text in the near future, we'll skip the next revolution in VCS tools, and AI will be able to version its own software. /sigh
I just finished watching the presentation, and Pijul seems like an iterative improvement over Git. Nothing jumped out at me like a killer feature that would make me want to give it a try. It might be because the author focuses too much on technical details and fixing Git's shortcomings, instead of taking a step back and rethinking what a modern VCS tool should look like today.
Shameless plug: I've written difftastic[1], a tool that builds ASTs and then does a structural diff of them. You can use it with git too.
It's an incredibly hard problem though, both from a computational complexity point of view, and trying to build a comprehensible UI once you've done the structural AST diff.
I think part of the problem is it seems everyone is trying to make a version control tool that is agnostic to all languages. Both computationally and UI wise. But C++ users expect to see different things than JavaScript users and so forth.
I’m surprised at the lack of hyper-specific language version control tools. I thought about making a side project for one in Julia a while back but not quite sure how it would look. Some random thoughts:
- info on type, name, constant changes
- let me checkout older revisions of individual functions / objects / whatever
- on unit test result changes for functions that have unit tests
- when changes are simply a refactor and are functionally the same
Most repositories I work on don't have only one language. They have at the very least two, like the main language and maybe markdown for README files, then configuration like .ini or .toml, json stuff, yml, xml, etcpp. And then you might have bash scripts, Dockerfiles, other build tool languages, etcpp. And those are only text files. You probably will also have images, maybe zipped stuff, office documents and more, all not the "core" repository content, but stored nearby and versioned alongside.
Building a hyper-focussed tool won't be very useful, expect to at least rudimentarily support other file types.
This doesn’t really detract from my point - the “best” tool tool would use knowledge of python for python files, json for json files, and so forth. I think you’re just saying you’d want multiple of these rolled in a single tool as opposed to standalone, which is fair. I think any tool would have to be compatible with git /layer on top of it so it’s available as a fallback
Every change is different in the same way every program is unique, the change of a couple of characters will alter the meaning. I think you have to try to write a diff UI to understand why it is hard.
Difftastic, Meld, diff -u, Word and other tools are amazing because they are usefull in many scenarios. Getting the UI right has been a long process, beingable to grok the changes is still hard even with thw best tooling. It is also a question of tool adoption it takes a long time to understand how a tool works.
Ah, yes, I knew I was forgetting one project. difftastic is very cool, thanks for writing it!
How well do existing VCSs integrate with it? Did you feel restricted at any point by writing a diffing tool, instead of basing a new VCS around this concept? Do you think a deeper integration would allow supporting other functionality beyond diffing, like automatic merging, conflict resolution, etc.?
I agree that it's a very difficult problem. But as an industry, we have more than enough smart people and resources to work on it, which if solved would greatly improve our collective QoL. I can't imagine the amount of time and effort we've wasted fighting with version control tools over the years, and a tool that solved these issues in a smarter way would make our lives much easier.
Git supports external diffing tools really well with GIT_EXTERNAL_DIFF, which you can use with difftastic[1]. Other VCSs are less flexible. For example, I haven't found a nice way of getting a pager when using difftastic with mercurial.
> Did you feel restricted at any point by writing a diffing tool, instead of basing a new VCS around this concept?
Oh, that's an interesting question! Difftastic has been a really big project[2] despite its limited scope and I'm less interested in VCS implementation.
I think text works well as the backing store for a VCS. There are a few systems that have structured backends (e.g. monticello for smalltalk), but they're more constrained. You can only store structured content (e.g. monticello requires smalltalk code) and it must be well-formed (your VCS must understand any future syntax you use).
Unison[3] is a really interesting project in this space, it stores code by hash in a sqlite backend. This makes some code changes trivial, such as renames.
From the perspective of a text diff, an AST diff is lossy. If you add an extra blank line between two unchanged functions, difftastic ignores it. That's great for understanding changes, but not for storage.
I already use delta[1] as a diff viewer, but I suppose GIT_EXTERNAL_DIFF is a deeper integration than just a pager. I've been aware of your project for some time now, but haven't played around with it since I wasn't sure if it would help with automatic conflict resolution, and other issues Git often struggles with. But I'll give it a try soon, thanks again.
I wasn't familiar with Unison. It looks interesting. We definitely need more novel approaches to programming, especially since our field will radically change in a few years as AI becomes more capable.
For languages that have strong IDE refactoring support and userbases that use it a (future) solution would be for the ide to autocommit along the way with metadata to explain what happen "removed unused function based on suggestion", "extracted duplicate", "renamed public method taxed to isTaxed and updated usages across files x, y and z, developer comment: every other of these methods follow the pattern isSomething ".
The last example also add a new feature, and option for a developer to add a comment on an automated refactor.
Ordinary commits could exist on top of this as milestones.
I wouldn't be totally surprised if sooner or later Jetbrains does this. They are creating their own, often better versions of everything I feel and version control could be an obvious next step.
As someone who often prefers other solutions to theirs, I'd prefer if someone else does it first so I end up with something I can use across NetBeans, VS Code, eclipse etc and not something like Kotlin which forces me to use IntelliJ. (Don't get me wrong, IntelliJ is great, I just have NetBeans as my personal favorite.)
I disagree. Merge conflicts are just a fact of life, and line-granularity has good usability properties (displaying and editing). `git` has issues, but I don't see merge conflict granularity being an issue, especially when project enforce consistent&automatic formatting.
I agree however that while Pijul is technically very interesting, it doesn't seem to have any killer features that would overcome the cost of switching to a niche version control.
I was reviewing a PR today and have to disagree with you. There was a single value change in a jsonl test data file. This is nightmarish to read in regular git diffs, as the change gif thought was happening was (with text wrapping) an full page worth of json rather than identifying it was a single word change. And because it is jsonl, the file could not be split into different lines without altering it’s semantics.
I don’t think it’s unreasonable we could be a little smarter here.
JSONL and line-oriented version control are never going to play nicely together.
Imagine doing code review in a language where every function had to be written on one line!
The two techniques I use to dodge this:
1/ Switch from JSONL to a list of objects then pretty print it to be line oriented.
2/ Compress the test data to discourage viewing it altogether, and make people describe what’s being changed rather than leaving it up to the diff to show it.
I can believe that they can come with that... but `man diff` doesn't mention it at all, I can't find any description of the diff/patch format that mentions it, and I've literally never seen it. I do frequently get multi-megabyte single-line diffs though, which amount to just one or two characters changing.
Other tools exist, of course, but if you're going to be shipping around stuff that `git apply` or thousands of other tools can handle, it has to be in "the" standard format.
This is an area where it should be easier to build tools for that. I have tools for diffing k/v and structured formats but they are local on my machine. Being able to publish those to Gitlab, github and bitbucket would be great.
Edit: Worst is diffing yaml files, you really need a stable way to parse those.
These ideas have mileage but as long as the source code is plain text, the way we represent changes is always going to be text based too.
Wouldn’t a better starting point be to change the way we represent source code, then let the patch tools follow?
Your editor knows exactly what steps you took to make your change. A semantic VCS like you describe sounds very similar to reaching eventual consistency in a distributed data structure by sharing streams of edits between peers.
Personally, I’m a firm believer in text. Auto formatting code so that changes are line oriented helps a lot. So does a good culture of namespaces and separation of concerns. Conflicts happen when two people working on different things have to edit the same code. You can dodge that by more carefully structuring your project.
There’s a reason why software projects aren’t just single-file piles of symbols. Code is primarily supposed to be human readable, and the more legible the project the better shape it is in for good ole line oriented diffing and patching.
Plain text is only storage, I happily posit plain text as storage is not the issue.
It's the tooling above that's lacking. From editors to
source control, it's all text/buffer/line/character oriented,
which does have its benefits. Sure there's syntax highlight, folding, symbol search and whatnot but semantically these tools only superficially understand code itself and certainly don't operate on code, they only pretend to and are fundamentally text editors. We're getting there with LSPs and error-tolerant parsers but they still map back to text for us to interact.
Tools like gofmt, black, ruby standard and such already kind of abstract away text as storage: you write code in whatever way and it gets transformed right under your feet. In some way as a dev you already don't care about the text, it gets handled for you, but it still maps back to text because editors can't handle anything else.
Similarly LSPs are in my mind quite nerfed because they have to do a whole back and forth to text dance. Vim text objects kind of goes into that direction as well, where you think about higher level constituents than text (arguments, methods, etc). Imagine being able to bind the understanding of LSPs right into semantic Vim language objects without them having to go through text!
I dream of an editor where I can open a bunch of functions or classes or namespaces (not files) in buffers that have understanding of the constituents, and it would all map back to files for storage behind the scenes. I believe it doesn't have to go full tilt Smalltalk-like; the Closure conf Overtone demo from years ago is almost there, although not quite.
I think it’s possible to go to a function level, but you basically need to stop using the file system. We come back to the question of storing code in some sort of db based storage, which can then contain all these tools built in. I can see this type of system being used more and more with the lambda / edge / micro service systems where it simplifies data synchronization. However git / nextBestThing will keep on being used as long as we write code in text files.
You can tell git to use a different executable as diff tool. I agree, and I'm curious if such a tool satisfies my needs.
I think this problem is particularly hard since the diff tool needs to understand the coding language. We should have one diff tool per language IMO.
Not sure how i feel about “ML” that would likely change over time being used in a VCS. This would make commits or whatever unit of work you want to save at non deterministic. Also as people we still care about file format and likely want to track it. If anything what you are talking about would just be a different view in a VCS that would still want to track file level changes if it was ever adopted.
For what your talking about though I don’t think the fundamental VCS really matters. You can do everything you are talking about with a tool that uses the diff from git.
I while back I saw a paper[1] from someone who integrated semantic diff for VCS. They said that it works well for toplevel changes to the file (movind classes around, etc), but it doesn't work as well for changes inside functions. For changes at the statement level, text diff worked better. [1] Unfortunately, I don't remember the name of the paper though :(
> If AI can write code for me, it could surely understand what I'm trying to do.
You are anthropomorphizing LLMs. Essentially, they are just conditional probability distributions over tokens. That does not require or imply understanding or reasoning skills.
We don't know. The nature of consciousness is an unsolved problem.
We do know that LLMs fall into a certain category of mistake that most educated humans look at and go "HA! What was it thinking??"
It's not that humans don't also make those types of errors - it's that we recognize them quickly when they're pointed out to us and usually describe the error as a "stupid mistake," "brain fart," or similar name intended to show explicitly "gosh, I totally failed to actually think before I did that."
The LLMs show no sign of such self-awareness or, well, "intelligence," loose and squishy as those words are.
Maybe GPT-5 will fix that, but so far it doesn't look that way.
For a step back moving away from text into ASTs there's a bunch of interesting projects.
* https://unisonweb.org: Unison, a programming language that abstracts names and builds a store of canonical functions
* https://lamdu.org: Lamdu, a programming language that's meant to be edited as a tree and it's accompanying editor.
I can't wait fast enough for these ideas to reshape how we deal with programs and build stuff.
Also, I wouldn't take too much credit away from projects like Pijul that, maybe more practically, slowly steer us where we want to go. I find it hard to believe that something new will suddenly replace everything given the sheer amount of things that would be left behind and can't rapidly be ported into new shiny technology for various reasons.
Merge conflicts don't go away when a diff tool understands syntax.
Semantic conflicts happen even when there are no textual conflicts. E.g. one developer removes a function and all calls to it. In parallel, another developer adds a new use of the removed function somewhere, in a file that the other developer didn't even touch. Cherry pick those changes and you have a broken program.
That's true, but those are logical conflicts, that could arguably also be taken into account by a (much) smarter AI-powered VCS.
The conflicts most people experience on a daily basis are with the tool being confused about changes to symbol names, function signatures, and with the context around the hunk changing, which has no relevance to the change itself. These can mostly be resolved by the tool having more awareness of the program structure, understanding the intention of the change, and knowing how to produce a valid result.
Version control would be much more useful if the tools kept track of semantic changes in a project, instead of line-based differences without any awareness of the content. Existing semantic diffing tools show that this is indeed a better approach, but as pointed out[0], it's a very difficult problem to solve.
> These can mostly be resolved by the tool having more awareness of the program structure, understanding the intention of the change, and knowing how to produce a valid result.
If you have a three-way merge tool which does all these things, you can use it with Git.
> Version control would be much more useful if the tools kept track of semantic changes in a project, instead of line-based differences without any awareness of the content.
So the good news for you is that Git doesn't track any differences at all. Every commit stores a set of files, the "index". There is no limitation on how smart your merge tooling can be, as long as it can work with three artifacts: the ancestor code, and the two parallel derivatives of it to be merged.
The idea that a version control must track detailed differences is false, and a bad requirement. Additionally, if the internal tracking representation has follow different syntaxes of umpteen languages, that's a hyperbad requirement.
Can't find the source right now, but I think I've read a discussion on pijul's forum about its ability to change the tokenizer depending on the file type, for a more meaningful granularity level. I think someone was talking about plugging treesitter there to get an AST.
How about a few for examples of how this semantic / AST is a game changer.
An extra parameter is added to an existing function, how does this look
Similar functionality are extracted to a parameterised new function, how does this look
I’m sure diff and code review tools will evolve but it’s helpful for people to talk about how it looks like to make it less nebulous matrix second life
What's the difference between shipping AST with formatting stripped and shipping code that's been automatically formatted? I feel like the only difference is the configuration required to enforce the latter and different modes of failure. Adopting Prettier in my team was the best decision ever, so liberating. More languages should have a single, mandatory way to format code, without any ways to opt out.
> More languages should have a single, mandatory way to format code, without any ways to opt out.
Strongly disagree. Maybe if you're in a very domain constrained environment, i vould see this being valuable. But i write graphics and simulation code all day, which involves a lot of translating math expressions. A compiler insisting on me using PascalCase (like for example .net uses) leads to very unreadable translations of formulas. And I'm not of the opinion that a system making me rewrite variable names to "meaningful names" helps understanding of the underlying math much, if you need to do symbol manipulation, or read backgrounds papers anyway.
Trust your users. Give them the tools to enforce safety barriers for themselves. Give them sensible defaults, sure. But give them ways to opt-out if they know that they need to break the conventions.
Just one example: Empty lines are used to visually structure blocks of code within a function. Those can't be recovered from an AST. (Unless, of course, you make such formatting choices part of the abstract syntax.)
More languages should have a single, mandatory way to
format code, without any ways to opt out.
Strong disagree but, I definitely agree that every project (or team) should have its own standards for formatting that can be automatically applied.
In Ruby land, Rubocop has been a win at the companies where I've worked. Greatly reduces grumbling about formatting. And VSCode/Sublime/etc can format code automatically.
Basically you'd lose the entire source representation of your code so you are essentially shipping binaries at that point. You could annotate the AST with hints to recover the original source, but once you have an AST you also have the option of transpiling to other languages/representations.
This is essentially what things like the JVM, .Net, wasm, and any sort of embedded virtual machine are. The AST is kind of just the byte-code that gets executed since the machine abstraction isn't really tied to physical architecture.
I think the problem with that is that's it's a massive amount of work after which you get a fragile system (What if the AST changes?) which doesn't really mean much less work. Merge conflicts will still happen if two people change the same thing.
Your "rethinking" is "let's AI write and version control the code."
Very deep.
AI, for the moment, cannot write anything outside of what is already written. It just so happen that my interests are in the very strange problems where no code has been written thus far, except by me and for me. AI is not helpful there at all.
I am not even starting on the tasks that are formulated like "DUZ BY ... WITH KUMAJ is not supported" that has to be solved in code base that is a million LOC or so (50 million bytes) and is itself a part (and user of) of much, much larger code base.
Finally, git was a de-improvement on darcs, which predates git by two years, if I remember correctly. Darcs was way ahead of git in everything but speed, including an attempt to view text as structure (darcs has a Rename change where one identifier gets renamed into other). Pijul is contemporary rethinking of what darcs offer.
So, pijul is not an iterative improvement upon git. It is improvement upon darcs which represents a start of separate lineage of DVCS, that tried to include many things you mentioned.
> Your "rethinking" is "let's AI write and version control the code."
It's not. There are several semantic diffing tools that do a much better job at showing relevant changes than simple line-based diffs. This is all done without AI.
My point is that a VCS written from the ground up with this knowledge would offer a much better UX than current tools.
My second point mentions AI as the next step in the progression, since it's clear that it will affect how humans write code in the very near future.
> Very deep.
I don't appreciate the snark.
This is not some deeply unique line of thinking.
> AI, for the moment, cannot write anything outside of what is already written.
You're underestimating the power of combining written chunks of code to produce a unique solution. Most software is written by glueing existing code together and using libraries. AI can do this reasonably well today. What do you think it will be capable of in 5 years? 10?
> It just so happen that my interests are in the very strange problems where no code has been written thus far, except by me and for me. AI is not helpful there at all.
I think you're overestimating the uniqueness of your code. Is it really all original? You're writing everything from scratch with novel, never before seen approaches? I doubt that very much.
There's a reason why design patterns exist. Many solutions can benefit from following existing patterns. AI today can help automate with writing common patterns, and also with any chore work like writing tests. It's helpful even if it's not writing those novel solutions you still have to do yourself.
Besides, none of this is relevant for a VCS. AI doesn't need to have superhuman programming skills to manage versions. It would just be foolish to not use its current capabilities to understand changes better, and help us take the chore out of dealing with version management.
>Most software is written by glueing existing code together and using libraries. AI can do this reasonably well today.
>I think you're overestimating the uniqueness of your code. Is it really all original? You're writing everything from scratch with novel, never before seen approaches? I doubt that very much.
You can doubt that. Yes, of course.
One kind gentleman here introduced me to worst-case optimal join algorithms and I designed one myself. It uses Bloom filters represented as binary decision diagrams, really nothing fancy.
You can try asking AI to write you a Bloom filter represented as binary decision diagrams. I doubt you will get anything interesting and/or useful.
>There's a reason why design patterns exist. Many solutions can benefit from following existing patterns.
This is history repeating itself.
Functional languages do not require that much of design patterns. In fact, functional programming can hide complexity so well that uncovering it can produce exponential blow up in code size. And type systems greatly benefit writing complex systems. I know because I programmed in almost all kind of languages (typed, untyped, dynamic, static, Turing-complete, total, etc) and programing paradigms existing (including term rewriting systems).
I doubt you use Haskell for your work, which is functional and has rich and powerful type system.
The same will be with AI. You can proclaim it will change the world, but similar tools readily available for quite some time did not.
Returning to AI and VCS, the presentation that started our discussion shows a principal problem with VCS, at 7:06 or so. We can make a changeset that will provoke any merging/conflict-resolution algorithms into making a silent mistake. Pijul postpones this problem, I think, not rids of it.
AI does not report a problem, it hides it, and lies. Nobody, to my knowledge, was able to make any AI to admit it does not have a clue about thing it clearly has no clue about.
So if we certainly will have a mistake in our merging process, AI will certainly lie about it. The code after merging-with-AI shall be tested and reviewed, there is no short-cuts there.
I don’t know much about using Pijul, but one nice thing about it compared to Git is its implementation is mostly defined in a library while the command line executable is separate:
Git can be annoying to integrate into a larger system without resorting to shelling out to the Git executable. There are alternatives like libgit2 and jgit, but they only have a subset of functionality.
This causes a lot of grief for IDEs like Visual Studio, where updating Git breaks the IDE because there is no "API contract". The text output changes, VS fails to parse it, and just crashes out with inscrutable errors.
I've become very opinionated in my old age, and I now firmly believe that:
1. All command-line tools should be "library first, cli second".
2. All text-based formats should include a parser and formatter as a function. Never specify a text format you can't round-trip. In other words, always include an "escape" function and an "unescape" function, or better yet, a parser and serializer. Random config files in Linux are notorious for not doing this. I want to be able to parse them, modify the object in memory, and then write them back out without having to worry about how strings are quoted or dates are formatted.
3. Protocols should always come with a non-executable and machine-readable spec. Think ANTLR grammar file, Open API spec, or something. Never use English only to describe a protocol. Make sure client code can be 100% automatically generated by a tool, in multiple languages.
Is that because they're parsing porcelain output? Or is git's plumbing machine-readable but not well specified?
But git users are more familiar with porcelain so I wouldn't be surprised if they parsed that for an initial implementation.
It sounds like plumbing shouldn't break as often as you imply:
> The interface (input, output, set of options and the semantics) to these low-level commands are meant to be a lot more stable than Porcelain level commands, because these commands are primarily for scripted use.
> Is that because they're parsing porcelain output? Or is git's plumbing machine-readable but not well specified?
From what I've seen, they're using the latter, but breaking changes are still introduced.
Either way, the output of UNIX-like command-line tools is inherently weakly typed and often completely unspecified.
PowerShell for comparison ships every module as both a user-interactive CLI command (with parameter tab-complete!) and as a programatically usable dynamic library. They're inherently one and the same, there's only one interface that does both. The API returns .NET objects and is strongly typed. There is no parsing step at all. If you load a given version of a library, you'll always get the expected types in the results.
Speaking of which, PowerShell uses semantic versioning for modules and can have multiple running side-by-side.
The future is here, it's just not very evenly distributed.
Semantic versioning still break clients (so what do you mean with multiple versions?). I would say that if you are handed a new version of a binary that spits out text, it is usually much easier to fix that than handling an API with objects.
I have failed to use PowerShell seriously many times, I am always let down by the opaqueness of it. (Which I agree is a common complaint against bash from inexperienced users)
I was actually looking into libpijul earlier this week, but unfortunately it seems like it's still suffering from some growing pains in terms of friendliness to external devs; after spending over a half an hour delving into the API docs, I couldn't even figure out where the "entrypoint" was, much less how to use the large number of pieces that interacted with each other.
From looking at the implementation of the executable, I think the library could really use some higher-level constructions like `Repository` here[0], or at least some higher-level prose docs explaining how to put the pieces together manually, maybe with a disclaimer like ripgrep's backing library[1] has.
I really like what Pijul is doing from a design standpoint, but unfortunately it's far from the level of polish I would want to be able to consider it as a realistic alternative to git. If I'm going to have to put in effort to work around warts either way, I'm going to pick the tool with warts that I already know how to work around over the one where I'd have to learn from scratch and wouldn't have nearly as many resources to help me learn them.
I really like Pijul and its underlying libraries, both from a conceptual and implementation perspective.
However over the 6 years of its existence I've lost hope of it ever becoming something that I'll use in my day-to-day, the same way I lost hope on rocket (the Rust web framework). After all those years it's still essentially a single-maintainer side project who is stretched too thin across 3 big projects (Pijul, Sanakirja, Nest).
I really hope that at some point a maintainer/team constellation emerges that can put all the lessons learned from Pijul into something that has a shot at being a successor to git.
Within the first 5 minutes, the presentation claims that "the tools are unusable without a global central server".
I really don't know what that is supposed to mean. Plemty of people do in fact use github, but I don't know why anyone would say that git in particular is "unusable" without a global central server.
Github (etc.) are conveniences for people who don't want to self-host. They provide some additional tools that can be useful in some situations. But nothing more.
Note that Linux, today, is far from the largest Git project. Oldest, yes. Biggest, not even close. FAANG monorepos are vastly bigger with much higher commit rates. The Linux kernel is quaint, small, and moves glacially slow in comparison. Even if we consider only public open source codebases, Chromium's repo is bigger than Linux by any reasonable metric.
Niche use cases are a great reason to make a bespoke tool! But that doesn’t make it not a niche use case. And it doesn’t mean that tool is optimal for the common use case.
In the sense that it was the first project to use git (has been using git the longest), not in the sense that it's the oldest project that switched to git.
IIRC, the oldest was git itself (it self-hosted early on its development history); git, linux, and sparse are IIRC the three projects that have been using git the longest.
We don't even do that. ardour.org self-hosts gitea, and that allows for PRs etc. We don't expose that to "the public", for now, just because github is familiar to more people. But we could.
I self-host Gitlab, which is pretty familiar to people, but I still mirror repos for which I'm hoping to get contributions to GitHub for the same reason
No-one should be applying patches from emails. I know several people will react with certainty that I'm wrong, but email-patch-based flows are just used because the old-timers using them are very accustomed to it.
It makes vastly more sense to transmit a patch attached to the specific code base state that it is intended to modify: in other words a patch doesn't exist in isolation. Git is great; I'm not saying everyone should use any particular Git-hosting or code review framework. But the fact that the inventor of Git continues to use email in one particular project with many expert and experienced contributors has more to do with their expertise and experience (aka age) than with the benefits of email. I mean look at the reality of these people's workflows: they're using things like Mutt and Gnus to read email locally and apply patches from local email. I get it, I've done it, I've even run a local dovecot for the purpose. It had it's time in technological history. That time isn't now.
There are multiple open source projects that I would have contributed to were it not for the email-based flows (e.g. emacs).
I strongly disagree. I've recently contributed to a project that uses an email-based workflow, and find it considerably easier to use: after a surprisingly brief one-time setup, I can contribute patches using a single `git send-email` command from the same terminal that I make my commits, and can review and comment on patches from my email client of choice (thunderbird) – all without needing to create a new account.
Applying patches is, admittedly, a little more awkward, but it can still be done in two or three steps from thunderbird (including invoking `git am`) and, I understand, some newer email clients can handle this much much better.
> It makes vastly more sense to transmit a patch attached to the specific code base state that it is intended to modify:
This is no longer a problem. `git send-email` (and `git format-patch`, if you prefer to call it directly) has for a while supported a `--base=auto` or the `format.useAutoBase` configuration option to attach the upstream commit ID to the patch. To my knowledge this is currently only to help the maintainer understand its context but there is ongoing discussion of ways to automatically use this information.
The biggest difference is the need to have shared access and it therefore does not easily allow for contributions from people without write access to the repo, although a similar workflow is possible with multiple publicly-readable repos (esp. with the help of `git request-pull`). Another advantage a send-email–based workflow has (compared to the workflow you describe, at least) is that the presence of the patch in the email makes it trivial to respond to specific code changes inline, using standard email quoting.
ETA: Many forges reintroduce the latter (commenting directly on lines of code), in which case this may not be an advantage depending on if such a forge is used. (But then you're back to using a forge, and needing to open the website to review changes in addition to using git and email.)
Perhaps I should have been more explicit: sending "a commit sha and branch name" is functionally equivalent in that it contains both the information necessary to retrieve the commits and a method of responding to the new commits, but is considerably less convenient. There is an extra step in sending the commit (you must push the commit then send the relevant data using a separate email client), and the resulting email lacks both the commit message explaining what was changed and why, and the changes in question, requiring one to manually examine the commits out-of-band. This is in addition to the need for all contributors to have write-access to the repository, mentioned above.
It is, however, also functionally equivalent to a pull request, excepting again the need for write-access.
> But then you're back to using a forge, and needing to open the website to review changes in addition to using git and email.
It sounds to me, and I mean this quite objectively, not as a personal insult, that you're attracted to the email-based flow not because it is better, but because you just don't want to use a web UI. So since you define web UI as bad, you of course reach the conclusion that the email-based flow is better.
I'm not sure I phrased that correctly. The point I was trying to get across with that sentence was not so much that web UIs are bad, but that if you are using a forge with in-browser support for reviewing changes (generally in the context of pull requests), then there is little benefit to sending "a commit sha and branch name" separately. Similarly, send-email–based workflows have little need to manage the repository in the browser. Email-based workflows and pull-request-based workflows are parallel in that they provide the same features in different ways.
We can quibble about which is more convenient (admittedly I find the former better in this regard, although I recognize that this is a minority opinion), but they are IMO both more-or-less equally viable choices.
I have used git without a central server many, many times. Passing bundles by email or USB sticks, and pushing and pulling between workspaces. At work, it is because we have often have different networks, isolated for security reasons. For personal use, I don't really have a central server. I may use GitHub, because it offers free hosting and serves as a backup, but it isn't central in the sense that it is not always the git-remote "origin".
It may not be the most typical workflow, and it requires some getting used to but it is 100% usable. In fact, I am not aware of a "central server" for the Linux kernel. For all I know, the closest thing to it could be Linus Torvalds's PC.
I really like the idea of a version control system based on commutative patches but I'm just not seeing how it's going to translate to a system that's intuitive and easy to use not just for programmers, but also for lawyers and artists.
Consider a patch like this one, titled "Solving conflicts":
"Solving an order conflict" and "Resurrecting zombie bytes" sound like the tricky parts of the patch-based workflow. I would very much like to understand how to work with these parts of Pijul. How are these actions performed? How does the user interact with the graph to perform these actions? Is the conflict state rendered into the file? It's not mentioned in the video, and I cannot find it in the documentation or in blog posts.
I cannot be the only one that's interested in learning about those things, and I think it would be more effective than anything else in raising interest for Pijul.
My best guess is that the workflow is still rough around the edges and requires the user to have a good understanding of the underlying graph, and therefore it's not ready to be presented yet. And my hot take is that it's still rough around the edges because there is no easy way, and maybe no way at all, to make it polished and intuitive for a new user. But I want to be proven wrong.
Is git conflict resolution "intuitive and easy to use not just for programmers, but also for lawyers and artists"? Several programmers I know can barely manage it. I think we might be setting our bar a little high here.
I'm which case dvcs will always be useless: this is a speed of light AND network partition issue. If you don't care about distributed vcs then the issue has been cracked for decades: the likes of TFS have file locking.
Pijul has one of the most reasonable approaches to conflicts: you solve them once. That is about as close to cracked as we can get for dvcs in this universe.
I think I really needed to add more context to this comment. This is on me, again sorry. I was really saying this in regards to the part of the comment regarding dvcs and general accessibility.
What I was trying to say is in regards to the artist/lawyers (let’s generalize this to anyone who does not know how to code or could not meaningfully interpret a plain text or binary diff of a certain document type), you can have a user friendly dvcs that has structured conflicts but you need that system to understand more than plain text/the file system. It needs to know the schema of the thing it is merging to be meaningfully diffable or perform three way merges.
What I mean by structured conflicts is conflicts that do not break the editor’s interpretation of the document state. When you’re writing source code in vim/whatever IDE you use, the editor state is entirely decoupled from the runtime of your merge conflicts (git and vim do not need to know the semantics of the programming language you are writing source code in to work), so a syntax error in a merge conflict doesn’t prevent you from fixing the syntax error or resolving the conflict. The trickiness to structured version control is that the merge conflicts must be resolvable within whatever is serving as a medium of editing the document. So in structured editors, you can have conflicts in a document but they have to still be semantically interpretable while in a conflicted state.
This is not possible in pijul/git or any generic file based vcs without an editor that can be provided a schema to interpret the document type it is merging. Sure you can write your own diff and merge drivers but this becomes cumbersome fast and is extremely uninteroperable.
However, this is entirely possible, I’ve been working on a system that does this very thing for a year. Happy to show you if you want to see.
I think the bigger question is Why on earth would you would want a dvcs like this to exist when you have plenty of centralized version control systems to cater to people who aren’t technical?
Philosophically there are quite a few people who care about not having their work gated by subscription plans/centralized players. I’m relatively indifferent to that side of things but I do like having local access to all my documents.
The bigger thing to me is that it allows non-technical people to do work that can be used to generate semantically correct code. The document types themselves can also interoperate, which makes things like centralizing static asset generation very useful if you share assets across multiple client codebases. This also means you can let a copy translator, do things like create rich text translations without needing them to edit things like a stings.xml or a .po file and worry about them botching syntax. Designers can put all icons into a document and generate themes and palette code for engineers to consume that work across any client type. Sure you can do all those things in isolation with centralized software providers but without a commit sha to point to, these tools become kind of unviable to engineers since they aren’t idempotent (or offline first). Moreover many code generation tasks end up becoming very product and organization specific, so the system needs to provide a generalized way for anyone to create their own diffable document type.
I think there’s a host of other benefits gained but my motivations for the tool I made came from problems that arose at my first company when dealing with translations, feature flags and building a good dark mode. I also think merging is the best way to handle collaborative editing and there is little reason aside from the syntax errors that arise in plain text conflicts for the general concept of dvcs to only be accessible to software engineers.
Sorry for ranting. Hope that clarifies what I meant.
I’d be happy to demo a work in progress if you want to see a structured dvcs. This is a surprisingly tricky problem. There’ve been quite a few attempts throughout the years at a structured/visual dcvs but it’s almost impossible to solve without also building a schema language, an offline browser, and a package manager/app registry. There’s murmurs in some of the offline first/CRDT communities that suggest other solutions but I haven’t seen anything structured and viable emerge. Anyway, let me know if you want to chat. I’ll put my email in my bio
Imagine being able to "toggle" features or changes quickly and easily without worrying about which order they were implemented in. Patch commutation makes that easier.
Sure, but in any case where the patches were independent enough to automatically apply in any order, `git rebase --interactive` would have worked just as well and has done for about two decades. It can reorder, drop, squash, amend, etc. all in your $EDITOR. The much more common limitation that hits every VCS is that patches are rarely that independent and manual merges are tedious and error-prone, but when that's not a blocker, interactive rebase is hard to beat.
The biggest problem, in my experience, is that 99% of people using git have no idea just how powerful it actually is. That is a mark against git, but it is what it is, and it's a mark against engineers who refuse to learn the tools they use every day for years.
Weirdest case was when I asked someone to fix up a bad merge they did that resulted in duplicate commits they didn't even notice in the log. I said just do an interactive rebase, the dude said "that is outside my operating parameters", yes really, I couldn't make that up if I tried. Dude boasted 25+ years experience and didn't know about this incredible feature, and apart from the weird robot response, this is consistent with almost everyone I talk to about almost any semi-advanced git feature.
I use `git rebase -i` all the time and it's great but it involves rewriting history, which essentially means the result is almost guaranteed to give you conflicts in any case where you want to merge or rebase it with another branch that shares some of the commits in your rebase.
Whereas in Pijul history is preserved. In Pijul a "branch" (called a "channel") is just a set of applied patches. I'm still a little ignorant but if I'm not mistaken the way Pijul encodes patches, in a big graph, also means that they're more likely to be composable without conflicts in the first place. And when conflicts do arise their resolutions will be more reusable.
Just today I was performance profiling various subsets of a set of changes using "git cherry-pick" and "git revert", to understand how these changes interacted and which ones gave the best results. It was fine but I had to do quite a bit of conflict resolution because some of them were based on one branch and some on another, and the resulting wouldn't even be easily mergable/reusable anyway.
What if I could instead just pick a few patches I wanted to mix & match and run some benchmarks over every combination of them? Then I could pick the fastest combo without needing to worry about messing up my history? With Pijul I imagine this would be a breeze.
I'm looking forward to trying it now that it's getting stable.
Those commits you were having issues with in git, have you tried putting them into pijul to be sure it doesn't have the same problem?
Because so far I have yet to understand why pijul's "set of applied patches" would do anything different than git rebase/cherry-pick. Those work by applying the diffs of each commit to the new tree. Sounds like the same thing to me, or at least close enough to result in the same conflicts.
Why is that? I work regularly with Mercurial and Git. The stage isn’t a huge problem or anything, but it’s super annoying and gives me zero value.
TBH I think the stage is kinda dirty. Because it means you’re making a commit that has never actually existed on your machine and thus can’t have possibly been tested.
Two things: To review the changes piece by piece before commit (git add -p) in case there's something temporary in there I forgot, and to easily exclude any such thing I never intended to commit (usually todo-style comments, sometimes hardcoding a value to trigger a bug or a popup, etc)
I remember a decade ago trying mercurial and git at the same time when subversion was all I knew, and that there was something about mercurial that I found really confusing. It's long enough ago I don't remember what it was, and it may be better now, but git clicked immediately. I wouldn't be surprised if it was the way they handle commits instead of having a staging area.
Also compared to subversion, the staging area is an explicit form of something that half-exists in how it tracks files for committing.
Interesting. I use Mercurial in my day job and use Araxis Merge. Before pushing to CI I use Araxis to diff every file. Can make changes to the file in the tool if needed. Often reverting temp changes or spurious whitespace. This generates new changes which I then amend.
The work around is that I stash after commit and then test the newly made patch before pushing to the repo.
That describes git in a nutshell. It lives up to its name by providing stupid roundabout ways to get things done, but otherwise mostly doesn’t get in your way. Pijul is in theory better, but GitHub prevents any competition from being serious.
I think the Git problem that needs solution is, ahem, the problem people solve with extra tools when using Git? That is, forges (GitHub, GitLab, Gitea, …).
Issues, PR discussions, &c need to be managed with the code (questions like “is issue #769798 fixed in branch xxx?” need to be answerable, code review discussions are almost as valuable as commit messages). Code needs to be actually distributed (forges introduce centralization). Local development should be possible.
Patch theory is nice, but ultimately produces only a moderate improvement.
Git's lack of centralization, i.e. a "source of truth" repo at the tool level is it's most imaginary, impractical benefit. Times where I've had to skip over origin and push to someone directly are few. Those occasions are far outweighed by a dozen annoyances that consistently come up with having to tell git "no trust me, origin is the remote I care about". Sure, you can fix it with aliases, but it's still misaligned of how most people work. I don't see the Linux kernel benefiting from it much, either.
Can you perhaps give a better example when you need to say that origin is what matters for you? I seldom have these problems myself, but there is a nagging feeling that this is something I have thought about my self.
I recently found out about another project called jj: https://github.com/martinvonz/jj. It takes inspiration from Pijul and others but is git-compatible.
Came here to praise jj as well. More than just "git-compatible", its backing store is literally git, which means that you can transparently collab with existing git repos to your heart's desire.
Other interesting ideas it implements:
- Working copy always comitted,
- All repo operations are version controlled, which enables painless undo etc.
- First-class conflicts, just like pijul, so no merge/rebase/etc ops every fail
- Auto-rebase of downstream dependent branches when doing rebase etc.
Also, the dev is really active and helpful. Apparently, Google let him make the project his full time job.
Nothing against the project or talk itself. But kinda funny when a talk starts with “i bet many of you thought X was a solved problem, well im here to tell u it isnt.”
Git was populare not because its somehow revolutionary, its popular because the previous options where so increadibly shit. Any alternative to git is gonna have a hard time without that advantage
I'm not disagreeing, though. I think you're right (for the "foreseeable future"). CVS, SVN, etc., really became serious obstacles once you moved to larger projects and more distributed teams etc. Hell, they caused problems even with small colocated teams.
I think it's clear that git (and its forerunners, esp. "BitKeeper") meets a solid "good enough" standard. Potential competitors from that era, including, in particular, "Mercurial", largely fall into categories of "trade-offs".
But, I am, personally, very happy to see the work that's been done with "Pijul", including completing a "theory of merging".
I don't think Pijul has much of a chance in even 10+ years of replacing git (and "GitHub", part of the success story of git). But, I do think it'll see some use, become a solid foundation for some projects, and, there's a reasonable chance it will influence or become the foundation of some "next git" and/or future versions of git.
With the caveat, of course, that forecasting anything on those kinds of timeframes is even more of a fools errand now than it was even ~15+ years ago (about when git was first developed).
> Potential competitors from that era, including, in particular, "Mercurial", largely fall into categories of "trade-offs".
Which "trade-offs" are you referring to?
Mercurial was, and still is, a solid DVCS. It's not often used today because it lost the popularity contest, due to several reasons[0], but technically it's as good as Git, and functionally it's even better. It has a much saner and friendlier UI, which is the main complaint about Git.
Git itself might not have been revolutionary, but the concept of distributed version control certainly was. Git wasn't the only tool from that era to adopt this model, but it's fair to say that it has won the popularity contest, and is the modern standard in most projects.
Mercurial, Darcs and Fossil are also interesting, and in some ways better than Git, but Git won because it had the persona of Linus behind it, one of the most popular and influential OSS projects using it as proving ground, and a successful and user friendly commercial service built directly around it, that included it even in its name. All of this was enough for Git to gain traction and pull ahead of other DVCSs, even though in the early days Mercurial and Bitbucket were also solid and popular choices.
I used and preferred Mercurial for a long time, but ultimately Git was more prevalent, and it felt like swimming against the current. I feel like that also happened with Docker (Swarm) and Kubernetes, where k8s is now the de facto container orchestration standard, much to my own chagrin.
I entirely disagree. Git was not an okayish solution to a problem that previously didn't have any okayish solutions. Git was trying to solve a new problem, and, as it turns out, it's a problem that a majority of git users don't have, and don't intend to have in the future.
I use SVN to this day, because I seriously believe that it's a better solution to the centralized version control problem than GIT. After SVN, centralized version control was basically a solved problem (or, at least, we had an okayish solution), so the next generation of tools (GIT, BZR, HG, FOSSIL) tried to solve a different problem, namely distributed version control.
But they made a complete mess of it (at least git did, I don't know the other distributed ones particularly well). A majority of git projects use centralized workflow and are subsetting the use of git features to only the ones that straightforwardly correspond to things that svn can do as well, and can do more easily. And the cost was a much more complex/convoluted mental model that a majority of git users don't truly understand in full detail, which gets them in trouble if edge cases turn up. Hence this joke [1]. With things like [2], you're basically seeing git becoming a parody on itself.
Edit: I have heard this before but it is never specific, in what way is it easier? I never needed the big selling points of SVN, binary files (not good enough for me at the time), controll etc.
> Edit: I have heard this before but it is never specific, in what way is it easier?
In SVN, any subdirectory of any repo looks pretty much exactly as if it was the root of its own repo.
Given that basic idea, you can accomplish things like branches, tags, links, and sparse checkouts through operations on directories, so there's no need for any special treatment of these concepts in the software (or your brain).
If your main branch is `myproject/trunk`, and you want to make a tag, just `svn cp myproject/trunk myproject/tags/v0_1`, done. If you never touch the copy, the tag will always keep "pointing" to what it is you want it to point to.
Want to start a feature branch? Just `svn cp myproject/trunk myproject/branches/my_feature`. Then check out that subdirectory and apply commits there instead of to `trunk`. When you're done, `svn merge myproject/branches/my_feature myproject/trunk`, done.
Want a special subdirectory `theirproject` that exists under `myproject` and points to a particular commit of `theirproject`? Assuming both are in the same repo just `svn cp theirproject/trunk myproject/trunk/theirproject`, done. Want to move the pointer to the latest version of `theirproject`? Just `svn merge theirproject/trunk myproject/trunk/theirproject`, done.
You can check out the directory tree at the point where you're actually working on it, and not have to consume bandwith to transfer the rest of the repo, not even when you check it out for the first time. This is super-useful for when repos get too large for their own good, or you want to do a monorepo covering many different projects. In GIT, trying to do "sparse checkouts" to accomplish this has driven me crazy on numerous occasions, in SVN it's totally natural.
An SVN monorepo thus allows one to make project structures a lot more fluent, adapting layouts as projects evolve. Permissioning is done at directory-level too. You can easily work out (and adapt, as needs change) your project layouts to fit your permissioning needs, or apply fine-grained (sub-repo-level) permissioning that respects your project layouts. Permissions aren't a big deal in open source projects, but they sure are when managing proprietary code.
I could go on, but the above is just an attempt to jot down some specifics that quickly come to mind.
Git works well enough for projects that are essentially similar to the Linux kernel, i.e. it's designed to be used by programmers working primarily on text files. For projects that have non-technical people collaborating with programmers, or frequently changing binary assets, there is plenty of room for improvement over git.
That was very in depth! As far as Pijul as a tool goes, I'm not seeing a git compatibility layer? So I think it's a neat project, but I probably won't try it because nearly all code is rooted squarely in git. Even if Pijul is perfect, you'd need to convince everyone else to use it.
To the author: you should include a parragraph on the repo README explaining how pijul compares to git.
Learning git kinda sucks, but once you learn it there's very little incentive to invest time in learning other version control tools. The extra features rarely justify the additional effort. Like I thought Fossil was kinda neat, but I still stuck with git.
Pijul is definitely a nice research project, and maybe some of the technology behind it could be integrated into better git tooling. But I doubt that exposing users to the raw "everything is just a bucket of patches, you have no history of repo state" idea is good. People want to be able to bisect, or check out some older version at a given time point. There are benefits of pijul's model, but I don't think they are worth as much as losing that.
"bucket of patches" doesn't imply the patches couldn't be replayed to a point in time. Looking at the docs, you could probably use `log` and `unrecord` to go back to an earlier state. Or use `change` to view a particular change.
The issue is that `log` orders the patches by when they were made, not by when they were merged (in reality it might be a bit more detailed, probably a patch will always be ordered after the patches it depends on).
git log also has parameters to configure it in that fashion, and it's also what github uses. However, importantly, git bisect does not use it. It uses the DAG of commits that git maintains.
Yes, also git also only has a weak notion of a history of a branch: it is somewhat inside the titles of merge commits, and merge commits also have an order of their parents, so often you can just walk the first parent and always get to commits that used to be state of the master branch at a specific point in time. Sometimes it's even easier when merged PRs get squashed. But yeah, it's a bit implicit and if you e.g. rebase a PR then some of the commits in it have never been the tip of the master branch.
But very importantly, every commit in git depends on every single commit that has contributed to that commit's tree.
In pijul, they break up that property, and patches only depend on those patches which have directly modified the pieces of code that the change is editing. IIRC you can also manually introduce dependencies but those are rare. There isn't even a notion of a "patch's tree". You could construct a branch with only the patch and its dependencies but probably that wouldn't compile, and it will definitely not reflect the state of the branch at that point in time.
You can make new branches in pijul that you don't modify and use like tags, but it is an explicitly requested thing.
I mean, I see the elegance in pijul. Reverting a patch that has no conflicts and un-reverting it is trivial. But on the other hand, how often do you do that?
Some workflows need patch based thinking, others need snapshot based thinking, and VCS tools need to master both, whether they start out with a snapshot history (like git) and then build diff based workflows on it (diff, rebase, etc), or whether they start out with a patch tree, and then build snapshot based workflows on it. pijul has weak snapshot based workflows, this part is still missing.
The use cases define whether pijul is sufficient for your needs. For bisecting to find a bug, it seems you could easily bisect along a history of patches. Maybe there are patches that may alter the history that haven’t yet been merged, but that’s really an agreement between you and your collaborators that they should commit often. A similar constraint is required if you need to tag a commit for a release (where a snapshot can have meaningful implications for legal compliance, reproducible builds etc.)
> it seems you could easily bisect along a history of patches.
I've explained why this doesn't work in my comment above. Both a history of patches and bisect halting at the state of a specific patch are not things pijul supports: patches are only diffs so don't have "repo state", and are ordered only after their strict dependencies, i.e. those patches whose lines they modify.
You could still isolate the changes that cause a problem, and it would still be reversing state in chronological order, but there might not be a one-for-one mapping between builds and the history, because the history can change upon merge. Though it's perhaps possible to freeze a channel for each build version, but I'll concede that seems like a hack.
The problem with pijul is that the command line interface is not really better than what git has. I tried pijul several times over the years and last month I couldn't even figure out how to make a commit and push it. I think it should be far more intuitive.
I don't know if it's the server being slow, or inherent to the protocol, but `pijul clone https://nest.pijul.com/pijul/pijul` took an awefully long time (it doesn't say how much it downloaded, but the .pijul directory is 28MB -- which isn't a lot assuming it contains things similar to .git (gathering from the talk said it should contain diff + snapshot of each 'commit'))
I feel that git is the DVD of source control systems: sccs / rcs / cvs are the equivalent of tapes, clearly inferior and crappy. Other, more modern systems, are like BlueRay: technically superior, but not by a margin wide enough to make me want to change.
I don't buy (as sold) the argument that we need associative and commutative merges, and I believe that this constraint when applied by the software makes resolving merge requirements harder on the user, rather than easier. It's punting the slightly more solved part of the problem back on the user, when it's one of the least favorite parts of the problem for the user.
The solution that he writes up is basically "git merge" and part of the reason that we have such a bad time with git is that the industry largely standardized on "git rebase", which destroys many of the properties listed:
changes are partially ordered: if you merge, this is true, if you rebase, this is not, your change is strictly reodered atop the merge base, and you dear user are responsible for the hell of fixing the constant conflict this produces, even if an alternative merge order would implicitly fix this. when pijul says "partial ordering" here, it means (in slightly abstract and minutely different terms) it's doing what git merge does, because it's better. As it happens one of the various reasons rebase became popular was that `git log` order shifts around for efficiency and correctness reasons, when people mostly want the output of `git log --topo-order` by default, which for the common case of "merge right onto left" presents "right first, then left" in linearized history, unless there's a content defined order. Pijul dodges the presentation problem by just not presenting this information at all. Git's mistake was in many ways not hiding enough from users - but it came from kernel engineers, who don't want things being hidden.
No git rerere: rerere is a hack to try to make rebase pretend that it doesn't drop usable merge information on the floor, if you stop imposing rebase on yourself you don't need the rerere hack.
Partial clones & large files: these are storage structure optimizations, and yes, a problem for git, though for example git lfs approach is similar in solution to the described pijul solution: "we only need a description of operations" - that's how you would describe lfs changes. As it happens, that's also how you would describe subproject updates, and yes, sure subproject UX isn't great, but now we're back to a UX problem, not a storage problem.
Working with large graphs: cache snapshots and index deltas, um, yeah. You don't have to squint very hard to draw equivalence here either.
I'm not apologizing for git, but there's a lot of "here's some non-ascii literals on a slide, it's science" presented, but not actually a very substantial shift in trade-offs when applied in practice. There are some pathologies improved, and some worsened, and there are some more onus put on the user for some cases, and some removed for other cases. It's a very different (at the minute level) structure, sure, but I've not been sold that it makes a set of trade-offs that substantially change the game, it's just the other side of the field, not all that dissimilar from choosing between two different database engines.
An example of it not being that different in practice: can you conflict with yourself in pijul? Absolutely, and you end up with the pijul version of merge commits, e.g. https://nest.pijul.com/pijul/thrussh/changes/7S7FHFDVSSRB4DC..., more than that you can conflict with yourself multiple times (which is basically the rerere problem but actually distributed, which was somewhat erroneously claimed to be eradicated). It hasn't substantially moved the needle on merging history quality _in practice_. The manual is very incomplete on this, i.e. https://pijul.org/manual/conflicts.html and there are some nascent but very slow tools to unhide the totality of conflicts in a repo https://nest.pijul.com/laumann/pijul-conflicts which would bring the rerereal problem (couldn't resist, sorry) into the light.
What's the largest team concurrently working on one project full time using pijul today? I ask because small numbers of users don't create many interesting conflicts often, and don't provide a good insight for the properties of a VCS at scale/in practice. You can use CVS with zero problems when you're on your own. There's an embedded theory which is that you can do very lazy conflict resolution requirement and that it'll pay off. I'd like to see that theory tested at scale.
I don't really buy the commutative patch thing either (yet!), but I'm willing to believe that it's because I'm so happy to `git rebase` all the time, but also because I have a tool for fast rebasing across many commits[0].
> The solution that he writes up is basically "git merge" and part of the reason that we have such a bad time with git is that the industry largely standardized on "git rebase", [...]
I wish. So many people still use `git merge` workflows!
> No git rerere: rerere is a hack to try to make rebase pretend that it doesn't drop usable merge information on the floor, if you stop imposing rebase on yourself you don't need the rerere hack.
Whether you cherry-pick (rebase) or merge, you still only commit the conflict-resolved state. Merge commits document what were the two HEADs you conflict-resolved, but so what, if you lose the merge commits at upstream integration time no one loses anything useful.
I've worked on very large systems using rebase workflows. I have written about that many times here on HN in the comments. Rebase workflows are definitely far better than merge workflows.
Perhaps the Jujutsu approach to rebase (automate it!) is the right solution for all those who can't stand rebasing.
Thank you. I’ve lost months of my life to rebasing large patch sets against an actively maintained upstream. Your tool looks like it will save me a lot of gray hair.
I had to rebase a patch to PG across thousands of upstream commits after having left it alone for a year+, and it was very painful going until I talked to @vdukhovni about it, and he had this tool, which was `slowrebase` at the time, and we improved it to do a sort of bisection approach to rebasing. That made the rebase I had to do go real fast!
I never had to re-resolve the same conflicts using this script.
Instead, the tools should be smarter and work on the level of functions, classes, packages, sentences, paragraphs, or whatever primitive makes sense for the project and file that is being changed. In the case of code bases, they need to be aware of the language and the AST of the program. For binary files, they need to be aware of the file format and its binary structure. This would allow them to show actually meaningful diffs, and minimize the chances of conflicts, and of producing a corrupt file after an automatic merge.
There has been some research in this area, and there are a few semantic diffing tools[1,2,3], but I'm not aware of this being widely used in any VCS.
Nowadays, with all the machine learning advances, the ideal VCS should also use ML to understand the change at a deeper level, and maybe even suggest improvements. If AI can write code for me, it could surely understand what I'm trying to do, and help me so that version control is entirely hands-free, instead of having to fight with it, and be constantly aware of it, as I have to do now.
Or, since it's more than likely that humans won't be writing code or text in the near future, we'll skip the next revolution in VCS tools, and AI will be able to version its own software. /sigh
I just finished watching the presentation, and Pijul seems like an iterative improvement over Git. Nothing jumped out at me like a killer feature that would make me want to give it a try. It might be because the author focuses too much on technical details and fixing Git's shortcomings, instead of taking a step back and rethinking what a modern VCS tool should look like today.
[1]: https://semanticdiff.com/
[2]: https://github.com/trailofbits/graphtage
[3]: https://github.com/GumTreeDiff/gumtree