I know it's early days on this, but compilation speed is the downside to Rust IMO. Having worked in a Rust monorepo, my number one complaint was compilation speed. It made CI/CD more expensive and it could really slow down dev time if we needed to remove the cache (happened sometimes - not cargo's fault, actually it's a docker bug, but still).
It’s unlikely that it’ll ever get dramatically better. It’s already been heavily optimized, and the Rust compiler now has more parallelism than pretty much any other mainstream compiler. Language design choices make Rust more challenging to compile than a language (like Go) that is specifically designed for fast compilation.
I don't agree. There are a lot of things on the table, performance-wise.
1. The compiler could ship binary artifacts, which would avoid all compilation of build scripts/ proc macros, and allow those to be compiled with performance optimizations enabled. This would be huge on its own.
2. Cranelift could potentially improve backend codegen compile times significantly as well.
3. Link times are still suboptimal, mold is promising here.
We can definitely still get significant wins out of the compiler.
Pretty sure compile times can get cut in half (or better) with those changes.
Maybe, but even twice as fast would still make it a “slow compiling language” in comparison to a “fast compiling language” like Go or Pascal.
This is not a knock on Rust—I doubt it’s possible to do what Rust does—including zero overhead abstractions—in a fast compiling language. Go certainly pays a performance penalty with things like boxed generics.
Twice as fast (or more) is just what I'm aware of in terms of "things that are possible to do today but aren't the default/ would take work to hack in". I don't even know what other options there are beyond that.
But sure, twice as fast isn't fast, it's just faster. My point is that we're not at the point of serious diminishing returns, there's tons of stuff left to do.
If there was a magic pot of gold, it would be technically possible to precompile every crate version with every rustc version on every supported platform and distrivute those prebuilt rlibs to users through cargo. That would help with first compile times when using the standard tooling, and not just for proc-macros.
Different people with different use cases have different complains. I haven't quantified but have certainly seen various complaints about both cases from different people.
Go’s generics aren’t boxed. At least, not in the sense that Java’s are. For example, you can write generic functions that operate over slices of unboxed values.
Still worse than true monomorphized generics in C# which too has fast compilation times (by nature of being JIT-compiled, but AOT target is still faster than Rust once you download dependencies).
Go’s implementation strategy for generics essentially is monomophirization plus obvious code size optimizations (e.g. don’t generate different code for different pointer types given that they all have the same underlying representation). Do you have specific scenario in mind where Go’s implementation strategy carries a significant performance penalty? I think possibly there are some misconceptions in this thread about how Go’s implementation actually works.
It is a knock on Rust. The circumstances of Rust's state of existence in 2023, as a language created in this millennium but not in the last decade, are absurd.
> I doubt it’s possible to do what Rust does—including zero overhead abstractions—in a fast compiling language
People packaging releases for software written in Rust and others who are passive consumers and finding themselves downloading some project repo to compile from source for whatever reason (e.g. because the creators don't do binary releases themselves) don't need the Rustlang toolchain to do the things that active contributors to a given project (who want type system diagnostics, etc.) need from it.
I'd call this oversight a massive lack of imagination on the part of TPTB, but that would be wrong, because there is no need to imagine the differences between these use cases. They exist. An adequate toolchain for dealing with projects written in Rust—despite the deliberate decisions made during language design that led to these problems—does not.
> 2. Cranelift could potentially improve backend codegen compile times significantly as well.
I've been told that the Cranelift team (at least for the time being) doesn't have the intention on focusing on the optimizer to a degree where it would be competitive with LLVM's optimizers (which would also be a huge effort). So if you want faster compile times you would have to take significant performance hits (which for a lot of code compiled in CI is not a trade-off that people are willing to take).
Yes, to be clear, Cranelift would be suitable for dev and test builds, you'd likely use llvm for release builds. So in your CI builds you'll almost certainly stick to llvm.
Beyond specific optimization and implementation details of a compiler, the three variables of "compilation speed", "generated code optimization" and "language expressiveness" are fundamentally in tension. In order to move one axis you have to affect one or both of the other two.
It would be great it people would pay Rui to make mold versions for Windows and Mac, which ideally would be required before making it a part of the official Rust tool chain.
He did monetization the wrong way around, IMO. Most CI is on Linux, but most developers are on Windows or macOS, so he should've capitalized on the Linux builds being paid while the local developer builds on Windows and macOS being free.
I doubt that anyone cares all that much about linking times in the CI. And even if someone does, it's probably an individual developer or team, ie, someone without decision power to pay for something as niche as a linker.
Also, mold was designed as an alternative to gold / lld, therefore it would require to be open-source and free on their main platform: linux.
I care deeply about linking times on CI. It's very frustrating having your code all build and run tests locally just to wait a long time for it to pass all of the CI barriers. Plus, CI builds often go stale much faster, so you're looking at much longer build times without caches.
Sure, but you're not really contradicting me unless you're able to get your company to pay for faster tooling. And if you can, why haven't you already?
Well, yes, that is the crux of this argument, that one can convince their employer to use mold. Otherwise, what is the point of using it? Desktop users by and large will not notice a small 3-5% improvement in compile times while those that pay for CI will.
Well, CI is where the costs are, and if the application is big enough, even a few percent reduction via faster linking times would equate to lower costs, while in contrast, developers won't really care or notice a few percent reduction on their local machine.
It's AGPL on Linux now, and they sell commercial licenses for companies that won't touch that license, and they were contemplating earlier making mold only available under a non-free source available license like BSL, so there's no "requirement" as such that it be free and open source, even on Linux.
> Most CI is on Linux, but most developers are on Windows or macOS
Do you have any data on this ? Maybe that's industry dependent, but I hardly know any Windows (not even talking about macOS, that's almost nil) developers outside of video games and web dev. 100% of Rust devs I know use Linux, to keep on the subject.
Data that most people don't use Linux as their day-to-day desktop OS for development? I suppose you can just look at desktop Linux statistics, which shows <5% usage. In my experience, most use macOS, or Windows via WSL2, which does use Linux but I am not sure if that is actually reflected in any desktop OS statistics.
I agree with this assessment, despite the optimism of some others. C++ has had slow compile times since forever, and so will Rust. Rust does a lot more work at compile time than most other popular languages. And it's largely stuff that's fundamental to the language. For example, besides borrow checking, the de facto default way to do polymorphism/generic programming in Rust is at compile time via what is essentially code-gen. In Java if you write `void useFoo(Foo foo)`, it'll compile quickly and will use runtime polymorphism to make sure that the argument is a subtype of `Foo`; in Rust if you write `fn use_foo(foo: impl Foo)`, the compiler is going to spit out a `use_foo` definition for each concrete type that is passed to `use_foo`. That takes time.
That being said, I definitely find the trade-off worth it. Though, I've never been the kind of programmer that desires the constant iteration and feedback of something like "REPL driven development".
> C++ has had slow compile times since forever, and so will Rust.
Rust has a massive advantage, which is having a 'sanctioned' package manager and built-time capabilities. A huge part of Rust's slowdown is due to:
a) Having to compile build scripts
b) Those build scripts being built without optimizations (100s of times slower at runtime)
If cargo + crates.io supports pre-built dependencies that is a massive optimization.
This isn't theoretical or optimistic, it's just a fact - we can already see this by compiling build and proc macro crates with optimizations, it's just not the default and they still have to be compiled once. IF you remove that compilation time, again, it's not theoretical, it's turning N time spent on those deps into 0 time spent.
There is easily a 200% performance win available, just from the known optimizations that are on the table.
Rust has another advantage in the language itself- generic code can be type-checked and (partially) optimized before being instantiated.
When you export a generic function in C++, every file that pulls it in has to re-parse it, and every instantiation has to re-type-check it. C++20 modules should help with the first part, but they can't help with the second (and neither can concepts). Further, separate translation units can wind up duplicating the same instantiations, which the linker has to deduplicate.
When you export a generic function in Rust, by the time it gets pulled in somewhere else, it takes the form of pre-parsed, pre-type-checked MIR. It can also be pre-optimized, so type-independent optimization work is shared between instantiations. The compiler can also tell, before instantiation, which type parameters a function does not actually depend on, and essentially erase them ("polymorphization"). Further, Rust's compilation model reduces the redundant duplicate instantiations C++ does, both by using larger translation units and by automatically sharing any instantiations in dependencies with their dependents (though you can do this by hand in C++).
(Incidentally, these differences also apply to inline functions- in C++ you wind up putting their definitions in headers and recompiling them from scratch over and over; in Rust they are shared MIR form.)
> we can already see this by compiling build and proc macro crates with optimizations, it's just not the default and they still have to be compiled once.
I'm hopeful something like watt (https://github.com/dtolnay/watt) will land in Cargo that'll allow us to ship pre-compiled wasm blobs for proc-macros so we can just have sandboxed binaries.
I think the whole point is to prevent build scripts from doing arbitrary things. The sandbox should give access to the source code being built, record changes to these files (and/or new files generated in the same directories), and that's about it.
C++ compiles times are awful insomuch as you have to do the multiple times because the "template barf" makes finding root causes very challenging, esp with multiple problems.
Rust makes the problems easier to fix, IMHO. So, maybe even with same (or slightly longer) compile times, you'll hopefully have faster time to delivery.
In fact, in my experience, Rust has faster time to delivery than any other language I've used. It takes forever to compile, but I have so many fewer runtime bugs that have to be caught (hopefully) by testing, that it still comes out ahead, overall (again, for me and my various projects).
I also find write-time to not be as slow as others complain about, except when it comes to async/futures where it is, indeed, pretty rough. But, if I sit and think about how many times I have to flip back and forth between my code and some library code to try and guess what exceptions it may or may not throw in other languages or whether something could be null or not, I find that the dev times aren't so much better in these other languages as people sometimes claim.
Sure, if you're a fulltime JavaScript dev with 10 years of experience, you might remember things like that calling the Array constructor with 0 or >1 arguments creates an array with those values, but if you call it with exactly 1 number, it will create an empty array with that capacity. But, since I have to switch between many languages regularly, my time to delivery is significantly reduced by nonsense like that. Likewise, it's reduced by NPEs in Java, double-frees in C++, Kotlin's inane idea to use exceptions for errors and coroutine control-flow, etc, etc.
I just want to note that I fully agree that Rust is, ultimately, an extremely productive language. In my considerable experience with Rust it is the most productive language I have ever written code with professionally.
The fact that my only complaint is that compile times are slower than I'd like should be seen as high praise.
It’s not really that go is better designed for fast compilation - it is just a plain language where the compiler can just spit out vaguely optimized code, and call it a day.
Rust’s unique feature itself fundamentally depends on extensive static analysis. It’s not a design choice, it is pretty much what Rust is - a low-level language without a GC that is still memory safe. The price for that is hefty compile times.
> It’s not really that go is better designed for fast compilation
One of the explicit goals, by Go's creators, was fast build times. I still remember Rob Pike introducing Go during an all-hands at Google, where he talked about the very long build times for C++ and Java in Google's monorepo, and then showed some promising demos. (Most of us rolled our eyes at it then, because it was just a "hello world", but it's quite impressive how the language has evolved and remained true to its goals.)
> - it is just a plain language where the compiler can just spit out vaguely optimized code, and call it a day.
It's a simple language, but I wouldn't call it plain, nor characterize the optimizers that way.
It is not faster at compilation than Java, which was not particularly designed for such.
Also, as can be seen, go is not a well-designed language, having language warts we knew for 50 years. I would take the creators’ claims with a huge grain of salt.
But Java is inspired by Smalltalk, which is a late-binding language that defers most things to runtime. I believe in Java you can generate bytecode directly as you’re parsing the source file.
Java is compiled to bytecode, for later compilation to machine code at runtime (JIT). Go is compiled AOT, straight to machine code. It makes no sense to compare them.
Unless you meant that Java's AOT compilation is faster than Go's?
The parent comment explicitly mentioned that Java is slow at compilation, which is just false.
Also, there are single-pass compilers that produce machine code, they are not fundamentally slower than a byte code generator. Of course extensive optimizations will be more expensive.
I do think highly of Rob Pike and Ken Thompson for their IT work, but they are simply not good at language design, which just shows that PL design is quite unlike working on an OS.
Both statements, because unless otherwise qualified you're comparing apples to oranges when you say Java compiles as fast as Go. There's always going to be more overhead on running the Java bytecode on the JVM than there will be when running the native instructions generated by a compiler (even as "unoptimized" as Go is).
And someone that makes that assertion with a straight face without this caveat is not someone that should be dissing Rob Pike about language design.
Profiling the compilation process suggests that this isn't the case. Rust's higher level passes are rarely the dominant part of execution time.
Check out https://github.com/lqd/rustc-benchmarking-data/tree/main/res... and the other benchmarks in that repository for some data on how real world crates compilation times are spent. You'll find that backend code generation and optimization dominate most crates compile times. There are a few exceptions: particularly macro heavy crates, a couple crates with deeply nested types that hit some quadratic behavior in the compiler. But overall, the backend is still the largest piece.
The front end is time-consuming enough where replacing the backend with something lightweight like Go’s wouldn’t get you a 5-10x improvement, which is what I think you’d need to really move the needle on user perception. Moreover, a lot of the backend slowdown is due to front end choices monomophization which generates large amounts of intermediate code that must then be optimized away.
I doubt that a hypothetical version of Rust that avoided monomorphization would compile any faster. I remember doing experiments to that effect in the early days and found that monomorphization wasn't really slower. That's because all the runtime bookkeeping necessary to operate on value types generically adds up to a ton of code that has to be optimized away, and it ends up a wash in the end. As a point of comparison, Swift does all this bookkeeping, and it's not appreciably faster to compile than Rust; Swift goes this route for ABI stability reasons, not for compiler performance.
What you would need to go faster would be not only a non-monomorphizing compiler but also boxed types. That would be a very different language, one higher-level than even Go (which monomorphizes generics).
Just wanted to note Go does only a partial monomorphization, only monomorhpizes for gcshapes and not for all types. This severely limits the optimization potential and adds a runtime cost to dispatch, at least in its initial implementation.
Then there is an open niche for a “development mode”, that outputs barely optimized binaries with proper error handling, fast. (I do know about debug, etc).
It already exists: It's called “debug” mode and it's what you get when you don't compile it in release mode. (The biggest problem with debug mode is how slow the unoptimized code is: for back-end stuff it doesn't matter, but for things like gamedev you want your dependencies to be compiled in release mode (fortunately the cargo allows you to specify that you want some deps to be compiled in release mode even when your project is compiled in debug mode).
This is a HN thread about a blog post about how compile times have become dramatically better thanks to newly introduced parallelism in an area that was completely single threaded.
> However, at this point the compiler has been heavily optimized and new improvements are hard to find. There is no low-hanging fruit remaining. But there is one piece of large but high-hanging fruit: parallelism.
From discussions I've seen, there's not much high-hanging fruit left either, short of rewriting the entire compiler for better incremental compilation.
I think if you're talking about the compiler getting faster at what it does today, how it does it today, that's true. But that's a heavy constraint. If we got support for binary dependencies, that wouldn't be a compiler optimization in the same sense as parallelism is, but it would radically improve compile times for the average project.
Yeah, but binary dependencies or watt-style precompiled macros aren't going to get improve the build times people really care about, incremental build times. The parallel frontend is the plausibly the last major improvement we'll see on that front for years.
Incremental matters more than clean build times because (A) you're likely to do a lot more of them (B) they break developer flow more than waiting on CI does (C) at least in theory, you can always add more cores to your CI and get reasonable speedups, less so for incremental.
> Yeah, but binary dependencies or watt-style precompiled macros aren't going to get improve the build times people really care about, incremental build times.
Why not? If I add a new struct with `#[derive(serde::Serialize)]` I'll benefit from serde being compiled with optimizations.
> they break developer flow more than waiting on CI does
It might not get 10x better, but 3x isn't outside the realm of possibility. Just swapping the LLVM backend for cranelift can cut compile times in half.
The low-hanging fruit is gone but there are lots of hard but likely-significant improvements left on the table.
Rust-analyzer is even more of a resource hog than rustc itself. Not sure how directly applicable this work might be, but hopefully we'll see big improvements there as well. It's something that's clearly needed for state-of-the-art IDE support.
That’s interesting/surprising. I remember this in the Eclipse days, and it was often attributed to Java’s allocation-heavy style, garbage collection, and lack of value types
Also Java style of tiny classes and tiny files.
It’s an issue on both the implementation side and the thing-being-implemented
I would have thought Rust would be better on both fronts.
How many lines are the Rust codebases and their dependencies?
To be fair, I don't care. RA is extremely valuable, and I am _not_ one of the people who think 16GB of RAM and 250GB of SSD is good for a programmer machine.
I really disagree here: I'm a maintainer of a medium sized open source Rust project [1] and I'm always surprised by Rust compilation time speed (local). On a MacBook Pro, it's a matter of seconds, in debug. Release compilation and CI/CD are slower, but, since the beginning of my Rust journey (2 years ago), Rust compilation seems just very fa...
To balance / explain my point:
- my day work is Java / Kotlin with Gradle. Now, we can talk about glacial compilation times
- on my open source Rust project, we try to minimise dependencies, don't use macros (apart derive[Debug, Clone] etc...), and have a very moderate generics usage
If you take the time to `cargo build` my project, I'll be happy to have feedbacks on compilation times
- with `cargo tree`, I see that the project depends on ~600 crates
In my toy project:
- cloc shows that there are ~40,000 lines of Rust
- with `cargo tree`, I see ~40 crates
I don't know the scope of grapl, but 600 (transitives) crates seems a lot to me. Maybe that explains why this particular build is so long. I haven't managed to build it (seems to have prerequisites on proto buffer stuff).
Yes, it'll require a protoc installation to actually compile, as well as some native dependencies.
Naturally more crates means more time on compilation. Grapl is a pretty large project, lots of services that do different things, so it isn't too surprising that it has a lot of dependencies relative to what I assume is a more tightly scoped project.
For example, Grapl talks to multiple different databases, AWS services, speaks HTTP + JSON and gRPC (with protobuf), has a cli, etc etc.
As someone who has only done small projects in Rust, I'm curious how many LoC are we're talking? And were you splitting your project into crates where it made sense?
I wouldn't be surprised if Rust/Cargo does more disk IO than other build tools, though. Rust does a lot of compile time code gen and caches a lot of stuff on disk.
You're right that some slowdown is expected, but for me personally I hadn't realized how bad this particular FS was, nor had I expected how much it impacted build times
We split each service into crates, plus some libraries. There were some native dependencies as well, which could really impact compile times, as well as some codegen for things like protobuf.
Depends on the code and whether you are doing a release or debug build. I work on a very large rust project (~1m LoC) with a lot of dependencies. We've split it into multiple crates and the compile times don't really frustrate my dev workflow (incremental compilation works well and debug builds are pretty fast anyway). But building in our CI pipeline where we do a fully optimized build (single codegen unit, LTO enabled) it takes a while (~30m) which is annoying when you are waiting for a hotfix to be ready. It's also incredibly resource intensive (mainly linking with LTO enabled) so we've been the bane of our platform teams existence since we need something like 50GB of memory in our build container to do a full release build :)
It was over a year ago so it's a bit hard to recall... Maybe 20 minutes clean? 1 or 2 minutes for cached. Things have probably improved since then but idk. We did stuff like protobuf, we had a few proc-macros of our own, plenty of serde, and I think ~3 native dependencies (zstd, librdkafka, something else I don't recall). The native dependencies could be brutal, it caused long serial stalls, if I recall correctly. Linking took up a lot of time as well, but I forget why we didn't use mold, there was a reason at the time... but, again, over a year ago.
We did our builds in docker, for various reasons. So we relied on the docker buildx cache and some other tricks that I don't recall because I didn't work a ton on the build system.
I’m continuously amazed that this opinion is so prevalent. I maintain both a C++ and a Rust project, and incremental compile times on Rust are vastly better. It is, I believe, one of the fastest statically typed compiled languages. Go is faster, but I think that’s it.
async-stripe takes over two minutes to build due to codegen. We're considering switching to dolladollabills.
Our core API server takes a minute to build, and we have about a dozen services and command line apps, a bunch of little shared library crates, two desktop apps, and a Bevy app.
Our Github actions docker build takes ~10 minutes if you don't include the tests, but we're starting to shave off more time. (Our monorepo is 105589 Rust LOC total)
> async-stripe takes over two minutes to build due to codegen. We're considering switching to dolladollabills.
Ooohh interesting. We also use async-stripe, definitely going to have to check out dolladollabills though. Also in the Rust monorepo camp, our proof release takes ~5 mins from clean, tests are about 6 mins. We’ve invested a bit of effort getting our build-time down: don’t build in a docker container, we just copy the final artefact in-this wiped the most time of our builds. More parallel codegen units too.
I never understood the "gotta clean cache every time" with CI/CD. I'm sure it makes sense sometimes but you can make a compromise. I worked at two places where I only clean up our c++ caches on the build systems on the weekend. We did that "just in case" caching was hiding a problem, but would only have a "small 1 week" set back at most. We were not on heavy release cycles though so we could afford that. We never had a single problem traced back to the cache or build hiding something. This was for internal company software. Not sure why people are willing to pay the cost for a full rebuild every time if that full rebuild takes a long time (rust or c++). I'm sure there are cases for it though, just that it doesn't have to be done everywhere.
It's not that you have to, it's that you have many different builds that are going to stomp on each other's caches, plus your build services are often ephemeral - especially since I was at a small startup where we wanted to shut systems down overnight to keep the money.
Glad to see this progress.