I think the issues with concurrency are at this point greatly overblown. Is it hard? Maybe, so it are a lot of things in our field. Is it phd level? Not at all, unless you are literally breaking new ground; most problems have very well known solutions.
Mutexes are a solution to the mutual exclusion problem, no more no less. Sometimes this problem can be solved by a queue, but that opens other large cans of worms like asynchronicity.
I write more lock-less [1] code than most, but I will often fall back to a mutex as the right solution for a problem given a complexity and performance budget.
[1] by that I mean code without standard use of mutexes, not necessarily non-blocking.
The nasty part of concurrency bugs is that normally you can expect the code is almost correct and needs some tweaks to fix oversights, but concurrency bugs in incompletely thought-out code more often requires throwing out the design to build it right. Additionally, concurrency issues usually arise from multiple components interacting in nonlocal ways, requiring a global knowledge of the system to diagnose and fix. And finding the presence of a bug or reproducing it is usually probabilistic, making concurrency heisenbugs hard to discover, locate, and prove they're fixed.
It can be made somewhat tractable to experienced concurrency wizards; I think Rust's "cross-thread shared ^ mutable" rule is a good starting point, but have less experience with Go, JS, or Erlang-style approaches.
Aren’t there formal analysis techniques that can help ferret those out? My MSCS had a course called “science of programming”. The focus used to be proofs for correctness of programs, which included concurrent programs.
On Intel processors, which have strong memory order, concurrency is fairly easy to reason about. On processors with weak memory ordering (ARM notably), things get really treacherous.
I'm still settling in to the horrors of memory ordering on ARM. The one thing I know for sure, is that my oeuvre contains a trail of code that will work fine on Intel processors, but won't work on ARM (or any other processor with weak memory ordering). :-/
From my point of view, the issues with concurrency (specifically the shared memory "threadlike"-kind) are absolutely NOT overblown, because misuse of that mechanism can completely RUIN a codebase in a way that few other things can; "goto" abuse would be a good analogy, or massive amounts of partly redundant global state.
My perception is that other comparably dangerous mechanisms are taught with appropriate caveats-- you WILL typically be admonished to keep variables as local as possible, learn how to encapsulate state, and probably be instructed to ALWAYS use proper loops, conditionals and function calls instead of wild gotos-- it does not even matter if you learn programming in university or on your own.
But with threading this is not the case, you'll typically get handed threading primitives without much prejudice, with predictable results.
Concurrency, like distributed systems, is only "hard" in so much it's easier to sell your product, that features it, if you can convinice others of how arduous solving such a CS101 topic would otherwise be.
If I were younger you would have triggered my impostor syndrome, but as a man of wisdom I know that you (a) have super-human skills[1] nobody else has, or (b) you’re talking about something different.
Note I’m not talking about an isolated case of analyzing a concurrent snippet of code for happens-before relationships, but rather how to write highly concurrent code within large scale applications, without runaway complexity.
Mutexes are a solution to the mutual exclusion problem, no more no less. Sometimes this problem can be solved by a queue, but that opens other large cans of worms like asynchronicity.
I write more lock-less [1] code than most, but I will often fall back to a mutex as the right solution for a problem given a complexity and performance budget.
[1] by that I mean code without standard use of mutexes, not necessarily non-blocking.