macOS 26.2 enables fast AI clusters with RDMA over Thunderbolt

simonw · 2025-12-12T21:34:39 1765575279

I follow the MLX team on Twitter and they sometimes post about using MLX on two or more joined together Macs to run models that need more than 512GB of RAM.

A couple of examples:

Kimi K2 Thinking (1 trillion parameters): https://x.com/awnihannun/status/1986601104130646266

DeepSeek R1 (671B): https://x.com/awnihannun/status/1881915166922863045 - that one came with setup instructions in a Gist: https://gist.github.com/awni/ec071fd27940698edd14a4191855bba...

awnihannun · 2025-12-12T22:23:55 1765578235

For a bit more context, those posts are using pipeline parallelism. For N machines put the first L/N layers on machine 1, next L/N layers on machine 2, etc. With pipeline parallelism you don't get a speedup over one machine - it just buys you the ability to use larger models than you can fit on a single machine.

The release in Tahoe 26.2 will enable us to do fast tensor parallelism in MLX. Each layer of the model is sharded across all machines. With this type of parallelism you can get close to N-times faster for N machines. The main challenge is latency since you have to do much more frequent communication.

dpe82 · 2025-12-13T00:24:22 1765585462

> The main challenge is latency since you have to do much more frequent communication.

Earlier this year I experimented with building a cluster to do tensor parallelism across large cache CPUs (AMD EPYC 7773X have 768mb of L3). My thought was to keep an entire model in SRAM and take advantage of the crazy memory bandwidth between CPU cores and their cache, and use Infiniband between nodes for the scatter/gather operations.

Turns out the sum of intra-core latency and PCIe latency absolutely dominate. The Infiniband fabric is damn fast once you get data to it, but getting it there quickly is a struggle. CXL would help but I didn't have the budget for newer hardware. Perhaps modern Apple hardware is better for this than x86 stuff.

wmf · 2025-12-13T01:50:00 1765590600

That's how Groq works. A cluster of LPUv2s would probably be faster and cheaper than an Infiniband cluster of Epycs.

dpe82 · 2025-12-13T09:33:16 1765618396

Yeah I'm familiar; I was hoping I could do something related on previous generation commodity(ish) hardware. It didn't work but I learned a ton.

fooblaster · 2025-12-13T04:36:43 1765600603

what is an lpuv2

wmf · 2025-12-13T04:51:47 1765601507

The chip that Groq makes.

liuliu · 2025-12-12T22:51:55 1765579915

But that's only for prefilling right? Or is it beneficial for decoding too (I guess you can do KV lookup on shards, not sure how much speed-up that will be though).

zackangelo · 2025-12-12T23:00:10 1765580410

No you use tensor parallelism in both cases.

The way it typically works in an attention block is: smaller portions of the Q, K and V linear layers are assigned to each node and are processed independently. Attention, rope norm etc is run on the node-specific output of that. Then, when the output linear layer is applied an "all reduce" is computed which combines the output of all the nodes.

EDIT: just realized it wasn't clear -- this means that each node ends up holding a portion of the KV cache specific to its KV tensor shards. This can change based on the specific style of attention (e.g., in GQA where there are fewer KV heads than ranks you end up having to do some replication etc)

liuliu · 2025-12-12T23:25:42 1765581942

I usually call it "head parallelism" (which is a type of tensor parallelism, but paralllelize for small clusters, and specific to attention). That is what you described: sharding input tensor by number of heads and send to respective Q, K, V shard. They can do Q / K / V projection, rope, qk norm whatever and attention all inside that particular shard. The out projection will be done in that shard too but then need to all reduce sum amongst shard to get the final out projection broadcasted to every participating shard, then carry on to do whatever else themselves.

I am asking, however, is whether that will speed up decoding as linearly as it would for prefilling.

awnihannun · 2025-12-13T00:33:43 1765586023

Right, my comment was mostly about decoding speed. For prefill you can get a speed up but there you are less latency bound.

In our benchmarks with MLX / mlx-lm it's as much as 3.5x for token generation (decoding) at batch size 1 over 4 machines. In that case you are memory bandwidth bound so sharding the model and KV cache 4-ways means each machine only needs to access 1/4th as much memory.

liuliu · 2025-12-13T01:20:40 1765588840

Oh! That's great to hear. Congrats! Now, I want to get the all-to-all primitives ready in s4nnc...

monster_truck · 2025-12-12T23:25:30 1765581930

Even if it wasn't outright beneficial for decoding by itself, it would still allow you to connect a second machine running a smaller, more heavily quantized version of the model for speculative decoding which can net you >4x without quality loss

anemll · 2025-12-13T03:51:59 1765597919

Tensor Parallel test with RDMA last week https://x.com/anemll/status/1996349871260107102

Note fast sync workaround

andy99 · 2025-12-12T22:25:11 1765578311

I’m hoping this isn’t as attractive as it sounds for non-hobbyists because the performance won’t scale well to parallel workloads or even context processing, where parallelism can be better used.

Hopefully this makes it really nice for people that want the experiment with LLMs and have a local model but means well funded companies won’t have any reason to grab them all vs GPUs.

codazoda · 2025-12-12T22:41:26 1765579286

I haven’t looked yet but I might be a candidate for something like this, maybe. I’m RAM constrained and, to a lesser extent, CPU constrained. It would be nice to offload some of that. That said, I don’t think I would buy a cluster of Macs for that. I’d probably buy a machine that can take a GPU.

ChrisMarshallNY · 2025-12-13T09:26:14 1765617974

I’m not particularly interested in training models, but it would be nice to have eGPUs again. When Apple Silicon came out, support for them dried up. I sold my old BlackMagic eGPU.

That said, the need for them also faded. The new chips have performance every bit as good as the eGPU-enhanced Intel chips.

willtemperley · 2025-12-13T02:44:24 1765593864

I think it’s going to be great for smaller shops that want on premise private cloud. I’m hoping this will be a win for in-memory analytics on macOS.

api · 2025-12-12T23:31:23 1765582283

No way buying a bunch of minis could be as efficient as much denser GPU racks. You have to consider all the logistics and power draw, and high end nVidia stuff and probably even AMD stuff is faster than M series GPUs.

What this does offer is a good alternative to GPUs for smaller scale use and research. At small scale it’s probably competitive.

Apple wants to dominate the pro and serious amateur niches. Feels like they’re realizing that local LLMs and AI research is part of that, is the kind of thing end users would want big machines to do.

gumboshoes · 2025-12-12T23:59:18 1765583958

Exactly: The AI appliance market. A new kind of home or small-business server.

jabbywocker · 2025-12-13T00:30:49 1765585849

I’m expecting Apple to release a new Mac Pro in the next couple years who’s main marketing angle is exactly this

firecall · 2025-12-13T00:48:20 1765586900

Seems like it could be a thing.

Also, I’m curious and in case anyone that knows reads this comment:

Apple say they can’t get the performance they want out of discreet GPUs.

Fair enough. But yet nVidia becomes the most valuable company in the world selling GPUs.

So…

Now I get that Apples use case is essentially sealed consumer devices built with power consumption and performance tradeoffs in mind.

But could Apple use its Apple Silicon tech to build a Mac Pro with its own expandable GPU options?

Or even other brand GPUs knowing they would be used for AI research etc…. If Apple ever make friends with nVidia again of course :-/

What we know of Tim Cooks Apple is that it doesn’t like to leave money on the table, and clearly they are right now!

jabbywocker · 2025-12-13T01:02:41 1765587761

There’s been rumors of Apple working on M-chips that have the GPU and CPU as discrete chiplets. The original rumor said this would happen with the M5 Pro, so it’s potentially on the roadmap.

Theoretically they could farm out the GPU to another company but it seems like they’re set on owning all of the hardware designs.

nntwozz · 2025-12-13T03:19:36 1765595976

Apple always strives for complete vertical integration.

SJ loved to quote Alan Kay:

"People who are really serious about software should make their own hardware."

Qualcomm are the latest on the chopping block, history repeating itself.

If I were a betting man I'd say Apple's never going back.

FuckButtons · 2025-12-13T04:36:46 1765600606

Power draw? A entire Mac Pro running flat out uses less power than 1 5090. If you have a workload that needs a huge memory footprint then the tco of the Macs, even with their markup may be lower.

bigyabai · 2025-12-12T23:21:11 1765581671

The lack of official Linux/BSD support is enough to make it DOA for any serious large-scale deployment. Until Apple figures out what they're doing on that front, you've got nothing to worry about.

Eggpants · 2025-12-13T00:18:59 1765585139

Not sure I understand, Mac OS is BSD based. https://en.wikipedia.org/wiki/Darwin_(operating_system)

bigyabai · 2025-12-13T00:47:34 1765586854

macOS is XNU-based. There is BSD code that runs in the microkernel level and BSD tools in the userland, but the kernel does not resemble BSD's architecture or adopt BSD's license.

This is an issue for some industry-standard software like CUDA, which does provide BSD drivers with ARM support that just never get adopted by Apple: https://www.nvidia.com/en-us/drivers/unix/

7e · 2025-12-13T02:25:35 1765592735

If there were TCO advantages with this setup, CUDA would not be a blocker.

bigyabai · 2025-12-13T04:39:53 1765600793

CUDA's just one example; there's a lot of hardware support on the BSDs that Apple doesn't want to inherit.

ngcc_hk · 2025-12-13T05:54:34 1765605274

Why maint other and have baggage ?

CamperBob2 · 2025-12-13T02:09:56 1765591796

Almost the most impressive thing about that is the power consumption. ~50 watts for both of them? Am I reading it wrong?

wmf · 2025-12-13T04:07:06 1765598826

Yeah, two Mac Studios is going to be ~400 W.

m-s-y · 2025-12-13T05:22:42 1765603362

Can confirm. My M3 Ultra tops out at 210W when ComfyUI or ollama is running flat out. Confirmed via smart plug.

CamperBob2 · 2025-12-13T04:15:29 1765599329

What am I missing? https://i.imgur.com/YpcnlCH.png

(Edit: interesting, thanks. So the underlying OS APIs that supply the power-consumption figures reported by asitop are just outright broken. The discrepancy is far too large to chalk up to static power losses or die-specific calibration factors that the video talks about.)

wmf · 2025-12-13T04:51:05 1765601465

https://www.youtube.com/watch?v=zCkbVLqUedg

btown · 2025-12-12T21:52:47 1765576367

It would be incredibly ironic if, with Apple's relatively stable supply chain relative to the chaos of the RAM market these days (projected to last for years), Apple compute became known as a cost-effective way to build medium-sized clusters for inference.

andy99 · 2025-12-12T21:56:52 1765576612

It’s gonna suck if all the good Macs get gobbled up by commercial users.

icedchai · 2025-12-12T23:25:38 1765581938

Outside of YouTube influencers, I doubt many home users are buying a 512G RAM Mac Studio.

DrStartup · 2025-12-13T00:53:27 1765587207

I'm neither and have 2. 24/7 async inference against github issues. Free. (once you buy the macs that is)

madeofpalk · 2025-12-13T09:40:07 1765618807

I'm not sure who 'home users' are, but i doubt they're buying two $9,499 computers.

Waterluvian · 2025-12-13T01:25:21 1765589121

I wonder what the actual lifetime amortized cost will be.

oidar · 2025-12-13T03:27:02 1765596422

Every time I'm tempted to get one of these beefy mac studios, I just calculate how much inference I can buy for that amount and it's never a good deal.

embedding-shape · 2025-12-13T08:21:14 1765614074

Every time someone brings up that, it brings me back memories of trying to frantically finish stuff as quickly as possible as either my quota slowly go down with each API request, or the pay-as-you-go bill is increasing 0.1% for each request.

Nowadays I fire off async jobs that involve 1000s of requests, billion of tokens, yet it costs basically the same as if I didn't.

Maybe it takes a different type of person, than the one I am, but all these "pay-as-you-go"/tokens/credits platforms make me nervous to use, and I end up not using it or spending time trying to "optimize", while investing in hardware and infrastructure I can run at home and use that seems to be no problem for my head to just roll with.

noname120 · 2025-12-13T10:54:00 1765623240

But the downside is that you are stuck with inferior LLMs. None of the best models have open weights: Gemini 3.5, Claude Sonnet/Opus 4.5, ChatGPT 5.2. The best model with open weights performs an order of magniture worse than those.

embedding-shape · 2025-12-13T11:44:25 1765626265

The best weights are the weights you can train yourself for specific use cases. As long as you have the data and the infrastructure to train/fine-tune your own small models, you'll get drastically better results.

And just because you're mostly using local models doesn't mean you can't use API hosted models in specific contexts. Of course, then the same dread sets in, but if you can do 90% of the tokens with local models and 10% with pay-per-usage API hosted models, you get the best of both worlds.

stingraycharles · 2025-12-13T10:51:44 1765623104

Nevermind the fact that there are a lot of high quality (the highest quality?) models that are not released as open source.

asimovDev · 2025-12-13T08:27:53 1765614473

anyone buying these is usually more concerned with just being able to run stuff on their own terms without handing their data off. otherwise it's probably always cheaper to rent compute for intense stuff like this

dontlaugh · 2025-12-13T09:43:30 1765619010

For now, while everything you can rent is sold at a loss.

bee_rider · 2025-12-13T03:52:08 1765597928

Are the inference providers profitable yet? Might be nice to be ready for the day when we see the real price of their services.

Nextgrid · 2025-12-13T10:13:07 1765620787

Isn't it then even better to enjoy cheap inference thanks to techbro philanthropy while it lasts? You can always buy the hardware once the free money runs out.

icedchai · 2025-12-13T04:28:00 1765600080

Heh. I'm jealous. I'm still running a first gen Mac Studio (M1 Max, 64 gigs RAM.) It seemed like a beast only 3 years ago.

kridsdale1 · 2025-12-13T05:04:51 1765602291

I did. Admittedly it was for video processing at 8k which uses more than 128gb of ram, but I am NOT a YouTuber.

mirekrusin · 2025-12-13T01:06:19 1765587979

Of course they're not. Everybody is waiting for next generation that will run LLMs faster to start buying.

7e · 2025-12-13T02:29:45 1765592985

That product can still steal fab slots from cheaper, more prosumer products.

FireBeyond · 2025-12-12T23:30:35 1765582235

I doubt many of them are, either.

When the 2019 Mac Pro came out, it was "amazing" how many still photography YouTubers all got launch day deliveries of the same BTO Mac Pro, with exactly the same spec:

18 core CPU, 384GB memory, Vega II Duo GPU and an 8TB SSD.

Or, more likely, Apple worked with them and made sure each of them had this Mac on launch day, while they waited for the model they actually ordered. Because they sure as hell didn't need an $18,000 computer for Lightroom.

lukeh · 2025-12-13T06:04:13 1765605853

Still rocking a 2019 Mac Pro with 192GB RAM for audio work, because I need the slots and I can’t justify the expense of a new one. But I’m sure a M4 Mini is faster.

mschuster91 · 2025-12-12T22:27:06 1765578426

it's not like regular people can afford this kind of Apple machine anyway.

teeray · 2025-12-12T23:20:20 1765581620

It’s just depressing that the “PC in every home” era is being rapidly pulled out from under our feet by all these supply shocks.

Aurornis · 2025-12-13T04:24:15 1765599855

You can get a Mac Mini for $600 with 16GB of RAM and it will be more powerful than the "PC in every home" people would need for any common software.

The personal computing situation is great right now. RAM is temporarily more expensive, but it's definitely not ending any eras.

m-s-y · 2025-12-13T05:27:08 1765603628

Not Apple’s ram.

jeroenhd · 2025-12-13T10:33:24 1765622004

RAM prices have exploded enough that Apple's RAM is now no longer a bad deal. At least until their next price hikes.

We're going back to the "consumer PCs have 8GB of RAM era" thanks to the AI bubble.

dghlsakjg · 2025-12-12T23:52:47 1765583567

Huh?

Home PCs are as cheap as they’ve ever been. Adjusted for inflation the same can be said about “home use” Macs. The list price of an entry level MacBook Air has been pretty much the same for more than a decade. Adjust for inflation, and you get a MacBook air for less than half the real cost of the launch model that is massively better in every way.

A blip in high end RAM prices has no bearing on affordable home computing. Look at the last year or two and the proliferation of cheap, moderately to highly speced mini desktops.

I can get a Ryzen 7 system with 32gb of ddr5, and a 1tb drive delivered to my house before dinner tomorrow for $500 + tax.

That’s not depressing, that’s amazing!

jeroenhd · 2025-12-13T10:43:51 1765622631

> I can get a Ryzen 7 system with 32gb of ddr5, and a 1tb drive delivered to my house before dinner tomorrow for $500 + tax

That's an amazing price, but I'd like to see where you're getting it. 32GB of RAM alone costs €450 here (€250 if you're willing to trust Amazon's February 2026 delivery dates).

Getting a PC isn't that expensive, but after the blockchain hype and then the AI hype, prices have yet to come down. All estimations I've seen will have RAM prices increase further until the summer of next year, and the first dents in pricing coming the year after at the very earliest.

heavyset_go · 2025-12-13T00:12:14 1765584734

Home calculators are cheap as they've ever been, but this era of computing is out of reach for the majority of people.

The analogous PC for this era requires a large amount of high speed memory and specialized inference hardware.

dghlsakjg · 2025-12-13T01:27:46 1765589266

What regular home workload are you thinking of that the computer I described is incapable of?

You can call a computer a calculator, but that doesn’t make it a calculator.

Can they run SOTA LLMs? No. Can they run smaller, yet still capable LLMs? Yes.

However, I don’t think that the ability to run SOTA LLMs is a reasonable expectation for “a computer in every home” just a few years into that software category even existing.

buu700 · 2025-12-13T06:04:14 1765605854

It's kind of funny to see "a computer in every home" invoked when we're talking about the equivalent of ~$100 buying a non-trivial percentage of all computational power in existence at the time of the quote. By the standards of that time, we don't just have a computer in every home, we have a supercomputer in every pocket.

atonse · 2025-12-13T01:25:13 1765589113

You can have access to a supercomputer for pennies, internet access for very little money, and even an m4 Mac mini for $500. You can have a raspberry pi computer for even less. And buy a monitor for a couple hundred dollars.

I feel like you’re twisting the goalposts to make your point that it has to be local compute to have access to AI. Why does it need to be local?

Update: I take it back. You can get access to AI for free.

platevoltage · 2025-12-13T01:39:07 1765589947

No it doesn't. The majority of people aren't trying to run Ollama on their personal computers.

inferiorhuman · 2025-12-13T00:15:31 1765584931

  A blip in high end RAM prices

It's not a blip and it's not limited to high end machines and configurations. Altman gobbled up the lion's share of wafer production. Look at that Raspberry Pi article that made it to the front page, that's pretty far from a high end Mac and according to the article's author likely to be exported from China due to the RAM supply crisis.

  I can get a Ryzen 7 system with 32gb of ddr5, and a 1tb drive delivered to my house
  before dinner tomorrow for $500 + tax.

B&H is showing a 7700X at $250 with their cheapest 32GB DDR5 5200 sticks at $384. So you've already gone over budget for just the memory and CPU. No motherboard, no SSD.

Amazon is showing some no-name stuff at $298 as their cheapest memory and a Ryzen 7700X at $246.

Add another $100 for an NVMe drive and another $70–100 for the cheapest AM5 motherboards I could find on either of those sites.

sspiff · 2025-12-13T07:28:12 1765610892

Add to that a case, PSU and monitor and you're realitically over $1000

dghlsakjg · 2025-12-13T01:17:52 1765588672

People that can reliably predict the future, especially when it comes to rising markets, are almost always billionaires. It is a skill so rare that it can literally make you the richest man on earth. Why should I trust your prediction of future markets that this pricing is the new standard, and will never go down? Line doesn’t always go up, even if it feels like it is right now, and all the tech media darlings are saying so.

If everything remains the same, RAM pricing will also. I have never once found a period in known history where everything stays the same, and I would be willing to bet 5 figures that at some point in the future I will be able to buy DDR5 or better ram for cheaper than today. I can point out that in the long run, prices for computing equipment have always fallen. I would trust that trend a lot more than a shortage a few months old changing the very nature of commodity markets. Mind you, I’m not the richest man on earth either, so my pattern matched opinion should be judged the same.

> B&H is showing a 7700X at $250 with their cheapest 32GB DDR5 5200 sticks at $384. So you've already gone over budget for just the memory and CPU. No motherboard, no SSD.

I didn't say I could build one from parts. Instead I said buy a mini pc, and then went and looked up the specs and price point to be sure.

The PC that I was talking about is here[https://a.co/d/6c8Udbp]. I live in Canada so translated the prices to USD. Remember that US stores are sometimes forced to hide a massive import tax in those parts prices. The rest of the world isn’t subject to that and pays less.

Edit: here’s an equivalent speced pc available in the US for $439 with a prime membership. So even with the cost of prime membership you can get a Ryzen 7 32gb 1tb for $455. https://www.amazon.com/BOSGAME-P3-Gigabit-Ethernet-Computer/...

SunlitCat · 2025-12-13T06:23:10 1765606990

Don’t forget that many of these manufacturers operate with long-term supply contracts for components like RAM, maintain existing inventory, or are selling systems that were produced some time ago. That helps explain why we are still seeing comparatively low prices at the moment.

If the current RAM supply crisis continues, it is very likely that these kinds of offers will disappear and that systems like this will become more expensive as well, not to mention all the other products that rely on DRAM components.

I also don’t believe RAM prices will drop again anytime soon, especially now that manufacturers have seen how high prices can go while demand still holds. Unlike something like graphics cards, RAM is not optional, it is a fundamental requirement for building any computer (or any device that contains one). People don’t buy it because they want to, but because they have to.

In the end, I suspect that some form of market-regulating mechanism may be required, potentially through government intervention. Otherwise, it’s hard for me to see what would bring prices down again, unless Chinese manufacturers manage to produce DRAM at scale, at significantly lower cost, and effectively flood the market.

inferiorhuman · 2025-12-13T02:30:56 1765593056

  People that can reliably predict the future

You don't need to be a genius or a billionaire to realize that when most of the global supply of a product becomes unavailable the remaining supply gets more expensive.

  here’s an equivalent speced pc available in the US for $439 with a prime membership.

So with prime that's $439+139 for $578 which is only slightly higher than the cost without prime of $549.99.

dghlsakjg · 2025-12-13T03:01:01 1765594861

> You don't need to be a genius or a billionaire to realize that when most of the global supply of a product becomes unavailable the remaining supply gets more expensive.

Yes. Absolutely correct if you are talking about the short term. I was talking about the long term, and said that. If you are so certain would you take this bet: any odds, any amount that within 1 month I can buy 32gb of new retail DDR5 in the US for at least 10% less than the $384 you cited. (think very hard on why I might offer you infinite upside so confidently. It's not because I know where the price of RAM is going in the short term)

> So with prime that's $439+139 for $578 which is only slightly higher than the cost without prime of $549.99.

At this point I can't tell if you are arguing in bad faith, or just unfamiliar with how prime works. Just in case: You have cited the cost of prime for a full year. You can buy just a month of prime for a maximum price of $14.99 (that's how I got $455) if you have already used your free trial, and don't qualify for any discounts. Prime also allows cancellation within 14 days of signing up for a paid option, which is more than enough time to order a computer, and have it delivered, and cancel for a full refund.

So really, if you use a trial or ask for a refund for your prime fees the price is $439. So we have actually gotten the price a full 10% lower than I originally cited.

Edit: to eliminate any arguments about Prime in the price of the PC, here's an indentically speced mini PC for the same price from Newegg https://www.newegg.com/p/2SW-00BM-00002

behnamoh · 2025-12-13T00:53:49 1765587229

> Home PCs are as cheap as they’ve ever been.

just the 5090 GPU costs +$3k, what are you even talking about

dghlsakjg · 2025-12-13T01:11:14 1765588274

“A computer in every home” (from the original post I was replying to) does not mean “A computer with the highest priced version of the highest priced optional accessory for computers in every home”

I’m talking about the hundreds of affordable models that are perfectly suitable for everything up to and including AAA gaming.

The existence of expensive, and very much optional, high end computer parts does not mean that affordable computers are not more incredible than ever.

Just because cutting edge high end parts are out of reach to you, does not mean that perfectly usable computers are too, as I demonstrated with actual specs and prices in my post.

That’s what I’m talking about.

pests · 2025-12-13T01:07:26 1765588046

A home PC has to have a SOTA gpu?

morshu9001 · 2025-12-13T03:23:55 1765596235

Probably upset that the high-end video game "hobby" costs more than it used to. Used to be $1-2K for the very best gaming GPU of the time.

platevoltage · 2025-12-13T01:36:01 1765589761

Man you positively demolished that straw man.

How much as a base model MacBook Air changed in price over the last 15 years? With inflation, it's gotten cheaper.

dghlsakjg · 2025-12-13T01:43:56 1765590236

Some numbers to drive your point home:

The original base MacBook Air sold for $1799 in 2008. The inflation adjusted price is $2715.

The current base model is $999, and literally better in every way except thickness on one edge.

If we constrain ourselves to just 15 years. The $999 MBA was released 15 years ago ($1488 in real dollars). The list price has remained the same for the base model, with the exception of when they sold the discontinued 11” MBAs for $899.

It’s actually kind of wild how much better and cheaper computers have gotten.

morshu9001 · 2025-12-13T02:38:07 1765593487

It's also gotten cheaper nominally. I just got a new base MBA for $750. Kinda surprised, like there has to be some catch.

teaearlgraycold · 2025-12-13T07:45:10 1765611910

I feel bad for their competitors. We need good competition in the long run but over the last few years it's made less and less sense to get something other than an Apple laptop for most use cases.

teaearlgraycold · 2025-12-12T22:37:38 1765579058

It already is depending on your needs.

reilly3000 · 2025-12-12T22:51:29 1765579889

dang I wish I could share md tables.

Here’s a text edition: For $50k the inference hardware market forces a trade-off between capacity and throughput:

* Apple M3 Ultra Cluster ($50k): Maximizes capacity (3TB). It is the only option in this price class capable of running 3T+ parameter models (e.g., Kimi k2), albeit at low speeds (~15 t/s).

* NVIDIA RTX 6000 Workstation ($50k): Maximizes throughput (>80 t/s). It is superior for training and inference but is hard-capped at 384GB VRAM, restricting model size to <400B parameters.

To achieve both high capacity (3TB) and high throughput (>100 t/s) requires a ~$270,000 NVIDIA GH200 cluster and data center infrastructure. The Apple cluster provides 87% of that capacity for 18% of the cost.

mechagodzilla · 2025-12-12T22:59:29 1765580369

You can keep scaling down! I spent $2k on an old dual-socket xeon workstation with 768GB of RAM - I can run Deepseek-R1 at ~1-2 tokens/sec.

jacquesm · 2025-12-13T10:44:29 1765622669

I did the same, then put in 14 3090's. It's a little bit power hungry but fairly impressive performance wise. The hardest parts are power distribution and riser cards but I found good solutions for both.

Weryj · 2025-12-13T01:10:02 1765588202

Just keep going! 2TB of swap disk for 0.0000001 t/sec

kergonath · 2025-12-13T09:59:33 1765619973

Hang on, starting benchmarks on my Raspberry Pi.

ternus · 2025-12-13T00:45:18 1765586718

And if you get bored of that, you can flip the RAM for more than you spent on the whole system!

a012 · 2025-12-13T01:15:08 1765588508

And heat the whole house in parallel

3abiton · 2025-12-12T23:45:12 1765583112

What's the math on the $50k nvidia cluster? My understanding these things cost ~$8k and you can at least get 5 for $40k, that's around half a tb.

That being said, for inference mac still remain the best, and the M5 Ultra will even be a better value with its better PP.

reilly3000 · 2025-12-13T00:23:07 1765585387

GPUs: 4x NVIDIA RTX 6000 Blackwell (96GB VRAM each) • Cost: 4 × $9,000 = $36,000

• CPU: AMD Ryzen Threadripper PRO 7995WX (96-Core) • Cost: $10,000

• Motherboard: WRX90 Chipset (supports 7x PCIe Gen5 slots) • Cost: $1,200

• RAM: 512GB DDR5 ECC Registered • Cost: $2,000

• Chassis & Power: Supermicro or specialized Workstation case + 2x 1600W PSUs. • Cost: $1,500

• Total Cost: ~$50,700

It’s a bit maximalist, but if you had to spend $50k it’s going to be about as fast as you can make it.

icedchai · 2025-12-12T23:19:45 1765581585

For $50K, you could buy 25 Framework desktop motherboards (128G VRAM each w/Strix Halo, so over 3TB total) Not sure how you'll cluster all of them but it might be fun to try. ;)

sspiff · 2025-12-12T23:44:56 1765583096

There is no way to achieve a high throughput low latency connection between 25 Strix Halo systems. After accounting for storage and network, there are barely any PCIe lanes left to link two of them together.

You might be able to use USB4 but unsure how the latency is for that.

0manrho · 2025-12-13T05:52:14 1765605134

In general I agree with you, the IO options exposed by Strix Halo are pretty limited, but if we're getting technical you can tunnel PCIe over USB4v2 by the spec in a way that's functionally similar to Thunderbolt 5. That gives you essentially 3 sets of native PCIe4x4 from the chipset and an additional 2 sets tunnelled over USB4v2. TB5 and USB4 controllers are not made equal, so in practice YMMV. Regardless of USB4v2 or TB5, you'll take a minor latency hit.

Strix Halo IO topology: https://www.techpowerup.com/cpu-specs/ryzen-ai-max-395.c3994

Frameworks mainboard implements 2 of those PCIe4x4 GPP interfaces as M.2 PHY's which you can use a passive adapter to connect a standard PCIe AIC (like a NIC or DPU) to, and also interestingly exposes that 3rd x4 GPP as a standard x4 length PCIe CEM slot, though the system/case isn't compatible with actually installing a standard PCIe add in card in there without getting hacky with it, especially as it's not an open-ended slot.

You absolutely could slap 1x SSD in there for local storage, and then attach up to 4x RDMA supporting NIC's to a RoCE enabled switch (or Infiniband if you're feeling special) to build out a Strix Halo cluster (and you could do similar with Mac Studio's to be fair). You could get really extra by using a DPU/SmartNIC that allows you to boot from a NVMeoF SAN to leverage all 5 sets of PCIe4x4 for connectivity without any local storage but we're hitting a complexity/cost threshold with that that I doubt most people want to cross. Or if they are willing to cross that threshold, they'd also be looking at other solutions better suited to that that don't require as many workarounds.

Apple's solution is better for a small cluster, both in pure connectivity terms and also with respect to it's memory advantages, but Strix Halo is doable. However, in both cases, scaling up beyond 3 or especially 4 nodes you rapidly enter complexity and cost territory that is better served by nodes that are less restrictive unless you have some very niche reason to use either Mac's (especially non-pro) or Strix Halo specifically.

bee_rider · 2025-12-13T00:19:17 1765585157

Do they need fast storage, in this application? Their OS could be on some old SATA drive or whatever. The whole goal is to get them on a fast network together; the models could be stored on some network filesystem as well, right?

pests · 2025-12-13T01:06:09 1765587969

It's more than just the model weights. During inference there would be a lot of cross-talk as each node broadcasts its results and gathers up what it needs from the others for the next step.

icedchai · 2025-12-12T23:58:20 1765583900

I figured, but it's good to have confirmation.

3abiton · 2025-12-12T23:42:55 1765582975

You could use llama.cpp rpc mode over "network" via usb4/thunderbolt connection

FuckButtons · 2025-12-12T23:38:10 1765582690

Are you factoring in the above comment about as yet un-implemented parallel speed up in there? For on prem inference without any kind of asic this seems quite a bargain relatively speaking.

conradev · 2025-12-12T23:45:11 1765583111

Apple deploys LPDDR5X for the energy efficiency and cost (lower is better), whereas NVIDIA will always prefer GDDR and HBM for performance and cost (higher is better).

_zoltan_ · 2025-12-13T00:23:31 1765585411

the GH/GB compute has LPDDR5X - a single or dual GPU shares 480GB, depending if it's GH or GB, in addition to the HBM memory, with NVLink C2C - it's not bad!

wtallis · 2025-12-13T00:46:17 1765586777

Essentially, the Grace CPU is a memory and IO expander that happens to have a bunch of ARM CPU cores filling in the interior of the die, while the perimeter is all PHYs for LPDDR5 and NVLink and PCIe.

_zoltan_ · 2025-12-13T02:23:50 1765592630

fully agree!

with MGX and CX8 we see PCIe root moving to the NIC, which is very exciting.

yieldcrv · 2025-12-13T04:18:03 1765599483

15 t/s way too slow for anything but chatting, call and response, and you don't need a 3T parameter model for that

Wake me up when the situation improves

geerlingguy · 2025-12-12T22:15:32 1765577732

This implies you'd run more than one Mac Studio in a cluster, and I have a few concerns regarding Mac clustering (as someone who's managed a number of tiny clusters, with various hardware):

1. The power button is in an awkward location, meaning rackmounting them (either 10" or 19" rack) is a bit cumbersome (at best)

2. Thunderbolt is great for peripherals, but as a semi-permanent interconnect, I have worries over the port's physical stability... wish they made a Mac with QSFP :)

3. Cabling will be important, as I've had tons of issues with TB4 and TB5 devices with anything but the most expensive Cable Matters and Apple cables I've tested (and even then...)

4. macOS remote management is not nearly as efficient as Linux, at least if you're using open source / built-in tooling

To that last point, I've been trying to figure out a way to, for example, upgrade to macOS 26.2 from 26.1 remotely, without a GUI, but it looks like you _have_ to use something like Screen Sharing or an IP KVM to log into the UI, to click the right buttons to initiate the upgrade.

Trying "sudo softwareupdate -i -a" will install minor updates, but not full OS upgrades, at least AFAICT.

wlesieutre · 2025-12-12T22:21:31 1765578091

For #2, OWC puts a screw hole above their dock's thunderbolt ports so that you can attach a stabilizer around the cord

https://www.owc.com/solutions/thunderbolt-dock

It's a poor imitation of old ports that had screws on the cables, but should help reduce inadvertent port stress.

The screw only works with limited devices (ie not the Mac Studio end of the cord) but it can also be adhesive mounted.

https://eshop.macsales.com/item/OWC/CLINGON1PK/

crote · 2025-12-12T22:50:36 1765579836

That screw hole is just the regular locking USB-C variant, is it not?

See for example:

https://www.startech.com/en-jp/cables/usb31cctlkv50cm

wlesieutre · 2025-12-12T23:00:53 1765580453

Looks like it! Thanks for pointing this out, I had no idea it was a standard.

Apparently since 2016 https://www.usb.org/sites/default/files/documents/usb_type-c...

So for any permanent Thunderbolt GPU setups, they should really be using this type of cable

wtallis · 2025-12-13T00:51:25 1765587085

Note that the locking connector OWC uses is a standard, not the standard. This is USB we're dealing with, so they made it messy: the spec defines two different mutually-incompatible locking mechanisms.

jamiek88 · 2025-12-13T08:37:11 1765615031

Of course they do.

TheJoeMan · 2025-12-12T23:26:14 1765581974

Now that’s one way to enforce not inserting a USB upside-down.

eurleif · 2025-12-12T22:20:42 1765578042

I have no experience with this, but for what it's worth, looks like there's a rack mounting enclosure available which mechanically extends the power switch: https://www.sonnetstore.com/products/rackmac-studio

geerlingguy · 2025-12-13T03:45:36 1765597536

I have something similar from MyElectronics, and it works, but it's a bit expensive, and still imprecise. At least the power button isn't in the back corner underneath!

rsync · 2025-12-13T01:42:26 1765590146

"... Thunderbolt is great for peripherals, but as a semi-permanent interconnect, I have worries over the port's physical stability ..."

Thunderbolt as a server interconnect displeases me aesthetically but my conclusion is the opposite of yours:

If the systems are locked into place as servers in a rack the movements and stresses on the cable are much lower than when it is used as a peripheral interconnect for a desktop or laptop, yes ?

827a · 2025-12-13T02:03:55 1765591435

This is a semi-solved problem e.g. https://www.sonnetstore.com/products/thunderlok-a

Apple’s chassis do not support it. But conceptually that’s not a Thunderbolt problem, it’s an Apple problem. You could probably drill into the Mac Studio chassis to create mount points.

827a · 2025-12-13T02:01:21 1765591281

They do still sell the Mac Pro in a rack mount configuration. But, it was never updated for M3 Ultra, and feels not long for this world.

ThomasBb · 2025-12-13T05:09:34 1765602574

With MDM solutions you can not only get software update management, but even full LOM for models that support this. There are free and open source MDM out there.

badc0ffee · 2025-12-13T01:50:19 1765590619

> To that last point, I've been trying to figure out a way to, for example, upgrade to macOS 26.2 from 26.1 remotely,

I think you can do this if you install a MDM profile on the Macs and use some kind of management software like Jamf.

timc3 · 2025-12-12T22:29:39 1765578579

It’s been terrible for years/forever. Even Xserves didn’t really meet the needs of a professional data centre. And it’s got worse as a server OS because it’s not a core focus. Don’t understand why anyone tries to bother - apart from this MLX use case or as a ProRes render farm.

crote · 2025-12-12T22:52:15 1765579935

iOS build runner. Good luck developing cross-platform apps without a Mac!

jeroenhd · 2025-12-13T10:47:59 1765622879

Practically, just run the macos-inside-kvm-inside-docker command. Not very fast, but you can compile the entire thing outside of the VM, all you need is the final incantations to get Apple's signatures on there.

Legally, you probably need a Mac. Or rent access to one, that's probably cheaper.

colechristensen · 2025-12-12T22:51:52 1765579912

There are open source MDM projects, I'm not familiar but https://github.com/micromdm/nanohub might do the job for OS upgrades.

int32_64 · 2025-12-12T23:54:25 1765583665

Apple should setup their own giant cloud of M chips with tons of vram, make Metal as good as possible for AI purposes, then market the cloud as allowing self-hosted models for companies and individuals that care about privacy. They would clean up in all kinds of sectors whose data can't touch the big LLM companies.

wmf · 2025-12-13T00:24:08 1765585448

That exists but it's only for iUsers running Apple models. https://security.apple.com/blog/private-cloud-compute/

make3 · 2025-12-13T00:25:34 1765585534

The advantages of having a single big memory per gpu are not as big in a data center where you can just shard things between machines and use the very fast interconnect, saturating the much faster compute cores of a non Apple GPU from Nvidia or AMD

irusensei · 2025-12-13T10:10:33 1765620633

I am waiting for M5 studio but due to current price of hardware I'm not sure it will be at a level that I would call affordable. Currently I'm watching for news and if there is any announcement prices will go up I'll probably settle for an M4 Max.

zeristor · 2025-12-13T08:53:13 1765615993

Will Apple be able to ramp up M3 Ultra MacStudios if this becomes a big thing?

Is this part of Apple’s plan of building out server side AI support using their own hardware?

If so they would need more physical data centres.

I’m guessing they too would be constrained by RAM.

timsneath · 2025-12-12T21:55:48 1765576548

Also see https://www.engadget.com/ai/you-can-turn-a-cluster-of-macs-i...

pjmlp · 2025-12-13T07:47:19 1765612039

Maybe Apple should rethink bringing back Mac Pro desktops with pluggable GPUs, like that one in the corner still playing with its Intel and AMD toys, instead of a big box full of air and pro audio cards only.

FridgeSeal · 2025-12-13T05:57:12 1765605432

That’s great for AI people, but can we use this for other distributed workloads that aren’t ML?

geerlingguy · 2025-12-13T06:16:43 1765606603

I've been testing HPL and mpirun a little, not yet with this new RDMA capability (it seems like Ring is currently the supported method)... but it was a little rough around the edges.

See: https://ml-explore.github.io/mlx/build/html/usage/distribute...

dagmx · 2025-12-13T07:05:34 1765609534

Sure, there’s nothing about it that’s tied to ML. It’s faster interconnect , use it for many kinds of shared compute scenarios.

storus · 2025-12-12T22:42:53 1765579373

Is there any way to connect DGX Sparks to this via USB4? Right now only 10GbE can be used despite both Spark and MacStudio having vastly faster options.

zackangelo · 2025-12-12T23:05:09 1765580709

Sparks are built for this and actually have Connect-X 7 NICs built in! You just need to get the SFPs for them. This means you can natively cluster them at 200Gbps.

wtallis · 2025-12-12T23:18:33 1765581513

That doesn't answer the question, which was how to get a high-speed interconnect between a Mac and a DGX Spark. The most likely solution would be a Thunderbolt PCIe enclosure and a 100Gb+ NIC, and passive DAC cables. The tricky part would be macOS drivers for said NIC.

zackangelo · 2025-12-12T23:26:39 1765581999

You’re right I misunderstood.

I’m not sure if it would be of much utility because this would presumably be for tensor parallel workloads. In that case you want the ranks in your cluster to be uniform or else everything will be forced to run at the speed of the slowest rank.

You could run pipeline parallel but not sure it’d be that much better than what we already have.

piskov · 2025-12-12T23:22:32 1765581752

George Hotz made nvidia running on macs with his tinygrad via usb4

https://x.com/__tinygrad__/status/1980082660920918045

throawayonthe · 2025-12-13T00:05:00 1765584300

https://social.treehouse.systems/@janne/115509948515319437 nvidia on a 2023 Mac Pro running linux :p

piskov · 2025-12-13T00:14:49 1765584889

Geohotz stuff anyone can run today

yalogin · 2025-12-13T03:26:10 1765596370

As someone that is not familiar with rdma, dos it mean I can connect multiple Macs and run inference? If so it’s great!

wmf · 2025-12-13T04:08:45 1765598925

You've been able to run inference on multiple Macs for around a year but now it's much faster.

nottorp · 2025-12-13T08:11:12 1765613472

It's good to sell shovels :)

650REDHAIR · 2025-12-12T23:02:53 1765580573

Do we think TB4 is on the table or is there a technical limitation?

cluckindan · 2025-12-12T23:40:32 1765582832

This sounds like a plug’n’play physical attack vector.

guiand · 2025-12-12T23:50:16 1765583416

For security, the feature requires setting a special option with the recovery mode command line:

rdma_ctl enable

nickysielicki · 2025-12-13T06:47:34 1765608454

This is such a weird project. Like where is this running at scale? Where’s the realistic plan to ever run this at scale? What’s the end goal here?

Don’t get me wrong... It’s super cool, but I fail to understand why money is being spent on this.

aurareturn · 2025-12-13T08:07:37 1765613257

The end goal is that Macs become good local LLM inference machines and for AI devs to keep using Macs.

nickysielicki · 2025-12-13T08:11:25 1765613485

The former will never happen and the latter is a certainty.

aurareturn · 2025-12-13T08:28:53 1765614533

The former is already true and will become even more true when M5 Pro/Max/Ultra release.

daft_pink · 2025-12-12T22:44:06 1765579446

Hoping Apple has secured plentiful DDR5 to use in their machines so we can buy M5 chips with massive amounts of RAM soon.

colechristensen · 2025-12-12T22:48:41 1765579721

Apple tends to book its fab time / supplier capacity years in advance

lossolo · 2025-12-12T23:30:52 1765582252

I hope so, I want to replace my M1 Pro with MacBook Pro with M5 Pro when they release it next year.

colechristensen · 2025-12-13T03:32:10 1765596730

I mostly want the M5 Pro because my choice of an M4 Air this year with 24 GB of RAM is turning out to be less than I want with the things I'm doing these days.

thatwasunusual · 2025-12-13T01:32:32 1765589552

Can someone do an ELI5, and why this is important?

wmf · 2025-12-13T01:59:16 1765591156

It's faster and lower latency than standard Thunderbolt networking. Low latency makes AI clusters faster.

novok · 2025-12-12T21:50:57 1765576257

Now we need some hardware that is rackmount friendly, an OS that is not fidly as hell to manage in a data center or headless server and we are off to the races! And no, custom racks are not 'rackmount friendly'.

joeframbach · 2025-12-12T22:07:45 1765577265

So, the Powerbook Duo Dock?

pstuart · 2025-12-12T21:39:11 1765575551

I imagine that M5 Ultra with Thunderbolt 5 could be a decent contender for building plug and play AI clusters. Not cheap, but neither is Nvidia.

baq · 2025-12-12T22:07:54 1765577274

at current memory prices today's cheap is yesterday's obscenely expensive - Apple's current RAM upgrade prices are cheap

whimsicalism · 2025-12-12T21:45:00 1765575900

nvidia is absolutely cheaper per flop

FlacksonFive · 2025-12-12T22:03:35 1765577015

To acquire, maybe, but to power?

whimsicalism · 2025-12-12T22:22:38 1765578158

machine capex currently dominates power

amazingman · 2025-12-12T22:44:53 1765579493

Sounds like an ecosystem ripe for horizontally scaling cheaper hardware.

crote · 2025-12-12T23:00:37 1765580437

If I understand correctly, a big problem is that the calculation isn't embarrasingly parallel: the various chunks are not independent, so you need to do a lot of IO to get the results from step N from your neighbours to calculate step N+1.

Using more smaller nodes means your cross-node IO is going to explode. You might save money on your compute hardware, but I wouldn't be surprised if you'd end up with an even greater cost increase on the network hardware side.

adastra22 · 2025-12-12T22:14:41 1765577681

FLOPS are not what matters here.

whimsicalism · 2025-12-12T22:21:50 1765578110

also cheaper memory bandwidth. where are you claiming that M5 wins?

Infernal · 2025-12-12T22:26:55 1765578415

I'm not sure where else you can get a half TB of 800GB/s memory for < $10k. (Though that's the M3 Ultra, don't know about the M5). Is there something competitive in the nvidia ecosystem?

whimsicalism · 2025-12-12T22:35:34 1765578934

I wasn't aware that M3 Ultra offered a half terabyte of unified memory, but an RTX5090 has double that bandwidth and that's before we even get into B200 (~8TB/s).

650REDHAIR · 2025-12-12T22:56:46 1765580206

You could get x1 M3 Ultra w/ 512gb of unified ram for the price of x2 RTX 5090 totaling 64gb of vram not including the cost of a rig capable of utilizing x2 RTX 5090.

bigyabai · 2025-12-13T00:55:22 1765587322

Which would almost be great, if the M3 Ultra's GPU wasn't ~3x weaker than a single 5090: https://browser.geekbench.com/opencl-benchmarks

I don't think I can recommend the Mac Studio for AI inference until the M5 comes out. And even then, it remains to be seen how fast those GPUs are or if we even get an Ultra chip at all.

adastra22 · 2025-12-13T04:41:24 1765600884

Again, memory bandwidth is pretty much all that matters here. During inference or training the CUDA cores of retail GPUs are like 15% utilized.

kjkjadksj · 2025-12-13T04:30:26 1765600226

Remember when they enabled egpu over thunderbolt and no one cared because the thunderbolt housing cost almost as much as your macbook outright? Yeah. Thunderbolt is a racket. It’s a god damned cord. Why is it $50.

wmf · 2025-12-13T05:13:31 1765602811

In this case Thunderbolt is much much cheaper than 100G Ethernet.

(The cord is $50 because it contains two active chips BTW.)

geerlingguy · 2025-12-13T06:19:45 1765606785

Yeah, even decent 40 Gbps QSFP+ DAC cables are usually $30+, and those don't have active electronics in them like Thunderbolt does.

The ability to also deliver 240W (IIRC?) over the same cable is also a bit different here, it's more like FireWire than a standard networking cable.

0manrho · 2025-12-13T03:59:06 1765598346

Just for reference:

Thunderbolt5's stated "80Gbps" bandwidth comes with some caveats. That's the figure for either Display Port bandwidth itself or in practice more often realized by combining the data channel (PCIe4x4 ~=64Gbps) with the display channels (=<80Gbps if used in concert with data channels), and potentially it can also do unidirectional 120Gbps of data for some display output scenarios.

If Apple's silicon follows spec, then that means you're most likely limited to PCIe4x4 ~=64Gbps bandwidth per TB port, with a slight latency hit due to the controller. That Latency hit is ItDepends(TM), but if not using any other IO on that controller/cable (such as display port), it's likely to be less than 15% overhead vs Native on average, but depending on drivers, firmware, configuration, usecase, cable length, and how apple implemented TB5, etc, exact figures very. And just like how 60FPS Average doesn't mean every frame is exactly 1/60th of a second long, it's entirely possible that individual packets or niche scenarios could see significantly more latency/overhead.

As a point of reference Nvidia RTX Pro (formerly known as quadro) workstation cards of Ada generation and older along with most modern consumer grahics cards are PCIe4 (or less, depending on how old we're talking), and the new RTX Pro Blackwell cards are PCIe5. Though comparing a Mac Studio M4 Max for example to an Nvidia GPU is akin to comparing Apples to Green Oranges

However, I mention the GPU's not just to recognize the 800lb AI compute gorilla in the room, but also that while it's possible to pool a pair of 24GB VRAM GPU's to achieve a 48GB VRAM pool between them (be it through a shared PCIe bus or over NVlink), the performance does not scale linearly due to PCIe/NVLinks limitations, to say nothing of the software, and configuration and optimization side of things also being a challenge to realizing max throughput in practice.

This is also just as true as a pair of TB5 equipped macs with 128GB of memory each using TB5 to achieve a 256GB Pool will take a substantial performance hit compared to on otherwise equivalent mac with 256GB. (capacities chosen are arbitrary to illustrate the point). The exact penalty really depends on usecase and how sensitive it is to the latency overhead of using TB5 as well as the bandwidth limitation.

It's also worth noting that it's not just entirely possible with RDMA solutions (no matter the specifics) to see worse performance than using a singular machine if you haven't properly optimized and configured things. This is not hating on the technology, but a warning from experience for people who may have never dabbled to not expect things to just "2x" or even just better than 1x performance just by simply stringing a cable between two devices.

All that said, glad to see this from Apple. Long overdue in my opinion as I doubt we'll see them implement an optical network port with anywhere near that bandwidth or RoCEv2 support, much less a expose a native (not via TB) PCIe port on anything that's a non-pro model.

EDIT: Note, many mac skus have multiple TB5 ports, but it's unclear to me what the underlying architecture/topology is there and thus can't speculate on what kind of overhead or total capacity any given device supports by attempting to use multiple TB links for more bandwidth/parallelism. If anyone's got an SoC diagram or similar refernce data that actually tells us how the TB controller(s) are uplinked to the rest of the SoC, I could go in more depth there. I'm not an Apple silicon/MacOS expert. I do however have lots of experience with RDMA/RoCE/IB clusters, NVMeoF deployments, SXM/NVlink'd devices and generally engineering low latency/high performance network fabrics for distributed compute and storage (primarily on the infrastructure/hardware/ops side than on the software side) so this is my general wheelhouse, but Apple has been a relatively blindspot for me due to their ecosystem generally lacking features/support for things like this.

sebnukem2 · 2025-12-13T01:14:40 1765588480

I didn't know they skipped 10 version numbers.

badc0ffee · 2025-12-13T02:00:13 1765591213

They switched to using the year.

ComputerGuru · 2025-12-12T22:59:04 1765580344

Imagine if the Xserve was never killed off. Discontinued 14 years ago, now!

icedchai · 2025-12-13T00:07:53 1765584473

If it was still around, it would probably still be stuck on M2, just like the Mac Pro.

reaperducer · 2025-12-12T22:45:13 1765579513

As someone not involved in this space at all, is this similar to the old MacOS Xgrid?

https://en.wikipedia.org/wiki/Xgrid

wmf · 2025-12-12T23:27:16 1765582036

londons_explore · 2025-12-12T23:34:24 1765582464

Nobodies gonna take them seriously till they make something rack mounted and that isn't made of titanium with pentalobe screws...

moralestapia · 2025-12-12T23:38:25 1765582705

You might ignore this but, for a while, Mac Mini clusters were a thing and they were capex and opex effective. That same setup is kind of making a comeback.

fennecbutt · 2025-12-13T01:17:43 1765588663

They were only a thing to do ci/compilation related to apples os because their walled garden locked using other platforms out. You're building an iPhone or mac app? Well your ci needs to be on a cluster of apple machines.

londons_explore · 2025-12-12T23:43:45 1765583025

It's in a similar vein to the PS2 linux cluster or someone trying to use vape CPU's as web servers...

It might be cost effective, but the supplier is still saying "you get no support, and in fact we might even put roadblocks in your way because you aren't the target customer".

moralestapia · 2025-12-13T01:03:45 1765587825

True.

I'm sure Apple could make a killing on the server side, unfortunately their income from their other products is so big that even if that's a 10B/year opportunity they'll be like "yawn, yeah, whatever".

fennecbutt · 2025-12-13T01:19:17 1765588757

Doubt. A 10B idea is still a promotion. And if capitalism is shrinkflationing hard, which it is atm, then capitalists would not leave something like that on the table.

givemeethekeys · 2025-12-12T22:41:05 1765579265

Would this also work for gaming?

AndroTux · 2025-12-12T22:50:18 1765579818

jeffbee · 2025-12-12T21:50:37 1765576237

Very cool. It requires a fully-connected mesh so the scaling limit here would seem to be 6 Mac Studio M3 Ultra, up to 3TB of unified memory to work with.

PunchyHamster · 2025-12-12T22:14:22 1765577662

I'm sure someone will figure out how to make thunderbolt switch/router

huslage · 2025-12-12T22:29:48 1765578588

I don't believe the standard supports such a thing. But I wonder if TB6 will.

kmeisthax · 2025-12-12T23:47:25 1765583245

RDMA is a networking standard, it's supposed to be switched. The reason why it's being done over Thunderbolt is that it's the only cheap/prosumer I/O standard with enough bandwidth to make this work. Like, 100Gbit Ethernet cards are several hundred dollars minimum, for two ports, and you have to deal with SFP+ cabling. Thunderbolt is just way nicer[0].

The way this capability is exposed in the OS is that the computers negotiate an Ethernet bridge on top of the TB link. I suspect they're actually exposing PCIe Ethernet NICs to each other, but I'm not sure. But either way, a "Thunderbolt router" would just be a computer with a shitton of USB-C ports (in the same way that an "Ethernet router" is just a computer with a shitton of Ethernet ports). I suspect the biggest hurdle would actually just be sourcing an SoC with a lot of switching fabric but not a lot of compute. Like, you'd need Threadripper levels of connectivity but with like, one or two actual CPU cores.

[0] Like, last time I had to swap work laptops, I just plugged a TB cable between them and did an `rsync`.

bleepblap · 2025-12-13T01:40:47 1765590047

I think you might be swapping RDMA with RoCE - RDMA can happen entirely within a single node. For example between an NVME and a GPU.

wmf · 2025-12-13T02:01:24 1765591284

Within a single node it's just called DMA. RDMA is DMA over a network and RoCE is RDMA over Ethernet.

bleepblap · 2025-12-13T02:56:41 1765594601

Sorry, but it certainly isn't--

https://docs.nvidia.com/cuda/gpudirect-rdma/index.html

The "R" in RDMA means there are multiple DMA controllers who can "transparently" share address spaces. You can certainly share address spaces across nodes with RoCE or Infiniband, but thats a layer on top

wtallis · 2025-12-13T04:42:38 1765600958

I don't know why that NVIDIA document is wrong, but the established term for doing DMA from eg. an NVMe SSD to a GPU within a single system without the CPU initiating the transfer is peer to peer DMA. RDMA is when your data leaves the local machine's PCIe fabric.

wmf · 2025-12-13T04:01:42 1765598502

I'm going to agree to disagree with Nvidia here.