Perhaps I've drunk too much of the Unix and Heroku 12-factor kool-aid, but I'm s...

twic · on April 13, 2014

I'm struggling to find a way to say this that doesn't come across as either bitchy or trolling, but none of this is exactly surprising coming from Puppet Labs. Puppet is hardly a poster child for simplicity. The people behind Puppet are smart, and very capable implementers, but simplicity and modularity have never been goals of theirs.

discreteevent · on April 13, 2014

Debugging: You can step from one service into another in the same debugging sesssion.

Profiling: Analyse one snapshot to find the main bottleneck across a number of services.

Performance, security etc. etc. Basically the ability to use a lot of the things that make the JVM powerful and save you time and money. (Not only that but the things that make clojure powerful like its concurrency mechanisms etc across services)

Besides one process is often simpler than many. Interprocess communication suffers from some of the "Fallacies of Distributed Computing."

If you like to keep your UNIX process mental model just think of JVM threads as processes.

(One other thing: Running services in multiple JVMs uses more heap than multiple services in one)

dsl · on April 13, 2014

> One other thing: Running services in multiple JVMs uses more heap than multiple services in one

That sounds like a failure of the JVM. You are also putting multiple independent services in the same fault domain.

Tuna-Fish · on April 14, 2014

> That sounds like a failure of the JVM.

That is a natural feature of garbage collection. The asymptotic complexity of copying garbage collection is O(complexity of live object graph)/(total available heap size). That is, having more heap available makes it consume less computational resources, up to an including making many real loads have near-zero garbage collection cost if the heap is large enough. Having three individual processes share a heap can make them actually do less work than if they each had a static third.

Modern operating systems are not set up to exploit this. There has actually been some work done on Linux user space reclaim for it but it's not yet usable. Until there is a way to tell the kernel that you want to use every last free byte on the machine if possible but can likely free most of it when any other process needs it, to make best use of the machine you will have to roll your own in userspace.

> You are also putting multiple independent services in the same fault domain.

In practice, unless you are doing a few stupid things (which most places don't let you do), it's extremely uncommon for whole JVM processes to go down. With a bit of work you can make almost all faults only take out individual tasks.

Not that I personally like JVM. I'd much prefer to do work on the actual OS, not wrap it all and pretend it doesn't exist. However, the people who build and use it are not idiots -- there is an actual method to their madness.

syjer · on April 13, 2014

Actually, there is already (proprietary) multitenancy support for the jvm that provide full isolation (minus JNI calls). See http://www.ibm.com/developerworks/library/j-multitenant-java... .

IIRC I saw some talks about introducing it in java9.

eikenberry · on April 13, 2014

I agree with your points to a certain extent, but if you understand things from a Java mindset the features make sense.

The thing to understand is that, much like Smalltalk, the JVM is its own world. The underlying OS is only there to bootstrap the JVM, there is no Unix underneath. So every time you think they would do better to use some feature of the underlying OS you are breaking with the Java way of doing things.

twic · on April 13, 2014

a Java mindset

I'd just like to point out that this is not the only Java mindset. It's certainly true that Java has a long and horrible tradition of people thinking that everything and the kitchen sink should go in the app server (which can then be managed with a SOAP interface!), but this is not intrinsic to Java, and it's not the only way to work with Java. Some of us are actually pretty keen on the 12 factor approach.

mwcampbell · on April 13, 2014

I appreciate that many JVM proponents treat the JVM as a world to itself. I guess the JVM community is big enough that it can afford to be that insular; that just makes it seem all the more foreign to the rest of us. Still, it seems to me that pragmatic developers can take advantage of the good parts of the JVM while rejecting the NIH syndrome. Puppet Labs, in particular, can rely on Unix being the OS underneath.

weavejester · on April 14, 2014

This is a Clojure framework though, so the Java mindset doesn't apply.

noelwelsh · on April 14, 2014

Heroku's architecture makes sense when you bill per process and primarily target a slow runtime like MRI. If you want low latency you want as little as possible between your HTTP endpoint and the actual processing. You don't need HAProxy, SSL termination daemons, and so on when the JVM has the grunt to do this itself, and you end up lower latency as a bonus. It is also to my mind a simpler system. All things being equal I would rather configure and monitor one system than a zillion.

cprice404 · on April 14, 2014

One minor point of clarification: this framework is not intended to prescribe that we (Puppet Labs) or anyone else should move all of their services into a giant monolothic process, a la JavaEE. In fact, that's one of the main reasons we decided to build the framework rather than use an existing solution.

Trapperkeeper is simply intended to give us the flexibility to decide how to bundle different services together. If we have 2 or 3 tiny web services that can benefit from some shared state and are lightweight enough that they don't really warrant running a separate web server for each one, now we can choose to deploy them that way. But, absolutely, we can still run more than one instance of this framework on the same machine, to reap the benefits of truly isolated OS processes in cases where that is appropriate.

Another minor point to highlight is that, at Puppet Labs, we're building on-premise software that our customers install and administer on their own networks. We're not building SaaS products that we administer ourselves. This means that we need to be able to support a wider range of deployment configurations, so that we can tailor a particular installation to the needs of the particular user who is installing it. For some users, they may have enough traffic to warrant peeling out every individual service to its own process and distributing them across many nodes. For other users, they may have a very tiny network where it is perfectly sufficient (and much simpler) to run everything in a small number of processes on a single node. The new framework is all about giving us the ability to make those choices more dynamically--not about prescribing a particular deployment model to all users.

cprice404 · on April 14, 2014

Also--we do indeed use the embedded Jetty model for serving web apps. It's entirely up to a user of the framework to decide how many apps they'd like to run in each embedded Jetty instance.