Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

System Ops (at scale, large company, lots of teams) in a nutshell: you did not build the system or choose what kind, it was most likely installed using vendor defaults by the systems owner and you were called in because it's misbehaving. Most likely 50 other people have touched various parts of it, some with skill some without.

"Give me an IP, username and password - what OS is it?" are about all you start with and go from there. It's probably a critical system to someone, and everyone swears on a stack of bibles that nobody did anything, touched anything or made a change. You have very specific domain knowledge (kernel, grub, SAN/storage, systemd, dbus, etc.) and typically ask a lot of questions to the systems owner as your fingers are flying ruling out reasons (low hanging fruit common issues).



Ok so complain to the ops department that they need to unify their logs. That's their problem, not yours. If the company is big I would expect them to be doing that anyway, either they coalesce around a journald-type thing that aggregates the logs locally, or they'll use another centralized service like datadog, splunk, etc. Edit: If you are ops then this is your entire wheelhouse, you should be able to solve it at scale without messing everything up.


It has been my career experience that 5+ digit employee companies more resemble Chiba City than they do the USS Enterprise.


Sure, but that's entirely the problem those centralized logging services were made to solve. You make it really easy for everyone in the company to put their logs in the right place.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: