Ask HN: How do startups restrict employees from accessing private user data?

patio11 · on June 20, 2014

One of the first thing your DevOps team is going to do as you grow is make sure that employees don't have unrestricted access to the production DB / console access / root on the production web tier, since down that path lies madness.

This comes down partly to policy and partly to tech. The policy, disclosed early and often, is that misuse of customer data is an instant firing offense. Google, Facebook, etc have indeed terminated people over this, often literally count-the-minutes after the fact of the misuse became known to other people at the company.

Tech-wise, it's spiritually similar to other security measures. You lock down access on a need-to-have basis, you log the heck out of extraordinary requests for access, and you audit those requests.

e.g. Many companies will eventually develop a Use The Software As User X feature. At some, this requires you to a) be logged in as a privileged employee, then b) click to activate the feature, c) write an explanation why you need access to User #12345's account, and d) checkbox that you have receive #12345's consent for this. (I know some companies that skip D, largely in B2C.) When you hit submit, that logs it to the DB and fires an email to the audits@ email address, which goes out to 5 different people, or pipes "Patrick just logged in as #12345 because [chasing down display bug -- customer reports unescaped HTML in the message window, can't reproduce on staging or with own account]" into your team's HipChat/etc channel.

GuiA · on June 20, 2014

Thanks, that's super helpful.

While we're not big enough yet to have a dedicated dev ops team to implement the first things you've described, the logging approach is totally realistic. I especially like the idea of having a bot that monitors extraordinary admin actions and pipes them into the group chat.

Thanks again!

caw · on June 20, 2014

From a more technical basis (being a devops, and having worked in least privilege environments), the big thing you need is centralized logging, and from there you can do what you need. Whether it's syslog or logstash or something else, if you get the logs in one place you can then filter over them for instances that you'd need to alert via email or chatbot.

That works great and all until you realize only "sudo" is logged and not root terminal actions, and even then root could delete any logs of its actions. That's why something immediately shipping off logs is nice. I like "rootsh" (available on SourceForge) for forcing any sudo users to either use "sudo" or "sudo rootsh" to get a root terminal. You're not preventing anyone from doing their jobs, you just have an audit trail. Someone asked me why you need an audit trail unless it's to fire people for doing something wrong -- no it's for root cause and preventing certain operator errors from happening again, and in case of maliciousness from either an employee or someone impersonating an employee.

The one other big thing to do is get rid of any shared accounts that can access data. If it's AWS, gen up some keys for each user/application or use IAM roles for the hosts. If it's Linux accounts, separate out the accounts. If you must have a singular account for something, only allow sudo access to switch to them. Going back to the previous paragraph, you'll at least get a log of who switched to the shared account.

If you want to chat more about this, my email is in my profile.

dennisgorelik · on June 20, 2014

Access to production environment is not the only issue here.

Proper testing in development environment is not very practical without restoring a backup from production environment.

So typically, even if developers cannot change production data, they still can read few weeks old copy from production environment.

Trimming production data after restoring it to development may improve security a little bit, but such trimming is a project in itself and requires constant maintenance of that trimming code. In addition to that, trimming production backup may introduce new bugs in development and make production bugs not reproducible in development environment.

patio11 · on June 20, 2014

I would strenuously, strenuously argue against ever, at any stage of the company, at any kind of company, with any data of any significance whatsoever, letting developers initialize dev environments with production backups.

That's not a disaster waiting to happen. It's a fractal of disasters -- no matter which part of the idea you examine, it has an infinite number of disasters in it. It is an Ice-9 of disasters: you introduce it to perfectly sensible engineering decisions and they become disasters, too, just by contact with it. ("Do we let our developers work on laptops?" "Sure." Great, now our production database is one car window away from causing catastrophic reputational damage to the company.)

dennisgorelik · on June 20, 2014

In a small team there should be single development database restored from backup.

That development database should be hosted on development server, not on developers' laptops.

Developers are welcome to VPN into corporate network from their laptops in order to work with database.

Any other issues?

collyw · on June 20, 2014

That doesn't make sense. What if people want to change the database schema, and 4 developers are working on it at once?

dennisgorelik · on June 21, 2014

Breaking database changes do not happen often. There could be multiple solutions to that:

1) Other developers wait.

2) Affected functionality is turned off while in transition.

3) Changes implemented in a non-breaking way.

4) Change is done quickly.

5) Separate database environment is setup in order to prepare change.

In any case, the harder problem is to deploy that database change into production when thousands of users are working on it at once.

Practicing that change in development environment helps to make production deployment smoother.

collyw · on June 22, 2014

Fair enough, I am guessing your work is at a different scale from mine. We are "super agile" (constantly changing) due to the changing nature of the sequencing technology we work with.

opendais · on June 20, 2014

What we do at work is every developer has their own VM on an on-premise server in a locked room that is rebuilt/destroyed as needed with the full stack.

So maybe he just wasn't being clear?

jlawer · on June 20, 2014

I've really only seen 2 core approaches to this:

1.) Free Access to everything. You trust everyone and hope it works out right. This is the simplest solution, but offers nothing to prevent someone violating that trust.

2.) Lock production down to a trusted team. Typically this is done with a full dev / ops separation. Dev build the site and test, Ops run it live and have access to the live database. You trust 3 people in ops fully and no one else. Locking down production often entrenches rivalries between dev and ops, and makes debugging performance issues a PITA as dev typically don't have access to the dataset that is exhibiting the problem.

I've seen both work and both fail, and neither protects you from a trusted employee screwing you over.

What I have seen work rather well though is audit logging. Logging all access to the key systems and periodically (and randomly) auditing access. I've seen this done at the db level, system level and app level before. Basically the story is not that you will be prevented from doing something you shouldn't, but if you are that you will be held accountable for it (typically on the spot termination). However to be effective the company needs to be able to take the high ground and be consistent. This won't work if someone (co-founder, manager) is doing a similar thing and getting away with it.

As long as the data isn't high value (Credit Cards, etc) enough to make the opportunity cost worth it, knowing you will loose your job provides enough to make most hesitant to break the rules.

donavanm · on June 20, 2014

Trust but verify, ie tamper resistant auditing, works for low sensitivity data. It's also an investment you'll never outgrow. When you get larger, or more sensitive, its time to implement something like a Two Man Rule. Take your pick of implementations of Shamirs Secret Sharing. And lastly, my favorite, operators of the service can not access customer data without explicit permission from the customer. A friend implemented this, Grendel, as an internal service at Wesabi.

Personally I dislike identity impersonation schemes, even between internal services. It leads to poor visibility and accountability. A proper Auth/Authz/RHAC scheme where the customer expliceitly grants specific priveleges to your internal service, and delegates, works better long term.

eshvk · on June 21, 2014

Also, what if you are a data scientist? Yes, you could encrypt User IDs. However, when even publicly anonymized datasets can be reverse engineered, surely, a person in charge of feature selection and access to company datasets could wreak havoc even under the restriction?

I am specifically curious how Facebook/Google/NFLX which are companies with massive datascience teams handle this.

edoceo · on June 20, 2014

My early stage employees and partners are too busy interviewing customers, writing code a getting shit done (or commenting on HN) to waste the time looking at inane things posted by GP.