Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: How do startups restrict employees from accessing private user data?
28 points by GuiA on June 20, 2014 | hide | past | favorite | 14 comments
How do startups which hold private user data deal with restricting access to that data to employees?

I'm not talking about data that's sensitive enough that it falls under a specific jurisdiction (credit cards, SSNs, etc.) - more data like private messages between friends, photos, and so on, which users consider to be private.

In the very early days of a startup, you don't really worry about this because you don't have the time, and you trust your cofounders enough to not snoop around on messages/documents/etc. that the few users you have consider to be private.

But as the employee count hits the double digits and keeps going up, you probably shouldn't trust everyone to that level? Yet in most early stage startups, all employees have command line access to production application, admin panels, databases, etc.

How do you solve this problem? Do you add restrictions to which trusted employees can access production services? Do you encrypt the user data you store?

If there are any insights about how larger companies have handled this, I'd love to hear it. Surely at Instagram, OkCupid, Facebook, etc. the average employee can't read the private messages of their ex-partner?



One of the first thing your DevOps team is going to do as you grow is make sure that employees don't have unrestricted access to the production DB / console access / root on the production web tier, since down that path lies madness.

This comes down partly to policy and partly to tech. The policy, disclosed early and often, is that misuse of customer data is an instant firing offense. Google, Facebook, etc have indeed terminated people over this, often literally count-the-minutes after the fact of the misuse became known to other people at the company.

Tech-wise, it's spiritually similar to other security measures. You lock down access on a need-to-have basis, you log the heck out of extraordinary requests for access, and you audit those requests.

e.g. Many companies will eventually develop a Use The Software As User X feature. At some, this requires you to a) be logged in as a privileged employee, then b) click to activate the feature, c) write an explanation why you need access to User #12345's account, and d) checkbox that you have receive #12345's consent for this. (I know some companies that skip D, largely in B2C.) When you hit submit, that logs it to the DB and fires an email to the audits@ email address, which goes out to 5 different people, or pipes "Patrick just logged in as #12345 because [chasing down display bug -- customer reports unescaped HTML in the message window, can't reproduce on staging or with own account]" into your team's HipChat/etc channel.


Thanks, that's super helpful.

While we're not big enough yet to have a dedicated dev ops team to implement the first things you've described, the logging approach is totally realistic. I especially like the idea of having a bot that monitors extraordinary admin actions and pipes them into the group chat.

Thanks again!


From a more technical basis (being a devops, and having worked in least privilege environments), the big thing you need is centralized logging, and from there you can do what you need. Whether it's syslog or logstash or something else, if you get the logs in one place you can then filter over them for instances that you'd need to alert via email or chatbot.

That works great and all until you realize only "sudo" is logged and not root terminal actions, and even then root could delete any logs of its actions. That's why something immediately shipping off logs is nice. I like "rootsh" (available on SourceForge) for forcing any sudo users to either use "sudo" or "sudo rootsh" to get a root terminal. You're not preventing anyone from doing their jobs, you just have an audit trail. Someone asked me why you need an audit trail unless it's to fire people for doing something wrong -- no it's for root cause and preventing certain operator errors from happening again, and in case of maliciousness from either an employee or someone impersonating an employee.

The one other big thing to do is get rid of any shared accounts that can access data. If it's AWS, gen up some keys for each user/application or use IAM roles for the hosts. If it's Linux accounts, separate out the accounts. If you must have a singular account for something, only allow sudo access to switch to them. Going back to the previous paragraph, you'll at least get a log of who switched to the shared account.

If you want to chat more about this, my email is in my profile.


Access to production environment is not the only issue here.

Proper testing in development environment is not very practical without restoring a backup from production environment.

So typically, even if developers cannot change production data, they still can read few weeks old copy from production environment.

Trimming production data after restoring it to development may improve security a little bit, but such trimming is a project in itself and requires constant maintenance of that trimming code. In addition to that, trimming production backup may introduce new bugs in development and make production bugs not reproducible in development environment.


I would strenuously, strenuously argue against ever, at any stage of the company, at any kind of company, with any data of any significance whatsoever, letting developers initialize dev environments with production backups.

That's not a disaster waiting to happen. It's a fractal of disasters -- no matter which part of the idea you examine, it has an infinite number of disasters in it. It is an Ice-9 of disasters: you introduce it to perfectly sensible engineering decisions and they become disasters, too, just by contact with it. ("Do we let our developers work on laptops?" "Sure." Great, now our production database is one car window away from causing catastrophic reputational damage to the company.)


In a small team there should be single development database restored from backup.

That development database should be hosted on development server, not on developers' laptops.

Developers are welcome to VPN into corporate network from their laptops in order to work with database.

Any other issues?


That doesn't make sense. What if people want to change the database schema, and 4 developers are working on it at once?


Breaking database changes do not happen often. There could be multiple solutions to that:

1) Other developers wait.

2) Affected functionality is turned off while in transition.

3) Changes implemented in a non-breaking way.

4) Change is done quickly.

5) Separate database environment is setup in order to prepare change.

In any case, the harder problem is to deploy that database change into production when thousands of users are working on it at once.

Practicing that change in development environment helps to make production deployment smoother.


Fair enough, I am guessing your work is at a different scale from mine. We are "super agile" (constantly changing) due to the changing nature of the sequencing technology we work with.


What we do at work is every developer has their own VM on an on-premise server in a locked room that is rebuilt/destroyed as needed with the full stack.

So maybe he just wasn't being clear?


I've really only seen 2 core approaches to this:

1.) Free Access to everything. You trust everyone and hope it works out right. This is the simplest solution, but offers nothing to prevent someone violating that trust.

2.) Lock production down to a trusted team. Typically this is done with a full dev / ops separation. Dev build the site and test, Ops run it live and have access to the live database. You trust 3 people in ops fully and no one else. Locking down production often entrenches rivalries between dev and ops, and makes debugging performance issues a PITA as dev typically don't have access to the dataset that is exhibiting the problem.

I've seen both work and both fail, and neither protects you from a trusted employee screwing you over.

What I have seen work rather well though is audit logging. Logging all access to the key systems and periodically (and randomly) auditing access. I've seen this done at the db level, system level and app level before. Basically the story is not that you will be prevented from doing something you shouldn't, but if you are that you will be held accountable for it (typically on the spot termination). However to be effective the company needs to be able to take the high ground and be consistent. This won't work if someone (co-founder, manager) is doing a similar thing and getting away with it.

As long as the data isn't high value (Credit Cards, etc) enough to make the opportunity cost worth it, knowing you will loose your job provides enough to make most hesitant to break the rules.


Trust but verify, ie tamper resistant auditing, works for low sensitivity data. It's also an investment you'll never outgrow. When you get larger, or more sensitive, its time to implement something like a Two Man Rule. Take your pick of implementations of Shamirs Secret Sharing. And lastly, my favorite, operators of the service can not access customer data without explicit permission from the customer. A friend implemented this, Grendel, as an internal service at Wesabi.

Personally I dislike identity impersonation schemes, even between internal services. It leads to poor visibility and accountability. A proper Auth/Authz/RHAC scheme where the customer expliceitly grants specific priveleges to your internal service, and delegates, works better long term.


Also, what if you are a data scientist? Yes, you could encrypt User IDs. However, when even publicly anonymized datasets can be reverse engineered, surely, a person in charge of feature selection and access to company datasets could wreak havoc even under the restriction?

I am specifically curious how Facebook/Google/NFLX which are companies with massive datascience teams handle this.


My early stage employees and partners are too busy interviewing customers, writing code a getting shit done (or commenting on HN) to waste the time looking at inane things posted by GP.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: