The Site is slow is an open-ended system design interview question. It is a collaborative problem-solving exercise that asserts the breadth of a candidate’s knowledge. At the same time, it allows the interviewer to dive deep into dedicated topics and test how much depth a candidate has in a variety of problem domains, from networking to databases and distributed systems.
This is very similar to what software engineers face.
It’s a real-world problem Software Engineers at Amazon, Google or any company monitoring the performance of their products face. And even if they don’t have proper monitoring in place – that’s a feedback real customers may report. Countless situations that can relate to this problem!
The outcome of this exercise is to come up with concepts you as a candidate have to read-up on. So that you feel confident to walk into your onsite interview.
Given the following system, a site connected to a single relational database.
Customers complain that the site is slow. Your task is to investigate what’s wrong.
+------------+ +------------+ | site | ----------> | db | +------------+ +------------+
- Clarify what the product is about. What kind of data is expected to transit? How frequently? What’s the expected TPS? Could it be a misuse of the service? If yes, how can you mitigate the attack immediately?
- How many customers reported the problem? Is it localized? Region-specific? Account specific?
- Could it be a UI problem only where the rendering is slow? Do you even have a UI or is it a REST service?
- Do you have any metrics in place that keep track of the latency? Breakdown graphs?
- Is it reproducible? How?
- Any logs?
- Are the web servers receiving traffic? If they don’t, can be coming from a load-balancer? reverse proxy? Any spillovers? SSL termination?
- What about the database? How does the traffic look like?
- Database connection pooling? Is it sufficient?
- Do we have to shard? Horizontal vs vertical? Relational vs NoSQL?
- Could it be coming from from the SQL queries? Write queries vs read queries? Locking and concurrency?
- Are the DB indexes ok? What are indexes anyway?
Possible follow-up questions are:
- The site was under attack. How do we prevent this in the future?
- Someone famous tweeted about our product and we got much more traffic than expected. How do we handle this?
- The interview can add constraints such as a customer requesting to keep his cost and expenses low (you can’t just throw 10 replicas in this case anymore).
- More caching?
- Move to cloud bases solutions? How would this help?
A candidate I recently mock interviewed on mocki.co mentioned that he would store a password field in the database. I challenged him about what would be saved. This led us to realize that he was confusing encryption (AES, etc.) and hashing (SHA, salted hash, HMAC, etc.). We discussed the LinkedIn breach that happened back in 2012 where they were storing hashed passwords instead of salted hash.
Another candidate mentioned that he would use HTTPS over HTTP. That when you authenticate to a website like amazon.com, you first authenticate in HTTP and then HTTPS kicks in. This doesn’t make any sense when you think about the goal behind HTTPS.
As stated earlier, this exercise enables the interviewer to identify gaps and areas of improvements before a system design interview.
Find someone to challenge you on every concept you mention. If you cannot find anyone, reach out to us on mocki.co and we will pair you with one of our engineers.