I have been doing software architectural work for a long time now, and as it turns out, the ‘right way’ of solving things may not always be the best way. Below are two anecdotes from life in the trenches.
The Case of the Database Bottleneck
I was visiting a company where we discussed their current system design and what problems they were experiencing. Their system had a respectful peak of 7 000 concurrent users and it turns out that at those peak times they started to hit the limit of their database which was running as a single entity.
We discussed the regular slew of database scaling solutions such as sharding, dedicated reader nodes etc. and some pros and cons with each solution.
As it turned out however, they solved it in their own way. “Yeah, we solved it, ” they came back to me when I asked them about it. “We bought an SSD drive which replaced the old hard-drive. It is much faster now.” they said matter-of-factly.
Naturally I scoffed at this and thought for myself that they have only bought themselves a little bit of time and, in at best, they could grow by 50-100% but then it would be the same issues all over! Rookies! Surely they did not understand the beauty of unlimited, linear scaling with sharding?
Within less than a year, they had grown about 20% and was then bought up by a bigger player in the industry. As is custom, their system was erased from the face of the earth in favor of the larger one. They never hit the limit of the SSD.
Later on I also did some calculations, if they had grown by 100%, they would have become one of the top5 actors in the market and their profits would have been through the roof. Their development budget would have been completely different by then.
So when looking back, in this case, it actually seems like the SSD solution was the right thing to do. They only needed to buy some more time for the deal to come through.
The Case of the Missing Scheduler
Another case occurred when I was reviewing a large gaming network that was running cash games as well as tournaments. There where many tournaments running on a daily basis and most of them were re-occurring events, such as The Daily Lunch Tournament etc. Almost every gaming network I know has a scheduling option for tournaments. An administrator would enter a tournament template and then say something like ‘run every day at 12 AM’ for instance. You would also be able to create a future tournament and say ‘start this tournament on October 10 at 18 AM’. Then the system would then create and start the tournament as specified.
This network did not have that.
Instead, they had about 10 employees in Indonesia who would work in shift and manually create each tournament and then manually click ‘start’ to start them. Nuts! This must surely be fixed!
So we started a discussion and I don’t remember my exact word, but they were something like: “This is insane! Surely we should be able to implement a simple scheduler in the system?”.
To which they replied something like: “Sure. But we have made an estimate on the time it would take us, and the cost of the developers on US salaries to implement this corresponds to about 7 years of the Indonesian guys doing this manually.”
Yikes.
“Besides, do you want to be the guy who calls them up and tell them and their families that they are losing their jobs? And for what? Saving a buck after 7 years? We have a choke-full backlog to work on anyway.”
Hmmm. Maybe it would not be worth cutting other features out in order to prioritize a feature that would cause 10 people their jobs and not save any money for a long time. Could this be? What kind of socialist development company was this?
As you might have guessed by now, by being able to dedicate their developers to other things rather than make the Indonesians redundant they were able to dish out new feature that actually attracted new players. Which turned out to be very successful for the owners in the end.
Summary
Am I advocating that you shouldn’t care about scalability (just buy SSD’s!) or never automate tasks because there are cheap labour to be found? Am I advocating quick hacks and avoiding solid engineering principles? Of course not.
But sometimes it is good to try and raise the view a bit and try to see what *actually* needs to be solved. As engineers we do sometimes get stuck on implementing the ‘right thing’ and lose sight of reality as it comes. I know I do 😉
You can contact him at: fredrik.johansson(at)cubeia.com