RentLikeAChampion.com allows homeowners to rent out their college-area homes to travelers over a football weekend.
How to prepare for overnight exposure to 6 million TV viewers: An interview with the CIO of Shark Tank startup Rent Like A Champion
Q: Who are you?
Mike Hostetler, Entrepreneur & Engineer, based in Chicago. I work with early-stage startups in various roles, helping them scale their technology and business.
For Rent Like A Champion, I served the role of CIO and Sys Admin, prepping the site and application to withstand the onslaught of traffic expected from the company’s appearance on ABC’s Shark Tank.
Q: How did you come to work with Rent Like A Champion?
Jordan Curnes, one of the founders of Rent Like A Champion, is a friend.
I originally joined the team to facilitate a responsive redesign of https://rentlikeachampion.com. We finished that project up in May 2015 and immediately learned about the Shark Tank opportunity. As it solidified, we realized that the technology stack needed some attention.
We had to be sure the application and infrastructure held up under the unusual circumstances of airing on a popular TV show with a national audience.
Q: What was Rent Like A Champion facing, technologically, heading into Shark Tank?
The Shark Tank opportunity was unique because of the potential for such incredibly high load over a short period of time. Preparing for this spike, or “traffic profile” as we called it, directed every planning step we took at that point.
The thing about the Shark Tank hit was that it was incredibly difficult to prepare for. The number of variables at play was huge.
We knew that the show reached 7-8 million people an episode — but didn’t have a feel for how many of those people would make the jump to go and visit the website. We didn’t know how much traffic would be generated.
We knew the outcome of the deal, but didn’t know how the segment would be edited or how long it would last.
We didn’t know if our segment would air early in the hour or late in the hour, which would have an impact on traffic.
The last unknown element was that we didn’t receive the official air date until two weeks out.
This created a “hurry up and wait” situation. We had no idea what to expect or when.
This meant we had to prepare for the worst-case scenario and have it on the shelf ready to execute, potentially weeks in advance. There are very real costs associated with this that had to be understood and managed.
It was certainly not a low-stress time for the business or the technology.
We were fortunate in that we were able to call other Shark Tank companies and ask for their advice. They were all very generous and shared their numbers with us. However, Rent Like A Champion’s application is a modern, fully customized web application, whereas most of the other companies we spoke with were more straightforward marketing and pre-interested eCommerce sites. This is a HUGE difference in complexity due to the dynamic nature of our application.
We had one chance to get it right. I called it the Space Shuttle Project because there were no do-overs. You either nail it or you don’t, and it will be highly publicized if you don’t.
Every “i” had to be dotted, every “t” had to be crossed, and the entire environment had to be stress-tested many times over.
Q: What did you see as the biggest challenge to solve in preparation for this type of unpredictable scale?
There was just one chance to get it right.
I knew I was playing quarterback and surrounded myself with a team of people who could help me block and tackle at every level. Each layer of our infrastructure was carefully planned, configured, deployed, and alerted to what we expected to happen. We worked extensively with SCTG’s team to make sure everything was prepped for the challenge that we faced with this traffic profile.
A second challenge, or opportunity, really, was caching. Caching proved to be our secret weapon. The Rent Like A Champion application is built on Rails, which is notorious for its inability to scale as a platform. We quickly realized that if we ran Rails without caching, we would need over 100 servers to meet the risk profile we were comfortable with. This was unrealistic from a cost and a technology perspective.
Q: How did you address this challenge?
The team at SCTG suggested we look very closely at Varnish. Varnish introduced an order of magnitude of complexity into our application, but was our only realistic option to handle such a high load. They introduced us to some trusted resources and tools, and we were able to integrate Varnish into our technology stack.
We did run a few tests using nginx (our web server of choice) at the reverse proxy layer, but that proved to be less performant than Varnish. At this point we knew Varnish was going to be more complex, but we knew it was the right answer.
We learned a few things along the way: Varnish can’t terminate an SSL connection, so we had to leave nginx to do that.
With our horizontal scaling strategy, I wanted to keep our application servers all self-contained, meaning we didn’t need to coordinate another layer into our infrastructure. I specifically didn’t want to introduce another layer for two technical reasons:
- We were architecting for a specific event and planned to tear it down the next day. It made perfect sense to follow the KISS rule: keep it simple, stupid.
- We intentionally chose not to use a configuration management layer like Puppet or Chef because it was overkill for our situation. Ultimately this limited our ability to sustainably build a more reliable and robust infrastructure, but the project objective was to survive Shark Tank.
The solution was to make a nginx and Varnish sandwich on each box:
- Port 80 redirects traffic to Port 443
- 443 terminates the SSL connection and proxies the request to Varnish, which is running on Port 8080
- Varnish then reached out to nginx running locally on Port 8888, where the Rails application sits
The sandwich approach proved to be robust and scalable. It was one of my biggest personal lessons coming out of the project. I will definitely use it again.
With our application organized into this “Lego brick,” we proved we could scale horizontally as far as necessary. This approach limited complexity, minimized potential points of failure, and was an all around better economic decision.
Q: Once you open the proverbial can of worms, all kinds of things change. What was the biggest challenge you uncovered?
After we nailed down our plan for an nginx and Varnish sandwich, we installed the Rails application and began testing.
We immediately hit cache issues for logged-in users.
The problem was that, by default, Rails sets a session cookie for all users. This cookie wreaks havoc on caching. We had to spend a considerable amount of time testing and retesting our Varnish configuration to get the right setup for our application with regards to cookie handling.
This took pairing with our CTO, who was tweaking the Rails application and HTTP headers, while I was tweaking Varnish. In the end, we got it right and learned a lot about Varnish along the way.
After we sorted the cookies out, we did more load testing and observed something very interesting. The first load test I did after I restarted Varnish and flushed the cache delivered terrible results. I conducted most of my tests in groups of three, meaning I’d run the test three times and average the results. Tests 2 & 3 would run without issue. We did more digging and eventually realized that on the first load test run, Varnish was populating the cache. The problem was, while the server was doing the extra processing to populate that page to the cache, other requests would queue up behind it so quickly that they’d all hit their timeouts. This would’ve delivered a bad experience to users before the server could get caught up.
We knew if we didn’t solve this traffic shockwave problem, we would fail.
The more we looked into it, the more we began realizing that any sort of auto-scaling strategy would never have worked for us. Automated provisioning is nice, but it just can’t respond fast enough to what we were facing, or for what many others face.
In a real-world situation, with real consequences for failure, there was no way we were going to trust auto-scaling scripts to handle the load.
After more research and discussions with SCTG, we realized that we needed to deal with the traffic shockwave head on.
The solution was to write a cache-population script.
We crawled our entire site and generated a list of URLs across the entire application. We then fed this list of URLs to curl to make single anonymous requests to all 1500 URLs, thereby populating the cache for those URLs.
After writing the script, results from load testing proved we had solved this issue. We were ready for Shark Tank.
Q: What happened with Shark Tank?
The night of Shark Tank was a total blast. The company held a party in a club in downtown Chicago, so we holed up in the back and set up a mini-NOC around a bar table. We knew that once the episode came on TV, we just needed to hold on and enjoy the ride. We also set up a back channel where we could be online with the technology team from our investors and SCTG.
There was a limit on what we could observe, statistics wise, so we focused on Google Analytics and New Relic (also an SCTG customer). The SCTG team was watching their infrastructure monitoring tools, so we knew we had our bases covered. We only had one glitch the whole night.
At T-minus 3 minutes, Evan at SCTG called out that he was seeing “500 errors” on one of our servers. Because we were all staring at our screens, I was able to instantly log into the problematic server, diagnose that Varnish had crashed, turn it back on, and bring it back into the swarm in about 30 seconds.
That was my second favorite moment of the night. The best part was seeing the traffic climb when Mike and Drew came on TV.
It was a total blast calling out the number of users hitting the site as it climbed.
At one point, we were adding 150 users per SECOND.
I kept calling out the numbers on the back channel so the team knew what I was seeing.
The noise from the crowd in the club was rising as it became obvious that the Rent Like A Champion team was about to receive a deal from the two billionaire sharks.
It was so loud that I don’t think the other people on the back channel could hear me anymore.
When we finally peaked, our simultaneous users increased by 10,612.5%.
It was absolute chaos, and it was great.
New Relic was still reporting less than 250-ms response times for our application. We were well within our target response range.
As the segment ended, traffic slowly began to decline. We knew we had nailed it.
It was an amazing feeling.
Q: Why did you choose to work with SCTG?
When I started on the project, we were hosting with a vendor that claims to provide fanatical support.
When the Shark Tank opportunity came along and presented us with a real challenge, we decided that fanatical support wasn’t good enough.
We needed instant response times and the ability to solve hard technical problems.
We landed at SCTG for a few reasons:
- SCTG has integrated the complete infrastructure stack. They own and manage everything down to the network backbone connections. This means there is someone who knows EXACTLY what is going on at every level. They’re watching, managing, and mitigating every layer beneath us. That’s serious peace of mind to have when selecting a partner, especially with an unknown traffic shock wave heading our way.
- If I’m building something, a foundation like this means I can start building on the 10th floor with confidence. It’s the knowledge that I don’t have to think about countless infrastructure elements because SCTG has it SOLVED. I spend more time solving the problems in my court, and don’t have to wonder or worry about things breaking outside of my cloud because I know SCTG is taking care of all of it.
- SCTG is relatable and accessible. They are here in Chicago, right down the street, and we knew they would have our backs as we encountered the unknown of the show’s airing and our post-airing “new normal.”
- We knew SCTG had faced similar (and much larger) challenges before. They would be a helpful and supportive partner in tackling our Shark Tank appearance and upcoming traffic profile.
Q: Any final thoughts?
It’s not difficult to select a quality partner, just be sure you do it!