Metra is a commuter railroad in the Chicago metropolitan area. It is the largest and busiest commuter rail system outside New York City.
With the advent of new technologies and standards, Metra wanted to enhance their rider experiences by developing a new website that made getting pertinent information quick and easy. Inclement weather, track and station construction, heavy usage during peak hours, and special events contributed to an ever-growing frustration amongst Metra riders because of the inability to get accurate and real-time information on trains and their locations.
“Our goal was to create a customer-friendly website that presents information in a logical and intuitive way. We hope these changes will make using our website—and using our train service—an even more satisfying experience.”
Executive Director/CEO, Metra
“Metra debuts new and improved metrarail.com”
metrarail.com, June 29, 2016
A significant problem with the Metra website, and its infrastructure, was the way in which GTFS data were delivered to riders via a browser-based pull mechanism, rather than modern server-based push technology such as WebSockets. Alerts were not presented within the context of their respective train or route and were, instead, provided in a manner that caused riders to scroll through an often lengthy list of notices to find any that may have impacted their particular train or route.
The new Metra system needed to be able to handle massive spikes in traffic to their website during special events and severe weather. Scaling of the site could be scheduled in advance, but unforeseen events, such as inclement weather or mechanical failures, left them vulnerable to unpredictable spikes. Due to the elastic nature of Amazon’s Cloud offering and Metra’s ridership, the Metra project was a perfect opportunity to utilize much of the functionality already available in cloud offerings.
As an AWS Managed Services Partner (MSP), we were brought in to leverage our expertise in AWS’ vast functionality, implementation, and management. We needed to architect an environment that would expand and contract automatically with demand and load. Designing this type of environment can be challenging, as it requires developers to understand the elastic principles of the cloud and to create software solutions that can handle the variable nature of auto-scaling scenarios, where servers often appear and disappear as they ramp up or ramp down the number of cloud server instances.
Because a website crash would be disastrous, the site could not afford downtime and the system would need to be highly available and fault tolerant. Also, it was vital that development and deployment were straightforward, consistent, reliable and non-impactful to site availability.
Once complete, we needed to be able to hand off the environment to Metra, ensuring that all involved software and systems engineers had the ability to update code, content, and live GTFS feed data promptly to improve rider communication.
Finally, because the website would process credit cards for ticket sales, we and Clarity Partners collaborated to achieve Payment Card Industry (PCI) compliance. PCI compliance verified, via audit, that Metra and its partners take all available precautions to provide security standards and multiple layers of defense to protect sensitive consumer data.
Because websites and the infrastructures that run them do not scale to meet demand automatically and are not fault-tolerant by default, we had to take care when designing the environment. To accomplish these two critical goals, the environment was designed from the start to utilize multiple availability zones within the AWS cloud.
This meant hosting the site across multiple physical data centers.
The solution required the use of both CloudFront (AWS’ CDN offering) and Elastic Load Balancers (ELBs).
A well-architected auto-scaling policy was also critical, as the environment needed to grow and shrink based on utilization.
Using auto-scaling to cut costs
Auto-scaling helps maintain application availability and allows you to scale your Amazon EC2 capacity up or down automatically according to conditions you define. Auto-scaling can be used to help ensure that you are running a desired number of Amazon EC2 instances, and can also automatically increase the number of Amazon EC2 instances during demand spikes to maintain performance, and decrease capacity during lulls to reduce costs.
In addition to overall management of the Metra project, Clarity Partners was responsible for the site front-end design and architecture of the Drupal components, ticket sales e-commerce integration, and supporting the PCI compliance initiative.
Detailed service alert data
Transit Tracker app preview
Find A Metra Train mobile app feature
Live transit tracker view in mobile app
Pop-up alert in mobile app
As a DevOps-focused company, we guided the Clarity development team on how to deploy, run, and code Drupal for an auto-scaling environment, helped design and deploy the Continuous Integration (CI) jobs and processes, and led triage and management of all issues.
The hosting infrastructure now consists of development, staging, and production environments. Through CI tools and processes, Metra and Clarity are able to test new code and ideas in the development and staging environments prior to pushing them to production with confidence. The new DevOps-oriented process also allows for multiple development teams to contribute to the code base, including Metra’s internal development team.
To address the lack of real-time push capability of GTFS data, we built and continue to maintain the real-time data feed for alerts, train positions, and schedule data. Our system makes this data available via WebSockets and a standard JSON-based REST API.
Additionally, we provide direct access to the raw GTFS data feed. This access protects Metra’s GTFS source from being overloaded by too many requests.
In addition to providing data to riders via the website, Metra provides GTFS data to major providers such as Google, Microsoft, Yahoo, and others for their mapping and routing applications.
The new real-time messaging platform, based on NodeJS and PubNub, is generating and delivering millions of messages a day to website users and other connected devices. To be useful, these messages need to be delivered in a reliable, scalable, and timely fashion.
Real-Time GTFS Messages – 3 Months Trailing
(Spike on 11/4 was for the Chicago Cubs World Series parade and rally)
Metra now enjoys a fully Managed AWS Cloud that covers all aspects of the environment, including automated code deployments throughout the development cycle.
Managing technology costs with automation
Amazon CloudFront’s CDN service accelerates delivery of websites, APIs, video content or other web assets. With Amazon CloudFront, you don’t need to worry about maintaining expensive web server capacity to meet the demand for content from potential traffic spikes.
The service automatically responds as demand increases or decreases without any intervention. CloudFront allows us to serve cached data to metrarail.com end users, rather than passing their requests directly to the web/application servers to fulfill every client request. This process allows us to significantly increase the efficiency of the Metra site and ultimately scale back on the instances/resources needed to service riders.
The positive impact of improved site performance was significant and immediate: during the month of August 2016, over 5.5 TB of data was pushed from CloudFront and only 61 GB of data had to be served from the web servers.
Running far fewer cloud server instances and other infrastructure to service end users helps Metra manage their server costs.
Amazon CloudFront passes on the benefits of Amazon’s scale. You pay only for the content that you deliver through the network, without minimum commitments or up-front fees. This applies for any type of delivered content: static, dynamic, streaming media, or a web application with any combination of these. With Amazon CloudFront, you don’t need to worry about maintaining expensive web-server capacity to meet the demand for your content from potential traffic spikes.
Benefits of CloudFront, Highlighted
(The spike to 500GB on 11/4/16 was in support of the Chicago Cubs World Series parade & rally)
The auto-scaling solution we developed with Clarity has allowed the site to scale and respond to demand in three fundamental ways.
First, we programmed scheduled scale-up and scale-down scenarios that increase the size of the cluster before rush hour and decreases it back down after rush hour, since that is when the site is in highest demand. If full capacity is not required 24×7, it’s better to run cloud instances at peak capacity for only 8 hours a day.
Second, the site is also capable of auto-scaling itself if the load on servers or the page response times get too high. This ability to dynamically scale means that Metra only runs servers when they’re needed, automatically adjusting to peak and off-peak times, which saves Metra a considerable amount of money in hosting costs. Metra also doesn’t need to try and plan for unforeseen spike conditions since the site will auto scale up and handle them as the need arises.
Finally, we are able to proactively scale up the Metra AWS environment in anticipation of supporting traffic spikes related to Chicago events. The first tests of this capability were the Lollapalooza music festival and the annual Taste of Chicago festival in July 2016.
More recently, and most significantly, we managed the Metra cloud environment to support surges in ridership to the Chicago Cubs playing in the World Series games hosted at Wrigley Field, and particularly the day of the Cubs parade and rally in Grant Park.
Metra Executive Director/CEO Don Orseno accurately predicted that the day of the parade and rally was expected to be the day of highest ridership in Metra history.
Though the streets, train stations, and sidewalks of Chicago struggled to accommodate the physical demands of nearly 5 million Cubs fans attending the celebration, we ensured the Metra AWS environment kept the site live with rapid data flows to provide all riders with accurate schedules and timely alerts.
Auto Scaling Graph for Two Weeks (# of active web servers in the cluster over time)
(Spikes on 11/4 and 11/5 were for the Chicago Cubs World Series parade and rally)
Our proprietary mix of tools and processes were utilized from the start, along with our DevOps philosophy and approach to environment architecture. From infrastructure build and deployment to application deployment and Continuous Integration (CI), we make sure to always operate in a way that is consistent with current best practices in the DevOps space. This approach allows for much greater efficiency, consistency and repeatability within the environments we manage – and the Metra project was no exception. We utilized tools like Ansible, CloudFormation, Git, Jenkins and others to make sure all changes to the environment were vetted, self-documenting, easily rolled back in case of issues, and well-orchestrated. As a result, Metra benefits from quicker, cleaner and more seamless deployment of their applications and cloud infrastructure.
The new Metra environment has over 28 different CI jobs that support three different teams of developers. Typically, coordinating deployments and pushes would be complicated and error-prone. Because of how we implemented CI, each development group can take control of their own deployments in a consistent fashion.
What does all this mean for Metra and its riders? Rapid deployments, faster fixes, and accurate data reaching riders. There will also be more frequent feature releases and a more stable environment overall.
The DevOps approach also contributes to PCI compliance because the automation process documents releases and deployments, and helps account for changes to the environment. With traditional, old-school deployments, systems and software engineers may have had to log in to servers and manually deploy/install new versions of custom software. By using a DevOps approach and Continuous Integration tools, we eliminated the need for much of this, as Metra’s developers and engineers can simply rely on external processes to deploy new versions of software. Building solutions that keep these developers and engineers from logging into servers directly reduces the PCI burden and takes many of the access concerns out of scope.
Saving Metra 50% over their previous contract
As of the launch, the agency’s new web services provider is expected to save Metra 50 percent, or about $400,000 a year, over their previous contract.
Development costs are also less, and the open-source platform means that Metra can perform both support and development in-house, saving money and ensuring timely updates to the site and its content.
“Metra is proud of our new site. We’ve included enhancements that directly improve our customers’ ability to make the best travel decisions for themselves. For instance, the schedule finder tool has been upgraded to provide more information: customers can decide whether to view the schedule between two stops or the whole schedule for the line, and the results will show if the train is running behind schedule or if there are any other service changes affecting that train, such as a decision to add or skip stops. Innovations like these are possible because of the technical choices we’ve made, and those same choices will allow us to continue to innovate for our customers.”