Skip to main content

Muni Experts Troubleshoot Obsolete Control System to Keep Trains Running

Muni Experts Troubleshoot Obsolete Control System to Keep Trains Running
By Dan Howard

Two weeks ago, we experienced yet another subway train control system failure caused by aging equipment. A failure like this is certain to impact everyone working on or riding Muni. What’s not widely known is that the ingenuity and skill of Muni’s technical staff makes the difference between these failures crippling the system for weeks or for just a few hours.

On March 3, a control computer failed that governs part of the underground network of tracks and switches between Embarcadero Station and the surface, where most Muni Metro trains turn around. When our Signal Maintenance team is called to address a problem like this, all they start out knowing is that there are a bunch of “disturbed” switches and track segments.

The Automatic Train Control System, or ATCS, constantly watches over the system’s track and switches, and reports them as “disturbed” when it gets a peculiar reading, or when a system error prevents it from knowing whether the area is safe or dangerous. When this happens, the technicians methodically go through troubleshooting procedures, step by step, ruling out different components and subsystems as the cause.

Image of the failure that occurred March 3, 2020, from the TMC control center. Disturbed track switches are circled and disturbed track segments shown in red.

View of the failure that occurred March 3, 2020, from the TMC control center. Disturbed track switches are circled and disturbed track segments shown in red.

To do this successfully, Muni’s technicians need to have a solid familiarity for what behaviors and indications are “normal”—not an easy task in a system that has some of its original equipment dating back to the 1990s, mixed with other parts that have been swapped and re-swapped as the years go on. Last week, it was a night-shift technician’s sharp eye that caught a split-second oddity on the Axle Counter Evaluator, or ACE, a computer that monitors those train detectors in the trackway.

The Signal Maintenance crew found that the ACE was in an unusual low-power mode. After swapping out the power supply and bringing the computer to full power, it still wouldn’t boot. After changing some components it started up, but now one of two redundant control computers, called Intersigs, failed whenever both were switched on together. Despite this, each worked fine individually.

On Thursday morning they thought they found the culprit—a faulty connector that had been working faithfully since the 90s, allowing only one of the two Intersig computers to run at a time. But just as the crew was packing up their tools after replacing the faulty connector, both of the Intersigs failed again.

Photo of the local control center rack at the MMT, containing the Intersig computers

The local control center rack at the MMT, containing the Intersig computers

They restarted troubleshooting when a member of the crew noticed something unusual for a split-second while watching the flashing lights of the equipment. Although the two Intersigs failed, the ACE, the original piece of equipment that was having problems, had also failed very briefly, but recovered itself without declaring an error. Because it recovered so quickly and showed no indications or logs that it had failed, it had gone unnoticed.

To address the new ACE failure, the team increased the power supply and there were no more failures. The night shift team had finally found the root cause of the problem: The faulty power supply had damaged multiple pieces of equipment in the area, causing them to fail in different ways.

Without so many things going right—the sharp eye of the night crew, the dedicated systems knowledge of the technicians, the collaboration and turnover of information between work shifts and the willingness to stick to the methodology, it’s likely that this problem wouldn’t have been discovered so quickly.

Photo of the culprit of the March 3 subway train control system failure, an old power supply

The culprit of the March 3 subway train control system failure, an old power supply

Our train control system is a challenge to manage because it is both a technology system and a piece of critical infrastructure. In the United States, this sort of infrastructure is updated once or twice a century, but technology systems become obsolete at a much faster pace.

Like every other transit system in the country, Muni has been managing the train control system on the same timescale as infrastructure. That has left us with situations like this when components become outdated and ultimately fail.

Today, with a subway train control system approaching 30 years old, our success depends entirely on the prowess and dedication of our maintenance team, who are holding the system together. While we celebrate their capabilities to get us through events like this, we must rely on more than just the heroics of our staff to provide more reliable train service for San Francisco.

We must change the paradigm of how we procure, manage and maintain our train control systems. Muni’s rail network demands a modern train control system which is always kept up to date with the latest service-proven technology, and our customers deserve it.



Published March 16, 2021 at 03:32AM
https://ift.tt/3qRCzcu

Comments

Popular posts from this blog

Women Pioneers at Muni: Adeline Svendsen and Muni’s First Newsletter

Women Pioneers at Muni: Adeline Svendsen and Muni’s First Newsletter By Jeremy Menzies To close out Women’s History Month, here’s a look back at one woman whose work to bring Muni staff together in the late 1940s created a legacy that lives on to this day. Adeline “Addy” Svendsen was founding editor of Muni’s first internal newsletter, “ Trolley Topics .” Adeline Svendsen sits at her desk in the Geneva Carhouse office building in this 1949 shot. Trolley Topics was a new venture when it started in February 1946. As Svendsen wrote in the first issue it was created, “to bring a little fun, a little news, and a lot of good will to all our fellow employees in the Railway.” Just two years prior in 1944, Muni merged with the Market Street Railway Company, expanding the small municipal operation into the largest transit provider in the city with hundreds of employees, vehicles of every shape and size, and dozens of facilities scattered across town. The newsletter was meant to help unite ...

Show HN: StreetComplete, an OpenStreetMap Editor for Humans https://ift.tt/2J8IL02

Show HN: StreetComplete, an OpenStreetMap Editor for Humans StreetComplete is an OpenStreetMap[0] editor directed at people who want to contribute and want to do this using their smartphone, without learning how to edit things[1]. It is available as an Android application. It is intended to be used as one walks, with quests appearing as markers on the map. Selecting a marker allows one to answer a simple question. The answer will be added to the OpenStreetMap database, with app handling selecting objects for editing, transforming answer into OSM tags and making edits. OpenStreetMap account is needed to apply edits, but it is possible to start without it, make some edits and login/register later. Note: I am not the main author, but I am one of the active contributors. Github page is at https://ift.tt/2g8lasH and https://ift.tt/3nR9PzS shows what was recently released. [0]OpenStreetMap is a Wikipedia of maps, available on the open licence. This dataset is already used for many interestin...

Show HN: Launch VM workloads securely and instantaneously, without VMs https://ift.tt/2QwJ1Kd

Show HN: Launch VM workloads securely and instantaneously, without VMs Hello HN! We've been working on a new hypervisor https://kwarantine.xyz that can run strongly isolated containers. This is still a WIP, but we wanted to give the community an idea about our approach, its benefits, and various use cases it unlocks. Today, VMs are used to host containers, and make up for the lack of strong security as well as kernel isolation in containers. This work adds this missing security piece in containers. We plan on launching a free private beta soon. Meanwhile, we'd deeply appreciate any feedback, and happy to answer any questions here or on our slack channel. Thanks! April 29, 2021 at 07:50AM