Thursday, April 1, 2021

 

Weather Service Internet systems are crumbling as key platforms are taxed and failing

Most of the agency’s online systems went down Tuesday, and during last week’s tornado outbreak in the South, a vital resource for relaying information crashed

Image without a caption

The National Weather Service experienced a major, systemwide Internet failure Tuesday morning, making its forecasts and warnings inaccessible to the public and limiting the data available to its meteorologists.

The outage highlights systemic, long-standing issues with its information technology infrastructure, which the agency has struggled to address as demands for its services have only increased.

In addition to Tuesday morning’s outage, the Weather Service has encountered numerous, repeated problems with its Internet services in recent months, including:

  • bandwidth shortage that forced it to propose and implement limits to the amount of data its customers can download;
  • the launch of a radar website that functioned inadequately and enraged users;
  • a flood at its data center in Silver Spring, Md., that has stripped access to key ocean buoy observations; and
  • multiple outages to NWS Chat, its program for conveying critical information to broadcasters and emergency managers, relied upon during severe weather events.

Problems with the stability and reliability of the Weather Service’s information dissemination infrastructure date back to at least 2013, when Capital Weather Gang began reporting on the issue.

The Weather Service is working to evaluate and implement solutions to these problems which are, in the meantime, impacting its ability to fulfill its mission of protecting life and property.

‘Major, national outage’ Tuesday

Tuesday morning’s outage meant the Weather Service’s flagship website, weather.gov was down, cutting off access to its forecasts and warnings.

“There is a major, national outage impacting the distribution of NWS products,” tweeted the Weather Service’s Weather Prediction Center in College Park, Md.

The Weather Service’s central operations center issued a bulletin at 5:11 a.m. highlighting failures nationwide, which included its forecast offices losing contact with the agency’s networks “impacting product dissemination and data reception,” inoperable websites and no access to NWS Chat.

The lack of data limited what model data and observations Weather Service meteorologists could use to make forecasts.

Meteorologists and Weather Service constituents took to Twitter to complain about the outage, many noting the chronic issues with its Internet services:

  • “Why do things like this keep happening? It’s inexcusable at this point. The folks at NWS are constantly dealing with IT hurdles to get their message out in recent months. The frequency and complications are about the absolute worst I’ve seen,” tweeted Matt Lanza, a Houston-based meteorologist in the energy industry.
  • “There are absolutely no words appropriate for twitter that can describe how maddening it is that in the year 2021, the richest and most powerful government on Earth cannot get lifesaving weather forecasting information to its citizens because of an internal internet outage,” tweeted Jack Sillin, a meteorology student at Cornell University.
  • “The perpetual tech issues that NWS has to deal with are completely unacceptable. The response capabilities of the entire country are undermined when this happens,” tweeted Samantha Montano, a disaster specialist.
  • “The @NWS outages are just part and parcel of our country’s massive infrastructure problems. It’s hard to imagine meaningful climate resilience without addressing our literally crumbling bridges, broken roads, and 1995 data services,” tweeted Kathie Dello, the state climatologist for North Carolina.
  • “A seven hour outage of the NWS heading into the peak of severe weather season.....so lucky that it was an extremely quiet evening. Fiber cut or not, this is not the beginning or end of IT issues in the NWS. I’d demand congressional investigation into this before the pimple pops,” tweeted Victor Gensini, a professor of meteorology at Northern Illinois University.

By midmorning Tuesday, the Internet problems appeared to be resolved, but cast new light on numerous other information technology problems the Weather Service has faced in recent weeks and months.

‘Please do not use NWS Chat’

The Weather Service’s chat system has proved to be one of its more unreliable systems, failing in multiple instances, including dangerous severe weather situations.

The issues have become so widespread and inveterate that one Weather Service office attempted to abandon NWS Chat for an external program, a move rebuked by Weather Service headquarters in Silver Spring.

On March 15, the Weather Service office in Birmingham, Ala., sent an email to media partners about its decision to switch to Slack, an instant messaging program, ahead of the tornado outbreak March 17 that unleashed nearly 50 twisters.

“In the interest of public safety and due to factors beyond our control, NWS [Birmingham] will be SWITCHING to Slack Chat as our PRIMARY means of realtime communication until such a time that NWS Chat is proven stable, reliable and has a reliable backup service in place,” read the email sent by warning coordination meteorologist John De Block.

The email, obtained by The Washington Post, noted that NWS Chat would become the office’s new backup service, and provided media partners and emergency managers a link to sign up for the Slack group, which would go live the next morning.

“We … believe this is the best option for all of us at this time,” wrote De Block.

The Weather Service in Birmingham declined to comment on its switch to Slack, but received instruction from higher-ups not to do it again.

“Offices were provided guidance not to procure their own alternative platforms. This function is the responsibility of NWS headquarters, not individual offices,” wrote Weather Service director of public affairs Susan Buchanan in an email.

On Thursday, a second “high risk” tornado outbreak was forecast, but the Weather Service in Birmingham was required to utilize NWS Chat instead of Slack this time.

Knowing heavy use would crash the chat program, Weather Service forecasters pleaded with partners outside affected areas to stay off it to conserve bandwidth.

“If you’re not in a severe wx [weather] risk area today, please do not use NWSChat,” tweeted Rick Smith, warning coordination meteorologist for the Weather Service’s office in Norman, Okla.

Many meteorologists on Twitter echoed Smith’s appeal to ease stress on the system. Nevertheless, the chat service still went dark for a time as deadly tornadic storm was swirling across the South.

James Spann, a veteran meteorologist whose home was struck while he was covering the tornadoes live on the air, said, “I think NWS Chat is down again,” during his broadcast.

Josh Johnson, another well-respected Alabama meteorologist credited with saving lives during the Lee County tornado of 2019, tweeted, “the fact that we can’t use a reliable NWS chat platform is unbelievable — and dangerous to those we serve. It’s 2021.”

NWS Chat also went down for a time Saturday evening as tornadoes tore through Arkansas, Texas, Mississippi and Tennessee and deadly flash flooding engulfed Nashville.

“And down goes my beloved NWS Chat,” tweeted Daryl Herzmann, a systems analyst at Iowa State University who first helped develop and implement NWS Chat years ago. “I wish I could find some mechanism to help them fix it. [Tens] of unanswered emails so far.”

In a statement provided by Buchanan, the Weather Service described last week’s issues as “intermittent slowness and temporary outages,” acknowledging “we recognize the importance and need for this communication and coordination with local partners.”

Buchanan attributed the problems to a “combination of increased web traffic associated with the severe weather in the Southeast and the loss of one data center on March 9 due to a water pipe burst at NWS HQ in Silver Spring.”

But Troy Kimmel, a senior lecturer in meteorology at the University of Texas at Austin, said that the chat system was not a temporary issue and that problems have plagued it for many months. “It should be up and operating at 100 percent efficiency. It is inexcusable,” he said. “This thing’s going to roll back with congressional inquiries if it hasn’t already.”

Rather than rectifying the issues with NWS Chat, the Weather Service, recognizing its unsteadiness, is opting to pursue other options.

“The good news is that this spring, we will officially launch a demonstration project to assess the viability of commercially available, off-the-shelf products as a long-term replacement for NWS Chat,” wrote Buchanan. “We’re working to improve stability of the system in the short-term.”

Systemic Weather Service Internet issues

Problems with the Weather Service’s Internet systems have persisted for years, in part because of increasing demand from users, which the agency has struggled to meet.

In December, because of an escalating bandwidth shortage, the Weather Service proposed limiting users to 60 connections per minute on a large number of its websites.

Constituents complained about the quota and, earlier this month, the Weather Service announced it would instead impose a data limit of 120 requests per minute and only on servers hosting model data, beginning April 20.

“With this solution, access to all other NWS websites will not be affected,” said an email from the Weather Service to its partners.

The email also said the agency intends to upgrade its data server and network architecture using congressionally appropriated funds.

Meanwhile, on March 9, the Weather Service’s headquarters in Silver Spring “experienced a ruptured water pipe, which caused significant and widespread flooding,” which affected a data center, the agency said in a statement.

“Some NWS data stopped flowing, including data from ocean buoys,” the statement said, noting some of the buoys are used “to detect and locate a seismic event that could cause a tsunami.”

Neil Jacobs, former acting head of the National Oceanic Atmospheric Administration, which oversees the Weather Service, said many of the agency’s Internet infrastructure problems are tied to the fact they run on internal hardware rather than through cloud service providers such as Amazon Web Services, Microsoft and Google Cloud.

“I’ve demanded in writing that NWS transition these applications … to our Cloud partners. It’s part of an internal strategy I’ve laid out,” Jacobs, a Trump administration appointee, told the Capital Weather Gang in an email before he left office.

In July, NOAA released its Cloud Strategy, which stated, “the volume and velocity of our data are expected to increase exponentially with the advent of new observing system and data-acquisition capabilities, placing a premium on our capacity and wherewithal to scale the IT infrastructure and services to support this growth. Modernizing our infrastructure requires leveraging cloud services as a solution to meet future demand.”