AMS Washington Forum: Unleashing Big Data and Big Discussion

Today at her keynote address to the AMS Washington Forum, U.S. Secretary of Commerce Penny Pritzker announced that NOAA is forming five new alliances to help bring its vast data resources to the public. The partnerships with Amazon Web Services, Microsoft Azure, IBM, Google, and the Open Cloud Consortium address the growing need for access to NOAA’s huge—and rapidly growing—environmental data resource.
That Secretary Pritzker’s announcement came at the opening of this year’s Forum is a testament to the sustained focus of these annual AMS gatherings in Washington, D.C. The Forum revisits recurring themes to build year-to-year unity—and progress—to the discussions. Last year, for example, the AMS Washington Forum participants focused on how data integration across disciplines and sectors drives the effectiveness of the weather, water, and climate enterprise. The Forum found that

Working across agencies and across sectors (e.g., health, energy) is becoming a new “normal” for solving problems. All agree the needs and demands for data, information and forecasts are continuing to change, so our enterprise must remain flexible and agile.

Though the context last year was more about the use of commercially provided data, this continuing Forum theme resonates with Secretary Pritzker’s announcement today. The new government-private sector partnerships are part of the overall movement toward “open government”–accessible, consistent data practices—that should enhance the flexibility and agility emphasized at the AMS Forum last year.
Forum participants also generally agreed last year that “while the private sector needs to take on a bigger role in the provision of weather data, the public and private sectors need more time to jointly determine the best path forward.” And indeed at that time NOAA was in an information-gathering phase preparing for the partnerships announced today. The agency issued a Request for Information (RFI) in February 2014 to see who might be able to help move NOAA data onto the cloud. Commercial partnerships would, according to the RFI, help pull together disparate NOAA sources and web sites and help people “find and integrate data from these sources for cross-domain analysis and decision-making.”
Data integration was not the only motivation. Being the main provider of its own data saddles government agencies with burgeoning information technology needs.
In a separate email news letter today, NOAA Administrator Kathryn Sullivan elaborated on the scope of the Big Data need:

Of the 20 terabytes of data NOAA gathers each day — twice the data of the entire printed collection of the United States Library of Congress — only a small percentage is easily accessible to the public.

The cloud was a way to alleviate this situation, as the RFI stated:

NOAA anticipates these partnerships will have the ability to rapidly scale and surge; thus, removing government infrastructure as a bottleneck to the pace of American innovation and enabling new value-added services and unimaginable integration into our daily lives.

Private sector cloud services have a history of meeting such challenges. The cloud services are able not only to store the huge quantities of data NOAA produces each day but also to provide opportunities for cloud-based applications. This means information processing is possible remotely so that each user does not need to have his or her own advanced infrastructure to move and manipulate vast troves of data. Thus, working in parallel with traditional NOAA data distribution channels, cloud services are expected to enable widespread use of Big Data and to drive private-sector development of applications.
The continued AMS discussions here in D.C. over Wednesday and Thursday will further amplify such continuing themes as Big Data, providing an especially rewarding venue for participants who can return year after year to the Forum.  For example sessions tomorrow on “Rail and Trucking” and “Information Needs for Water Related Extremes” hinge in part on data dissemination. Surface transportation was one of the panel topics last year, as well, meaning repeat participants this year will have an opportunity to update their earlier impressions and find out how opportunities in that field are progressing.
By reaching out to the innovators of the cloud, NOAA stated it was

looking for partners to incite creative uses and innovative approaches that will tap the full potential of its data, spur economic growth, help more entrepreneurs launch businesses, and to create new jobs.

That’s pretty much the same reason leaders of the weather, water, and climate enterprise return year after year to the AMS Washington Forum.

For Data to Live Long and Prosper

On February 25, the AMS released its new policy on citations for data sources in journal articles. We were all set to tell authors about it when sadly, far bigger news stole the attention of scientists everywhere. The great creator of Spock, actor Leonard Nimoy, had died. Within two days, the story of data policy had become the story of Star Trek.
“That’s not logical,” you say.
OK, we’re not Vulcan, but even a human can see this. Data. Spock. Now is the time to bring them together.
Nimoy made an improbable—some would say illogically great—impact on society masquerading as a half-Vulcan, half-human creature named Spock hurtling through space on both the small and big screens. The tributes following Nimoy’s death last week have spoken of his ability to transcend the seeming limitations of such a curious role. Nimoy embodied racial ambiguity in a time of prejudice, ennobled diplomacy and rationality in an age of war, and gave voice to those who feel alien in their own neighborhoods and schools.
Of all the dualities in Spock’s character—so brilliantly portrayed by an immigrant’s son who skipped college—arguably the most explicit was as the science officer on bridge of the “Enterprise.” His struggle to remain true to the Vulcan creed of logic without emotion was a perfect expression of science in its time. For nerds of the 1960s and ‘70s, Spock’s reliance on logic echoed the haughty aloofness with which popular culture characterized scientists of the Cold War. But through his formidable devotion to knowledge, truth, and teamwork—working through all the pointy-eared social awkwardness he faced among his crew-mates– Spock somehow made science a new kind of “cool” long before geeks made billions of bucks with computers.
The thing is, scientists are a duality, much as Spock and Captain Kirk were two sides of a coin. They get emotional about two things. One is logic. Scientists, like mathematicians, get dewy-eyed about beautiful theories, elegant proofs, and ingenious solutions. The other is data. Unlike Spock, they work themselves into a frenzy over data. The best way to make scientists swoon is to produce data that reveal secrets.
For science to live long and prosper, that data need to be treasured like a home planet. For a long time, most scientific publishers thought it was good enough that journal authors would casually mention data archives in their Acknowledgments. In this age of computer models and constantly updating technology, that’s not good enough. Now authors must use carefully sourced and dated formal citations and references that in turn lead to safeguarded, easily accessible repositories. The author’s guide online gives some helpful examples.
The new citation policy is just one step of many advancing data archive practices that were recommended in the AMS Statement on Full and Open Exchange of Data adopted in December 2013. That statement also calls on funding agencies to recognize the costs of managing data. It recognizes that data preservation and stewardship should be emphasized and discussed at meetings. It says AMS should promote conventions and standards for metadata to increase interoperability and usage, and that the Society should foster ways of deciding what data should be kept to improve preservation practices in the future.
AMS is not alone in this shift. There are others in the chain of research, publication, and archiving trying to do for data what Spock did for logic. Our Society is one of the original members of a year-old team of publishers, data facilities, and consortia called the Coalition on Publishing Data in the Earth and Space Sciences. COPDESS is working to ensure that data are preserved through proper, secure funding, and that careful decisions are made about what should be saved.
Most importantly, this international movement toward protecting and providing data is meant to preserve the scientific process. Science needs published studies to lead to more studies that can confirm or reject findings. According to the AMS Statement,

AMS should strongly encourage an environment in which scholarly papers published in scientific journals contain sufficient detail and references to data and methodology to permit others to test each paper’s scientific conclusions.

All that depends on data being available in the review process as well as in perpetuity, with published results closely aligned with open archives.
Logic and Data: the duality of the scientific spirit. It is easy to celebrate one without the other, but it would not be proper. Spock would understand.

What to Do with Data in the Modern World

The last AMS statement related to data issues was written in 2002. But in the last 10 years, information technology advances have revolutionized data services, including how data are provided, accessed, analyzed, managed, shared, and archived. In Monday’s Town Hall Meeting on Free and Open Sharing of Environmental Data, UCAR’s Mohan Ramamurthy introduced a new AMS statement on data policy that is presently in production. Ramamurthy pointed out that unrestricted access to data is fundamental to the advancement of science, and that access should be free as much as possible. But issues of data can lead to difficult questions, some of them fundamental, like “What does free even mean?” Does it refer to access, cost, or both? And when talking about cost, who ultimately bears that cost?
The process of creating an AMS statement involves multiple steps over several months, and development of the new data statement is still in its early stages. Thus, in many cases, questions like the above are still being answered. And the subject of data has numerous angles to be considered in preparing the statement: curation/stewardship, metadata, timeliness, transparency, preservation, citation, and standards, to name a few. One of the more intriguing issues mentioned by Ramamurthy involves the potential for preplanned joint data collection partnerships between governmental and commercial entities during crisis situations. He cited Superstorm Sandy as an instance when the private sector had an abundance of data that was particularly valuable to the government. He compared this situation with what currently occurs between the defense sector and the aviation industry, when the government utilizes aircraft from private airlines for various purposes, and the companies are compensated for such use.
Among the preliminary recommendations made by the statement’s writing team are to design programs that reduce data-sharing barriers between the sectors of the AMS; ensure that all journal articles include sufficient details regarding information and methodology in order to verify the articles’ conclusions; and recognize data science as a career.
Ramamurthy emphasized that crafting the new statement is a process that should involve the entire AMS community. He invited members to comment on the statement by contacting him at [email protected].

Getting Remote Data to Remote Regions

While Internet connections in more remote regions of the world have improved over the years, connectivity challenges still inhibit delivery of scientific data to people who need it. This past month the situation has gotten a little better, thanks to some international collaborations involving satellite data.
Often remote places are in developing countries that lack funding for the state-of-the-art connectivity necessary for scientific information. Back in 2003, in a BAMS essay, “The ‘Information Divide’ in the Climate Sciences,” Andrew Gettelman addressed the struggles of scientists in developing countries to keep up with the rest of the world in increasingly technology driven times. In visits to a number of countries around the world, Gettelman found slow or nonexistent internet access, outdated operating systems, and other hurdles limited the ability of these scientists to keep up with the literature and access data, among other problems.

The information divide is not unique to the atmospheric and related sciences. However, because of the unique role that timely information plays in forecasting, and the need for data for climate studies, the divide may be especially critical in these disciplines.  Our science is global, affects people globally, and requires global information.

Five years later Michel Verstraete of the European Commission Joint Research Centre Institute for Environment and Sustainability (JRC-IES) still found limited internet access when participating in a field campaign in 2008 to study the environment around Kruger National Park in South Africa. JRC-IES and South Africa’s Council for Scientific and Industrial Research (CSIR) joined forces to address the problem of accessing large satellite data files crucial in research related to sustainable development and other environmental studies. NASA became involved the following year, when the problem of electronic access became obvious during a workshop in South Africa on use of Multi-angle Imaging SpectroRadiometer (MISR) data.
The solution: NASA recently shipped 30 TerraBytes of MISR data directly to a distribution center in Africa. CSIR will manage the center and offer free access to researchers in the region. Verstraete, along with members of the other agencies, plans to upgrade connectivity and encourage participants to share data.  Verstraete says he hopes this collaboration will strengthen academic and research institutions in southern Africa.
Adds Bob Scholes, CSIR research group leader for ecosystem processes and dynamics at NASA,

The data transfer can be seen as a birthday present from NASA to the newly formed South African Space Agency. It will kick start a new generation of high-quality land surface products, with applications in climate chance and avoiding desertification.

Last month NASA also joined up with the U.S. Agency for International Development a new node for accessing satellite and other environmental information through the web-based SERVIR system. This time the local collaboration is with the International Centre for Integrated Mountain Development. ICIMOD analyzes geophysical monitoring and predictive information and also can disseminate the information through its relationships with the region’s decision makers. Remote sensing is critical in monitoring sparsely populated, difficult-to-access mountainous areas of the Hindu-Kush-Himalaya region—which includes Afghanistan, Bangladesh, Bhutan, China, India, Nepal, Myanmar, and Pakistan. SERVIR addresses issues of land cover change, air quality, glacial melt, and adaptation to climate change and other crucial issues in the mountainous region.
As Gettelman concluded in his article:

Perhaps the most important recommendation is that, as we restructure the model of scientific communication in the information age, we ensure that it benefits the maximum number of people. The greatest gains in terms of lives saved and mitigation of the impacts of weather extremes and changes in the climate can most likely come from not just improving the state of knowledge but improving the access to existing knowledge and information by scientists, forecasters, and policy makers around the world.