For Data to Live Long and Prosper

On February 25, the AMS released its new policy on citations for data sources in journal articles. We were all set to tell authors about it when sadly, far bigger news stole the attention of scientists everywhere. The great creator of Spock, actor Leonard Nimoy, had died. Within two days, the story of data policy had become the story of Star Trek.
“That’s not logical,” you say.
OK, we’re not Vulcan, but even a human can see this. Data. Spock. Now is the time to bring them together.
Nimoy made an improbable—some would say illogically great—impact on society masquerading as a half-Vulcan, half-human creature named Spock hurtling through space on both the small and big screens. The tributes following Nimoy’s death last week have spoken of his ability to transcend the seeming limitations of such a curious role. Nimoy embodied racial ambiguity in a time of prejudice, ennobled diplomacy and rationality in an age of war, and gave voice to those who feel alien in their own neighborhoods and schools.
Of all the dualities in Spock’s character—so brilliantly portrayed by an immigrant’s son who skipped college—arguably the most explicit was as the science officer on bridge of the “Enterprise.” His struggle to remain true to the Vulcan creed of logic without emotion was a perfect expression of science in its time. For nerds of the 1960s and ‘70s, Spock’s reliance on logic echoed the haughty aloofness with which popular culture characterized scientists of the Cold War. But through his formidable devotion to knowledge, truth, and teamwork—working through all the pointy-eared social awkwardness he faced among his crew-mates– Spock somehow made science a new kind of “cool” long before geeks made billions of bucks with computers.
The thing is, scientists are a duality, much as Spock and Captain Kirk were two sides of a coin. They get emotional about two things. One is logic. Scientists, like mathematicians, get dewy-eyed about beautiful theories, elegant proofs, and ingenious solutions. The other is data. Unlike Spock, they work themselves into a frenzy over data. The best way to make scientists swoon is to produce data that reveal secrets.
For science to live long and prosper, that data need to be treasured like a home planet. For a long time, most scientific publishers thought it was good enough that journal authors would casually mention data archives in their Acknowledgments. In this age of computer models and constantly updating technology, that’s not good enough. Now authors must use carefully sourced and dated formal citations and references that in turn lead to safeguarded, easily accessible repositories. The author’s guide online gives some helpful examples.
The new citation policy is just one step of many advancing data archive practices that were recommended in the AMS Statement on Full and Open Exchange of Data adopted in December 2013. That statement also calls on funding agencies to recognize the costs of managing data. It recognizes that data preservation and stewardship should be emphasized and discussed at meetings. It says AMS should promote conventions and standards for metadata to increase interoperability and usage, and that the Society should foster ways of deciding what data should be kept to improve preservation practices in the future.
AMS is not alone in this shift. There are others in the chain of research, publication, and archiving trying to do for data what Spock did for logic. Our Society is one of the original members of a year-old team of publishers, data facilities, and consortia called the Coalition on Publishing Data in the Earth and Space Sciences. COPDESS is working to ensure that data are preserved through proper, secure funding, and that careful decisions are made about what should be saved.
Most importantly, this international movement toward protecting and providing data is meant to preserve the scientific process. Science needs published studies to lead to more studies that can confirm or reject findings. According to the AMS Statement,

AMS should strongly encourage an environment in which scholarly papers published in scientific journals contain sufficient detail and references to data and methodology to permit others to test each paper’s scientific conclusions.

All that depends on data being available in the review process as well as in perpetuity, with published results closely aligned with open archives.
Logic and Data: the duality of the scientific spirit. It is easy to celebrate one without the other, but it would not be proper. Spock would understand.

What to Do with Data in the Modern World

The last AMS statement related to data issues was written in 2002. But in the last 10 years, information technology advances have revolutionized data services, including how data are provided, accessed, analyzed, managed, shared, and archived. In Monday’s Town Hall Meeting on Free and Open Sharing of Environmental Data, UCAR’s Mohan Ramamurthy introduced a new AMS statement on data policy that is presently in production. Ramamurthy pointed out that unrestricted access to data is fundamental to the advancement of science, and that access should be free as much as possible. But issues of data can lead to difficult questions, some of them fundamental, like “What does free even mean?” Does it refer to access, cost, or both? And when talking about cost, who ultimately bears that cost?
The process of creating an AMS statement involves multiple steps over several months, and development of the new data statement is still in its early stages. Thus, in many cases, questions like the above are still being answered. And the subject of data has numerous angles to be considered in preparing the statement: curation/stewardship, metadata, timeliness, transparency, preservation, citation, and standards, to name a few. One of the more intriguing issues mentioned by Ramamurthy involves the potential for preplanned joint data collection partnerships between governmental and commercial entities during crisis situations. He cited Superstorm Sandy as an instance when the private sector had an abundance of data that was particularly valuable to the government. He compared this situation with what currently occurs between the defense sector and the aviation industry, when the government utilizes aircraft from private airlines for various purposes, and the companies are compensated for such use.
Among the preliminary recommendations made by the statement’s writing team are to design programs that reduce data-sharing barriers between the sectors of the AMS; ensure that all journal articles include sufficient details regarding information and methodology in order to verify the articles’ conclusions; and recognize data science as a career.
Ramamurthy emphasized that crafting the new statement is a process that should involve the entire AMS community. He invited members to comment on the statement by contacting him at [email protected].

Data Stewardship: A Basis for Change

Today’s Town Hall on Data Stewardship promises to be a good opportunity to consider how the atmospheric science community might reshape itself at the most fundamental level: the data underlying science and services.
According to Unidata’s Mohan Ramamurthy, chair of the AMS Ad Hoc

Mohan Ramamurthy, Unidata director, and chair of AMS Ad-Hoc Committee on Data Stewardship.

committee presenting its prospectus at the session (12:15 p.m.-1:15 p.m.; B211), data stewardship is not just a topic for people who specialize in archiving. Technology has made this a task for all of us. In the following email exchange, Ramamurthy made a good case to The Front Page for how data stewardship is basic to how the entire community interacts and progresses.
Why did AMS form an Ad Hoc Committee on Data Stewardship?
The AMS STAC Commissioner Roger Wakimoto was fielding a number of questions related to data, so in the fall of 2008 he began highlighting the importance of Data Stewardship. According to Roger’s report to the AMS Council in 2009

Our community is generating huge volumes of data from observing systems (especially remote sensors) and enormous outputs via numerical simulations. In addition, topics such as data archival, access (including free access between countries), maintenance, metadata, visualization, life expectancy are becoming critical at many institutions. In light of this background information, it was my suggestion that the AMS needed to propose a formal mechanism to recognize this important area that permeates our entire discipline.

How have stewardship needs changed in recent years?
Data has always been essential to the field. However, advances in computing, information and observational technologies have resulted in larger and larger volumes of diverse data being generated from many sources and they are being used/consumed by more and more people. What used to be the purview of just data centers and providers, has now become a responsibility for many more stakeholders. Also, there is increasing awareness of the importance of data and data stewardship. For example, the term “metadata” was not part of the scientific vernacular until recently, but now many more people understand it and recognize its importance. Similarly, until now there was no expectation that people would share datasets from their scientific studies or link them to publications with others. The best people did was give a URL where a reader of a scientific article could get a few additional plots or products. Today it is (theoretically) possible to link all of the data that went into a study (e.g., model output, model configuration, source code, derived analyses, etc.) in appropriate places in a paper. How to go about it? Who should be responsible for keeping those links and data sets alive in perpetuity?
How does AMS figure in this expansion of the possibilities, and responsibilities, of data stewardship?
We should understand that the AMS is not usually the producer of data sets. So the Society will have to work with data producers and data hosting/holding facilities. And how would AMS get authors to submit not just a manuscript but all of the data? And to where? Each area will have, among others, technical challenges, coordination and collaboration
challenges, and organizational challenges.
Also, data stewardship is a vast area, so we will have to scope it properly (i.e. limit it), or else nothing will get done.  Picking those key areas will be a challenge when the membership is diverse and you have a large number of stakeholders.
What might attendees learn from the Town Hall in Atlanta?
The purpose of the Town Hall is two fold. a) To inform attendees about what the Ad Hoc committee is thinking and our proposed plans/activities. b) To gather feedback on our thoughts/plans/prospectus as well as get additional input on what AMS members would like to see happen in this area.
Who will most benefit from attending?
Almost anyone who is interested in data stewardship issues (scientists, data providers, editors and publication commission folks, librarians, educators, IT personnel) will benefit or can attend and contribute their thoughts. But realistically, I expect a smaller subset of those people who are most interested in this subject to come to the Town Hall.
Will  feedback from attendees influence the committee’s task?
This is a brand new area for AMS. As such, everything is a work in progress. We want our mission and tasks to be shaped by the membership.