Fully Opening NASA Research Data To The Public (Update)
In 2013 the White House told NASA and other government agencies that they needed to make the results of their research more readily available to the public. In so doing the White House said that agencies needed to make research publications that had been available only for a fee available for free within 12 months of their publication. The public plaid for this science, the public should have access to it.
Update: this presentation was delivered at NASA Goddard Spaceflight Center regarding NASA’s plans to collect and post research data. Download.
In February 2013 the White House Office of Science and Technology Policy issued a memo which stated: “The Office of Science and Technology Policy (OSTP) hereby directs each Federal agency with over $100 million in annual conduct of research and development expenditures to develop a plan to support increased public access to the results of research funded by the Federal Government. This includes any results published in peer-reviewed scholarly publications that are based on research that directly arises from Federal funds.” It also stated that agencies must “Ensure full public access to publications’ metadata without charge upon first publication in a data format that ensures interoperability with current and future search technology. Where possible, the metadata should provide a link to the location where the full text and associated supplemental materials will be made available after the embargo period”.
Agencies are also directed to deal with embargoes as follows. They “shall use a twelve-month post-publication embargo period as a guideline for making research papers publicly available; however, an agency may tailor its plan as necessary to address the objectives articulated in this memorandum, as well as the challenges and public interests that are unique to each field and mission combination, and ii) shallalsoprovideamechanismforstakeholderstopetitionforchangingtheembargo period for a specific field by presenting evidence demonstrating that the plan would be inconsistent with the objectives articulated in this memorandum.”
This memo was followed in May 2013 with a formal memorandum M-13-13 asking for all government agencies to provide their plans on how all of their funded research results will be published in open fashion such that all taxpayers can access that data free of charge. Moreover it calls for information collected by agencies be done in machine-readable format so as to facilitate this enhanced access. This project has been termed “Project Open Data“.
In December 2014 NASA replied the White House’s request for a plan with “NASA Plan for Increasing Access to the Results of Scientific Research“. The plan deals with two major classes of information: data sets and peer-reviewed publications. NASA said that it would be implementing a system to provide open access to NASA research results to the public by October 2015. It is November 2015 and NASA has no such system in place – at least nothing that has been publicly mentioned. Nothing is mentioned on the NASA CIO webpage or the NASA Open Government page.
I sent an email to the NASA CIO (Chief Information Officer), NASA Chief Scientist, Science Mission Directorate, and NASA Public Affairs Office (PAO) asking “Can you provide me with a copy of the NASA plan and requirements whereby NASA research publications are made available to the public per OSTP guidance – the topic you will be discussing at NASA GSFC on 12 November such that I can share it with my readers? I certainly hope that you will not tell me to file a FOIA request since that would fly in the face of the intent of OSTP’s original guidance.”
NASA PAO promptly replied “In response to the OSTP data sharing initiative, NASA developed a plan to ensure the public has access to the results of federally-funded scientific research. The plan was approved by OSTP in December 2014 and a NASA Policy Directive (NPD) is in review. A web portal is under development that will house the plan, FAQs, and web links to NASA scientific research data. It is anticipated that the plan will be fully implemented next year. In the meantime, NASA is discussing the plan’s status in public venues as well as visiting NASA centers to discuss and answer questions about it. The plan and other relevant information can be found here: http://www.nasa.gov/news/reports/index.html – http://science.nasa.gov/researchers/sara/faqs/dmp-faq-roses/ “
There is an interesting and somewhat humorous warning in this FAQ NASA mentioned for those who do not comply: “However, if you don’t make an effort and or flagrantly flaunt your defiance of these requirements I will remind you that funded researchers, research institutions, and NASA centers are responsible for ensuring and demonstrating compliance with the DMPs approved as part of their awards. Remember, this is a directive from the white house and if you are really bad The President will call your dean and shame you. Just kidding, but awardees who do not fulfill the intent of their DMPs may have continuing funds withheld and this may be considered in the evaluation of future proposals, which may be even worse.” Let’s see if NASA leaves this page online …
If you read the NASA plan cited by NASA PAO it shows a timeline on page 20 that shows “implementation complete” in October 2015. The NASA Policy Directive work and internal planning that is still underway was supposed to have been completed a year ago according to this timeline. I sent several follow-up questions: “In reading the FAQ it only covers data/datasets – not published papers and, unless I am mistaken, this seems to be a HQ SMD webpage. Is there a separate effort for life science, microgravity science, aeronautics, IT, technology, and other non-SMD research results? Is there any connection between this effort and the NASA Spaceline summaries of life science research (today’s issue) that NASA pays to collect but does not post online? (I have a full archive going back to the 1990s online at my website) and NASA Astrophysics Data System, and NTRS? Or will these other activities just continue to function in an uncoordinated fashion? In essence, is NASA going to create its own version of PubMed so as to meet the OSTP requirement?”
NASA Deputy Chief Scientist Gale Allen was able to provide me with insight into this project. Their intent is ambitious, but if they pull off, NASA will have a substantially enhanced presence online in a way that a much broader audience will be able to access and utilize research results from NASA. Based on my discussion with Allen there is the intent to fully comply with the spirit and intent of what the White House has directed NASA to do. Delays in the implementation have had to do with some standard government procurement issues brought up by the FAR. These issues have more or less been worked out. The FAQ that was parked on a SMD website is actually applicable to the entire agency and will be followed by additional guidance in the near future.
A pre-beta version of a NASA-specific version of NIH’s pubMed is under development. Initial testing involving the uploading of actual NASA research papers has been successful. It is expected that a more finalized version of the NASA interface will be available for testing early in 2016. This is what NASA’s plan had suggested. IN its plan submitted to OSTP NASA said “Based on the criteria listed in the OSTP memo dated February 22, 2013, and the need for flexibility in incorporating future upgrades, NASA has chosen the NIH PMC platform. NIH has led in information retrieval for many years, and the PMC is a capable, mature, and low-risk platform that has evolved over time. NASA will arrange, on a reimbursable basis, to acquire the necessary ingest, Extensible Markup Language (XML) conversion, and accessibility services, as well as other collateral support, for compliance with OSTP memo requirements. Also on a reimbursable basis, NIH PMC will provide a NASA-branded portal to access the full functionality of the PMC system.”
Allen said that work is underway to make certain that NASA researchers (intramural and extramural) are prepared to submit their research publications in a format that is compatible with the NIH pubMed format. A series of internal briefings are being conducted to inform NASA personnel about how this will all come together. According to an internal GSFC memo: “NASA Deputy Chief Scientist Gale Allen, accompanied by SMD Lead for Research Max Bernstein, will visit Goddard to brief us on two topics of great interest to proposers and researchers: NASA’s new policy on data management and publication archiving. The policy responds to direction from the Office of Science and Technology Policy (OSTP) in the Executive Office of the President and pertains to all federally funded research. In short, our proposals are required to include Data Management Plans, and our research papers must be accessible to the public. Come and learn how you can satisfy the new requirements. Please mark your calendar and join me for this important briefing on Thursday, November 12, 2 – 3 pm.”
It is important to note that neither “Nature” or “Science” are listed as participating in the current NIH PMC system. So, any NASA-branded implementation is either going to be incomplete or require these (and other) journals to agree to allow their papers to be posted online within 12 months for free public access. Allen agreed and said that other journals not included in the current NIH pubMed system would need to be added so as to allow NASA-funded research to be adequately represented in this new data base – and that full research papers and links to data must be available within the 12 month embargo period limit specified by OSTP.
In addition to the inclusion of all new NASA-funded scientific research being posted in this new system, I would hope that additional work is done to help integrate older and historic information that is already in existing NASA data repositories such as the Planetary Data System, Spaceline, Astrophysics Abstracts, and community resources Such as arXiv.org e-Print archive. These days, the popularity of data mining prior research has become ever more popular. NASA has over half a century of research contained in numerous databases and archives – anything that facilitates enhanced access to all of these older resources serves to enhance the value of this half century of data.
In addition to the traditional forms of information NASA research generates i.e. datasets and published papers, a new form of information is emerging: genomic data. One bright new example is NASA’s Genelab which has begun to post genomic (DNA) sequences from organisms that have been used in ground-based and spaceflight experiments. Gale Allen told me that her team has already been in contact with Genelab and that they are working to be certain that their data systems are compatible (they are). The audience for this information has traditional resonances within the space biology and medicine world – but also has resonances far outside of NASA in areas the agency has yet to fully tap.
Side by side with life science research on ISS is materials science – much of which has direct overlaps with life science. So far NASA has not expanded its Spaceline service to go beyond biology and medicine but Gale Allen tells me that this is now under discussion as part of this broader issue of complying with OSTP’s data initiative.
Finally, as NASA’s Astrobiology and extrasolar planet research continues to expand, so will there be an increase in multi- and cross-disciplinary research papers and datasets – things that might not easily fit in the current, more focused discipline-oriented archives. Papers wherein astronomers, planetary geologists, biologists, and atmospheric chemists all join to discuss new worlds that have been discovered could easily find a home in half a dozen places online. NASA will be challenged as to how to bring these wide-ranging research results into their new database as well.
Ideally, there will be an App where everything that NASA does that publishes updates in a daily email – or by Twitter. Anything that broadens the visibility of- and access to NASA research serves to enhance the agency’s ability to participate in the nation’s broader research and commercial activities. Making the full datasets and research papers available at no charge to the citizenry further expands on that utility. All it takes is for NASA to use a proven system from NIH, adopt a common format (again, already proven), make everyone who gets NASA funding for research to follow the rules, make certain that the service is easily accessible to anyone who seeks it, and then stand back.