Today's Data are Part of Tomorrow's Research: Archival Issues in the Sciences

Scientific data are essential for training in science and informed decision-making regarding health, the environment, and the economy. Cumulative data sets assist with understanding trends, frequencies and patterns, and can form a baseline upon which we can develop predictions. This paper discusses the preservation of scientific data, providing an overview of the characteristics of scientific data and scientific-data portals from a variety of fields, with a focus on data quality, particularly accuracy, reliability and authenticity, and how these are captured in metadata. These concepts are broadly defined from both scientific and archival perspectives. Based on an extensive literature review of publications from national and international scientific organizations, government and research funding bodies, and empirical evidence from a selection of InterPARES 2 Case Studies and General Study 10, which investigated thirty-two scientificdata portals, the paper includes a brief examination of machine-base “knowledge representation” (KR) and the potential implications for the preservation of scientific data, with a particular focus on formal ontologies. The paper also discusses the concept of record in the context of Web 2.0 environments, the paucity of scientific data archives, and the lack of funding priorities in this area. It is argued that archivists will have to work closely with scientific-data creators to understand their practices, that data portals are mechanisms that archivists can use to extend their preservation practices, and that it is not technology that is impeding progress regarding the preservation of scientific data; it is a lack of funding, policy, prioritizing, and vision allowing our scientific national resources to be lost.


Les données scientifiques sont essentielles à la formation en sciences et à la prise de décision éclairée au sujet de la santé, de l’environnement et de l’économie. Les ensembles de données cumulatives aident à comprendre les tendances, les fréquences et les courants, et ils peuvent servir de base pour développer des prévisions. Cet article se penche sur la préservation des données scientifiques et des portails de données scientifiques d’un ensemble de domaines, en ciblant la qualité des données – surtout l’exactitude, la fiabilité et l’authenticité – et en examinant comment ces caractéristiques sont saisies par les métadonnées. Les auteurs donnent des définitions générales de ces concepts, dans des perspectives à la fois scientifiques et archivistiques. À partir d’une recension approfondie de la littérature sur le sujet (publications provenant d’organisations scientifiques nationales et internationales, d’organismes gouvernementaux et d’organismes de financement, ainsi que des observations empiriques d’un échantillon d’études de cas d’InterPARES 2 et de « General Study 10 » qui étudiaient 32 portails de données scientifiques), cet article examine sommairement la « représentation des connaissances » électronique (« machine-base “knowledge representation” [KR] ») et les répercussions possibles sur la préservation des données scientifiques, avec un accent particulier sur les ontologies formelles. Il présente aussi le concept de document dans le contexte d’un environnement Web 2.0, la rareté des archives sur les données scientifiques, et le fait que ce domaine ne figure pas souvent dans les priorités de financement. Les auteurs avancent que les archivistes devront travailler de près avec les scientifiques créateurs de données afin de comprendre leurs pratiques; que les portails de données sont des mécanismes dont les archivistes peuvent se servir pour parfaire leurs pratiques de préservation; et que ce n’est pas la technologie qui empêche le progrès en ce qui concerne les données scientifiques. C’est plutôt le manque de ressources, de politiques, de classement par ordre de priorités, et de vision qui occasionne la perte de nos ressources scientifiques nationales.

Tracey P. Lauriault is a doctoral student in the Department of Geography and Environmental Studies at Carleton University, and is a Canadian Graduate Scholar. She is part of the Project Management Team for the Cybercartography and the New Economy Project and is responsible for collaboration, archival research, transdisciplinary research, and olfactory cartography. She is lead researcher of the Cybercartographic Atlas of Antarctica Case Study for the International Research on Permanent Authentic Records in Electronic Systems (InterPARES 2) and leads the InterPARES 2 General Study of Archival Policies of Science Data Archives Repositories. She is founder of, which works towards making civic data available to citizens.
Barbara L. Craig is an associate professor of archives in the Faculty of Information Studies at the University of Toronto. She has a PhD in Archive Studies from the University of London, England. She has been vice-president and president of the Association of Canadian Archivists, senior associate and general editor of Archivaria, and the principal investigator in research projects that use survey and interview techniques to explore issues in the archival profession and to understand the views on archives that are held by user groups; this research has been published in The American Archivist, the Public Historian, and Archivaria. Her current research examines the impact of technologies on knowledge management in offices of the British Civil Service before 1960. In this area she has published research into the adoption of copying technologies before 1900 (The Archival Imagination: Essay in Honour of Hugh A. Taylor) and on the rethinking of formal knowledge and its practices between 1900 and 1950 (in Archival Science, vol. 2, nos. 1–2). Professor Craig also continues to pursue her interests in the form and genre of records in public offices in the nineteenth century, organizational records management before World War II, and early office technologies and their work ecologies. In 2003 she worked with Phil Eppard and Heather MacNeil to organize and mount the first international conference on the history of records and archives, known ever since as I-CHORA, which was held in Toronto. The best papers from that conference were published in Archivaria 60 (Fall 2005).
D.R. Fraser Taylor is Distinguished Research Professor of International Affairs and Geography and Environmental Studies at Carleton University and director of the Geomatics and Cartographic Research Centre. His main research interests lie in the application of geomatics to the understanding of socio-economic issues. Current research includes a major SSHRC research project entitled “Cybercartography and the New Economy” which involves the creation of a Cybercartographic Atlas of Antarctica (which was a case study for the InterPARES 2 project), and a cybercartographic product, Canada’s Trade with the World. Among his numerous publications are a special issue of Cartographica on cybercartography (guest edited with Sébastien Caquard, April 2006), and Cybercartography: Theory and Practice (2005) (editor and contributor). Dr. Taylor is a member of the Canadian Committee for CODATA, a board member of the OGC (Open Geospatial Consortium) Interoperability Institute, and chairs the International Steering Committee for Global Mapping (ISCGM), an international body involving over 170 mapping agencies, which is producing a digital map of the world. He was secretary-treasurer of the Canadian Association of African Studies for fifteen years, president of the Canadian Cartographic Association, and president of the International Cartographic Association.
Peter Pulsifer is a research assistant and doctoral candidate with the Geomatics and Cartographic Research Centre, Department of Geography, Carleton University. In academic and industrial projects he has used and developed theory and tools in the domains of remote sensing, GIS, multimedia cartography, and geo-semantics. His current research is focused on the use of geographic information semantics and the link between scientific knowledge and environmental policy. He is an active member of the polar science geographic information and data management community.
Lauriault, Tracey P., Barbara Lazenby Craig, D.R. Fraser Taylor, and Peter L. Pulsifer. 1. “Today’s Data Are Part of Tomorrow’s Research: Archival Issues in the Sciences”. Archivaria 64 (1), 123-79.
