CurateCamp Processing: Processing Data/Processing Collections

There's a new CURATEcamp in the works! From:

http://blogs.loc.gov/digitalpreservation/2012/06/curatecamp-processing-p...

Alongside this year’s NDSA/NDIIPP conference, DigitalPreservation 2012, we are excited to try out another kind of meeting, an unconference. In conjunction with DigitalPreservation 2012 we are going to play host to a CurateCamp. For those unfamiliar with unconferences, the key idea is that the participants define the agenda and that there are no spectators, everyone who comes should plan on actively participating in and helping to lead discussions. Everybody who participates should come ready to work.

We are focusing this camp on the idea of processing, bringing together the computational sense of the word with the archival sense of it. We are particularly excited about bringing together archivists and curators with software developers and engineers to do some creative thinking and tinkering. You can read up on the topic below. We will be opening up registration for the camp, and posting information about where exactly in the DC metro area we will be hosting the event, but we wanted to make sure those interested could put it on their calendars now. The camp will be the last day of DigitalPreservation 2012, July 26th and it is being facilitated by myself and Leslie Johnston from the Library of Congress and Meg Phillips, Electronic Records Lifecycle Coordinator at the National Archives and Records Administration and Mark Matienzo, Digital Archivist at Yale University.

If you are interested in participating, or just have ideas for things you would love to see campers engage with, take a minute to post a comment about an idea you have for a session in the comments of this post. Consider posing some questions you would like the group to think about tackling in some of the sessions.

Processing Data/Processing Collections

Processing means different things to an archivist and a software developer. To the former, processing is about taking custody of collections, preserving context, and providing arrangement, description, and accessibility. Processing, in its analog archival sense, also includes a lot of preservation, (stabilization, preliminary conservation assessment, and the dreaded “re-housing”). To the latter, processing is about computer processing and has to do with how one automates a range of tasks through computation. When a cultural heritage organization’s work is organized around processing digital objects, these two notions of processing intermingle. This CurateCamp unconference is intended to put these two notions of processing together in whatever ways can be imagined by the curators, archivists, librarians, scholars, software developers, computer engineers, and others that attend.

Potential topics and considerations could include:

Automated inventorying and file characterization
Computational determination of hierarchical arrangement
Format validation & migrations
Automated metadata extraction
Potential roles for entity extraction in subject cataloging
Dynamically generated description
Malware scanning
Pattern & fuzzy searching for PII, SSNs, etc
Automated access restrictions
Generating visualizations and using them as access tools
Human computation’s potential role in cultural heritage collections
Machine learning and digital collections
Using name authority linked data
Processes for geo-refferencing
Potential uses of facial recognition tools for identifying individuals in collection images