Semester 1 Week 4: Data Mining and Data Curation

In digital humanities and humanities research, data, especially digital data, plays an increasingly important role. Furthermore, the organization and activity of making this data, information, and research available to the public has required a significant transformation in order to align with emerging trends and needs of digital humanists. The work of researchers like Christof Schöch, in “Big? Smart? Messy? Data in the Humanities”, and Trevor Muñoz, in “Data Curation as Publishing for the Digital Humanities”, provide a straightforward way of appreciating the concepts of data in the humanities and data curation-as-publishing.

Schöch introduces the notion of data in the humanities as something that is not always obvious but like any other research field is nonetheless present. He offers several appropriate definitions of data, from which the general notion can be derived that it is a machine-actionable abstraction of objects of humanities inquiry. The author then goes on to juxtapose the two types of data, smart data and big data, in order to propose a sophisticated mix, called smart big data, which makes use of a combination of automation and crowdsourcing.

Schöch’s discussion on data in the humanities was rather clear and digestible. I could absolutely relate to his colleagues in that, at a glance, I may not be able to classify paintings and movies as data in the same way that I might be able to classify a spreadsheet of figures since it does not quite fit the definition that I have always attached to the term. While I am still not completely capable of doing so now, I have a better understanding of the requirements. Furthermore, it was quite fascinating to learn about the big data phenomena within the humanities space and I am quite curious to see how humanists perpetuate and utilize smart big data.

Taking matters a step further, Muñoz assesses the publishing of this data in the changing environment of humanities scholarship. New dimensions, notably data curation, must be applied to the idea of publishing since it not only encompasses current trends in knowledge production and dissemination within digital humanities, but it also aligns with values of librarians in ways that other methods do not. By referencing and critiquing proposed models, Muñoz also demonstrates how data curation-as-publishing should be re-envisioned rather than adjusted with pre-existing workflows, and that this will eventually necessitate modification to the interaction of libraries and data collections.

In this article, we can clearly see how the application of digital technologies to a field of study will bring forth new trends and as such a need for adjustment to existing practices. In that regard, I believe that it is important for humanists and librarians to come to an agreement as to the manner in which to usher in this change, as Muñoz has made it clear that there are still many models that do not adequately account for the needs of both parties. Additionally, I agree with the author’s recommendation that librarians adopt a more hands-on approach to data curation-as-publishing, as I can imagine that outsourcing being quite costly the more complex the project or data set is and the higher the demand for it becomes. Furthermore, I believe that librarians must not discredit platforms like Zotero in their data curation activities because they can greatly benefit from its provision of digital information produced in the course of research which can further be shared among other scholars as a potential input for further research.

All in all, Schöch and Muñoz have demonstrated that as digital humanities continue to change so too must considerations be made for the transformations of data and its publishing within the space.