• Part of
    Ubiquity Network logo
    Contact us Submit a book proposal

    Read Chapter
  • No readable formats available
  • Mining Art History: Bulk Converting Nonstandard PDFs to Text to Determine the Frequency of Citations and Key Terms in Humanities Articles

    Amanda Wasielewski Anna Dahlgren

    Chapter from the book: Petersson, S. 2021. Digital Human Sciences: New Objects – New Approaches.

     Download
    Buy Paperback

    Text mining in art history scholarship can tell us about the discipline itself, as well as artistic concerns at any given moment. The aim of this study is to develop and test a strategy for text mining from PDFs of journal articles that have nonstandard formatting and/or use notes rather than full bibliographies for references. While articles in the natural and social sciences typically adhere to standard formats, art history journals employ a variety of formatting styles that make bulk capture of citation and other textual data from the articles challenging. This study outlines a method by which researchers can extract data from journals articles, using a sample set from art history. Once extracted, the data from PDFs can be used to compare frequently used terms across samples and determine which scholars are most cited in either bibliographies or the main body text of articles. If the structure and layout of individual journals are carefully considered and the data is properly cleaned, a clear picture of the disciplinary influences and dependencies of the scholarship through citations and key terms can be obtained.

    Chapter Metrics:

    How to cite this chapter
    Wasielewski A. & Dahlgren A. 2021. Mining Art History: Bulk Converting Nonstandard PDFs to Text to Determine the Frequency of Citations and Key Terms in Humanities Articles. In: Petersson, S (ed.), Digital Human Sciences. Stockholm: Stockholm University Press. DOI: https://doi.org/10.16993/bbk.l
    License

    This is an Open Access chapter distributed under the terms of the Creative Commons Attribution 4.0 license (unless stated otherwise), which permits unrestricted use, distribution and reproduction in any medium, provided the original work is properly cited. Copyright is retained by the author(s).

    Peer Review Information

    This book has been peer reviewed. See our Peer Review Policies for more information.

    Additional Information

    Published on June 8, 2021

    DOI
    https://doi.org/10.16993/bbk.l


    comments powered by Disqus