7 minute read

Final Guidelines for the Assignment:

The Corpus Assignment, otherwise known as Assignment 1, will be completed in one step. It builds on work we did in the textual portion of the class, particularly with Voyant Tools and the RMarkdown files in posit.cloud. This assignment can be done alone or in pairs.

  • Format: Individual or pairs (maximum 2 people)
  • Length: Approximately 1500 words (about an 8-minute read), plus visuals
  • Due Date: Saturday, 28 February 2026.

This exercise has three main elements:

  1. Corpus Selection: choosing a corpus of five or more texts you would like to work with
  2. Exploratory Analysis: using digital textual analysis tools to see what kind of exploratory data analysis (EDA) you can do using that corpus,
  3. Written Synthesis: Assemble your evidence, analysis, and visuals in a web-published essay in the form of a post that tells a coherent story about your findings. Make sure that one of your Voyant Tools visualization is a live widget embedded in your post.

Step 1: You will need to pick your corpus from the choices below.

Choice A. Five different books from the same category taken from Project Gutenberg (perhaps in different genres or repeat texts by one author).

Here are some ideas in PG:

If you choose one of these categories or another from PG, it will be a blast from the past, given the date of the texts.

Choice B. Five science fiction books from Project Gutenberg.

If you use the search function and look for science fiction you will find more than 4000 science fiction novel(la)s, many of which are written by the same authors. You could do some research on any of the authors in Wikipedia or in the Internet Speculative Fiction Database, and choose five texts by one author. If you know French, there is a large selection of science fiction too.

For example, if you use the search function and look for artificial intelligence you will find 20 science fiction novel(la)s on the topic. You could also look for terms such as alien or abduction or monsters.

Choice C. Five books about a certain geographical place.

  • If you would like to look into books separated by geographical region, try those bookshelves.

  • The bookshelf for India and the date of the books will give you an interesting vantage point for looking at colonial south Asia. You could combine that with a corpus created by a colleague found here.

Choice D. You could also choose five texts in a language other than English. NB: This will pose certain challenges in the analysis that you can write about in your assignment. Check here for books in Chinese Dutch Finnish French Italian Japanese Portuguese Russian Serbian Spanish Tagalog Telugu. There are other languages, but coverage is not uniform.

Choice E. A custom corpus made up of five different sets of articles from a single class or multiple classes in your major.

You have the option of using five article-length or longer files from your major, and you can even substituting one of the files with a fake genAI created one. NB: Generating a text of equivalent length with genAI may take considerable extra effort.

If you opt for choice E, please consult with the instructor before beginning this process. The texts can be short stories or articles, but not less than 2000 words. If you choose your own files, include a copy of all five of them in your assets folder and create links to them for your readers to examine. If you are not working with texts from Project Gutenberg, then you will need to follow the instructions in the Colonial South Asian Corpus notebook for uploading your own files.

Step 2: Research Your Texts

Before you begin analysis, research the texts themselves: Who are the authors? What is the publication context? What are the general themes and contents? You will want to do background research using something like Wikipedia or other reliable web sources. This research may actually inform your choice of corpus.

By becoming familiar with your texts, you’ll be able to:

  • Justify your selection meaningfully
  • Contextualize your findings rather than studying the corpus in isolation
  • Make connections between distant reading insights and close reading knowledge
  • Recognize what makes your corpus interesting

The more you know about the texts, the more meaningful the “distant reading” will be.

Step 3: Conduct your Analysis

For this exercise you must use both

  • (1) Voyant Tools and
  • (2) the RMarkdown notebook in posit.cloud Project Gutenberg Explorer. The RMarkdown notebook in posit.cloud Colonial South Asian Literature Explorer is useful if you want to provide your text files in .txt format.

If you want to combine sets of texts from Project Gutenberg into one, you can do so by creating a list of ID numbers like this:

gutenbergbook <- gutenberg_download(c(textID1, textID2, textID3, etc))

Step 4: Build a set of visualizations

Create at least two (2) screenshots showing the results of your exploratory analysis from Voyant and a selection of screenshots from ggplot in R. These might include:

  • Word frequency charts or tables
  • Wordcloud visualizations
  • Trend graphs showing word usage over time or across texts
  • Concordance results, etc.

Ensure each screenshot well chosen to illustrate a point and that each is clearly labeled and contextualized.

Step 5: Include an Interactive Visualization

Use one iframe from Voyant Tools so that you have one interactive visualization.

You can obtain an iframe by:

  • Running your analysis in Voyant Tools
  • Going to the Export tab
  • Copying the HTML snippet
<iframe style='width: 444px; height: 408px;' src='https://voyant-tools.org/tool/Cirrus/?corpus=8d8c7ce89087801d676ff4f77d5391fc'></iframe>

Step 6: Integrate Course Materials

Read the chapter “The Risks of Distant Reading” (pp 143-169) from Ted Underwood’s Distant Horizons, available as an e-book and refer to it in your assignment.

Reference at least two (2) other readings or resources (podcasts, articles) from this course in your essay. You may also draw on external sources as appropriate. Be sure to cite what you use include LLMs.

Consider the questions raised in class on making Markdown posts legible: using Markdown Live Preview and Hemingway App as you compose your response. Keep the F-shape principle for web writing in mind too!

Guiding Questions (you do not need to answer all these questions):

  • Background & Expectations: What did you know about your subject before beginning analysis? What hypotheses did you have about the language contained in the text?

  • Computational Insights: What does computational analysis reveal that a linear read would not? Would reading all texts cover-to-cover have been feasible in your timeline? What interesting patterns emerged?

  • Comparative Insights: What did Voyant Tools allow you to do that the Rmd Notebooks did not? How was working with the two methods different? similar?

  • Trends & Surprises: What trends can you identify across your corpus? Were there unexpected findings? How do your results compare to your initial hypotheses?

  • Methodological Questions: If you ran your analysis between Voyant and Rmd Notebooks, did you get consistent results? Why or why not? How do different visualization methods represent the data differently? Were there limitations in the tools or approaches you used? What risks are there in reading this way (draw on Underwood)?

  • Scope & Scale: How limiting (or enabling) was the constraint of comparing five or more texts? What would you analyze differently with more or fewer texts?

  • Transferability: How might you use this workflow in other courses, disciplines, or projects like a capstone?

  • For Choice D: If you worked on texts not in English, what challenges did you face?

Assessment

Your work will be assessed according to the following criteria located here.

Tips for Success

Writing: Use tools like Markdown Live Preview and Hemingway App to refine your prose for clarity and legibility, without AI. Keep the F-shape principle for web writing in mind—readers scan top-to-bottom and left-to-right, so structure your argument visibly.

Visualization: Make your screenshots speak. Use clear captions that explain what readers are seeing and why it matters to your argument. Your visualizations should support and enhance your analysis, not merely decorate it or fill space. Feel free to annotate on top of the visuals (like putting arrows or circles).

Collaboration: If working in pairs, you may submit a single essay that links to both group members’ sites. Include a brief statement describing each person’s unique contribution to the work.

Publishing: Post your assignment to your course site as a post so instructors and classmates can read and engage with your work.

It is fine to publish your assignment iteratively, but when you finish the final version of your assignment, write at the bottom of it “READY FOR GRADING”.

Good luck with your analysis!