Review the Documents
Wikipedia article on "Elephants in Thailand" | Student-submitted essay on "Elephants in Thailand" |
---|---|
The elephant has been a contributor to Thai society and its icon for many centuries. The elephant has had a considerable impact on Thai culture. The Thai elephant is the official national animal of Thailand. The elephant found in Thailand is the Indian elephant, a subspecies of the Asian elephant. |
The elephant is a contributor to Thai society. It has been an icon of Thai life for many centuries. The elephant, which it is possible to see found in every part of Thailand, is the Indian elephant, which is a subspecies of the Asian elephant. The Thai elephant has a considerable impact on culture. The elephant is the official national animal of Thailand. |
1 Do you think the student who submitted this essay on Elephants in Thailand committed plagiarism? Circle one: Yes No
Detect Identicality
Open the Plagiarism Detection Starter File and click "Run".
-
The two paragraphs above are defined in the starter file as
wiki-article
andstudent-essay
, respectively. -
The file also defines a function
simple-equality
, which takes in two Strings and produces a Boolean:true
indicates that there is a match;false
indicates that there is not a match.
2 Record if the output is true
or false
for each line of code:
simple-equality("hello", "goodbye")
simple-equality("hello", "HELLO")
simple-equality("hello", "helo")
simple-equality("hello,", " Hello.")
3 How does simple-equality
deal with punctuation, capitalization, and spaces?
4 Compare the essays using simple-equality(student-essay, wiki-article)
. What does the program return?
Building a Better Plagiarism Detector
Yara and Xola agree that there has to be a way to measure the similarity of the essays. They each propose a different solution.
Yara: I made a list of the four most unusual and unique words in each essay. Then I compared the lists. Because the two lists of unusual words are identical (elephant, Thai, Indian, and Asian), I decided that the student plagiarized!
Xola: I figured out the four most common words in each essay. In the wiki-article, they are: "the" (7), "elephant" (6), "Thai" (3), and "a" (3). In the student essay, they are: "the" (7), "elephant" (6), "is" (5), and "of" (4). Because the most common words are not the same, I don’t think the student plagiarized.
5 What do you like about Yara’s method? What are its shortcomings?
6 What do you like about Xola’s method? What are its shortcomings?
7 Come up with your own method! What is a different way to measure the similarity between the two essays?
These materials were developed partly through support of the National Science Foundation, (awards 1042210, 1535276, 1648684, 1738598, 2031479, and 1501927).
Bootstrap by the Bootstrap Community is licensed under a Creative Commons 4.0 Unported License. This license does not grant permission to run training or professional development. Offering training or professional development with materials substantially derived from Bootstrap must be approved in writing by a Bootstrap Director. Permissions beyond the scope of this license, such as to run training, may be available by contacting contact@BootstrapWorld.org.