From Babel to Knowledge: An Inside Look at Data Mining

(Sorry for the cheesy clip art, I couldn’t help myself…still shedding some of the engrained PowerPoint ‘skills’ I guess)

This week we took a look at Professor Cohen’s article on Data Mining, which can be linked to here. In his article, Professor Cohen discusses to main data mining ventures he has undertaken with regards to digital history.

Document Classification: Syllabus FinderĀ 

The Syllabus Finder focuses on an aspect of data mining known as ‘keyword-in-context indexing’, or KWIC. This focus of data mining came about as a solution to the problem of finding similar documents, known as document classification. KWIC uses the concept of inverted indexing to generate a ‘dictionary of notions’ that group documents by key words, rather than topic, to collect sets of information that share similar traits, such as history syllabi. Professor Cohen used this tool to index online history syllabi, helping professors around the world to generate ideas and material for their potential courses. Some of the benefits of this type of indexing are that the results are more specific, bringing up more results than, say, Google or Yahoo would. I found the idea of paring this sort of tool with an API very interesting. While the tool is not perfect (9 out of 10 results are actually syllabi), this API tool would help to further eliminate that small margin of error.

Question Answering: H-BotĀ 

The ‘H-Bot‘ was the other tool Professor Cohen created and discussed in his article. As noted in the article, Question Answering (QA) is a far greater challenge than document classification in the data mining world, due to the greater strain on computer skills/techniques involved. Not only do the documents need to be found, the question being asked by a user must be properly dissected and understood by the program/computer. The H-Bot (an automated historical fact finder) is a tool that does just that. This tool in particular caught my attention for the potential role it could play in the future regarding ethical test taking. Its no surprise that some students will do whatever it takes to pass an exam, including cheating when they don’t grasp the material well. While the cheating of the past used to involve glancing over at a classmates paper, this tool has the potential to take the role of the student completely, answering the exam question by question for them. As the tool isn’t really open to the public in this way yet (according to the article), it wouldn’t take long for students to figure out this potential should the tool become open to the general public. This is definitely something that would have to be considered by professors as well as the H-Bot creators.


Leave a Reply

Your email address will not be published. Required fields are marked *