Crowdsourcing Interpretative Labels from the Video

Even though the expectation for human-understanding computational systems is ever increasing, yet a lot of challenges lie ahead. The interpretative capability of computational systems is one of the goals that are hard to be achieved, and one reason is the absence of the datasets that contain possible interpretations on labels. Rather, previous datasets more focused on labels with a clear answer or ignored variability in interpretation, which led to computational systems that are inaccurate and inflexible. We question the ways to leverage crowd workers to retrieve various interpretative labels and the ways to structure them when labeling human emotions and intentions from the video dataset. As a result of the project, we aim to build video datasets that contain rich and structured labels with crowd workers, which can be utilized to train models that can interpret human emotions and intentions.

Exprgram: A Video-based Language Learning Interface Powered by Learnersourced Video Annotations

The real world conversations are diverse in expressions depending on the context such as the relationship between speakers, location or time. While there are multiple ways to greet, apologize, compliment others, language learning materials often fail to provide enough diverse situations and rather put more focus on the meaning of words, reading or listening comprehension and grammar. This research combats the challenge by exploring large scale natural conversations through video mining. Unlike unauthentic dialogues from existing materials, videos in the target language can expose learners to authentic and diverse language situations. We introduce Exprgram, a learnersourced, web-based interface for teaching diverse language expressions. This research project is primarily led by Kyungje Jo, and I am in supporting role in this research.