Crowdsourcing Interpretative Labels from the Video

less than 1 minute read

Even though the expectation for human-understanding computational systems is ever increasing, yet a lot of challenges lie ahead. The interpretative capability of computational systems is one of the goals that are hard to be achieved, and one reason is the absence of the datasets that contain possible interpretations on labels. Rather, previous datasets more focused on labels with a clear answer or ignored variability in interpretation, which led to computational systems that are inaccurate and inflexible. We question the ways to leverage crowd workers to retrieve various interpretative labels and the ways to structure them when labeling human emotions and intentions from the video dataset. As a result of the project, we aim to build video datasets that contain rich and structured labels with crowd workers, which can be utilized to train models that can interpret human emotions and intentions.