Multimodal: Image & Video

🧩 Level 1: Conversational Environment (Basic)

At this level, media is used to support low-barrier interaction. The focus is on comfort, recognition, and exploratory dialogue.

  • Purpose: Spark curiosity, guide attention, activate prior knowledge.

  • Example Use: Avatar shows an image and asks, “What do you think is happening here?”

  • Prompt Type: Open-ended or yes/no; more discussion-oriented.

  • Interaction Mode: Casual audio or short text input.

  • Use Case: Introductory scenes, warm-ups, soft skills, brainstorming.

🧠 Ideal for instructors who want to get started with minimal setup but still benefit from immersive engagement.

🧪 Level 2: Task-Integrated with Prompts and Assessment (Integrated with Conversation and MCQ)

Here, media becomes instructionally targeted — aligned with learning outcomes and scaffolded with prompts and responses.

  • Purpose: Enable analysis, evaluation, and applied thinking.

  • Prompt Type: Task-driven (“Identify the mistake”, “Explain what will happen next”).

  • Interaction Mode: Multiple-choice, typed explanation, or audio narration.

  • Feedback: Scripted or AI-generated based on learner input.

  • Progression: Learner can retry or advance based on performance.

🧠 Ideal for instructors aiming to assess understanding or teach diagnostic and procedural skills.


🔄 Task Flow

Classlet’s multimodal task design follows a structured five-stage flow that transforms static visual content into dynamic learning experiences. This progression is not only intuitive but pedagogically grounded in principles of active learning and feedback-driven iteration.

  1. Media Presentation The learner begins by viewing a carefully selected image or video embedded in the environment. This visual acts as the cognitive stimulus — sparking attention, curiosity, or analysis. It sets the scene without overwhelming the learner, allowing focus on key visual elements.

  2. Prompt Delivery Immediately following the media, a targeted question or task appears. This prompt could be inferential (“What do you think happens next?”), procedural (“Which step is missing?”), or diagnostic (“Identify the error”). The prompt is designed to direct attention and frame the learner’s cognitive task.

  3. Response Action The learner responds using one of several modes: multiple-choice selection, typed text, or audio narration. These input formats are selected based on the complexity and intent of the task, ensuring that the cognitive demand matches the instructional goal.

  4. Feedback Classlet delivers feedback either through pre-scripted responses or dynamic AI-generated messages. Feedback clarifies reasoning, affirms correct choices, or prompts revision. Instructors can configure whether feedback is immediate, delayed, or adaptive based on task type.

  5. Advance or Retry Depending on the learner’s performance and the desired pacing model, the system allows either progression to the next activity or a retry loop. This supports mastery-based learning by reinforcing concepts before moving on.


🎯 Design Tip for Instructors

When planning a multimodal task:

  1. Choose visual media that offers ambiguity, action, or contrast — this drives interpretive thinking.

  2. Craft prompts that demand observation-based reasoning (“What’s missing?”, “What’s wrong here?”).

  3. Align the task format to the learning goal — e.g., classification (MCQ), articulation (text/audio), sequencing.

  4. Decide interaction level — start with Level 1 for narrative flow, scale up to Level 2 for targeted assessment.

This image shows a sequenced multimodal learning task within a VR environment that combines multiple pedagogical elements. The sequence begins with on-screen text explanations to build foundational understanding, followed by interactive avatar dialogue that personalizes the content and prompts learner reflection.

Summary Multimodal Framework

The infographic below illustrates the structured task progression used in Classlet’s multimodal learning activities. This five-stage sequence is designed to scaffold cognitive engagement and support mastery-based learning.

Last updated