** Exact topics and schedule subject to change, based on interests and time. **

Part Topics Readings
1 Introduction [slides]
  • What is Multimodal? Definitions, dimensions of heterogeneity and cross-modal interactions.
  • Historical view and multimodal research tasks.
  • Core technical challenges: representation, alignment, transference, reasoning, generation, and quantification.
2 Multimodal challenges
  • Why is multimodal hard? Introduction to core challenges.
  • Overview of multimodal representation, alignment, reasoning, transfer, generation, and quantification.
  • Identifying recent solutions for practitioners.
3 Recent advances in multimodal AI
  • Multimodal transformers and foundation models
  • Multimodal generative models
  • Multimodal agents
4 Multimodal AI for Human Sensing [slides]
  • Sensor data synthesis: Video to Doppler, Video to IMU, Video to Audio, MoCap to IMU, MoCap to UWB
  • Data augmentation
  • Temporal data modeling
5 Ethics, interpretability and privacy
  • Privacy and fairness concerns
  • Handling errors and uncertainty
  • Bringing humans into the loop
6 Applications
  • Human activity recognition, pose estimation, gesture recognition
  • Infrastructure and environmental sensing
  • Wellness and fitness tracking, mobile health monitoring
7 Hardware and Sensors for Multimodal AI [slides]
  • Challenges and opportunities in hardware and sensors for multimodal AI
  • Importance of scalable, customizable hardware platforms
  • Key applications benefiting from advancements in multimodal sensing and feedback interfaces
8 Multimodal sensing and feedback interface
  • Advanced fabrication techniques for multimodal sensing hardware
  • Addressing scalability, adaptability, and customization in hardware development
  • Innovations in state-of-the-art data acquisition systems for multimodal interfaces
  • Integration of diverse sensing modalities into compact, flexible form factors
9 Multisensory data fusion
  • Approaches to synchronize and interpret multimodal data
  • Strategies for enhancing signal quality and accuracy
  • Future directions enabled by multisensory interfaces