Modeling with multimodal data in the wild poses similar challenges in human-computer and human-robot interaction (HCI, HRI). This workshop series thus blends HCI and HRI to jointly address a broad range of current topics in multimodal modeling aimed at designing intelligent systems in the wild. From addressing data scarcity in multimodal user state recognition to emotion prediction from EEG while listening to music, our third workshop in this series aims to further stimulate this important multidisciplinary exchange.