Media/files features on whatsapp

Good evening everyone! I'd like to know if it's possible to create a flow to run on WhatsApp in such a way that the user can send only text, image and text, or just audio. Based on that, the OPENAI node processes and gives a response. The idea is that, in the case of an image, it can recognize the image, and in the case of audio, transcribe and respond in text.
I have a project for deployment on WhatsApp, and this multimodal function is essential.

Share feedback, ideas and get community help

Media/files features on whatsapp