Good evening everyone! I'd like to know if it's possible to create a flow to run on WhatsApp in such a way that the user can send only text, image and text, or just audio. Based on that, the OPENAI node processes and gives a response. The idea is that, in the case of an image, it can recognize the image, and in the case of audio, transcribe and respond in text. I have a project for deployment on WhatsApp, and this multimodal function is essential.