hello! i am building an AI agent in typebot. if i just wanted it to handle text messages, it would be very simple (just a collect text input from user, send it to an openai assistant and then send the response). but i need to build a whole other system behind the scenes to enable it to handle voice messages from the user and even images. i would do this through n8n sending the info from typebot and then if its an audio i would send it to chatgpt whisper to transcribe it and send it over to the original openai assistant and if it is an image i would send it to a chatgpt to analyze the image and also send it over to the original openai assistant. (the simple logic is this) but the problem is, the voice messages and images, when coming from a text user input, typebot saves it as urls. these urls are files with the format .enc (i had never even seen this before.) apparently it is also hard to convert it to other file formats (i would need to convert it to mp3 and jpeg, for example).
has anyone done something similar to allow an AI agent integrated to typebot/whatsapp to handle/answer voice messages and images coming from the user?
i know someone who has done it but they did it with a php script running in the backend. and this is way more advanced than i am currently capable of!