[Feature Request] Gpt-4o vision

Gpt-4o is described as having better vision capabilities. Is there a plan to add image upload feature to it?


I’m curious regarding your anticipated use case for this in an IDE.

It would help me to be able to point to how I want something in my frontend to look and let it then implement it in the code. Currently I’m mostly stuck with going back and forth through me explaining and the GPT taking stabs in the dark.

For example giving the AI a Figma screenshot with description to implement a UI component. Or attaching a screenshot of an issue in the UI.
With the upcoming video capabilities perhaps it would be possible to automate the debugging and development process by iteratively feeding the AI a video of the UI and it automatically editing the code with continuous feedback, where it has access to the console, then possibly Network/Elements tabs (possibly via screenshots/video as well if it’s good enough).


ah, right. sounds fun

When ‘gpt-4o’ model is selected you can use images in the chat. Are you referring to another use - or does it look like images being handled under the covers differently?

When I select gpt-4o, the ‘Image’ button disappears.

Ah, ok - I realized I don’t usually use that button and just copy-paste an image into the chat (or you can mention the @imagefilename and it will load in the chat). Not sure why the image button is missing, but it will let you add + process images, using gpt-4o presumably, those ways currently.


Wow, I didn’t know about that, thanks for the tip!

1 Like

I use it to do rapid Jupyter Notebook debugging. The amount of context you can give a vision model with a simple screenshot is amazing and so fast!