Ingen varer
Kassen / rediger kurv
Visual Question Answering (VQA) usually combines visual inputs like image and video with a natural language question concerning the input and generates a natural language answer as the output.