ViperGPT: Visible Inference via Python Execution for Reasoning
*Equal contribution
Columbia University
ViperGPT decomposes visual queries into interpretable steps.
Summary
Answering visual queries is a posh project that requires
both visual processing and reasoning. Close-to-stay items,
the dominant procedure for this project, attain no longer explicitly differentiate between the 2, limiting interpretability and generalization. Finding out modular applications items a promising
alternative, however has proven stressful as a result of venture
of studying both the applications and modules simultaneously.
We introduce ViperGPT, a framework that leverages code-technology items to make imaginative and prescient-and-language items
into subroutines to create a end result for any inquire of. ViperGPT
utilizes a supplied API to acquire entry to the accessible modules, and
composes them by producing Python code that is later done. This straightforward procedure requires no extra coaching,
and achieves speak-of-the-artwork outcomes across diverse advanced
visual tasks.
Logical Reasoning
ViperGPT can plan common sense operations on story of it directly executes Python code.
Spatial Idea
We unique ViperGPT‘s spatial idea.
Recordsdata
ViperGPT can obtain entry to the easy project of considerable language items.
Consistency
ViperGPT answers identical questions with fixed reasoning.
Math
ViperGPT can count, and divide. All utilizing Python.
Attributes
We unique some ViperGPT examples involving attributes.
Relational Reasoning
Reasoning about family participants.
Negation
Negation is programmatic, no longer neural.
BibTeX
@article{surismenon2023vipergpt,
creator = {Sur'is D'idac and Menon, Sachit and Vondrick, Carl},
title = {ViperGPT: Visible Inference via Python Execution for Reasoning},
journal = {arXiv preprint arXiv: 2303.08128},
year = {2023},
}