ViperGPT: Visible Inference via Python Execution for Reasoning

*Equal contribution

Columbia University

ViperGPT decomposes visual queries into interpretable steps.


Answering visual queries is a posh project that requires
both visual processing and reasoning. Close-to-stay items,
the dominant procedure for this project, attain no longer explicitly differentiate between the 2, limiting interpretability and generalization. Finding out modular applications items a promising
alternative, however has proven stressful as a result of venture
of studying both the applications and modules simultaneously.
We introduce ViperGPT, a framework that leverages code-technology items to make imaginative and prescient-and-language items
into subroutines to create a end result for any inquire of. ViperGPT
utilizes a supplied API to acquire entry to the accessible modules, and
composes them by producing Python code that is later done. This straightforward procedure requires no extra coaching,
and achieves speak-of-the-artwork outcomes across diverse advanced
visual tasks.

Logical Reasoning

ViperGPT can plan common sense operations on story of it directly executes Python code.

Spatial Idea

We unique ViperGPT‘s spatial idea.


ViperGPT can obtain entry to the easy project of considerable language items.


ViperGPT answers identical questions with fixed reasoning.


ViperGPT can count, and divide. All utilizing Python.


We unique some ViperGPT examples involving attributes.

Relational Reasoning

Reasoning about family participants.


Negation is programmatic, no longer neural.


            creator    = {Sur'is D'idac and Menon, Sachit and Vondrick, Carl},
            title     = {ViperGPT: Visible Inference via Python Execution for Reasoning},
            journal   = {arXiv preprint arXiv: 2303.08128},
            year      = {2023},

Related Articles

Leave a Reply

Your email address will not be published.

Back to top button