Codebase indexing VS chat with codebase

So i’ve got a question.
1- you can already Chat with codebase without index being Enable in the settings. Is that right?
2- what’s the advantage of Enabling it? a better answer? How much is the difference? is it worth it?
3- are they working differently? (indexing codebase vectorize and embedd the code while chat with codebase just search through the file in the workspace)
ASking this question because if we can still have a good chat with the codebase and entire workspace and the quality is good then it might not worth it to use the codebasing index option in the setting.
so yeah i would like to know what’s the difference and usecase for these features.

Yep! You can chat with codebase whether or not you index the codebase.

If you don’t use indexing, we fall back on a simpler, entirely-local, and worse method for figuring out what parts of the codebase to show GPT-4 to answer your codebase-wide question.

1 Like

What about local embedding database tools/methods?
there are many methods out there.
what you guys think about that approach?
so even the vectorizing and embedding can happen locally.

1 Like

We worry that entirely local embeddings would be:

  1. quite a resource hog (both from the model inference and from the vector store, especially for folks who are on older PCs)
  2. limit the quality of the vector embeddings we could ship, by limiting the size of our embeddings model

In general, with Cursor, our philosophy is to focus our limited engineering bandwidth on pushing the AI as far as possible, which does mean reducing the resources spent on things like an entirely local experience.

1 Like

Hello, new user here. I am in the phase of comparing this new AI first IDE -Cursor- with other options. How does Cursor actually index my codebase?

We split it into syntactically relevant chunks (using tree-sitter), then store the embeddings in our vector database, while never storing any of your code on our servers.

We use the local state of your codebase as the source of truth for the text corresponding to a given vector in the database.

3 Likes

You do this for the entire code base when I open a project with Cursor and enable indexing? Then when I close the folder or window for Cursor, that vector database is deleted? That seems really expensive to let users use at cost if they provide their openai api key. Or do those users not get the full experience of having their entire codebase indexed/vectorized into vector databases? Do you have any plans to let users see the vector databases and then let them choose a sharding? scheme or to manually adjust parameters for better indexing?