Codebase indexing VS chat with codebase

So i’ve got a question.
1- you can already Chat with codebase without index being Enable in the settings. Is that right?
2- what’s the advantage of Enabling it? a better answer? How much is the difference? is it worth it?
3- are they working differently? (indexing codebase vectorize and embedd the code while chat with codebase just search through the file in the workspace)
ASking this question because if we can still have a good chat with the codebase and entire workspace and the quality is good then it might not worth it to use the codebasing index option in the setting.
so yeah i would like to know what’s the difference and usecase for these features.

Yep! You can chat with codebase whether or not you index the codebase.

If you don’t use indexing, we fall back on a simpler, entirely-local, and worse method for figuring out what parts of the codebase to show GPT-4 to answer your codebase-wide question.

1 Like

What about local embedding database tools/methods?
there are many methods out there.
what you guys think about that approach?
so even the vectorizing and embedding can happen locally.

1 Like

We worry that entirely local embeddings would be:

  1. quite a resource hog (both from the model inference and from the vector store, especially for folks who are on older PCs)
  2. limit the quality of the vector embeddings we could ship, by limiting the size of our embeddings model

In general, with Cursor, our philosophy is to focus our limited engineering bandwidth on pushing the AI as far as possible, which does mean reducing the resources spent on things like an entirely local experience.

1 Like

Hello, new user here. I am in the phase of comparing this new AI first IDE -Cursor- with other options. How does Cursor actually index my codebase?

We split it into syntactically relevant chunks (using tree-sitter), then store the embeddings in our vector database, while never storing any of your code on our servers.

We use the local state of your codebase as the source of truth for the text corresponding to a given vector in the database.

2 Likes