How is the inverted index organized in Elasticsearch?

Question

Sigiloso · Accepted Answer

First, we have a dictionary of terms. This is essentially a sorted list of all the unique words we've encountered in all the documents. Each term in this dictionary points to something called a 'posting list'.

Posting lists are where we store the actual document IDs that contain each term. But we don't just store the ID – we also keep track of how many times that term appears in the document (the term frequency, or TF), and we even store the positions of that term within the text. And we might also include field-specific information.

Then there are 'doc values'. These are like column-based storage, which is optimized for aggregations and sorting. Doc values contain field values and metadata that can be used later.

Didi Chuxing

Pergunta de entrevista da empresa Didi Chuxing

Resposta da entrevista

Empresas seguidas

Buscas de vagas