Kelvin Tan - Solr/Elasticsearch Consultant - SolrCloud search request lifecycle and writing distributed Solr SearchComponents

This entry is part 4 of 4 in the Developing Custom Solr SearchComponents series

Recap

To recap, in a previous article, we saw that a SearchComponent comprises of a prepare() and process() method:

public class TestSearchComponent extends SearchComponent {
    @Override
    public void prepare(ResponseBuilder rb) throws IOException {
 
    }
 
    @Override
    public void process(ResponseBuilder rb) throws IOException {
 
    }
}

and that the standalone SolrCloud application flow of a search request looks more or less like:

for( SearchComponent c : components ) {
  c.prepare(rb);
}
for( SearchComponent c : components ) {
  c.process(rb);
}

SolrCloud high-level search request lifecycle

When a search request is received by a SolrCloud collection. the following sequence of events takes place:

1. A node receives the search request.
2. On this node, each SearchComponent's prepare() method is called.
3. A loop is entered where the request is handled in stages. The 'standard' stages, as defined in ResponseBuilder, are:

  public static int STAGE_START = 0;
  public static int STAGE_PARSE_QUERY = 1000;
  public static int STAGE_TOP_GROUPS = 1500;
  public static int STAGE_EXECUTE_QUERY = 2000;
  public static int STAGE_GET_FIELDS = 3000;
  public static int STAGE_DONE = Integer.MAX_VALUE;

In each stage, each SearchComponent has the ability to 'vote' on the next stage. The lowest stage from each SearchComponent is what determines the actual next stage.

The request may not go through every stage. It all depends on what SearchComponents are handling this request.

For illustrative purposes, this is what QueryComponent's stage handling code looks like:

  protected int regularDistributedProcess(ResponseBuilder rb) {
    if (rb.stage &lt; ResponseBuilder.STAGE_PARSE_QUERY)
      return ResponseBuilder.STAGE_PARSE_QUERY;
    if (rb.stage == ResponseBuilder.STAGE_PARSE_QUERY) {
      createDistributedStats(rb);
      return ResponseBuilder.STAGE_EXECUTE_QUERY;
    }
    if (rb.stage &lt; ResponseBuilder.STAGE_EXECUTE_QUERY) return ResponseBuilder.STAGE_EXECUTE_QUERY;
    if (rb.stage == ResponseBuilder.STAGE_EXECUTE_QUERY) {
      createMainQuery(rb);
      return ResponseBuilder.STAGE_GET_FIELDS;
    }
    if (rb.stage &lt; ResponseBuilder.STAGE_GET_FIELDS) return ResponseBuilder.STAGE_GET_FIELDS;
    if (rb.stage == ResponseBuilder.STAGE_GET_FIELDS &amp;&amp; !rb.onePassDistributedQuery) {
      createRetrieveDocs(rb);
      return ResponseBuilder.STAGE_DONE;
    }
    return ResponseBuilder.STAGE_DONE;
  }

It is beyond the scope of this article to fully go into each stage, but it is sufficient for you to know that search requests are handled in stages.

4. At each stage, SearchComponents may send out requests to the other shards, requesting data in some form. This is known as a ShardRequest. Every ShardRequest has an integer field called purpose. The ShardRequest purpose should not be confused with the ResponseBuilder stage. They are both integer fields, but that's where the similarity ends.

For example, in QueryComponent's STAGE_EXECUTE_QUERY, createMainQuery() is called, in which a ShardRequest is sent out to the other shards, requesting them to execute the query and return the top matching docs. This ShardRequest has the purpose of ShardRequest.PURPOSE_GET_TOP_IDS.

5. In a SearchComponent, the handling of the ShardRequest and its specific purpose occurs in 2 separate methods, both of which are contained in the SearchComponent: process() and handleResponses().

process() is executed on each shard which receives the ShardRequest.

handleResponses is executed on the shard that initiated the ShardRequest, where usually some process of merging and collation takes place. handleResponses() would be much better named handleShardResponses(). The signature of handleResponses() is

public void handleResponses(ResponseBuilder rb, ShardRequest sreq)

The responses of each shard to this ShardRequest is available from sreq.responses.

This step is potentially the most confusing conceptual bit of the entire lifecycle, so let's stop here to recap.

The key realization is that even though the ShardRequest creation, processing and ShardResponse handling takes place in a single SearchComponent, they are executed on different shards.

For example, given 2 shards, ShardA and ShardB. Suppose the request arrives at ShardB and we are at the STAGE_EXECUTE_QUERY in QueryComponent.

a. In ShardB's QueryComponent.createMainQuery(), a ShardRequest to all other shards, with ShardRequest.PURPOSE_GET_TOP_IDS
b. ShardA and ShardB's QueryComponents receive the ShardRequest and respond to it in process()
c. ShardB's QueryComponent.handleResponses() is called with the ShardRequest containing the 2 ShardResponses and collates the results.

5. After each stage is complete, finishStage() is then called. The signature of finishStage() is

public void finishStage(ResponseBuilder rb)

6. The above steps are repeated until we arrive at the ResponseBuilder stage of STAGE_DONE.

This concludes the core lifecycle of a search request in SolrCloud.

Summary

In summary, a SolrCloud collection is made of a number of shards. For a search request to be handled, each shard needs to provide data to the search request.

A distributed search request is handled in (ResponseBuilder) stages.

In any of the ResponseBuilder stages, a SearchComponent may initiate ShardRequests to the shards in a collection.

That same SearchComponent (though potentially residing on a different machine also responds to that ShardRequest (in process()), as well as collates the ShardResponses from the various shards to that ShardRequest (in handleResponses() or finishStage()).

In this fashion, the necessary computation required to build the response to the search request is executed.

Notes

1. For simplicity's sake, I opted to use the terminology of shards as opposed to replicas, since this is more of a conceptual overview. I am aware that it is actually replicas that initiate and respond to ShardRequests.

2. Similarly, I use the terminology of machines as opposed to nodes. Nodes is the 'proper' term used in Solr documentation and code.

Developing Custom Solr SearchComponents

Introducing this series on developing Solr SearchComponents
Developing a Solr SearchComponent for standalone Solr
Introducing SolrCloud
SolrCloud search request lifecycle and writing distributed Solr SearchComponents

No Comments »

Supermind Search Consulting Blog Solr - Elasticsearch - Big Data

SolrCloud search request lifecycle and writing distributed Solr SearchComponents

Recap

SolrCloud high-level search request lifecycle

Summary

Notes

Supermind Search Consulting Blog
Solr - Elasticsearch - Big Data