Post

Elasticsearch and Opensearch

We know that Opensearch was forked from Elasticsearch 7.10.2 by AWS after Elastic changed Elasticsearch license. Most concepts and APIs are the same between them, but there are still some gotchas.

In both ES and OS, there is only one node that is responsible for maintaining a consistent cluster status, which is called master node in ES, but cluster-manager node in OS. If you run GET /_cat/nodes?v in both ES and OS, you can see the name difference. It seems that Opensearch team deliberately wants to change this name. See source file of class TransportClusterManagerNodeAction.

Read Opensearch developer guide to get start locally.

1
2
./gradlew localDistro
./gradlew run --debug-server-jvm

Parameter --debug-server-jvm above turns on remote debugging, so you can debug it using jdb.

Helm chart

The Elasticsearch helm chat has a configuration replicas which means the number of nodes, not the index.number_of_replicas. The number of replicas of each shard is by default one unless changed.

Node discovery

Zen discovery

Reference: https://www.elastic.co/guide/en/elasticsearch/reference/current/discovery-hosts-providers.html/

Snapshot related logic is mainly contained in class SnapshotsService. One thing to note when snapshotting is happening, you cannot delete index. Otherwise, you will see SnapshotInProgressException.

Workflow

What is running inside an ES node?

1
2
3
4
5
6
elasticsearch@elasticsearch-master-0:~$ ps auxww
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
elastic+     1  0.0  0.0   2500   592 ?        Ss   13:43   0:00 /bin/tini -- /usr/local/bin/docker-entrypoint.sh eswrapper
elastic+     7  0.0  0.5 2599688 89504 ?       Sl   13:43   0:09 /usr/share/elasticsearch/jdk/bin/java -Xms4m -Xmx64m -XX:+UseSerialGC -Dcli.name=server -Dcli.script=/usr/share/elasticsearch/bin/elasticsearch -Dcli.libs=lib/tools/server-cli -Des.path.home=/usr/share/elasticsearch -Des.path.conf=/usr/share/elasticsearch/config -Des.distribution.type=docker -cp /usr/share/elasticsearch/lib/*:/usr/share/elasticsearch/lib/cli-launcher/* org.elasticsearch.launcher.CliToolLauncher
elastic+   170 71.3 32.0 428683128 5197536 ?   Sl   13:43 128:31 /usr/share/elasticsearch/jdk/bin/java -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -Djava.security.manager=allow -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -XX:-OmitStackTraceInFastThrow -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Dlog4j2.formatMsgNoLookups=true -Djava.locale.providers=SPI,COMPAT --add-opens=java.base/java.io=ALL-UNNAMED -Des.cgroups.hierarchy.override=/ -XX:+UseG1GC -Djava.io.tmpdir=/tmp/elasticsearch-7239405004551249449 -XX:+HeapDumpOnOutOfMemoryError -XX:+ExitOnOutOfMemoryError -XX:HeapDumpPath=data -XX:ErrorFile=logs/hs_err_pid%p.log -Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m -Xmx4g -Xms4g -XX:MaxDirectMemorySize=2147483648 -XX:G1HeapRegionSize=4m -XX:InitiatingHeapOccupancyPercent=30 -XX:G1ReservePercent=15 -Des.distribution.type=docker --module-path /usr/share/elasticsearch/lib --add-modules=jdk.net -m org.elasticsearch.server/org.elasticsearch.bootstrap.Elasticsearch
elastic+   191  0.0  0.0 114336  6164 ?        Sl   13:43   0:00 /usr/share/elasticsearch/modules/x-pack-ml/platform/linux-x86_64/bin/controller

There are two java processes. The CliToolLauncher class is the entry point. It loads server-cli which is process bootstrap.Elasticsearch. Why it has this design? Because ES also provides other command line tools which also use CliToolLauncher as an entry point.

Another thing to note is the memory usage. Elasticsearch uses 428G virtual memory and 5G physical memory. Underneath, Lucene uses mmap to make searching faster. So, ES official guideline is to use 50% memory for JVM heap, so the rest can be used for Lucene mmap. The contributors of Lucene are even more aggressive: only reserve 25% for JVM heap. Here is relevant code for Lucene memory mapped indices.

Shards

According to this elastic doc, ES searches run on a single thread per shard, so making make shard smaller leads to faster search, but in reality, more shards mean more overhead, so the optimal size of a shard is 50GB. You can also use below query to see the size of thread pool which determines the maximal parallelism.

1
GET /_cat/thread_pool/search?v=true&h=node_name,name,core,largest,max,qs,size,type

ES data stream is a good way to control shard size.

Storage

Endpoint /_cat/indices tells basic information about ES indices. Example below.

1
2
3
health status index                        uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   filebeat-7.10.0-2023.02.19   pCBMkxMzQ1CAwoOkkyLY3g   1   1   43446487            0    118.6gb         59.5gb
green  open   filebeat-7.10.0-2023.02.20   xdKlPEsiQqu4Bt6NimFifA   1   1   45494980            0    128.6gb         64.4gb

Take note of above uuid field. The index files are stored on disk at location data/indices<index_uuid>. Inside this folder, you will see files with various extensions: .tid, .cfs, and etc. There are Lucene index files. Check out this description if you are interested.

Misc

How to get ES version?

1
/usr/share/elasticsearch/bin/elasticsearch --version

The Elasticsearch helm chat has a configuration replicas which means the number of nodes, not the index.number_of_replicas. The number of replicas of each shard is by default one unless changed.

This post is licensed under CC BY 4.0 by the author.