我的 2024 年 Elasticsearch 认证考试经验与真题回顾

凌虚 included in Elasticsearch

2024-03-17 3202 words 7 minutes

Contents

背景说明

大家好，我是凌虚。

我于 2024 年 3 月 14 日参加了 Elastic Certified Engineer（ECE）认证考试，并与 18 日收到了考试通过的邮件。本文将会回顾我的考试过程、考试真题、个人感受。

ECE 认证

一手资料请一定要阅读官方考试说明文档。

目前考试使用的是 Elasticsearch v8.1 版本。
考试费用 500 美元（涨价过了），需要用支持美元支付的信用卡购买，可以用别人的卡代付。
只有一次考试机会，没有补考，没有官方模拟考和模拟题。
考试内容是 10 个题目，都是实操题，可以使用 Kibana。

考试大纲

Data Management（数据管理）
1. Define an index that satisfies a given set of requirements（按要求定义 index）
2. Define and use an index template for a given pattern that satisfies a given set of requirements（按要求定义和使用 index template）
3. Define and use a dynamic template that satisfies a given set of requirements（按要求定义和使用 dynamic template）
4. Define an Index Lifecycle Management policy for a time-series index（为时序索引定义 ILM 策略）
5. Define an index template that creates a new data stream（定义一个 index template 让其创建一个新的 data stream）
Searching Data（搜索数据）
1. Write and execute a search query for terms and/or phrases in one or more fields of an index（为索引的一个或多个字段中的 terms 和/或 phrases 编写并执行搜索 query）
2. Write and execute a search query that is a Boolean combination of multiple queries and filters（编写并执行一个由多个 query 和 filter 进行 bool 组合而成的查询）
3. Write an asynchronous search（编写异步搜索）
4. Write and execute metric and bucket aggregations（编写并执行 metric 和 bucket 聚合）
5. Write and execute aggregations that contain sub-aggregations（编写并执行包含子聚合的聚合）
6. Write and execute a query that searches across multiple clusters（编写并执行跨集群搜索的查询）
7. Write and execute a search that utilizes a runtime field（编写并执行利用运行时字段的搜索）
Developing Search Applications（开发搜索应用）
1. Highlight the search terms in the response of a query（高亮查询响应中的搜索词）
2. Sort the results of a query by a given set of requirements（按要求对搜索结果进行排序）
3. Implement pagination of the results of a search query（实现搜索结果的分页）
4. Define and use index aliases（定义和使用索引别名）
5. Define and use a search template（定义和使用搜索模板）
Data Processing（数据处理）
1. Define a mapping that satisfies a given set of requirements（按要求定义 mapping）
2. Define and use a custom analyzer that satisfies a given set of requirements（按要求定义和使用 custom analyzer）
3. Define and use multi-fields with different data types and/or analyzers（定义和使用具有不同数据类型和/或 analyzer 的多字段）
4. Use the Reindex API and Update By Query API to reindex and/or update documents（使用 Reindex API 和 Update By Query API 重建索引和/或更新文档）
5. Define and use an ingest pipeline that satisfies a given set of requirements, including the use of Painless to modify documents（按要求定义和使用 ingest pipeline，包括使用 Painless 修改文档）
6. Define runtime fields to retries custom values using Painless scripting（使用 Painless 脚本定义运行时字段以检索自定义值）
Cluster Management（集群管理）
1. Diagnose shard issues and repair a cluster's health（诊断分片问题并修复集群健康）
2. Backup and restore a cluster and/or specific indices（备份和恢复集群和/或特定索引）
3. Configure a snapshot to be searchable（将快照配置为可搜索）
4. Configure a cluster for cross-cluster search（配置集群以进行跨集群搜索）
5. Implement cross-cluster replication（实现跨集群复制）

考试内容完完全全就是按照考试大纲里的考点来的，但是每个题目都会涉及到多个考点。

考试真题

以下是我这次考试的题目。

1. data stream + index template + ilm

按要求创建一个 ilm policy ，数据索引后 5 分钟内在 hot 节点，之后翻滚至 warm 节点，３分钟后转换到 cold 节点，翻滚之后 6 分钟删除。

然后创建一个 data stream 的 index template ：

按要求设置 index_patterns
关联上面的 ilm policy （我第一遍复制粘贴官方文档的代码然后忘了改 settings 里的 index.lifecycle.name ）
由于题目要求数据要先到 hot 节点上，所以按照我的理解 settings 中还应该加 “index.routing.allocation.include._tier_preference”: “data_hot”

最后复制粘贴题目给的请求写入一个文档，从而把这个 data stream 创建出来。

2. reindex + custom analyzer

给了某个 index 和搜索请求，用 the 去搜索 title 字段的时候会匹配很多文档，要求 reindex 为另外的 index（一般名称都是要求你使用 task2 这种跟题目编号一致的命名方式），然后在新的索引上用 the 搜索不到任务文档。需要注意的是他明确要求你保留原索引的数据结构和类型（也就是要先查原索引的 mappings 并复制粘贴过来），然后在 mappings 中的 title 字段中定义 analyzer 去处理这个 the（这道题 tokenizer 用 standard ，character filter 用 stop 就可以了）。

3. upadte_by_query + ingest pipeline

要求给某个索引增加一个新的字段，新字段是已有四个字段的值拼接而成，注意拼接的时候字段之间加空格（题目给的正确文档示例是有加空格的）。

看到 update 这种操作建议先 reindex 一下原索引然后测试一下，免得原索引改错了找不回来。

4. runtime field + aggregations

定义一个 runtime field ，值是已有两个字段的值相减，然后在这个 field 上面做 range aggregation 。

5. multi-match

要求搜索三个字段，其中一个字段权重乘2，最终得分为每个字段得分相加（也就是设置 type 为 most_fields）。

6. cross-cluster search 跨集群搜索

题目明确告诉你不需要配置 remote cluster，环境已经配好了，只要写一个跨集群的 query 就行了，query 的内容也很简单，里面会有一个 sort 排序。

7. aggregations

结果填空，不是填搜索请求。

要求找出来平均飞行里程最大的 airline 航班。

其实就是先按照 airline 航班做一遍 terms 分桶（bucket aggregation），然后在每个 bucket 里再用 avg（metrics aggregation）做一个子聚合求值，最后用 pipeline aggregation 里的 max bucket 取出来 avg 最大的这个 bucket。最后的答案是 AS 。

8. snapshot

要求先注册一个 shared file system repository 类型的 repository（用 Kibana 操作就行了），然后创建一个 snapshot （要求只包含特定的某个 index），去 rest API 文档下面看一下 snapshot API 就行了。

9. search template + highlight + sort

自己创建一个 search template（只需要单个 params 参数），要求查询里有 highlight 和 sort ，查询条件很简单，最后要求在 movie_data 这个索引上使用这个 search template 进行查询。题目只要求粘贴最后使用 search template 进行实际查询的请求（但是 search template 需要你先创建好）。

10. async search + aggregations

写一个异步搜索，针对航班数据索引进行聚合，填完整的请求。内容是查询每周的某个 metrics 指标（具体是啥我忘了，反正就是先做 date_histogram 然后再做 metrics），题目有另外要求 size 为 0 。

最后，你会发现我考试的这十道题除了集群管理里有几个考点没考，其余大纲里所有考点基本都覆盖到了。

我踩的坑

考前 15 分钟之内才能开始考试，太早了没用（我考 CKA 的时候提前半个小时就进了，但是 ECE 不行）。
一定要用大屏考试，不然考试体验会很痛苦（这是我之前考 CKA 的体会）。
我是笔记本电脑外接了显示器和摄像头，但是外接的摄像头看不清护照上的名字（考试结束后问了卖家才知道买的摄像头是定焦的，大家一定要买变焦的），然后我跟当时的印度监考官折腾了好久，她中间也是离开了一会儿，估计是咨询同事这种情况要怎么办，后来她让我把手机拿过来拍个照再放大了给摄像头看（其实还是有点不清晰，但是监考官没找我茬，让我继续考试了）。
复制键键位冲突。MacBook 都是 option+c，考试环境说是 ctrl + c ，但是我用 ctrl + c 在考试环境里却是唤起浏览器的调试栏，最后没办法只能用鼠标右击复制，这点会影响答题速度但不致命。
考试环境并不是很流畅，有时候会卡一下，这个其实影响也没有很大，保持一个好心态。我最后整个考试只花费了一个小时二十分钟，官方给的三个小时的考试时间是绰绰有余的。

个人感受

我最开始在备考的时候想的很简单，找几套真题做做就行了，但是后来发现行不通。一方面，考试涉及到的部分内容是 7.x 版本新增的特性，而我由于换公司的原因这几年基本没玩 ES 了，且我之前玩的 ES 还是 6.x 版本，也就是说其实我需要重新学一遍。另一方面，网上基本找不到 8.1 的考试真题，考试相关的资源被几个大佬搞成了付费增值服务的一部分，说到底还是 ES 的圈子太小了，没啥办法。

最后，考试难不难？一点都不难，做题的整个流程基本就是：1、理解题目内容，提炼考点；2、找到考点对应的官方文档，复制粘贴文档里的代码；3、按题目要求修改代码最后提交（某些步骤可以直接用 Kibana 可视化操作，代码都不用敲）。只要你理解了每个考点，能快速找到每个考点的文档位置，你一定能过。

总结

这种实操类的 IT 考试其实都不难，就是考察的基本功，像我这种几年不玩 ES 的复习两周也能过，大家不要害怕，对自己要有信心。