Contour & Envoy 指标
本文档内容与Contour与 Envoy 的metrics相关, 据Contour与envoy的官网内容并参考源码、第三方blog等汇总而成. 因为 Contour 指标较少, 所以在文中全部列出,且个人主观的将其暂分为:必须与可选的; 在envoy中 metrics被称为stats 通过管理接口向公众暴露,因其指标过多,所以按官方建议只选取Cluster与HCM相关指标,并只抽取个人认为比较重要且现阶段需要的指标列出,如需要相看详情,请点击相应链接.
Contour 指标
Required |
Name |
Type |
Description |
|
YES |
contour_build_info |
Gauge |
branch, revision, version |
|
NO |
contour_cachehandler_onupdate_duration_seconds |
Summary |
Histogram for the runtime of xDS cache regeneration. |
|
YES |
contour_dagrebuild_timestamp |
Gauge |
Timestamp of the last DAG rebuild. |
|
NO |
contour_dagrebuild_total |
Counter |
Total number of times DAG has been rebuilt since startup |
|
YES |
contour_eventhandler_operation_total |
Counter |
Total number of Kubernetes object changes Contour has received by operation and object kind. |
|
YES |
contour_httpproxy |
Gauge |
Total number of HTTPProxies that exist regardless of status. |
|
NO |
contour_httpproxy_invalid |
Gauge |
Total number of invalid HTTPProxies. |
|
NO |
contour_httpproxy_orphaned |
Gauge |
Total number of orphaned HTTPProxies which have no root delegating to them. |
|
YES |
contour_httpproxy_root |
Gauge |
Total number of root HTTPProxies. Note there will only be a single |
|
|
|
|
|
|
YES |
contour_httpproxy_valid |
Gauge |
Total number of valid HTTPProxies. |
|
Envoy 指标
General
Name |
Type |
Description |
cluster_added/modified/removed/updated |
Counter |
Total clusters added / modified / removed / updated |
active_clusters |
Gauge |
Number of currently active (warmed) clusters |
upstream_cx_total |
Counter |
Total connections |
upstream_cx_active |
Gauge |
Total active connections |
upstream_cx_connect_fail |
Counter |
Total connection failures |
upstream_cx_connect_timeout |
Counter |
Total connection connect timeouts |
upstream_cx_connect_ms |
Histogram |
Connection establishment milliseconds |
upstream_cx_rx_bytes_total |
Counter |
Total received connection bytes |
upstream_cx_tx_bytes_total |
Counter |
Total sent connection bytes |
upstream_rq_total |
Counter |
Total requests |
upstream_rq_active Gauge |
Total |
active requests |
upstream_rq_timeout |
Counter |
Total requests that timed out waiting for a response |
upstream_rq_retry |
Counter |
Total request retries |
upstream_rq_retry_success |
Counter |
Total request retry successes |
Request Response Size statistics
Name |
Type |
Description |
upstream_rq_headers_size |
Histogram |
Request headers size in bytes per upstream |
upstream_rq_body_size |
Histogram |
Request body size in bytes per upstream |
upstream_rs_headers_size |
Histogram |
Response headers size in bytes per upstream |
upstream_rs_body_size |
Histogram |
Response body size in bytes per upstream |
Name |
Type |
Description |
downstream_cx_total |
Counter |
Total connections |
downstream_cx_active |
Gauge |
Total active connections |
downstream_cx_rx_bytes_total |
Counter |
Total bytes received |
downstream_cx_tx_bytes_total |
Counter |
Total bytes sent |
downstream_rq_total |
Counter |
Total requests |
downstream_rq_active |
Gauge |
Total active requests |
downstream_rq_1/2/3/4/5xx |
Counter |
Total 1/2/3/4/5xx responses |
downstream_rq_completed |
Counter |
Total requests that resulted in a response (e.g. does not include aborted requests) |
downstream_rq_time |
Histogram |
Total time for request and response (milliseconds) |
Name |
Type |
Description |
downstream_cx_total |
Counter |
Total connections |
downstream_rq_total |
Counter |
Total requests |
Name |
Type |
Description |
downstream_rq_completed |
Counter |
Total responses |
downstream_rq_1/2/3/4/5xx |
Counter |
Total 1/2/3/4/5xx responses |
Name |
Type |
Description |
random_sampling |
Counter |
Total number of traceable decisions by random sampling |
service_forced |
Counter |
Total number of traceable decisions by server runtime flag tracing.global_enabled |
client_enabled |
Counter |
Total number of traceable decisions by request header x-envoy-force-trace |
not_traceable |
Counter |
Total number of non-traceable decisions by request id |
health_check |
Counter |
Total number of non-traceable decisions by health check |
参考
关于监控指标的问题答疑:
- 实时 启动之后一直有的
- 时间范围的数据
- 数据统计维度
- contour httpproxy count
- envoy request count
- httpproxy -> route -> request info summary
- 现阶段暂时未接入 prom,需要考虑如何处理,接入后会联动到 grafana,这部分跟 insight 怎么处理
- 告警怎么做?拿到了数据指标之后怎么处理
