Contour & Envoy 指标
本文档内容与Contour与 Envoy 的metrics相关, 据Contour与envoy的官网内容并参考源码、第三方blog等汇总而成. 因为 Contour 指标较少, 所以在文中全部列出,且个人主观的将其暂分为:必须与可选的; 在envoy中 metrics被称为stats 通过管理接口向公众暴露,因其指标过多,所以按官方建议只选取Cluster与HCM相关指标,并只抽取个人认为比较重要且现阶段需要的指标列出,如需要相看详情,请点击相应链接.
Contour 指标
Required | Name | Type | Description | |
---|---|---|---|---|
YES | contour_build_info | Gauge | branch, revision, version | |
NO | contour_cachehandler_onupdate_duration_seconds | Summary | Histogram for the runtime of xDS cache regeneration. | |
YES | contour_dagrebuild_timestamp | Gauge | Timestamp of the last DAG rebuild. | |
NO | contour_dagrebuild_total | Counter | Total number of times DAG has been rebuilt since startup | |
YES | contour_eventhandler_operation_total | Counter | Total number of Kubernetes object changes Contour has received by operation and object kind. | |
YES | contour_httpproxy | Gauge | Total number of HTTPProxies that exist regardless of status. | |
NO | contour_httpproxy_invalid | Gauge | Total number of invalid HTTPProxies. | |
NO | contour_httpproxy_orphaned | Gauge | Total number of orphaned HTTPProxies which have no root delegating to them. | |
YES | contour_httpproxy_root | Gauge | Total number of root HTTPProxies. Note there will only be a single | |
YES | contour_httpproxy_valid | Gauge | Total number of valid HTTPProxies. |
Envoy 指标
Cluster
General
Name | Type | Description |
---|---|---|
cluster_added/modified/removed/updated | Counter | Total clusters added / modified / removed / updated |
active_clusters | Gauge | Number of currently active (warmed) clusters |
upstream_cx_total | Counter | Total connections |
upstream_cx_active | Gauge | Total active connections |
upstream_cx_connect_fail | Counter | Total connection failures |
upstream_cx_connect_timeout | Counter | Total connection connect timeouts |
upstream_cx_connect_ms | Histogram | Connection establishment milliseconds |
upstream_cx_rx_bytes_total | Counter | Total received connection bytes |
upstream_cx_tx_bytes_total | Counter | Total sent connection bytes |
upstream_rq_total | Counter | Total requests |
upstream_rq_active Gauge | Total | active requests |
upstream_rq_timeout | Counter | Total requests that timed out waiting for a response |
upstream_rq_retry | Counter | Total request retries |
upstream_rq_retry_success | Counter | Total request retry successes |
Request Response Size statistics
Name | Type | Description |
---|---|---|
upstream_rq_headers_size | Histogram | Request headers size in bytes per upstream |
upstream_rq_body_size | Histogram | Request body size in bytes per upstream |
upstream_rs_headers_size | Histogram | Response headers size in bytes per upstream |
upstream_rs_body_size | Histogram | Response body size in bytes per upstream |
HCM
General
Name | Type | Description |
---|---|---|
downstream_cx_total | Counter | Total connections |
downstream_cx_active | Gauge | Total active connections |
downstream_cx_rx_bytes_total | Counter | Total bytes received |
downstream_cx_tx_bytes_total | Counter | Total bytes sent |
downstream_rq_total | Counter | Total requests |
downstream_rq_active | Gauge | Total active requests |
downstream_rq_1/2/3/4/5xx | Counter | Total 1/2/3/4/5xx responses |
downstream_rq_completed | Counter | Total requests that resulted in a response (e.g. does not include aborted requests) |
downstream_rq_time | Histogram | Total time for request and response (milliseconds) |
Per user agent statistics
Name | Type | Description |
---|---|---|
downstream_cx_total | Counter | Total connections |
downstream_rq_total | Counter | Total requests |
HTTP per listener statistics¶
Name | Type | Description |
---|---|---|
downstream_rq_completed | Counter | Total responses |
downstream_rq_1/2/3/4/5xx | Counter | Total 1/2/3/4/5xx responses |
Tracing statistics
Name | Type | Description |
---|---|---|
random_sampling | Counter | Total number of traceable decisions by random sampling |
service_forced | Counter | Total number of traceable decisions by server runtime flag tracing.global_enabled |
client_enabled | Counter | Total number of traceable decisions by request header x-envoy-force-trace |
not_traceable | Counter | Total number of non-traceable decisions by request id |
health_check | Counter | Total number of non-traceable decisions by health check |
参考
关于监控指标的问题答疑:
- 实时 启动之后一直有的
- 时间范围的数据
- 数据统计维度
- contour httpproxy count
- envoy request count
- httpproxy -> route -> request info summary
- 现阶段暂时未接入 prom,需要考虑如何处理,接入后会联动到 grafana,这部分跟 insight 怎么处理
- 告警怎么做?拿到了数据指标之后怎么处理