跳到主要内容

Contour & Envoy 指标

本文档内容与Contour与 Envoy 的metrics相关, 据Contour与envoy的官网内容并参考源码、第三方blog等汇总而成. 因为 Contour 指标较少, 所以在文中全部列出,且个人主观的将其暂分为:必须与可选的; 在envoy中 metrics被称为stats 通过管理接口向公众暴露,因其指标过多,所以按官方建议只选取Cluster与HCM相关指标,并只抽取个人认为比较重要且现阶段需要的指标列出,如需要相看详情,请点击相应链接.

Contour 指标

RequiredNameTypeDescription
YEScontour_build_infoGaugebranch, revision, version
NOcontour_cachehandler_onupdate_duration_secondsSummaryHistogram for the runtime of xDS cache regeneration.
YEScontour_dagrebuild_timestampGaugeTimestamp of the last DAG rebuild.
NOcontour_dagrebuild_totalCounterTotal number of times DAG has been rebuilt since startup
YEScontour_eventhandler_operation_totalCounterTotal number of Kubernetes object changes Contour has received by operation and object kind.
YEScontour_httpproxyGaugeTotal number of HTTPProxies that exist regardless of status.
NOcontour_httpproxy_invalidGaugeTotal number of invalid HTTPProxies.
NOcontour_httpproxy_orphanedGaugeTotal number of orphaned HTTPProxies which have no root delegating to them.
YEScontour_httpproxy_rootGaugeTotal number of root HTTPProxies. Note there will only be a single
YEScontour_httpproxy_validGaugeTotal number of valid HTTPProxies.

Envoy 指标

Cluster

General

NameTypeDescription
cluster_added/modified/removed/updatedCounterTotal clusters added / modified / removed / updated
active_clustersGaugeNumber of currently active (warmed) clusters
upstream_cx_totalCounterTotal connections
upstream_cx_activeGaugeTotal active connections
upstream_cx_connect_failCounterTotal connection failures
upstream_cx_connect_timeoutCounterTotal connection connect timeouts
upstream_cx_connect_msHistogramConnection establishment milliseconds
upstream_cx_rx_bytes_totalCounterTotal received connection bytes
upstream_cx_tx_bytes_totalCounterTotal sent connection bytes
upstream_rq_totalCounterTotal requests
upstream_rq_active GaugeTotalactive requests
upstream_rq_timeoutCounterTotal requests that timed out waiting for a response
upstream_rq_retryCounterTotal request retries
upstream_rq_retry_successCounterTotal request retry successes

Request Response Size statistics

NameTypeDescription
upstream_rq_headers_sizeHistogramRequest headers size in bytes per upstream
upstream_rq_body_sizeHistogramRequest body size in bytes per upstream
upstream_rs_headers_sizeHistogramResponse headers size in bytes per upstream
upstream_rs_body_sizeHistogramResponse body size in bytes per upstream

HCM

General

NameTypeDescription
downstream_cx_totalCounterTotal connections
downstream_cx_activeGaugeTotal active connections
downstream_cx_rx_bytes_totalCounterTotal bytes received
downstream_cx_tx_bytes_totalCounterTotal bytes sent
downstream_rq_totalCounterTotal requests
downstream_rq_activeGaugeTotal active requests
downstream_rq_1/2/3/4/5xxCounterTotal 1/2/3/4/5xx responses
downstream_rq_completedCounterTotal requests that resulted in a response (e.g. does not include aborted requests)
downstream_rq_timeHistogramTotal time for request and response (milliseconds)

Per user agent statistics

NameTypeDescription
downstream_cx_totalCounterTotal connections
downstream_rq_totalCounterTotal requests

HTTP per listener statistics¶

NameTypeDescription
downstream_rq_completedCounterTotal responses
downstream_rq_1/2/3/4/5xxCounterTotal 1/2/3/4/5xx responses

Tracing statistics

NameTypeDescription
random_samplingCounterTotal number of traceable decisions by random sampling
service_forcedCounterTotal number of traceable decisions by server runtime flag tracing.global_enabled
client_enabledCounterTotal number of traceable decisions by request header x-envoy-force-trace
not_traceableCounterTotal number of non-traceable decisions by request id
health_checkCounterTotal number of non-traceable decisions by health check

参考


关于监控指标的问题答疑:

  • 实时 启动之后一直有的
  • 时间范围的数据
  • 数据统计维度
    • contour httpproxy count
    • envoy request count
    • httpproxy -> route -> request info summary
  • 现阶段暂时未接入 prom,需要考虑如何处理,接入后会联动到 grafana,这部分跟 insight 怎么处理
  • 告警怎么做?拿到了数据指标之后怎么处理

image-20221120000945266