初探zabbix_agent2 plugin
概述
-
zabbix_agent2作为可以完全替代zabbix_agent功能的客户端,较以往的功能非常强大。 采用go语言进行编写,插件化方式对监控的能力进行管理。 一栈式代理能力,官方提供的5.2版本已经具有很强的监控能力
zabbix_agent2指标
-
在代理运行的情况下,我们可以执行zabbix_agent2 -R metrics 获取当前代理所支持的指标,以及指标的运行情况
[Agent] active: true capacity: 0/100 tasks: 0 agent.hostname: Returns Hostname from agent configuration. agent.ping: Returns agent availability check result. agent.version: Version of Zabbix agent. [Ceph] active: false capacity: 0/100 tasks: 0 ceph.df.details: Returns information about cluster’s data usage and distribution among pools. ceph.osd.discovery: Returns a list of discovered OSDs. ceph.osd.dump: Returns usage thresholds and statuses of OSDs. ceph.osd.stats: Returns aggregated and per OSD statistics. ceph.ping: Tests if a connection is alive or not. ceph.pool.discovery: Returns a list of discovered pools. ceph.status: Returns an overall clusters status. [Cpu] active: true capacity: 0/100 tasks: 12 system.cpu.discovery: List of detected CPUs/CPU cores, used for low-level discovery. system.cpu.num: Number of CPUs. system.cpu.util: CPU utilisation percentage. [Docker] active: false capacity: 0/100 tasks: 0 docker.container_info: Return low-level information about a container. docker.container_stats: Returns near realtime stats for a given container. docker.containers: Returns a list of containers. docker.containers.discovery: Returns a list of containers, used for low-level discovery. docker.data_usage: Returns information about current data usage. docker.images: Returns a list of images. docker.images.discovery: Returns a list of images, used for low-level discovery. docker.info: Returns information about the docker server. docker.ping: Pings the server and returns 0 or 1. [File] active: true capacity: 0/100 tasks: 3 vfs.file.cksum: Returns File checksum, calculated by the UNIX cksum algorithm. vfs.file.contents: Retrieves contents of the file. vfs.file.exists: Returns if file exists or not. vfs.file.md5sum: Returns MD5 checksum of file. vfs.file.regexp: Find string in a file. vfs.file.regmatch: Find string in a file. vfs.file.size: Returns file size. vfs.file.time: Returns file time information. [Kernel] active: true capacity: 0/100 tasks: 2 kernel.maxfiles: Returns maximum number of opened files supported by OS. kernel.maxproc: Returns maximum number of processes supported by OS. [Log] active: false capacity: 0/100 tasks: 0 log: Log file monitoring. log.count: Count of matched lines in log file monitoring. logrt: Log file monitoring with log rotation support. logrt.count: Count of matched lines in log file monitoring with log rotation support. [MQTT] active: false capacity: 0/100 tasks: 0 mqtt.get: Subscribe to MQTT topics for published messages. [Memcached] active: false capacity: 0/100 tasks: 0 memcached.ping: Test if connection is alive or not. memcached.stats: Returns output of stats command. [Memory] active: true capacity: 0/100 tasks: 3 vm.memory.size: Returns memory size in bytes or in percentage from total. [Modbus] active: false capacity: 0/100 tasks: 0 modbus.get: Returns a JSON array of the requested values, usage: modbus.get[endpoint,<slave id>,<function>,<address>,<count>,<type>,<endianness>,<offset>]. [Mongo] active: false capacity: 0/100 tasks: 0 mongodb.cfg.discovery: Returns a list of discovered config servers. mongodb.collection.stats: Returns a variety of storage statistics for a given collection. mongodb.collections.discovery: Returns a list of discovered collections. mongodb.collections.usage: Returns usage statistics for collections. mongodb.connpool.stats: Returns information regarding the open outgoing connections from the current database instance to other members of the sharded cluster or replica set. mongodb.db.discovery: Returns a list of discovered databases. mongodb.db.stats: Returns statistics reflecting a given database system’s state. mongodb.jumbo_chunks.count: Returns count of jumbo chunks. mongodb.oplog.stats: Returns a status of the replica set, using data polled from the oplog. mongodb.ping: Test if connection is alive or not. mongodb.rs.config: Returns a current configuration of the replica set. mongodb.rs.status: Returns a replica set status from the point of view of the member where the method is run. mongodb.server.status: Returns a database’s state. mongodb.sh.discovery: Returns a list of discovered shards present in the cluster. [Mysql] active: false capacity: 0/100 tasks: 0 mysql.db.discovery: Returns list of databases in LLD format. mysql.db.size: Returns size of given database in bytes. mysql.get_status_variables: Returns values of global status variables. mysql.ping: Tests if connection is alive or not. mysql.replication.discovery: Returns replication information in LLD format. mysql.replication.get_slave_status: Returns replication status. mysql.version: Returns MySQL version. [NetIf] active: true capacity: 0/100 tasks: 7 net.if.collisions: Returns number of out-of-window collisions. net.if.discovery: Returns list of network interfaces. Used for low-level discovery. net.if.in: Returns incoming traffic statistics on network interface. net.if.out: Returns outgoing traffic statistics on network interface. net.if.total: Returns sum of incoming and outgoing traffic statistics on network interface. [Oracle] active: false capacity: 0/100 tasks: 0 oracle.archive.discovery: Returns list of archive logs in LLD format. oracle.archive.info: Returns archive logs statistics. oracle.cdb.info: Returns CDBs info. oracle.custom.query: Returns result of a custom query. oracle.datafiles.stats: Returns data files statistics. oracle.db.discovery: Returns list of databases in LLD format. oracle.diskgroups.discovery: Returns list of ASM disk groups in LLD format. oracle.diskgroups.stats: Returns ASM disk groups statistics. oracle.fra.stats: Returns FRA statistics. oracle.instance.info: Returns instance stats. oracle.pdb.discovery: Returns list of PDBs in LLD format. oracle.pdb.info: Returns PDBs info. oracle.pga.stats: Returns PGA statistics. oracle.ping: Tests if connection is alive or not. oracle.proc.stats: Returns processes statistics. oracle.redolog.info: Returns log file information from the control file. oracle.sessions.stats: Returns sessions statistics. oracle.sga.stats: Returns SGA statistics. oracle.sys.metrics: Returns a set of system metric values. oracle.sys.params: Returns a set of system parameter values. oracle.ts.discovery: Returns list of tablespaces in LLD format. oracle.ts.stats: Returns tablespaces statistics. oracle.user.info: Returns user information. [Postgres] active: false capacity: 0/100 tasks: 0 pgsql.archive: Returns info about size of archive files. pgsql.autovacuum.count: Returns count of autovacuum workers. pgsql.bgwriter: Returns JSON for sum of each type of bgwriter statistic. pgsql.cache.hit: Returns cache hit percent. pgsql.connections: Returns JSON for sum of each type of connection. pgsql.custom.query: Returns result of a custom query. pgsql.db.age: Returns age for specific database. pgsql.db.bloating_tables: Returns percent of bloating tables for each database. pgsql.db.discovery: Returns JSON discovery rule with names of databases. pgsql.db.size: Returns size in bytes for specific database. pgsql.dbstat: Returns JSON for sum of each type of statistic. pgsql.dbstat.sum: Returns JSON for sum of each type of statistic for all database. pgsql.locks: Returns collect all metrics from pg_locks. pgsql.oldest.xid: Returns age of oldest xid. pgsql.ping: Tests if connection is alive or not. pgsql.replication.count: Returns number of standby servers. pgsql.replication.lag.b: Returns replication lag with Master in byte. pgsql.replication.lag.sec: Returns replication lag with Master in seconds. pgsql.replication.process: Returns flush lag, write lag and replay lag per each sender process. pgsql.replication.process.discovery: Returns JSON with application name from pg_stat_replication. pgsql.replication.recovery_role: Returns postgreSQL recovery role. pgsql.replication.status: Returns postgreSQL replication status. pgsql.uptime: Returns uptime. pgsql.wal.stat: Returns JSON wal by type. [Proc] active: false capacity: 0/100 tasks: 0 proc.cpu.util: Process CPU utilization percentage. [ProcExporter] active: false capacity: 0/100 tasks: 0 proc.mem: Process memory utilization values. [Redis] active: false capacity: 0/100 tasks: 0 redis.config: Returns configuration parameters of Redis server. redis.info: Returns output of INFO command. redis.ping: Test if connection is alive or not. redis.slowlog.count: Returns the number of slow log entries since Redis has been started. [Smart] active: false capacity: 0/100 tasks: 0 smart.attribute.discovery: Returns JSON array of smart device attributes. smart.disk.discovery: Returns JSON array of smart devices. smart.disk.get: Returns JSON data of smart device. [Sw] active: true capacity: 0/100 tasks: 1 system.sw.packages: Lists installed packages whose name matches the given package regular expression. [Swap] active: true capacity: 0/100 tasks: 3 system.swap.size: Returns Swap space size in bytes or in percentage from total. [SystemRun] active: false capacity: 0/100 tasks: 0 system.run: Run specified command. [Systemd] active: false capacity: 0/100 tasks: 0 systemd.unit.discovery: Returns JSON array of discovered units, usage: systemd.unit.discovery[<type>]. systemd.unit.get: Returns the bulked info, usage: systemd.unit.get[unit,<interface>]. systemd.unit.info: Returns the unit info, usage: systemd.unit.info[unit,<parameter>,<interface>]. [TCP] active: false capacity: 0/100 tasks: 0 net.tcp.port: Checks if it is possible to make TCP connection to specified port. net.tcp.service: Checks if service is running and accepting TCP connections. net.tcp.service.perf: Checks performance of TCP service. [UDP] active: false capacity: 0/100 tasks: 0 net.udp.service: Checks if service is running and responding to UDP requests. net.udp.service.perf: Checks performance of UDP service. [Uname] active: true capacity: 0/100 tasks: 3 system.hostname: Returns system host name. system.sw.arch: Software architecture information. system.uname: Returns system uname. [Uptime] active: true capacity: 0/100 tasks: 1 system.uptime: Returns system uptime in seconds. [Users] active: true capacity: 0/100 tasks: 1 system.users.num: Returns number of useres logged in. [VFSDev] active: true capacity: 0/100 tasks: 2 vfs.dev.discovery: List of block devices and their type. Used for low-level discovery. vfs.dev.read: Disk read statistics. vfs.dev.write: Disk write statistics. [VfsFs] active: true capacity: 0/100 tasks: 13 vfs.fs.discovery: List of mounted filesystems. Used for low-level discovery. vfs.fs.get: List of mounted filesystems with statistics. vfs.fs.inode: Disk space in bytes or in percentage from total. vfs.fs.size: Disk space in bytes or in percentage from total. [Web] active: false capacity: 0/100 tasks: 0 web.page.get: Get content of a web page. web.page.perf: Loading time of full web page (in seconds). web.page.regexp: Find string on a web page. [ZabbixAsync] active: true capacity: 0/100 tasks: 7 net.tcp.listen: Checks if this TCP port is in LISTEN state. net.udp.listen: Checks if this UDP port is in LISTEN state. sensor: Hardware sensor reading. system.boottime: Returns system boot time. system.cpu.intr: Device interrupts. system.cpu.load: CPU load. system.cpu.switches: Count of context switches. system.hw.cpu: CPU information. system.hw.macaddr: Listing of MAC addresses. system.localtime: Returns system local time. system.sw.os: Operating system information. system.swap.in: Swap in (from device into memory) statistics. system.swap.out: Swap out (from memory onto device) statistics. [ZabbixStats] active: false capacity: 0/100 tasks: 0 zabbix.stats: Return a set of Zabbix server or proxy internal metrics or return number of monitored items in the queue which are delayed on Zabbix server or proxy. [ZabbixSync] active: true capacity: 0/1 tasks: 2 net.dns: Checks if DNS service is up. net.dns.record: Performs DNS query. proc.num: The number of processes. system.hw.chassis: Chassis information. system.hw.devices: Listing of PCI or USB devices. vfs.dir.count: Directory entry count. vfs.dir.size: Directory size (in bytes).
-
从以上可以看到,按照组,zabbix_agent2代理已经支持很多种类型的软件,并且这些以插件的形式进行管理,在未启用的情况下,处于未激活状态,并不消耗资源,只有关联到的模板监控项时,才会进行启用。 预计未来zabbix_agent2的插件会整合更多的软件监控解决方案
打造一个简易插件
-
通过官方文档,我们可以了解到如何去打造一个属于自己的一个插件。相关文档资料链接 官方文档中的一个示例有明显错误,返回值不满足函数定义 一个插件至少要继承一个或多个插件接口(Exporter, Collector, Runner, Watcher),我们选择最简单的一种方式,Exporter 根据官方文档,我们需要定义一个Plugin结构体,包含plugin.Base,并且实现Export方法,代码如下:
type Plugin struct { plugin.Base } var impl Plugin func (p *Plugin) Export(key string, params []string, ctx plugin.ContextProvider) (result interface{ }, err error) { switch key { case "system.mytime": if len(params) > 0 { p.Debugf("received %d parameters while expected none", len(params)) return nil, errors.New("Too many parameters") } return time.Now().Format(time.RFC3339), nil case "system.echo": return params[0], nil default: return nil, plugin.UnsupportedMetricError } }
这里解释一下,这里的key就是监控项,params就是监控项所允许的参数内容,如我上述代码,当监控项为"system.mytime"时,如果带有参数,则会报错。当监控项为"system.echo"时,则会回显出第一个参数值出来,这里并没有对参数内容做判断,当参数长度为0时,会引发错误,所以在实际项目时,建议完善参数配置,并给出配置文档。
-
注册指标,通过包名的init函数,自动初始化注册指标,代码如下
func init() { plugin.RegisterMetrics(&impl, "myTime", "system.mytime", "Returns time string in RFC 3999 format.", "system.echo","Echo what you type in!") }
impl为包里面定义的一个插件变量,myTime是属性组,注册指标时,需要注意要把监控项和描述一对一对的注册,否则会引发运行时错误。 另外描述必须以大写字母开头,以英文句点结束,否则将引发运行时异常,无法启动代理,将出现如下报错。
panic: cannot register metric "system.echo" without dot at the end of description: "Just for test!" panic: cannot register metric "system.echo" with description without capital first letter: "just for test!"
-
添加插件 找到插件添加文件plugins_linux.go 添加我们自己新建的插件目录,即可完成插件的添加
_ "zabbix.com/plugins/yeqing/mydemo"
编译安装
-
整个编译过程跟官方使用源码方式部署一致,这里不再赘述。
验证
-
我们可以通过zabbix_agent2 -R metrics 来检查是否已经包含了我们的插件
[myTime] active: true capacity: 0/100 tasks: 1 system.echo: Just for test. system.mytime: Returns time string in RFC 3999 format.
我们可以看到已经有了自己编写的插件了
-
使用监控项验证
./zabbix_agent2 -t system.echo[xx,dd,dd] system.echo[xx,dd,dd] [s|xx]
可以看到,已经按照我们的要求,返回对应的内容了
-
监控项验证 在zabbix中增加一个监控项,从zabbix前端查看数据
总结
-
zabbix_agent2确实很强大,通过go语言,很轻松就可以实现插件的编写,使监控更加灵活,强大。
上一篇:
多线程四大经典案例