看到一个很不错的东西, ServerCat.
一个ios app, 可以直接做到0 agent, 直接读取服务器信息, 有点意思.

粗看这一个页面, 信息还挺全.
- 服务器名 hostname
操作系统发行版
这个复杂, 传说中有8种方法.lsb_release -a LSB Version: :core-4.1-amd64:core-4.1-noarch Distributor ID: CentOS Description: CentOS Linux release 7.6.1810 (Core) Release: 7.6.1810 Codename: Core我就记得这一个.
95% 是cpu使用?
核数 这个可以有n种方法,分发行版.
- lscpu
- cat /proc/cpuinfo
- nproc
cpu空闲
运行时间
内存可用
内存已用
页面缓存
top Tasks: 98 total, 1 running, 97 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.3 us, 0.2 sy, 0.0 ni, 99.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 3879880 total, 120400 free, 2509480 used, 1250000 buff/cache KiB Swap: 0 total, 0 free, 0 used. 967944 avail Mem这几个基本命令看top
us, user : time running un-niced user processes sy, system : time running kernel processes ni, nice : time running niced user processes id, idle : time spent in the kernel idle handler wa, IO-wait : time waiting for I/O completion hi : time spent servicing hardware interrupts si : time spent servicing software interrupts st : time stolen from this vm by the hypervisor主要是认得这几个奇葩缩写. 不过这个精确度不够,APP里的个位数哪里来的,需要查查. 还有这个精确到核的负载图.
free -m total used free shared buff/cache available Mem: 3788 2442 110 168 1236 953 Swap: 0 0 0used Used memory = (calculated as total - free - buffers - cache)
这是计算方式. available 是另一个计算方式, 会计算不发生swap前提下的
page.vmstat也算上行 下行速度
上下行流量
重传率
主动建连
被动建连
建连失败 网络这块, 首先是
ip -s -h link命令RX: bytes packets errors dropped overrun mcast 38.7G 183M 0 0 0 0 TX: bytes packets errors dropped carrier collsns 50.4G 255M 0 0 0 0入,出的流量和
packets.ss可以查看链接速度用的是,
iftop? 想做到他这样每个机器都行,感觉好难.这个命令好多 啊. 机器不一定有权限装.重传率是netstat -s 算出来?这也可以?那我也会了..
建连数感觉也是计算出来的 好像真是netstat -st
1867788 active connections openings 60000 passive connection openings 279 failed connection attempts 382854 connection resets received 94 connections established 167725061 segments received 256227690 segments send out 19044 segments retransmited 0 bad segments received. 381837 resets sent硬盘 卷, 挂载位置
文件系统
文件使用和总量
读写速度 量 iops wait
作为一个前端纯APP,能写出来这么多, 真的很厉害. 这些属性, 我一般也就是在zabbix上看过, 手动用命令查, 我估计有一部分还是得搜了才行.
对照zabbix agent 提供的功能检查点 , 复习下.
| Network | Packets/bytes transfered Errors/dropped packets Collisions |
|---|---|
| CPU | Load average CPU idle/usage CPU utilization data per individual process |
| Memory | Free/used memory Swap/pagefile utilization |
| Disk | Space free/used Read and write I/O |
| Service | Process status Process memory usage Service status (ssh, ntp, ldap, smtp, ftp, http, pop, nntp, imap) Windows service status DNS resolution TCP connectivity TCP response time |
| File | File size/time File exists Checksum MD5 hash RegExp search |
| Log | Text log Windows eventlog |
| Other | System uptime System time Users connected Performance counter (Windows) |
cpu
uptime
10:12:43 up 356 days, 23:29, 2 users, load average: 0.11, 0.15, 0.17
1,5,15 3个平均指标是过去3个时间段内的平均负载.
0-1, 偶尔过载还行, 长期过载有风险. 0.7算是合理阈值.
不过我看了一眼男人, 发现一直以来理解错了啊.
man uptime
System load averages is the average number of processes that are either in a runnable or uninterruptable state. A process in a
runnable state is either using the CPU or waiting to use the CPU. A process in uninterruptable state is waiting for some I/O
access, eg waiting for disk. The averages are taken over the three time intervals. Load averages are not normalized for the num‐
ber of CPUs in a system, so a load average of 1 means a single CPU system is loaded all the time while on a 4 CPU system it means
it was idle 75% of the time.
负载是runnable或io等待进程的平均数, 并且没有做normalize. 单核的load上限是1, 多核, 是核数啊.
磁盘部分, 装个iostat 估计可以. 但是不装怎么看. xchange 也是让 装包… 据说app作者,没装就能拿到,厉害啊.
找到一篇5个工具的 还有15个工具 几乎没有0安装的.. 汗,只能记这了. 有需要我还是尽量看zabbix吧. bpf工具 也出来了.
vmstat -D 1 1 vmstat -d 1 1 看起来是唯一一个装了能用的.