看到一个很不错的东西, ServerCat. 一个ios app, 可以直接做到0 agent, 直接读取服务器信息, 有点意思.

img_1.png

粗看这一个页面, 信息还挺全.

  1. 服务器名 hostname
  2. 操作系统发行版
    这个复杂, 传说中有8种方法.

    lsb_release -a
    LSB Version:    :core-4.1-amd64:core-4.1-noarch
    Distributor ID: CentOS
    Description:    CentOS Linux release 7.6.1810 (Core)
    Release:        7.6.1810
    Codename:       Core
    

    我就记得这一个.

  3. 95% 是cpu使用?

  4. 核数 这个可以有n种方法,分发行版.

    1. lscpu
    2. cat /proc/cpuinfo
    3. nproc
  5. cpu空闲

  6. 运行时间

  7. 内存可用

  8. 内存已用

  9. 页面缓存

      top 
        Tasks:  98 total,   1 running,  97 sleeping,   0 stopped,   0 zombie
        %Cpu(s):  0.3 us,  0.2 sy,  0.0 ni, 99.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
        KiB Mem :  3879880 total,   120400 free,  2509480 used,  1250000 buff/cache
        KiB Swap:        0 total,        0 free,        0 used.   967944 avail Mem
    

    这几个基本命令看top

    us, user    : time running un-niced user processes
    sy, system  : time running kernel processes
    ni, nice    : time running niced user processes
    id, idle    : time spent in the kernel idle handler
    wa, IO-wait : time waiting for I/O completion
    hi : time spent servicing hardware interrupts
    si : time spent servicing software interrupts
    st : time stolen from this vm by the hypervisor
    

    主要是认得这几个奇葩缩写. 不过这个精确度不够,APP里的个位数哪里来的,需要查查. 还有这个精确到核的负载图.

    free -m
                  total        used        free      shared  buff/cache   available
    Mem:           3788        2442         110         168        1236         953
    Swap:             0           0           0
    

    used Used memory = (calculated as total - free - buffers - cache)

    这是计算方式. available 是另一个计算方式, 会计算不发生swap前提下的page. vmstat也算

  10. 上行 下行速度

  11. 上下行流量

  12. 重传率

  13. 主动建连

  14. 被动建连

  15. 建连失败 网络这块, 首先是ip -s -h link命令

     RX: bytes  packets  errors  dropped overrun mcast
    38.7G      183M     0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    50.4G      255M     0       0       0       0
    

    入,出的流量和packets. ss 可以查看链接

    速度用的是, iftop? 想做到他这样每个机器都行,感觉好难.这个命令好多 啊. 机器不一定有权限装.

    重传率是netstat -s 算出来?这也可以?那我也会了..

    建连数感觉也是计算出来的 好像真是netstat -st

    1867788 active connections openings
    60000 passive connection openings
    279 failed connection attempts
    382854 connection resets received
    94 connections established
    167725061 segments received
    256227690 segments send out
    19044 segments retransmited
    0 bad segments received.
    381837 resets sent
    
  16. 硬盘 卷, 挂载位置

  17. 文件系统

  18. 文件使用和总量

  19. 读写速度 量 iops wait

作为一个前端纯APP,能写出来这么多, 真的很厉害. 这些属性, 我一般也就是在zabbix上看过, 手动用命令查, 我估计有一部分还是得搜了才行.

对照zabbix agent 提供的功能检查点 , 复习下.

Network Packets/bytes transfered
Errors/dropped packets
Collisions
CPU Load average
CPU idle/usage
CPU utilization data per individual process
Memory Free/used memory
Swap/pagefile utilization
Disk Space free/used
Read and write I/O
Service Process status
Process memory usage
Service status (ssh, ntp, ldap, smtp, ftp, http, pop, nntp, imap)
Windows service status
DNS resolution
TCP connectivity
TCP response time
File File size/time
File exists
Checksum
MD5 hash
RegExp search
Log Text log
Windows eventlog
Other System uptime
System time
Users connected
Performance counter (Windows)

cpu

uptime
10:12:43 up 356 days, 23:29,  2 users,  load average: 0.11, 0.15, 0.17

1,5,15 3个平均指标是过去3个时间段内的平均负载.

查解释还行

0-1, 偶尔过载还行, 长期过载有风险. 0.7算是合理阈值.

不过我看了一眼男人, 发现一直以来理解错了啊.

man uptime

       System load averages is the average number of processes that are either in a runnable or uninterruptable state.  A  process  in  a
       runnable  state  is  either  using  the CPU or waiting to use the CPU.  A process in uninterruptable state is waiting for some I/O
       access, eg waiting for disk.  The averages are taken over the three time intervals.  Load averages are not normalized for the num‐
       ber  of CPUs in a system, so a load average of 1 means a single CPU system is loaded all the time while on a 4 CPU system it means
       it was idle 75% of the time.

负载是runnableio等待进程的平均数, 并且没有做normalize. 单核的load上限是1, 多核, 是核数啊.

磁盘部分, 装个iostat 估计可以. 但是不装怎么看. xchange 也是让 装包… 据说app作者,没装就能拿到,厉害啊.

找到一篇5个工具的 还有15个工具 几乎没有0安装的.. 汗,只能记这了. 有需要我还是尽量看zabbix吧. bpf工具 也出来了.

vmstat -D 1 1 vmstat -d 1 1 看起来是唯一一个装了能用的.