±à¼ÍƼö: |
±¾ÎÄÖ÷Òª½éÉÜÁË¼à¿ØÏµÍ³¸ÅÂÛ¡¢»ù´¡×ÊÔ´¼à¿Ø¡¢Prometheus
¼ò½é¡¢Êý¾ÝÄ£ÐÍ¡¢ÆäËû¼à¿Ø¹¤¾ß¡¢ÆäËû¼à¿Ø¹¤¾ßµÈÏà¹ØÄÚÈÝ¡£
±¾ÎÄÀ´×ÔÓÚbubuko£¬ÓÉ»ðÁú¹ûÈí¼þAnna±à¼¡¢ÍƼö¡£ |
|
Ò»¡¢¼à¿ØÏµÍ³¸ÅÂÛ
¼à¿ØÏµÍ³ÔÚÕâÀïÌØÖ¸¶ÔÊý¾ÝÖÐÐÄµÄ¼à¿Ø£¬Ö÷ÒªÕë¶ÔÊý¾ÝÖÐÐÄÄÚµÄÓ²¼þºÍÈí¼þ½øÐÐ¼à¿ØºÍ¸æ¾¯¡£ÆóÒµµÄ IT ¼Ü¹¹Öð²½´Ó´«Í³µÄÎïÀí·þÎñÆ÷£¬Ç¨ÒƵ½ÒÔÐéÄâ»úΪÖ÷µ¼µÄ
IaaS ÔÆ¡£ÎÞÂÛ»ù´¡¼Ü¹¹ÈçºÎµ÷Õû£¬¶¼Àë²»¿ª¼à¿ØÏµÍ³µÄÖ§³Ö¡£
²»½öÈç´Ë¡£Ô½À´Ô½¸´ÔÓµÄÊý¾ÝÖÐÐÄ»·¾³¶Ô¼à¿ØÏµÍ³Ìá³öÁ˸üÔ½À´Ô½¸ßµÄÒªÇó£ºÐèÒª¼à¿Ø²»Í¬µÄ¶ÔÏó£¬ÀýÈçÈÝÆ÷£¬·Ö²¼Ê½´æ´¢£¬SDNÍøÂ磬·Ö²¼Ê½ÏµÍ³¡£¸÷ÖÖÓ¦ÓóÌÐòµÈ£¬ÖÖÀà·±¶à£¬»¹ÐèÒª²É¼¯ºÍ´æ´¢´óÁ¿µÄ¼à¿ØÊý¾Ý£¬ÀýÈçÿÌìÊýTBÊý¾ÝµÄ²É¼¯»ã×Ü¡£ÒÔ¼°»ùÓÚÕâЩ¼à¿ØÊý¾ÝµÄÖÇÄÜ·ÖÎö£¬¸æ¾¯¼°Ô¤¾¯µÈ¡£
ÔÚÿ¸öÆóÒµµÄÊý¾ÝÖÐÐÄÄÚ£¬»ò¶à»òÉÙ¶¼»áʹÓÃһЩ¿ªÔ´»òÕßÉÌÒµµÄ¼à¿ØÏµÍ³¡£´Ó¼à¿Ø¶ÔÏóµÄ½Ç¶ÈÀ´¿´£¬¿ÉÒÔ½«¼à¿Ø·ÖÎªÍøÂç¼à¿Ø£¬´æ´¢¼à¿Ø£¬·þÎñÆ÷¼à¿ØºÍÓ¦ÓÃ¼à¿ØµÈ£¬ÒòΪÐèÒª¼à¿ØÊý¾ÝÖÐÐĵĸ÷¸ö·½Ãæ¡£ËùÒÔ¼à¿ØÏµÍ³ÐèÒª×öµ½ÃæÃæ¾ãµ½£¬ÔÚÊý¾ÝÖÐÐÄÖг䵱¡°ÌìÑÛ¡°½ÇÉ«¡£


¶þ¡¢»ù´¡×ÊÔ´¼à¿Ø
2.1¡¢ÍøÂç¼à¿Ø
ÍøÂçÐÔÄÜ¼à¿Ø£ºÖ÷񻃾¼°ÍøÂç¼à²â£¬ÍøÂçʵʱÁ÷Á¿¼à¿Ø£¨ÍøÂçÑÓ³Ù¡¢·ÃÎÊÁ¿¡¢³É¹¦ÂÊ£©ºÍÀúÊ·Êý¾Ýͳ¼Æ¡¢»ã×ܺÍÀúÊ·Êý¾Ý·ÖÎöµÈ¹¦ÄÜ¡£
ÍøÂç***¼ì²â£ºÖ÷ÒªÕë¶ÔÄÚÍø»òÕßÍâÍøµÄÍøÂç***¡£ÈçDDoS***µÄ¡£Í¨¹ý·ÖÎöÒì³£Á÷Á¿À´È·¶¨ÍøÂç***ÐÐΪ¡£
É豸¼à¿Ø£ºÖ÷ÒªÕë¶ÔÊý¾ÝÖÐÐÄÄڵĶàÖÖÍøÂçÉ豸½øÐÐ¼à¿Ø¡£°üÀ¨Â·ÓÉÆ÷£¬·À»ðǽºÍ½»»»»úµÈÓ²¼þÉ豸£¬¿ÉÒÔͨ¹ýsnmpµÈÐÒéÊÕ¼¯Êý¾Ý¡£
2.2¡¢´æ´¢¼à¿Ø
´æ´¢ÐÔÄÜ¼à¿Ø·½Ã棺´æ´¢Í¨³£¼à¿Ø¿éµÄ¶ÁдËÙÂÊ£¬IOPS¡£¶ÁдÑÓ³Ù£¬´ÅÅÌÓÃÁ¿µÈ£»Îļþ´æ´¢Í¨³£¼à¿ØÎļþϵͳinode¡£¶ÁдËÙ¶È¡¢Ä¿Â¼È¨Ï޵ȡ£
´æ´¢ÏµÍ³¼à¿Ø·½Ã棺²»Í¬µÄ´æ´¢ÏµÍ³Óв»Í¬µÄÖ¸±ê£¬ÀýÈ磬¶ÔÓÚceph´æ´¢ÐèÒª¼à¿ØOSD, MONµÄÔËÐÐ״̬£¬¸÷ÖÖ״̬pgµÄÊýÁ¿ÒÔ¼°¼¯ÈºIOPSµÈÐÅÏ¢¡£
´æ´¢É豸¼à¿Ø·½Ã棺¶ÔÓÚ¹¹½¨ÔÚx86·þÎñÆ÷ÉϵĴ洢É豸£¬É豸¼à¿ØÍ¨¹ýÿ¸ö´æ´¢½ÚµãÉϵIJɼ¯Æ÷ͳһÊÕ¼¯´ÅÅÌ¡¢SSD¡¢Íø¿¨µÈÉ豸ÐÅÏ¢£»´æ´¢³§ÉÌÒԺںз½Ê½ÌṩÉÌÒµ´æ´¢É豸£¬Í¨³£×Ô´ø¼à¿Ø¹¦ÄÜ£¬¿É¼à¿ØÉ豸µÄÔËÐÐ״̬£¬ÐÔÄܺÍÈÝÁ¿µÄ¡£
2.3¡¢·þÎñÆ÷¼à¿Ø
CPU£ºÉæ¼°Õû¸ö CPU µÄʹÓÃÁ¿¡¢Óû§Ì¬°Ù·Ö±È¡¢ÄÚºË̬°Ù·Ö±È£¬Ã¿¸ö CPU µÄʹÓÃÁ¿¡¢µÈ´ý¶ÓÁг¤¶È¡¢I/O
µÈ´ý°Ù·Ö±È¡¢CPU ÏûºÄ×î¶àµÄ½ø³Ì¡¢ÉÏÏÂÎÄÇл»´ÎÊý¡¢»º´æÃüÖÐÂʵȡ£
ÄÚ´æ£ºÉæ¼°ÄÚ´æµÄʹÓÃÁ¿¡¢Ê£ÓàÁ¿¡¢ÄÚ´æÕ¼ÓÃ×î¸ßµÄ½ø³Ì¡¢½»»»·ÖÇø´óС¡¢È±Ò³Òì³£µÈ¡£
ÍøÂç I/O£ºÉ漰ÿ¸öÍø¿¨µÄÉÏÐÐÁ÷Á¿¡¢ÏÂÐÐÁ÷Á¿¡¢ÍøÂçÑÓ³Ù¡¢¶ª°üÂʵȡ£
´ÅÅÌ I/O£ºÉæ¼°Ó²Å̵ĶÁдËÙÂÊ¡¢IOPS¡¢´ÅÅÌÓÃÁ¿¡¢¶ÁдÑӳٵȡ£
2.4¡¢Öмä¼þ¼à¿Ø
ÏûÏ¢Öмä¼þ£º RabbitMQ¡¢Kafka
Web ·þÎñÖмä¼þ£ºTomcat¡¢Jetty
»º´æÖмä¼þ£ºRedis¡¢Memcached
Êý¾Ý¿âÖмä¼þ£ºMySQL¡¢PostgreSQL
2.5¡¢Ó¦ÓóÌÐò¼à¿Ø£¨APM£©
APMÖ÷ÒªÊÇÕë¶ÔÓ¦ÓóÌÐòµÄ¼à¿Ø£¬°üÀ¨Ó¦ÓóÌÐòµÄÔËÐÐ״̬¼à¿Ø£¬ÐÔÄÜ¼à¿Ø£¬ÈÕÖ¾¼à¿Ø¼°µ÷ÓÃÁ´¸ú×ٵȡ£µ÷ÓÃÁ´¸ú×ÙÊÇָ׷×ÙÕû¸öÇëÇó¹ý³Ì£¨´ÓÓû§·¢ËÍÇëÇó£¬Í¨³£Ö¸ä¯ÀÀÆ÷»òÕßÓ¦Óÿͻ§¶Ë£©µ½ºó¶ËAPI·þÎñÒÔ¼°API·þÎñºÍ¹ØÁªµÄÖмä¼þ£¬»òÕ߯äËû×é¼þÖ®¼äµÄµ÷Ó㬹¹½¨³öÒ»¸öÍêÕûµÄµ÷ÓÃÍØÆË½á¹¹£¬²»½öÈç´Ë£¬APM
»¹¿ÉÒÔ¼à¿Ø×é¼þÄÚ²¿·½·¨µÄµ÷Óòã´Î£¨Controller-->service-->Dao£©»ñȡÿ¸öº¯ÊýµÄÖ´ÐкÄʱ£¬´Ó¶øÎªÐÔÄܵ÷ÓÅÌṩÊý¾ÝÖ§³Å¡£
Ó¦ÓóÌÐò¼à¿Ø¹¤¾ß³ýÁËÓÐ Pinpoint£¬»¹ÓÐ Twitter ¿ªÔ´µÄ Zipkin£¬Apache
SkyWalking£¬ÃÀÍÅ¿ªÔ´µÄ CATµÈ¡£
µ÷Óüü¼à¿Ø

¼¸¿î²úÆ·¶Ô±È


Pinpoint


ͨ¹ý APM ³ýÁË¿ÉÒԽػñ·½·¨µ÷Ó㬻¹¿ÉÒԽػñTCP¡¢HTTPÍøÂçÇëÇ󣬴Ӷø»ñµÃÖ´ÐкÄʱ×µÄ·½·¨ºÍ
SQL Óï¾ä¡¢ÑÓ³Ù×î´óµÄ API µÄÐÅÏ¢¡£
Èý¡¢Prometheus ¼ò½é
3.1¡¢Ê²Ã´ÊÇ Prometheus
Prometheus ÊÇÒ»Ì׿ªÔ´µÄϵͳ¼à¿Ø±¨¾¯¿ò¼Ü¡£ËüÆô·¢ÓÚ Google µÄ borgmon ¼à¿ØÏµÍ³£¬Óɹ¤×÷ÔÚ
SoundCloud µÄ google ǰԱ¹¤ÔÚ 2012 Äê´´½¨£¬×÷ΪÉçÇø¿ªÔ´ÏîÄ¿½øÐпª·¢£¬²¢ÓÚ
2015 ÄêÕýʽ·¢²¼¡£2016 Ä꣬Prometheus Õýʽ¼ÓÈë Cloud Native Computing
Foundation£¬³ÉΪÊÜ»¶Ó¶È½ö´ÎÓÚ Kubernetes µÄÏîÄ¿¡£
3.2¡¢Óŵã
Ç¿´óµÄ¶àά¶ÈÊý¾ÝÄ£ÐÍ£º
ʱ¼äÐòÁÐÊý¾Ýͨ¹ý metric ÃûºÍ¼üÖµ¶ÔÀ´Çø·Ö¡£
ËùÓÐµÄ metrics ¶¼¿ÉÒÔÉèÖÃÈÎÒâµÄ¶àά±êÇ©¡£
Êý¾ÝÄ£Ð͸üËæÒ⣬²»ÐèÒª¿ÌÒâÉèÖÃΪÒÔµã·Ö¸ôµÄ×Ö·û´®¡£
¿ÉÒÔ¶ÔÊý¾ÝÄ£ÐͽøÐоۺϣ¬ÇиîºÍÇÐÆ¬²Ù×÷¡£
Ö§³ÖË«¾«¶È¸¡µãÀàÐÍ£¬±êÇ©¿ÉÒÔÉèΪȫ unicode¡£
Áé»î¶øÇ¿´óµÄ²éѯÓï¾ä£¨PromQL£©£ºÔÚͬһ¸ö²éѯÓï¾ä£¬¿ÉÒÔ¶Ô¶à¸ö metrics ½øÐг˷¨¡¢¼Ó·¨¡¢Á¬½Ó¡¢È¡·ÖÊýλµÈ²Ù×÷¡£
Ò×ÓÚ¹ÜÀí£º Prometheus server ÊÇÒ»¸öµ¥¶ÀµÄ¶þ½øÖÆÎļþ£¬¿ÉÖ±½ÓÔÚ±¾µØ¹¤×÷£¬²»ÒÀÀµÓÚ·Ö²¼Ê½´æ´¢¡£
¸ßЧ£ºÆ½¾ùÿ¸ö²ÉÑùµã½öÕ¼ 3.5 bytes£¬ÇÒÒ»¸ö Prometheus server ¿ÉÒÔ´¦ÀíÊý°ÙÍòµÄ
metrics¡£
ʹÓà pull ģʽ²É¼¯Ê±¼äÐòÁÐÊý¾Ý£¬ÕâÑù²»½öÓÐÀûÓÚ±¾»ú²âÊÔ¶øÇÒ¿ÉÒÔ±ÜÃâÓÐÎÊÌâµÄ·þÎñÆ÷ÍÆËÍ»µµÄ metrics¡£
¿ÉÒÔ²ÉÓà push gateway µÄ·½Ê½°Ñʱ¼äÐòÁÐÊý¾ÝÍÆËÍÖÁ Prometheus server
¶Ë¡£
¿ÉÒÔͨ¹ý·þÎñ·¢ÏÖ»òÕß¾²Ì¬ÅäÖÃÈ¥»ñÈ¡¼à¿ØµÄ targets¡£
ÓжàÖÖ¿ÉÊÓ»¯Í¼ÐνçÃæ¡£
Ò×ÓÚÉìËõ¡£
3.3¡¢×é¼þ
Prometheus Éú̬ȦÖаüº¬Á˶à¸ö×é¼þ£¬ÆäÖÐÐí¶à×é¼þÊÇ¿ÉÑ¡µÄ£º
Prometheus Server: ÓÃÓÚÊÕ¼¯ºÍ´æ´¢Ê±¼äÐòÁÐÊý¾Ý¡£
Client Library: ¿Í»§¶Ë¿â£¬ÎªÐèÒª¼à¿ØµÄ·þÎñÉú³ÉÏàÓ¦µÄ metrics ²¢±©Â¶¸ø Prometheus
server¡£µ± Prometheus server À´ pull ʱ£¬Ö±½Ó·µ»ØÊµÊ±×´Ì¬µÄ metrics¡£
Push Gateway: Ö÷ÒªÓÃÓÚ¶ÌÆÚµÄ jobs¡£ÓÉÓÚÕâÀà jobs ´æÔÚʱ¼ä½Ï¶Ì£¬¿ÉÄÜÔÚ Prometheus
À´ pull ֮ǰ¾ÍÏûʧÁË¡£Îª´Ë£¬Õâ´Î jobs ¿ÉÒÔÖ±½ÓÏò Prometheus server ¶ËÍÆËÍËüÃǵÄ
metrics¡£ÕâÖÖ·½Ê½Ö÷ÒªÓÃÓÚ·þÎñ²ãÃæµÄ metrics£¬¶ÔÓÚ»úÆ÷²ãÃæµÄ metrices£¬ÐèҪʹÓÃ
node exporter¡£
Exporters: ÓÃÓÚ±©Â¶ÒÑÓеĵÚÈý·½·þÎñµÄ metrics ¸ø Prometheus¡£
Alertmanager: ´Ó Prometheus server ¶Ë½ÓÊÕµ½ alerts ºó£¬»á½øÐÐÈ¥³ýÖØ¸´Êý¾Ý£¬·Ö×飬²¢Â·Óɵ½¶ÔÊյĽÓÊÜ·½Ê½£¬·¢³ö±¨¾¯¡£³£¼ûµÄ½ÓÊÕ·½Ê½ÓУºµç×ÓÓʼþ£¬pagerduty£¬OpsGenie,
webhook µÈ¡£
һЩÆäËûµÄ¹¤¾ß¡£
3.4¡¢¼Ü¹¹

´ÓÕâ¸ö¼Ü¹¹Í¼£¬Ò²¿ÉÒÔ¿´³ö Prometheus µÄÖ÷Ҫģ¿é°üº¬£¬ Server, Exporters,
Pushgateway, PromQL, Alertmanager, WebUI µÈ¡£
Ëü´óÖÂʹÓÃÂß¼ÊÇÕâÑù£º
Prometheus server ¶¨ÆÚ´Ó¾²Ì¬ÅäÖÃµÄ targets »òÕß·þÎñ·¢ÏÖµÄ targets
ÀÈ¡Êý¾Ý¡£
µ±ÐÂÀÈ¡µÄÊý¾Ý´óÓÚÅäÖÃÄڴ滺´æÇøµÄʱºò£¬Prometheus »á½«Êý¾Ý³Ö¾Ã»¯µ½´ÅÅÌ£¨Èç¹ûʹÓà remote
storage ½«³Ö¾Ã»¯µ½Ôƶˣ©¡£
Prometheus ¿ÉÒÔÅäÖà rules£¬È»ºó¶¨Ê±²éѯÊý¾Ý£¬µ±Ìõ¼þ´¥·¢µÄʱºò£¬»á½« alert ÍÆË͵½ÅäÖõÄ
Alertmanager¡£
Alertmanager ÊÕµ½¾¯¸æµÄʱºò£¬¿ÉÒÔ¸ù¾ÝÅäÖ㬾ۺϣ¬È¥ÖØ£¬½µÔ룬×îºó·¢Ë;¯¸æ¡£
¿ÉÒÔʹÓà API£¬ Prometheus Console »òÕß Grafana ²éѯºÍ¾ÛºÏÊý¾Ý¡£
3.5¡¢ÊÊÓÃÓÚʲô³¡¾°
Prometheus ÊÊÓÃÓڼǼÎı¾¸ñʽµÄʱ¼äÐòÁУ¬Ëü¼ÈÊÊÓÃÓÚÒÔ»úÆ÷ΪÖÐÐÄµÄ¼à¿Ø£¬Ò²ÊÊÓÃÓڸ߶ȶ¯Ì¬µÄÃæÏò·þÎñ¼Ü¹¹µÄ¼à¿Ø¡£ÔÚ΢·þÎñµÄÊÀ½çÖУ¬Ëü¶Ô¶àάÊý¾ÝÊÕ¼¯ºÍ²éѯµÄÖ§³ÖÓÐÌØÊâÓÅÊÆ¡£Prometheus
ÊÇרΪÌá¸ßϵͳ¿É¿¿ÐÔ¶øÉè¼ÆµÄ£¬Ëü¿ÉÒÔÔڶϵçÆÚ¼ä¿ìËÙÕï¶ÏÎÊÌ⣬ÿ¸ö Prometheus Server
¶¼ÊÇÏ໥¶ÀÁ¢µÄ£¬²»ÒÀÀµÓÚÍøÂç´æ´¢»òÆäËûÔ¶³Ì·þÎñ¡£µ±»ù´¡¼Ü¹¹³öÏÖ¹ÊÕÏʱ£¬Äã¿ÉÒÔͨ¹ý Prometheus
¿ìËÙ¶¨Î»¹ÊÕϵ㣬¶øÇÒ²»»áÏûºÄ´óÁ¿µÄ»ù´¡¼Ü¹¹×ÊÔ´¡£
3.6¡¢²»ÊʺÏʲô³¡¾°
Prometheus ·Ç³£ÖØÊÓ¿É¿¿ÐÔ£¬¼´Ê¹ÔÚ³öÏÖ¹ÊÕϵÄÇé¿öÏ£¬ÄãÒ²¿ÉÒÔËæÊ±²é¿´ÓйØÏµÍ³µÄ¿ÉÓÃͳ¼ÆÐÅÏ¢¡£Èç¹ûÄãÐèÒª°Ù·ÖÖ®°ÙµÄ׼ȷ¶È£¬ÀýÈç°´ÇëÇóÊýÁ¿¼Æ·Ñ£¬ÄÇô
Prometheus ²»Ì«ÊʺÏÄ㣬ÒòΪËüÊÕ¼¯µÄÊý¾Ý¿ÉÄܲ»¹»ÏêϸÍêÕû¡£ÕâÖÖÇé¿öÏ£¬Äã×îºÃʹÓÃÆäËûϵͳÀ´ÊÕ¼¯ºÍ·ÖÎöÊý¾ÝÒÔ½øÐмƷѣ¬²¢Ê¹ÓÃ
Prometheus À´¼à¿ØÏµÍ³µÄÆäÓಿ·Ö¡£
ËÄ¡¢Êý¾ÝÄ£ÐÍ
4.1¡¢Êý¾ÝÄ£ÐÍ
Prometheus ËùÓвɼ¯µÄ¼à¿ØÊý¾Ý¾ùÒÔÖ¸±ê£¨metric£©µÄÐÎʽ±£´æÔÚÄÚÖõÄʱ¼äÐòÁÐÊý¾Ý¿âµ±ÖУ¨TSDB£©£ºÊôÓÚͬһָ±êÃû³Æ£¬Í¬Ò»±êÇ©¼¯ºÏµÄ¡¢ÓÐʱ¼ä´Á±ê¼ÇµÄÊý¾ÝÁ÷¡£³ýÁË´æ´¢µÄʱ¼äÐòÁУ¬Prometheus
»¹¿ÉÒÔ¸ù¾Ý²éѯÇëÇó²úÉúÁÙʱµÄ¡¢ÑÜÉúµÄʱ¼äÐòÁÐ×÷Ϊ·µ»Ø½á¹û¡£
Ö¸±êÃû³ÆºÍ±êÇ©
ÿһÌõʱ¼äÐòÁÐÓÉÖ¸±êÃû³Æ£¨Metrics Name£©ÒÔ¼°Ò»×é±êÇ©£¨¼üÖµ¶Ô£©Î¨Ò»±êʶ¡£ÆäÖÐÖ¸±êµÄÃû³Æ£¨metric
name£©¿ÉÒÔ·´Ó³±»¼à¿ØÑù±¾µÄº¬Ò壨ÀýÈ磬http_requests_total ¡ª ±íʾµ±Ç°ÏµÍ³½ÓÊÕµ½µÄ
HTTP ÇëÇó×ÜÁ¿£©£¬Ö¸±êÃû³ÆÖ»ÄÜÓÉ ASCII ×Ö·û¡¢Êý×Ö¡¢Ï»®ÏßÒÔ¼°Ã°ºÅ×é³É£¬Í¬Ê±±ØÐëÆ¥ÅäÕýÔò±í´ïʽ
[a-zA-Z_:][a-zA-Z0-9_:]*¡£
[info] ×¢Òâ
ðºÅÓÃÀ´±íʾÓû§×Ô¶¨ÒåµÄ¼Ç¼¹æÔò£¬²»ÄÜÔÚ exporter Öлò¼à¿Ø¶ÔÏóÖ±½Ó±©Â¶µÄÖ¸±êÖÐʹÓÃðºÅÀ´¶¨ÒåÖ¸±êÃû³Æ¡£
ͨ¹ýʹÓñêÇ©£¬Prometheus ¿ªÆôÁËÇ¿´óµÄ¶àάÊý¾ÝÄ£ÐÍ£º¶ÔÓÚÏàͬµÄÖ¸±êÃû³Æ£¬Í¨¹ý²»Í¬±êÇ©ÁбíµÄ¼¯ºÏ£¬»áÐγÉÌØ¶¨µÄ¶ÈÁ¿Î¬¶ÈʵÀý£¨ÀýÈ磺ËùÓаüº¬¶ÈÁ¿Ãû³ÆÎª
/api/tracks µÄ http ÇëÇ󣬴òÉÏ method=POST µÄ±êÇ©£¬¾Í»áÐγɾßÌåµÄ http
ÇëÇ󣩡£¸Ã²éѯÓïÑÔÔÚÕâЩָ±êºÍ±êÇ©ÁбíµÄ»ù´¡ÉϽøÐйýÂ˺;ۺϡ£¸Ä±äÈκζÈÁ¿Ö¸±êÉϵÄÈκαêǩֵ£¨°üÀ¨Ìí¼Ó»òɾ³ýÖ¸±ê£©£¬¶¼»á´´½¨ÐµÄʱ¼äÐòÁС£
±êÇ©µÄÃû³ÆÖ»ÄÜÓÉ ASCII ×Ö·û¡¢Êý×ÖÒÔ¼°Ï»®Ïß×é³É²¢Âú×ãÕýÔò±í´ïʽ [a-zA-Z_][a-zA-Z0-9_]*¡£ÆäÖÐÒÔ
__ ×÷Ϊǰ׺µÄ±êÇ©£¬ÊÇϵͳ±£ÁôµÄ¹Ø¼ü×Ö£¬Ö»ÄÜÔÚϵͳÄÚ²¿Ê¹ÓᣱêÇ©µÄÖµÔò¿ÉÒÔ°üº¬ÈκΠUnicode
±àÂëµÄ×Ö·û¡£
ʱÐòÑù±¾
ÔÚʱ¼äÐòÁÐÖеÄÿһ¸öµã³ÆÎªÒ»¸öÑù±¾£¨sample£©£¬Ñù±¾ÓÉÒÔÏÂÈý²¿·Ö×é³É£º
Ö¸±ê£¨metric£©£ºÖ¸±êÃû³ÆºÍÃèÊöµ±Ç°Ñù±¾ÌØÕ÷µÄ labelsets£»
ʱ¼ä´Á£¨timestamp£©£ºÒ»¸ö¾«È·µ½ºÁÃëµÄʱ¼ä´Á£»
Ñù±¾Öµ£¨value£©£º Ò»¸ö folat64 µÄ¸¡µãÐÍÊý¾Ý±íʾµ±Ç°Ñù±¾µÄÖµ¡£
¸ñʽ
ͨ¹ýÈçϱí´ï·½Ê½±íʾָ¶¨Ö¸±êÃû³ÆºÍÖ¸¶¨±êÇ©¼¯ºÏµÄʱ¼äÐòÁУº
<metric name>{<label
name>= <label value>, ...} |
ÀýÈ磬ָ±êÃû³ÆÎª api_http_requests_total£¬±êǩΪ
method="POST" ºÍ handler="/messages"
µÄʱ¼äÐòÁпÉÒÔ±íʾΪ£º
api_http_requests_total { method="POST",
handler="/messages"} |
ÕâÓë OpenTSDB ÖÐʹÓõıê¼Ç·¨Ïàͬ¡£
4.2¡¢Ö¸±êÀàÐÍ
Prometheus µÄ¿Í»§¶Ë¿âÖÐÌṩÁËËÄÖÖºËÐĵÄÖ¸±êÀàÐÍ¡£µ«ÕâЩÀàÐÍÖ»ÊÇÔÚ¿Í»§¶Ë¿â£¨¿Í»§¶Ë¿ÉÒÔ¸ù¾Ý²»Í¬µÄÊý¾ÝÀàÐ͵÷Óò»Í¬µÄ
API ½Ó¿Ú£©ºÍÔÚÏßÐÒéÖУ¬Êµ¼ÊÔÚ Prometheus server Öв¢²»¶ÔÖ¸±êÀàÐͽøÐÐÇø·Ö£¬¶øÊǼòµ¥µØ°ÑÕâЩָ±êͳһÊÓΪÎÞÀàÐ͵Äʱ¼äÐòÁС£²»¹ý£¬½«À´ÎÒÃÇ»áŬÁ¦¸Ä±äÕâÒ»ÏÖ×´µÄ¡£
Counter
Ò»ÖÖÀÛ¼ÓµÄ metric£¬µäÐ͵ÄÓ¦ÓÃÈ磺ÇëÇóµÄ¸öÊý£¬½áÊøµÄÈÎÎñÊý£¬ ³öÏֵĴíÎóÊýµÈµÈ¡£
ÀýÈç Prometheus server ÖÐ http_requests_total, ±íʾ Prometheus
´¦ÀíµÄ http ÇëÇó×ÜÊý£¬ÎÒÃÇ¿ÉÒÔʹÓà delta, ºÜÈÝÒ׵õ½ÈÎÒâÇø¼äÊý¾ÝµÄÔöÁ¿£¬Õâ¸ö»áÔÚ PromQL
Ò»½ÚÖÐϸ½²¡£

Gauge
Ò»ÖÖ³£¹æµÄ metric£¬µäÐ͵ÄÓ¦ÓÃÈ磺ζȣ¬ÔËÐÐµÄ goroutines µÄ¸öÊý¡£
¿ÉÒÔÈÎÒâ¼Ó¼õ¡£
ÀýÈç Prometheus server ÖÐ go_goroutines, ±íʾ Prometheus
µ±Ç° goroutines µÄÊýÁ¿¡£

Histogram
¿ÉÒÔÀí½âΪÖù״ͼ£¬µäÐ͵ÄÓ¦ÓÃÈ磺ÇëÇó³ÖÐøÊ±¼ä£¬ÏìÓ¦´óС¡£
¿ÉÒÔ¶Ô¹Û²ì½á¹û²ÉÑù£¬·Ö×鼰ͳ¼Æ¡£
ÀýÈ磬²éѯ prometheus_http_request _duration_seconds_sum
{handler="/api/v1/query", instance="localhost:9090",
job="prometheus"}ʱ£¬·µ»Ø½á¹ûÈçÏ£º

Summary
ÀàËÆÓÚ Histogram, µäÐ͵ÄÓ¦ÓÃÈ磺ÇëÇó³ÖÐøÊ±¼ä£¬ÏìÓ¦´óС¡£
Ìṩ¹Û²âÖµµÄ count ºÍ sum ¹¦ÄÜ¡£
Ìṩ°Ù·ÖλµÄ¹¦ÄÜ£¬¼´¿ÉÒÔ°´°Ù·Ö±È»®·Ö¸ú×Ù½á¹û¡£
4.3¡¢instance ºÍ jobs
Prometheus ÖУ¬½«ÈÎÒâÒ»¸ö¶ÀÁ¢µÄÊý¾ÝÔ´£¨target£©³ÆÖ®ÎªÊµÀý£¨instance£©¡£°üº¬ÏàͬÀàÐ͵ÄʵÀýµÄ¼¯ºÏ³ÆÖ®Îª×÷Òµ£¨job£©¡£
ÈçÏÂÊÇÒ»¸öº¬ÓÐËĸöÖØ¸´ÊµÀýµÄ×÷Òµ£º
- job: api-server
- instance 1: 1.2.3.4:5670
- instance 2: 1.2.3.4:5671
- instance 3: 5.6.7.8:5670
- instance 4: 5.6.7.8:5671 |
×ÔÉú³É±êÇ©ºÍʱÐò
Prometheus Ôڲɼ¯Êý¾ÝµÄͬʱ£¬»á×Ô¶¯ÔÚʱÐòµÄ»ù´¡ÉÏÌí¼Ó±êÇ©£¬×÷ΪÊý¾ÝÔ´£¨target£©µÄ±êʶ£¬ÒÔ±ãÇø·Ö£º
job: The configured job name that the target belongs
to.
instance: The <host>:<port>
part of the target¡®s URL that was scraped.
Èç¹ûÆäÖÐÈÎÒ»±êÇ©ÒѾÔÚ´Ëǰ²É¼¯µÄÊý¾ÝÖдæÔÚ£¬ÄÇô½«»á¸ù¾Ý honor_labels ÉèÖÃÑ¡ÏîÀ´¾ö¶¨Ð±êÇ©¡£Ïê¼û¹ÙÍø½âÊÍ£º
scrape configuration documentation
¶Ôÿһ¸öʵÀý¶øÑÔ£¬Prometheus °´ÕÕÒÔÏÂʱÐòÀ´´æ´¢Ëù²É¼¯µÄÊý¾ÝÑù±¾£º
up{job="<job-name>", instance="<instance-id>"}:
1 ±íʾ¸ÃʵÀýÕý³£¹¤×÷
up{job="<job-name>", instance="<instance-id>"}:
0 ±íʾ¸ÃʵÀý¹ÊÕÏ
scrape_duration_seconds{job="<job-name>",
instance="<instance-id>"}
±íʾÀÈ¡Êý¾ÝµÄʱ¼ä¼ä¸ô
scrape_samples_post_metric_relabeling{job="<job-name>",
instance="<instance-id>"}
±íʾ²ÉÓÃÖØ¶¨Òå±êÇ©£¨relabeling£©²Ù×÷ºóÈÔȻʣÓàµÄÑù±¾Êý
scrape_samples_scraped{job="<job-name>",
instance="<instance-id>"}
±íʾ´Ó¸ÃÊý¾ÝÔ´»ñÈ¡µÄÑù±¾Êý
ÆäÖÐ up ʱÐò¿ÉÒÔÓÐЧӦÓÃÓÚ¼à¿Ø¸ÃʵÀýÊÇ·ñÕý³£¹¤×÷¡£
Îå¡¢ÆäËû¼à¿Ø¹¤¾ß
ÔÚǰÑÔÖУ¬¼òµ¥½éÉÜÁËÎÒÃÇÑ¡Ôñ Prometheus µÄÀíÓÉ£¬ÒÔ¼°Ê¹Óúó¸øÎÒÃÇ´øÀ´µÄºÃ´¦¡£
ÔÚÕâÀïÖ÷ÒªºÍÆäËû¼à¿Ø·½°¸¶Ô±È£¬·½±ã´ó¼Ò¸üºÃµÄÁ˽â Prometheus¡£
Prometheus vs Zabbix
Zabbix ʹÓõÄÊÇ C ºÍ PHP, Prometheus ʹÓà Golang, ÕûÌå¶øÑÔ Prometheus
ÔËÐÐËٶȸü¿ìÒ»µã¡£
Zabbix ÊôÓÚ´«Í³Ö÷»ú¼à¿Ø£¬Ö÷ÒªÓÃÓÚÎïÀíÖ÷»ú£¬½»»»»ú£¬ÍøÂçµÈ¼à¿Ø£¬Prometheus ²»½öÊÊÓÃÖ÷»ú¼à¿Ø£¬»¹ÊÊÓÃÓÚ
Cloud, SaaS, Openstack£¬Container ¼à¿Ø¡£
Zabbix ÔÚ´«Í³Ö÷»ú¼à¿Ø·½Ã棬Óиü·á¸»µÄ²å¼þ¡£
Zabbix ¿ÉÒÔÔÚ WebGui ÖÐÅäÖúܶàÊÂÇ飬µ«ÊÇ Prometheus ÐèÒªÊÖ¶¯ÐÞ¸ÄÎļþÅäÖá£
Prometheus vs Graphite
Graphite ¹¦ÄܽÏÉÙ£¬ËüרעÓÚÁ½¼þÊ£¬´æ´¢Ê±ÐòÊý¾Ý£¬ ¿ÉÊÓ»¯Êý¾Ý£¬ÆäËû¹¦ÄÜÐèÒª°²×°Ïà¹Ø²å¼þ£¬¶ø
Prometheus ÊôÓÚһվʽ£¬Ìṩ¸æ¾¯ºÍÇ÷ÊÆ·ÖÎöµÄ³£¼û¹¦ÄÜ£¬ËüÌṩ¸üÇ¿µÄÊý¾Ý´æ´¢ºÍ²éѯÄÜÁ¦¡£
ÔÚˮƽÀ©Õ¹·½°¸ÒÔ¼°Êý¾Ý´æ´¢ÖÜÆÚÉÏ£¬Graphite ×öµÄ¸üºÃ¡£
Prometheus vs InfluxDB
InfluxDB ÊÇÒ»¸ö¿ªÔ´µÄʱÐòÊý¾Ý¿â£¬Ö÷ÒªÓÃÓÚ´æ´¢Êý¾Ý£¬Èç¹ûÏë´î½¨¼à¿Ø¸æ¾¯ÏµÍ³£¬ ÐèÒªÒÀÀµÆäËûϵͳ¡£
InfluxDB Ôڴ洢ˮƽÀ©Õ¹ÒÔ¼°¸ß¿ÉÓ÷½Ãæ×öµÄ¸üºÃ, ±Ï¾¹ºËÐÄÊÇÊý¾Ý¿â¡£
Prometheus vs OpenTSDB
OpenTSDB ÊÇÒ»¸ö·Ö²¼Ê½Ê±ÐòÊý¾Ý¿â£¬ËüÒÀÀµ Hadoop ºÍ HBase£¬ÄÜ´æ´¢¸ü³¤¾ÃÊý¾Ý£¬
Èç¹ûÄãϵͳÒѾÔËÐÐÁË Hadoop ºÍ HBase, ËüÊǸö²»´íµÄÑ¡Ôñ¡£
Èç¹ûÏë´î½¨¼à¿Ø¸æ¾¯ÏµÍ³£¬OpenTSDB ÐèÒªÒÀÀµÆäËûϵͳ¡£
Prometheus vs Nagios
Nagios Êý¾Ý²»Ö§³Ö×Ô¶¨Òå Labels, ²»Ö§³Ö²éѯ£¬¸æ¾¯Ò²²»Ö§³ÖÈ¥Ô룬·Ö×é, ûÓÐÊý¾Ý´æ´¢£¬Èç¹ûÏë²éѯÀúʷ״̬£¬ÐèÒª°²×°²å¼þ¡£
Nagios ÊÇÉÏÊÀ¼Í 90 Äê´úµÄ¼à¿ØÏµÍ³£¬±È½ÏÊʺÏС¼¯Èº»ò¾²Ì¬ÏµÍ³µÄ¼à¿Ø£¬ÏÔÈ» Nagios Ì«¹ÅÀÏÁË£¬ºÜ¶àÌØÐÔ¶¼Ã»ÓУ¬Ïà±ÈÖ®ÏÂPrometheus
ÒªÓÅÐãºÜ¶à¡£
Prometheus vs Sensu
Sensu ¹ãÒåÉϽ²ÊÇ Nagios µÄÉý¼¶°æ±¾£¬Ëü½â¾öÁ˺ܶà Nagios µÄÎÊÌ⣬Èç¹ûÄã¶Ô Nagios
ºÜÊìϤ£¬Ê¹Óà Sensu ÊǸö²»´íµÄÑ¡Ôñ¡£
Sensu ÒÀÀµ RabbitMQ ºÍ Redis£¬Êý¾Ý´æ´¢ÉÏÀ©Õ¹ÐÔ¸üºÃ¡£
×ܽá
Prometheus ÊôÓÚһվʽ¼à¿Ø¸æ¾¯Æ½Ì¨£¬ÒÀÀµÉÙ£¬¹¦ÄÜÆëÈ«¡£
Prometheus Ö§³Ö¶ÔÔÆ»òÈÝÆ÷µÄ¼à¿Ø£¬ÆäËûϵͳÖ÷Òª¶ÔÖ÷»ú¼à¿Ø¡£
Prometheus Êý¾Ý²éѯÓï¾ä±íÏÖÁ¦¸üÇ¿´ó£¬ÄÚÖøüÇ¿´óµÄͳ¼Æº¯Êý¡£
Prometheus ÔÚÊý¾Ý´æ´¢À©Õ¹ÐÔÒÔ¼°³Ö¾ÃÐÔÉÏûÓÐ InfluxDB£¬OpenTSDB£¬Sensu
ºÃ¡£
Áù¡¢Export
6.1¡¢Îı¾¸ñʽ
ÔÚÌÖÂÛ Exporter ֮ǰ£¬ÓбØÒªÏȽéÉÜһϠPrometheus Îı¾Êý¾Ý¸ñʽ£¬ÒòΪһ¸ö Exporter
±¾ÖÊÉϾÍÊǽ«ÊÕ¼¯µÄÊý¾Ý£¬×ª»¯Îª¶ÔÓ¦µÄÎı¾¸ñʽ£¬²¢Ìṩ http ÇëÇó¡£
Exporter ÊÕ¼¯µÄÊý¾Ýת»¯µÄÎı¾ÄÚÈÝÒÔÐÐ (\n) Ϊµ¥Î»£¬¿ÕÐн«±»ºöÂÔ, Îı¾ÄÚÈÝ×îºóÒ»ÐÐΪ¿ÕÐÐ
×¢ÊÍ
Îı¾ÄÚÈÝ£¬Èç¹ûÒÔ # ¿ªÍ·Í¨³£±íʾעÊÍ¡£
ÒÔ # HELP ¿ªÍ·±íʾ metric °ïÖú˵Ã÷¡£
ÒÔ # TYPE ¿ªÍ·±íʾ¶¨Òå metric ÀàÐÍ£¬°üº¬ counter, gauge, histogram,
summary, ºÍ untyped ÀàÐÍ¡£
ÆäËû±íʾһ°ã×¢ÊÍ£¬¹©ÔĶÁʹÓ㬽«±» Prometheus ºöÂÔ¡£
²ÉÑùÊý¾Ý
ÄÚÈÝÈç¹û²»ÒÔ # ¿ªÍ·£¬±íʾ²ÉÑùÊý¾Ý¡£Ëüͨ³£½ô°¤×ÅÀàÐͶ¨ÒåÐУ¬Âú×ãÒÔϸñʽ£º
metric_name [
"{" label_name "=" `"`
label_value `"` { "," label_name
"=" `"` label_value `"` }
[ "," ] "}"
] value [ timestamp ] |
ÏÂÃæÊÇÒ»¸öÍêÕûµÄÀý×Ó£º

ÐèÒªÌØ±ð×¢ÒâµÄÊÇ£¬¼ÙÉè²ÉÑùÊý¾Ý metric ½Ð×ö x, Èç¹û x ÊÇ histogram »ò summary
ÀàÐͱØÐèÂú×ãÒÔÏÂÌõ¼þ£º
²ÉÑùÊý¾ÝµÄ×ܺÍÓ¦±íʾΪ x_sum¡£
²ÉÑùÊý¾ÝµÄ×ÜÁ¿Ó¦±íʾΪ x_count¡£
summary ÀàÐ͵IJÉÑùÊý¾ÝµÄ quantile Ó¦±íʾΪ x{quantile="y"}¡£
histogram ÀàÐ͵IJÉÑù·ÖÇøÍ³¼ÆÊý¾Ý½«±íʾΪ x_bucket{le="y"}¡£
histogram ÀàÐ͵IJÉÑù±ØÐë°üº¬ x_bucket{le="+Inf"},
ËüµÄÖµµÈÓÚ x_count µÄÖµ¡£
summary ºÍ historam ÖÐ quantile ºÍ le ±ØÐè°´´ÓСµ½´ó˳ÐòÅÅÁС£
6.2¡¢³£Óòéѯ
ÊÕ¼¯µ½ node_exporter µÄÊý¾Ýºó£¬ÎÒÃÇ¿ÉÒÔʹÓà PromQL ½øÐÐһЩҵÎñ²éѯºÍ¼à¿Ø£¬ÏÂÃæÊÇһЩ±È½Ï³£¼ûµÄ²éѯ¡£
×¢Ò⣺ÒÔϲéѯ¾ùÒÔµ¥¸ö½Úµã×÷ΪÀý×Ó£¬Èç¹û´ó¼ÒÏë²é¿´ËùÓнڵ㣬½« instance="xxx"
È¥µô¼´¿É¡£
CPU ʹÓÃÂÊ
100 - (avg by
(instance) (irate(node_cpu_seconds_total {mode="idle"}[5m]))
* 100) |
CPU ¸÷ mode Õ¼±ÈÂÊ
avg by (instance,
mode) (irate(node_cpu_seconds_total[5m])) * 100 |
»úÆ÷ƽ¾ù¸ºÔØ
node_load1{instance="xxx"}
// 1·ÖÖÓ¸ºÔØ
node_load5{instance="xxx"} // 5·ÖÖÓ¸ºÔØ
node_load15{instance="xxx"} // 15·ÖÖÓ¸ºÔØ |
ÄÚ´æÊ¹ÓÃÂÊ
100 - ((node_memory_MemFree_bytes
+node_memory_Cached_bytes +node_memory_Buffers_bytes) /node_memory_MemTotal_bytes)
* 100 |
´ÅÅÌʹÓÃÂÊ
100 - node_filesystem_free
{instance="xxx",fstype!~ "rootfs
| selinuxfs | autofs | rpc_pipefs | tmpfs | udev
| none | devpts | sysfs | debugfs | fuse.*"}
/ node_filesystem_size {instance="xxx", fstype!~"rootfs | selinuxfs | autofs | rpc_pipefs | tmpfs | udev | none | devpts | sysfs | debugfs |fuse.*"}
* 100 |
»òÕßÄãÒ²¿ÉÒÔÖ±½ÓʹÓà {fstype="xxx"} À´Ö¸¶¨Ïë²é¿´µÄ´ÅÅÌÐÅÏ¢
ÍøÂç IO
// ÉÏÐдø¿í
sum by (instance) (irate(node_network_receive_bytes {instance="xxx", device!~"bond.*?|lo"}[5m])/128)
// ÏÂÐдø¿í
sum by (instance) (irate(node_network_transmit_bytes {instance="xxx", device!~"bond.*?|lo"}[5m])/128) |
Íø¿¨³ö/Èë°ü
// Èë°üÁ¿
sum by (instance) (rate(node_network_receive_bytes
{instance="xxx", device!="lo"}[5m]))
// ³ö°üÁ¿
sum by (instance) (rate(node_network_transmit_bytes
{instance="xxx", device!="lo"}[5m]))
|
|