MySQL: How to monitor MySQL Replication Lag in MilliSeconds With PMM and pt-heartbeat



There could be various requirements at application end which need realtime slaves. Let me introduce a solution to a problem we used to face how can we monitor whether our slaves are real time or they are lagging in Milliseconds. Unfortunately there is no built in feature in MySQL to get Replication Lag in MilliSeconds.

Perhaps there is a tool provisioned in pt-toolkit named as pt-heartbeat. It generates heartbeat events on master and monitoring system can monitor time difference on slave to calculate lag.

How to deploy pt-heartbeat in your environment (Assuming OS as UBUNTU xx.xx):

Step 1: Download pt-heartbeat using below command.
wget http://percona.com/get/pt-heartbeat -P /usr/bin/ && chmod +x /usr/bin/pt-heartbeat 
Step 2: It requires a database where it can create a table. Let's create it on master.
mysql -uUser -pPassword -e "create database if not exists percona;" 
Step 3: Create heartbeat table in percona db.
pt-heartbeat -D percona --create-table --check -uUser -pPassword --master-server-id ServerId 
Step 4: Start pt-heartbeat daemon
pt-heartbeat -D percona --daemonize --update h=127.0.0.1,u=User,p=Password --utc
Step 5: Changes in pmm client to monitor lag in ms.
pmm-admin remove mysql:metrics

pmm-admin add mysql:metrics --user=monitor --password=xxxx -- --collect.heartbeat.database=percona --collect.heartbeat=true 
Step 6: Use Below grafana panel json to add new graph on dashboard
add panel > Graph > Panel Title > More > Panel Json > Update
{

  "aliasColors": {},

  "bars": false,

  "dashLength": 10,

  "dashes": false,

  "datasource": "Prometheus",

  "description": "Requires pt-heartbeat\nThis graph requires mysqld_exporter with -collect.heartbeat=true and a pt-heartbeat daemon running on MySQL masters",

  "fill": 2,

  "gridPos": {

    "h": 7,

    "w": 12,

    "x": 12,

    "y": 3

  },

  "id": 41,

  "legend": {

    "alignAsTable": true,

    "avg": true,

    "current": false,

    "max": true,

    "min": true,

    "show": true,

    "total": false,

    "values": true

  },

  "lines": true,

  "linewidth": 2,

  "links": [],

  "nullPointMode": "null",

  "percentage": false,

  "pointradius": 5,

  "points": false,

  "renderer": "flot",

  "seriesOverrides": [

    {

      "alias": "Lag",

      "color": "#bf1b00"

    }

  ],

  "spaceLength": 10,

  "stack": false,

  "steppedLine": false,

  "targets": [

    {

      "expr": "max_over_time(mysql_heartbeat_now_timestamp_seconds{instance=\"$host\"}[$interval]) - \nmax_over_time(mysql_heartbeat_stored_timestamp_seconds{instance=\"$host\"}[$interval]) or \nmax_over_time(mysql_heartbeat_now_timestamp_seconds{instance=\"$host\"}[5m]) - \nmax_over_time(mysql_heartbeat_stored_timestamp_seconds{instance=\"$host\"}[5m])",

      "format": "time_series",

      "interval": "$interval",

      "intervalFactor": 1,

      "legendFormat": "Lag for server-id={{server_id}}",

      "refId": "A"

    }

  ],

  "thresholds": [],

  "timeFrom": null,

  "timeShift": null,

  "title": "MySQL Heartbeat Replication Delay",

  "tooltip": {

    "shared": true,

    "sort": 0,

    "value_type": "individual"

  },

  "type": "graph",

  "xaxis": {

    "buckets": null,

    "mode": "time",

    "name": null,

    "show": true,

    "values": []

  },

  "yaxes": [

    {

      "format": "short",

      "label": null,

      "logBase": 1,

      "max": null,

      "min": "0",

      "show": true

    },

    {

      "format": "short",

      "label": null,

      "logBase": 1,

      "max": null,

      "min": null,

      "show": false

    }

  ],

  "yaxis": {

    "align": false,

    "alignLevel": null

  }

} 

Comments

  1. I believe this feature is available in PMM 2.0 which is still in beta release. Can you please confirm which version of PMM have been used for the above ?

    ReplyDelete
  2. Does this support group replication ? like innodb cluster ?

    ReplyDelete

Post a Comment

Popular posts from this blog

Shell/Bash Commands Execution from MySQL Client or Stored Procedure/function

How to remove/deregister an instance from PMM?