Monday, September 26, 2016

Mysql shutdown, dirty pages, and innodb_page_cleaners in 5.7

In a busy Mysql environment, often the shutdown is delayed for a long time by the innodb page cleaner.  On some very busy systems this delay can be hours and you will get tired of watching this message in your error log:
                   [Note] InnoDB: Waiting for page_cleaner to finish flushing of buffer pool

Prior to 5.7 all you could do was watch the dirty pages and see if you could force the number down by setting the sever to flush pages thusly:
                   set global innodb_max_dirty_pages_pct = 0;

You can then watch the dirty page count drop until it reaches a low point (likely never near zero in a busy system) where you can shut it down by monitoring the dirty pages thusly:
                     mysqladmin ext -i10 |grep pages_dirty

But in 5.7 you can now have multiple page cleaners (the above trick still works) to help get the server in a state for a quick shutdown by setting
                     innodb_page_cleaners=n

Like all good things there is no free lunch, so if you kill the server rather than waiting for the page cleaner to do it's job, you will have to pay on the startup.

Enjoy

Sunday, September 25, 2016

Huge databases and Percona Xtradb Cluster

In my latest endeavor  I have been working with Percona's Xtradb Cluster product.   All things considered I like it better than native Oracle MySQL clustering.  It's easy to set up and maintain and I've found it very resilient to the noisy failures we all are accustomed to.

But one of our challenges has been dealing with slow disks (the dreaded SAN), mediocre networks, and a huge database (~4T).  Moving this database around, getting backups, building the cluster, recovering the cluster have all presented issues.  Here are some observations which might help you but remember ymmv.

Avoid SST at all costs
If you lose a node in the cluster and you run out of time to do an incremental rebuild your node will be completely rebuilt.  (As I found out on one unfortunate incident if you mess up your my.cnf file this too can trigger an SST.)   So if your databases are in the G size, it's probably no big deal.  But in my current world it takes about 2 days to build a 4T node (don't ask).   Basically we are doing anything we can do to avoid an SST.  Here are two things to do.  One,  make your gcache as big as you can afford so you can do an incremental build.  Two, be very very careful when you stop the node.  Check your grastate.dat file for some valid non negative value and save it somewhere in case you make a mistake (so you can restore it).

Xtrabackup is your savior
If you have to do an SST then your only real option for a huge db is xtrabackup.  We use xtrabackup_v2 in the my.cnf file.  At least it does not lock one of the nodes.  I've also been using this to build fresh nodes or reconstruct a broken node.  To rebuild a node manually you can do this and maybe it's faster than letting SST do it for you.

We also tried using mydumper/myloader for building clusters with the big advantage that it shrinks our innodb data file but it's just too slow for this database; we're stuck with xtrabackup for the sheer speed.

Flow control will kill you
We are finding in our three node world that flow control due to heavy updates on one node can kill you.  Obviously you can work with your developers to break up those 1M row updates. (Changing the wsrep options to limit rows or data does not seem to work well in our environment.)    But sometimes you have nasty DDL to run.  pt-online-schema-change works well (would you believe we ran a pt-osc that took 4 days and 18 hours.  Don't ask.)  Be sure you get the latest version of pt-osc and turn on --max-flow-control so you don't slam your other nodes; for the above mentioned 4+ day pt-osc we used --max-flow-control=0 so that other dml was not affected by our massive DDL.  Thank goodness for pt-osc!  

Asynchronous replication is not recommended
We have three systems right now as we move to PXC.  We have two three node clusters and an existing master-slave environment.  To try and keep these all in sync (very similar to herding cats) we run asynchronous replication in various configurations.  We've tried master -> cluster 1 -> cluster 2 as well as master -> cluster 1 and master -> cluster 2.  Both have their challenges.  But key is to make sure you only have one that has read/write and the others have read_only=1 set.  For auto-increment columns you need to disable the automatic feature of PXC and do the whole thing manually.  I do not recommend this approach and once we finally get the old master/slave system out I am going to try and combine these two clusters into one across the WAN.

I'll update this as I get more experience but thought I would share some preliminary ideas.


Huge databases and Percona Xtradb Cluster

In my latest endeavor  I have been working with Percona's Xtradb Cluster product.   All things considered I like it better than native Oracle MySQL clustering.  It's easy to set up and maintain and I've found it very resilient to the noisy failures we all are accustomed to.

But one of our challenges has been dealing with slow disks (the dreaded SAN), mediocre networks, and a huge database (~4T).  Moving this database around, getting backups, building the cluster, recovering the cluster have all presented issues.  Here are some observations which might help you but remember ymmv.

Avoid SST at all costs
If you lose a node in the cluster and you run out of time to do an incremental rebuild your node will be completely rebuilt.  (As I found out on one unfortunate incident if you mess up your my.cnf file this too can trigger an SST.)   So if your databases are in the G size, it's probably no big deal.  But in my current world it takes about 2 days to build a 4T node (don't ask).   Basically we are doing anything we can do to avoid an SST.  Here are two things to do.  One,  make your gcache as big as you can afford so you can do an incremental build.  Two, be very very careful when you stop the node.  Check your grastate.dat file for some valid non negative value and save it somewhere in case you make a mistake (so you can restore it).

Xtrabackup is your savior
If you have to do an SST then your only real option for a huge db is xtrabackup.  We use xtrabackup_v2 in the my.cnf file.  At least it does not lock one of the nodes.  I've also been using this to build fresh nodes or reconstruct a broken node.  To rebuild a node manually you can do this and maybe it's faster than letting SST do it for you.

We also tried using mydumper/myloader for building clusters with the big advantage that it shrinks our innodb data file but it's just too slow for this database; we're stuck with xtrabackup for the sheer speed.

Flow control will kill you
We are finding in our three node world that flow control due to heavy updates on one node can kill you.  Obviously you can work with your developers to break up those 1M row updates. (Changing the wsrep options to limit rows or data does not seem to work well in our environment.)    But sometimes you have nasty DDL to run.  pt-online-schema-change works well (would you believe we ran a pt-osc that took 4 days and 18 hours.  Don't ask.)  Be sure you get the latest version of pt-osc and turn on --max-flow-control so you don't slam your other nodes; for the above mentioned 4+ day pt-osc we used --max-flow-control=0 so that other dml was not affected by our massive DDL.  Thank goodness for pt-osc!  

Asynchronous replication is not recommended
We have three systems right now as we move to PXC.  We have two three node clusters and an existing master-slave environment.  To try and keep these all in sync (very similar to herding cats) we run asynchronous replication in various configurations.  We've tried master -> cluster 1 -> cluster 2 as well as master -> cluster 1 and master -> cluster 2.  Both have their challenges.  But key is to make sure you only have one that has read/write and the others have read_only=1 set.  For auto-increment columns you need to disable the automatic feature of PXC and do the whole thing manually.  I do not recommend this approach and once we finally get the old master/slave system out I am going to try and combine these two clusters into one across the WAN.

I'll update this as I get more experience but thought I would share some preliminary ideas.


the three dba walk into a bar joke....

You've all heard it.

Three DBAs walk into a bar.  They leave because they can't find a table.

Ha ha.  Very good, but what about all the excitement today about NoSQL.  Sure no table is fine but what exactly are we using to store our data?

My experience so far with NoSQL components is limited but from what I see they do not offer the mathematical rigor of a SQL database as envisoned by Date and Codd.  But I do see that for some quick key/value look ups they are great.

the three dba walk into a bar joke....

You've all heard it.

Three DBAs walk into a bar.  They leave because they can't find a table.

Ha ha.  Very good, but what about all the excitement today about NoSQL.  Sure no table is fine but what exactly are we using to store our data?

My experience so far with NoSQL components is limited but from what I see they do not offer the mathematical rigor of a SQL database as envisoned by Date and Codd.  But I do see that for some quick key/value look ups they are great.

the three dba walk into a bar joke....

You've all heard it.

Three DBAs walk into a bar.  They leave because they can't find a table.

Ha ha.  Very good, but what about all the excitement today about NoSQL.  Sure no table is fine but what exactly are we using to store our data?

My experience so far with NoSQL components is limited but from what I see they do not offer the mathematical rigor of a SQL database as envisoned by Date and Codd.  But I do see that for some quick key/value look ups they are great.

Friday, March 4, 2016

setting up MySQL master-master replication; the complete checklist

Seems like almost all new MySQL dbas stumble a bit trying to set up master-master replication.  Here's my (hopefully) fool proof check list to try and make it work on the first try.

Given two mySql instances we'll call them host1.mydomain.com and host2.mydomain.com and we presume they are using the standard port 3306 to communicate, that you have a super mysql user we'll call root with password rootpw (name and password of course could be different), and that you can ssh to the hosts.  (Sorry but Windows is not my forte however the commands below should be very similar.)

Commands are in bold.

begin by checking on access on host1.mydomain.com and access from host1.mydomain.com to host 2.mydomain.com

  1. ssh host1.mydomain.com
  2. telnet host2.mydomain.com 3306      You should get something like "connected".  Then ctrl+v and quit  If it hangs then get someone on your network team to help you figure out why you cannot reach the other host.  In my experience this is 75% of the issues setting up replication. 
  3. which mysql (is the mysql client binary in your path?)  if not fix your path or in any mysql command specify the fully qualified location, something like /usr/bin/mysql
  4. login to mysql thusly  mysql -h host1.mydomain.com -u root -p     you will be prompted for the password
  5. record your server id.  show global variables like 'server_id';
  6. make sure you have binary logging as show binary logs;  If it says you are not using binary logging then add log-bin to your /etc/my.cnf file and bounce mysql. 
  7. exit from this instance   exit;
  8. try to reach host2 from host1.mydomain.com using the mysql client as   mysql -h host2.mydomain.com -u root -p you will be prompted for the password.  If this fails figure out why you cannot login.  Is it a user or password issue?  Are you sure the ports are open between the two hosts?  Is the root user not allowed to do remote logins in which case you may need another super user (select user from mysql.user where user='root'; and there should be a wildcard for host1 and host2 should be in the host name field)


now check on access on host2 and access to host1 from host2

  1. ssh host2.mydomain.com
  2. telnet host1.mydomain.com 3306      You should get something like "connected".  Then ctrl+v and quit  If it hangs then get someone on your network team to help you figure out why you cannot reach the other host.  In my experience this is 75% of the issues setting up replication. 
  3. which mysql (is the mysql client binary in your path?)  if not fix your path or in any mysql command specify the fully qualified location, something like /usr/bin/mysql
  4. login to mysql thusly  mysql -h host2.mydomain.com -u root -p     you will be prompted for the password
  5. record your server id.  show global variables like 'server_id';
  6. make sure you have binary logging as show binary logs;  If it says you are not using binary logging then add log-bin to your /etc/my.cnf file and bounce mysql.
  7. exit from this instance   exit;
  8. try to reach host1.mydomain.com from host2.mydomain.com using the mysql client as   mysql -h host1.mydomain.com -u root -p you will be prompted for the password.  If this fails figure out why you cannot login.  Is it a user or password issue?  Are you sure the ports are open between the two hosts?  Is the root user not allowed to do remote logins in which case you may need another super user (select user from mysql.user where user='root'; and there should be a wildcard for host1 and host2 should be in the host name field)
At this point we have proved that host1.mydomain.com and host2.mydomain.com can communicate to and from each other.  And we have the server ids, which must be different between the two hosts.  If the server ids are undefined or the same for the two hosts you must update your /etc/my.cnf configuration file to set a valid number and bounce the mysql instance.  The entry in the /etc/my.cnf file should look something like:
     server_id=900
and the numbers are not really important as long as all hosts participating in a replication setup have a unique number. 

Create a replication user on each mysql instance. We use the same user and password on both instances to keep this simple.  Do this on both host1 and host2.
     grant replication slave on *.* to 'repl'@'%' identified by 'password';
     grant replication client on *.* to 'repl'@'%' identified by 'password'; 
I'm using the user repl and you should pick a nice password.  (Purists will argue about allowing access to all schemas from all hosts; you can certainly restrict this but for simplicity I suggest you get this working first then make granularity adjustments.)

As a final check, try logging into each server from the other server using your new repl user and password.  You should be able to connect. 

Now let's define the first master and slave.  On host1.mydomain.com we find our master log position and log file name by doing show master status;   We use the file name and file position in the command below. 

On host2 we define host1.mydomain.com as our master by doing these three commands:
     change master to master_host='host1.mydomain.com',master_user='repl',master_password='password';
     change master to master_log_file='log file name from show master status';
     change master to master_log_pos=number_from_the_show_master_status;
Note the log_pos is not quoted since it's a number. 

Now start the slave by doing start slave;  Check your values by doing show master status\G which will show you a tabular list of all the settings and the status of the slave.

If all your settings are correct look for 
     Slave_IO_Running: Yes
     Slave_SQL_Running: Yes
which says the threads for replication are now running on your host1.mydomain.com and receiving data from host2.mydomain.com. If they are not Yes then check your mysql error log which will likely pinpoint the problem. Most of the time the errors are due to host1.mydomain.com and host2.mydomain.com not being able to talk to one another or binary logging not running or server ids not being set.  

Now repeat this process on host2.mydomain.com.   You will do the change master commands on  host2.mydomain.com  and you will do the show master status command on  host1.mydomain.com