I ran into an interesting and informative edge case involving asynchronous replication in MySQL and Percona's pt-online-schema_change utility.
In our 5.7 MySQL environment we were adding columns to an 18M 500G table. Unfortunately this not only copies the table to complete the alter it also blocks the world from writing. So we had to come up with another way to manage this change. We have used Percona's pt-osc utility before and have great praise for it. Our setup involved a PXC cluster and a single standalone slave.
pt-osc creates a new table _table_new with foreign keys renamed to _old_name, e.g., an underscore is added to make the fk in the new table unique. But, we had previously run a pt-osc on this table and had swapped the old/new table successfully so we did not check that those fk names had been changed. When we ran a new pt-osc it replicated through to the slave and failed with duplicate key on the create table ddl.
The fix was simple; drop and recreate the foreign keys without the leading _ (use set foreign_key_checks=NO to prevent scanning on the fk rebuild). Worked like a charm and we resumed the slave.
I confess that it was not simple to figure this out and thankfully Percona was around to help me.
Thursday, January 10, 2019
Sunday, January 6, 2019
with percona xtradb, mysqldump can create inconsistent ddl missing auto increment for example!
When running Percona's XtraDB cluster and using pxc_strict_mode=ENFORCNG (the default) lock tables will generate an error.
mysqldump defaults to --opt which includes --lock-tables wherin the error gets thrown. Of course you can temporarily change pxc_strict_mode to DISABLED or PERMISSIVE to get around the error however there can be side affects of this in a live system.
A better approach is to use mysqldump without --opt by specifying --no-opt however this can generate incorrect DDL because of the embedded option (in --opt) of --create-options. The safe method if you want to use --no-opt is to specify the options individually and omit the --lock-tables implied by --opt.
As shown by help
mysqldump defaults to --opt which includes --lock-tables wherin the error gets thrown. Of course you can temporarily change pxc_strict_mode to DISABLED or PERMISSIVE to get around the error however there can be side affects of this in a live system.
A better approach is to use mysqldump without --opt by specifying --no-opt however this can generate incorrect DDL because of the embedded option (in --opt) of --create-options. The safe method if you want to use --no-opt is to specify the options individually and omit the --lock-tables implied by --opt.
As shown by help
--opt Same as --add-drop-table, --add-locks, --create-options,
--quick, --extended-insert, --lock-tables, --set-charset,
and --disable-keys. Enabled by default, disable with
--skip-opt.
So omit --lock-tables and include the others and your ddl will be consistent.
Color me unhappy when I copied a schema from one pxc cluster to another and missed this so that my copied schema omitted auto increment and automatic update of timestamp.
Have fun.
Monday, March 20, 2017
when is free software not free
We all have the luxury of using various open source and free software products. I'm a huge user of PostgreSQL, MySQL, various os/x utilities, etc. I recently started a gig at a government site and have found they have a strange, but understandable, relation to open source software.
I was trying to get them to use a Wiki type product instead of Microsoft's SharePoint. At the same time we were considering moving some databases to Cassandra. In both cases, the requests went through the rather lengthy approval process only to come back rejected.
Why? Because in the government you cannot use or install any open source product unless you have engaged a vendor to support you. Supposedly it's the law. Go figure.
Of course this almost defeats the purpose of open source software, but then if you look at it from the point-of-view of the vendors who make all their money from the government it makes perfect sense. They have pressured someone to get this enacted to give them a steady stream of income. Forget that it may make no sense, it is the rule and so without a contract, without a billing vehicle, and without a vendor there's tons of software we cannot use. It really ties the hands of the government folks who want to do the right thing but are kept from doing so.
I love this job.
I was trying to get them to use a Wiki type product instead of Microsoft's SharePoint. At the same time we were considering moving some databases to Cassandra. In both cases, the requests went through the rather lengthy approval process only to come back rejected.
Why? Because in the government you cannot use or install any open source product unless you have engaged a vendor to support you. Supposedly it's the law. Go figure.
Of course this almost defeats the purpose of open source software, but then if you look at it from the point-of-view of the vendors who make all their money from the government it makes perfect sense. They have pressured someone to get this enacted to give them a steady stream of income. Forget that it may make no sense, it is the rule and so without a contract, without a billing vehicle, and without a vendor there's tons of software we cannot use. It really ties the hands of the government folks who want to do the right thing but are kept from doing so.
I love this job.
Wednesday, March 8, 2017
systemctrl seems to ignore my.cnf no matter where it lives. selinux is in the way!
I recently faced a daunting problem. The systems group turned over a new host to me, mysql was installed but not running, my configuration file was present, and all seemed perfect.
I double checked /etc/my.cnf and tweaked a couple of settings and went to initialize the db and get it running. mysqld --initialize-insecure
worked perfect, my new ibdata and ib_logfile files got created, the log was clean, and I thought whoopee, let's just start this up.
systemctl start mysqld
was my next command.
Whoa!. What's going on here. All the log messages are going to /var/log/messages and mysql is not starting but is cycling up and down up and down. Pretty soon /var/log/messages was full and my bad hair day began.
Well there must be a renegade my.cnf file somewhere. Checking mysqld to see where it thought the files might live I ran
mysqld --help --verbose
and found the lines
Default options are read from the following files in the given order:
/etc/my.cnf /etc/mysql/my.cnf /usr/etc/my.cnf ~/.my.cnf
I double checked /etc/my.cnf and tweaked a couple of settings and went to initialize the db and get it running. mysqld --initialize-insecure
worked perfect, my new ibdata and ib_logfile files got created, the log was clean, and I thought whoopee, let's just start this up.
systemctl start mysqld
was my next command.
Whoa!. What's going on here. All the log messages are going to /var/log/messages and mysql is not starting but is cycling up and down up and down. Pretty soon /var/log/messages was full and my bad hair day began.
Well there must be a renegade my.cnf file somewhere. Checking mysqld to see where it thought the files might live I ran
mysqld --help --verbose
and found the lines
Default options are read from the following files in the given order:
/etc/my.cnf /etc/mysql/my.cnf /usr/etc/my.cnf ~/.my.cnf
and so dutifully went off to see if one of those other files existed and was overwriting my options.
No such luck. The sound you hear is my hair being pulled out. After 72 hours, reinstalling mysql, running initialize multiple time, making my.cnf small with just a few options I was no closer to a solution.
But then (clue music) I remembered having something like this a long time ago. selinux. Checking
/etc/selinux/config I found
# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
# enforcing - SELinux security policy is enforced.
# permissive - SELinux prints warnings instead of enforcing.
# disabled - No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of three two values:
# targeted - Targeted processes are protected,
# minimum - Modification of targeted policy. Only selected processes are protected.
# mls - Multi Level Security protection.
SELINUXTYPE=targeted
which is exactly what I would expect EXCEPT that the time stamp was more recent than the latest uptime. Sure enough, the os team had fixed selinux like I asked, but they forgot to reboot the host. selinux was getting in my way and until it was resolved mysql would never start.
Now, after a reboot, all was well, mysql reads my my.cnf and starts up just fine.
Remember, after disabling selinux you must reboot. I surely won't forget to check this again.
Thursday, January 5, 2017
mange the size of slow query logs
The mysql slow query log is great. But if you don't use if often, or as is likely, never look at it until there is an issue, it can grow unwieldy. Fortunately Percona's mysql clone has options to let you have n copies of the slow query log, each of a maximum size, so that it rotates the logs and keeps the size(s) under control.
You probably already have something like this in your my.cnf file
slow_query_log=1
slow_query_log_file=/somewhere/slow_query.log
So now add this to get additional copies and rotate amongst them.
max_slowlog_size=500M
max_slowlog_files=10
Thank you Percona for yet another great feature.
You probably already have something like this in your my.cnf file
slow_query_log=1
slow_query_log_file=/somewhere/slow_query.log
So now add this to get additional copies and rotate amongst them.
max_slowlog_size=500M
max_slowlog_files=10
Thank you Percona for yet another great feature.
Monday, September 26, 2016
Mysql shutdown, dirty pages, and innodb_page_cleaners in 5.7
In a busy Mysql environment, often the shutdown is delayed for a long time by the innodb page cleaner. On some very busy systems this delay can be hours and you will get tired of watching this message in your error log:
[Note] InnoDB: Waiting for page_cleaner to finish flushing of buffer pool
Prior to 5.7 all you could do was watch the dirty pages and see if you could force the number down by setting the sever to flush pages thusly:
set global innodb_max_dirty_pages_pct = 0;
You can then watch the dirty page count drop until it reaches a low point (likely never near zero in a busy system) where you can shut it down by monitoring the dirty pages thusly:
mysqladmin ext -i10 |grep pages_dirty
But in 5.7 you can now have multiple page cleaners (the above trick still works) to help get the server in a state for a quick shutdown by setting
Like all good things there is no free lunch, so if you kill the server rather than waiting for the page cleaner to do it's job, you will have to pay on the startup.
Enjoy
[Note] InnoDB: Waiting for page_cleaner to finish flushing of buffer pool
Prior to 5.7 all you could do was watch the dirty pages and see if you could force the number down by setting the sever to flush pages thusly:
set global innodb_max_dirty_pages_pct = 0;
You can then watch the dirty page count drop until it reaches a low point (likely never near zero in a busy system) where you can shut it down by monitoring the dirty pages thusly:
mysqladmin ext -i10 |grep pages_dirty
But in 5.7 you can now have multiple page cleaners (the above trick still works) to help get the server in a state for a quick shutdown by setting
innodb_page_cleaners=nLike all good things there is no free lunch, so if you kill the server rather than waiting for the page cleaner to do it's job, you will have to pay on the startup.
Enjoy
Sunday, September 25, 2016
Huge databases and Percona Xtradb Cluster
In my latest endeavor I have been working with Percona's Xtradb Cluster product. All things considered I like it better than native Oracle MySQL clustering. It's easy to set up and maintain and I've found it very resilient to the noisy failures we all are accustomed to.
But one of our challenges has been dealing with slow disks (the dreaded SAN), mediocre networks, and a huge database (~4T). Moving this database around, getting backups, building the cluster, recovering the cluster have all presented issues. Here are some observations which might help you but remember ymmv.
Avoid SST at all costs
If you lose a node in the cluster and you run out of time to do an incremental rebuild your node will be completely rebuilt. (As I found out on one unfortunate incident if you mess up your my.cnf file this too can trigger an SST.) So if your databases are in the G size, it's probably no big deal. But in my current world it takes about 2 days to build a 4T node (don't ask). Basically we are doing anything we can do to avoid an SST. Here are two things to do. One, make your gcache as big as you can afford so you can do an incremental build. Two, be very very careful when you stop the node. Check your grastate.dat file for some valid non negative value and save it somewhere in case you make a mistake (so you can restore it).
Xtrabackup is your savior
If you have to do an SST then your only real option for a huge db is xtrabackup. We use xtrabackup_v2 in the my.cnf file. At least it does not lock one of the nodes. I've also been using this to build fresh nodes or reconstruct a broken node. To rebuild a node manually you can do this and maybe it's faster than letting SST do it for you.
We also tried using mydumper/myloader for building clusters with the big advantage that it shrinks our innodb data file but it's just too slow for this database; we're stuck with xtrabackup for the sheer speed.
Flow control will kill you
We are finding in our three node world that flow control due to heavy updates on one node can kill you. Obviously you can work with your developers to break up those 1M row updates. (Changing the wsrep options to limit rows or data does not seem to work well in our environment.) But sometimes you have nasty DDL to run. pt-online-schema-change works well (would you believe we ran a pt-osc that took 4 days and 18 hours. Don't ask.) Be sure you get the latest version of pt-osc and turn on --max-flow-control so you don't slam your other nodes; for the above mentioned 4+ day pt-osc we used --max-flow-control=0 so that other dml was not affected by our massive DDL. Thank goodness for pt-osc!
Asynchronous replication is not recommended
We have three systems right now as we move to PXC. We have two three node clusters and an existing master-slave environment. To try and keep these all in sync (very similar to herding cats) we run asynchronous replication in various configurations. We've tried master -> cluster 1 -> cluster 2 as well as master -> cluster 1 and master -> cluster 2. Both have their challenges. But key is to make sure you only have one that has read/write and the others have read_only=1 set. For auto-increment columns you need to disable the automatic feature of PXC and do the whole thing manually. I do not recommend this approach and once we finally get the old master/slave system out I am going to try and combine these two clusters into one across the WAN.
I'll update this as I get more experience but thought I would share some preliminary ideas.
But one of our challenges has been dealing with slow disks (the dreaded SAN), mediocre networks, and a huge database (~4T). Moving this database around, getting backups, building the cluster, recovering the cluster have all presented issues. Here are some observations which might help you but remember ymmv.
Avoid SST at all costs
If you lose a node in the cluster and you run out of time to do an incremental rebuild your node will be completely rebuilt. (As I found out on one unfortunate incident if you mess up your my.cnf file this too can trigger an SST.) So if your databases are in the G size, it's probably no big deal. But in my current world it takes about 2 days to build a 4T node (don't ask). Basically we are doing anything we can do to avoid an SST. Here are two things to do. One, make your gcache as big as you can afford so you can do an incremental build. Two, be very very careful when you stop the node. Check your grastate.dat file for some valid non negative value and save it somewhere in case you make a mistake (so you can restore it).
Xtrabackup is your savior
If you have to do an SST then your only real option for a huge db is xtrabackup. We use xtrabackup_v2 in the my.cnf file. At least it does not lock one of the nodes. I've also been using this to build fresh nodes or reconstruct a broken node. To rebuild a node manually you can do this and maybe it's faster than letting SST do it for you.
We also tried using mydumper/myloader for building clusters with the big advantage that it shrinks our innodb data file but it's just too slow for this database; we're stuck with xtrabackup for the sheer speed.
Flow control will kill you
We are finding in our three node world that flow control due to heavy updates on one node can kill you. Obviously you can work with your developers to break up those 1M row updates. (Changing the wsrep options to limit rows or data does not seem to work well in our environment.) But sometimes you have nasty DDL to run. pt-online-schema-change works well (would you believe we ran a pt-osc that took 4 days and 18 hours. Don't ask.) Be sure you get the latest version of pt-osc and turn on --max-flow-control so you don't slam your other nodes; for the above mentioned 4+ day pt-osc we used --max-flow-control=0 so that other dml was not affected by our massive DDL. Thank goodness for pt-osc!
Asynchronous replication is not recommended
We have three systems right now as we move to PXC. We have two three node clusters and an existing master-slave environment. To try and keep these all in sync (very similar to herding cats) we run asynchronous replication in various configurations. We've tried master -> cluster 1 -> cluster 2 as well as master -> cluster 1 and master -> cluster 2. Both have their challenges. But key is to make sure you only have one that has read/write and the others have read_only=1 set. For auto-increment columns you need to disable the automatic feature of PXC and do the whole thing manually. I do not recommend this approach and once we finally get the old master/slave system out I am going to try and combine these two clusters into one across the WAN.
I'll update this as I get more experience but thought I would share some preliminary ideas.
Subscribe to:
Posts (Atom)