Friday, March 29, 2019

percona SST fails with broken pipe or no route to host

I ran into a very strange set of issues when trying to build a PXC cluster node using SST.  The SST kept failing either due to "broken pipe" on both the donor and joiner or "no route to host" which was seen frequently on the joiner and less so on the donor.

Lots of head scratching and work with the network team finally figured this one out.  Percona support also played an important role.  We first started digging into the normal port issue.  PXC needs 3306, 4444, 4567, and 4568 open.  For SST the critical port is 4567.  We were on hosts that were located in two data centers and all hosts were in the DMZ.  Because of this, the networking was more complex than normal with two sets of firewalls and NATting going on as well.  I am no network person but thankfully I had good support.

Our first dig into the problem showed we had ports closed and it was unclear on which end it was breaking.  We were able to use tcp dump to finally convince ourselves that traffic could flow across the network on the critical ports.  We did this by running

tcpdump -n -o eth0  > output_file.

We ran this on the joiner with the ip address of the donor and we ran it on the donor with the address of the joiner.  This gives you a view of traffic flow and can confirm that you are making the connection.

After much work by the network team to open firewalls we finally were able to see the traffic flow between the two hosts.  During our investigation it was decided that we wanted to pin down the joiner and donor so that we would always know what hosts to investigate.  Here comes our first major error.  We added the
wsrep_sst_donor= 
on our joiner.  We used the ip address of the donor!  Don't do this.  It turns out that Galera has specifically recommended against using the ip address and either wants the wsrep_node_name or the hostname of the donor!  Once we fixed this we saw success of our SST.  But our woes were not over.

We then turned our attention to another node.  It too failed SST and we started head scratching.  Yes, the ports were open.  Yes, we had the hostname in the donor attribute.  After digging in Mr. Google we found a post that mentioned a failed SST due to a version difference.  Viola!  Our puppet manifest had accidentally put the wrong version of MySQL on one of the hosts and that was the reason for the SST failure.  Once the node was upgraded correctly the SST worked.

Hopefully this will help someone someday.

You can read the full Galera post on why not to use ip address as the donor here:
http://galeracluster.com/documentation-webpages/mysqlwsrepoptions.html#wsrep-sst-donor3



Thursday, January 10, 2019

create table duplicate key! What?

I ran into an interesting and informative edge case involving asynchronous replication in MySQL and Percona's pt-online-schema_change utility.

In our 5.7 MySQL environment we were adding columns to an 18M 500G table.  Unfortunately this not only copies the table to complete the alter it also blocks the world from writing.  So we had to come up with another way to manage this change.  We have used Percona's pt-osc utility before and have great praise for it.  Our setup involved a PXC cluster and a single standalone slave.

pt-osc creates a new table _table_new with foreign keys renamed to _old_name, e.g., an underscore is added to make the fk in the new table unique.  But, we had previously run a pt-osc on this table and had swapped the old/new table successfully so we did not check that those fk names had been changed.  When we ran a new pt-osc it replicated through to the slave and failed with duplicate key on the create table ddl.

The fix was simple; drop and recreate the foreign keys without the leading _ (use set foreign_key_checks=NO to prevent scanning on the fk rebuild).  Worked like a charm and we resumed the slave.

I confess that it was not simple to figure this out and thankfully Percona was around to help me.


Sunday, January 6, 2019

with percona xtradb, mysqldump can create inconsistent ddl missing auto increment for example!

When running Percona's XtraDB cluster and using pxc_strict_mode=ENFORCNG (the default) lock tables will generate an error.

mysqldump defaults to --opt which includes --lock-tables wherin the error gets thrown.  Of course you can temporarily change pxc_strict_mode to DISABLED or PERMISSIVE to get around the error however there can be side affects of this in a live system.

A better approach is to use mysqldump without --opt by specifying --no-opt however this can generate incorrect DDL because of the embedded option (in --opt) of --create-options.  The safe method if you want to use --no-opt is to specify the options individually and omit the --lock-tables implied by --opt.

As shown by help 
--opt               Same as --add-drop-table, --add-locks, --create-options,
                      --quick, --extended-insert, --lock-tables, --set-charset,
                      and --disable-keys. Enabled by default, disable with

                      --skip-opt.

So omit --lock-tables and include the others and your ddl will be consistent.

Color me unhappy when I copied a schema from one pxc cluster to another and missed this so that my copied schema omitted auto increment and automatic update of timestamp.  

Have fun.

Monday, March 20, 2017

when is free software not free

We all have the luxury of using various open source and free software products.  I'm a huge user of PostgreSQL, MySQL, various os/x utilities, etc.  I recently started a gig at a government site and have found they have a strange, but understandable, relation to open source software.

I was trying to get them to use a Wiki type product instead of  Microsoft's SharePoint.  At the same time we were considering moving some databases to Cassandra.  In both cases, the requests went through the rather lengthy approval process only to come back rejected.

Why?  Because in the government you cannot use or install any open source product unless you have engaged a vendor to support you.  Supposedly it's the law.   Go figure.

Of course this almost defeats the purpose of open source software, but then if you look at it from the point-of-view of the vendors who make all their money from the government it makes perfect sense.  They have pressured someone to get this enacted to give them a steady stream of income.  Forget that it may make no sense, it is the rule and so without a contract, without a billing vehicle, and without a vendor there's tons of software we cannot use.  It really ties the hands of the government folks who want to do the right thing but are kept from doing so.

I love this job.

Wednesday, March 8, 2017

systemctrl seems to ignore my.cnf no matter where it lives. selinux is in the way!

I recently faced a daunting problem.  The systems group turned over a new host to me, mysql was installed but not running, my configuration file was present, and all seemed perfect.

I double checked /etc/my.cnf and tweaked a couple of settings and went to initialize the db and get it running.     mysqld --initialize-insecure
worked perfect, my new ibdata and ib_logfile files got created, the log was clean, and I thought whoopee, let's just start this up.
    systemctl start mysqld 
was my next command.

Whoa!.  What's going on here.  All the log messages are going to /var/log/messages and mysql is not starting but is cycling up and down up and down.  Pretty soon /var/log/messages was full and my bad hair day began.

Well there must be a renegade my.cnf file somewhere.  Checking mysqld to see where it thought the files might live I ran
    mysqld --help --verbose 
and found the lines
    Default options are read from the following files in the given order:
    /etc/my.cnf /etc/mysql/my.cnf /usr/etc/my.cnf ~/.my.cnf
and so dutifully went off to see if one of those other files existed and was overwriting my options.  

No such luck.  The sound you hear is my hair being pulled out.  After 72 hours, reinstalling mysql, running initialize multiple time, making my.cnf small with just a few options I was no closer to a solution.  

But then (clue music) I remembered having something like this a long time ago.  selinux.  Checking 
/etc/selinux/config I found
    # This file controls the state of SELinux on the system.
    # SELINUX= can take one of these three values:
    #     enforcing - SELinux security policy is enforced.
    #     permissive - SELinux prints warnings instead of enforcing.
    #     disabled - No SELinux policy is loaded.
    SELINUX=disabled
    # SELINUXTYPE= can take one of three two values:
    #     targeted - Targeted processes are protected,
    #     minimum - Modification of targeted policy. Only selected processes are     protected.
    #     mls - Multi Level Security protection.
    SELINUXTYPE=targeted
which is exactly what I would expect EXCEPT that the time stamp was more recent than the latest uptime.   Sure enough, the os team had fixed selinux like I asked, but they forgot to reboot the host.  selinux was getting in my way and until it was resolved mysql would never start.

Now, after a reboot, all was well,  mysql reads my my.cnf and starts up just fine.  

Remember, after disabling selinux you must reboot.  I surely won't forget to check this again. 

Thursday, January 5, 2017

mange the size of slow query logs

The mysql slow query log is great.  But if you don't use if often, or as is likely, never look at it until there is an issue, it can grow unwieldy.  Fortunately Percona's mysql clone has options to let you have n copies of the slow query log, each of a maximum size, so that it rotates the logs and keeps the size(s) under control.

You probably already have something like this in your my.cnf file
    slow_query_log=1
    slow_query_log_file=/somewhere/slow_query.log

So now add this to get additional copies and rotate amongst them.
    max_slowlog_size=500M
    max_slowlog_files=10

Thank you Percona for yet another great feature.

Monday, September 26, 2016

Mysql shutdown, dirty pages, and innodb_page_cleaners in 5.7

In a busy Mysql environment, often the shutdown is delayed for a long time by the innodb page cleaner.  On some very busy systems this delay can be hours and you will get tired of watching this message in your error log:
                   [Note] InnoDB: Waiting for page_cleaner to finish flushing of buffer pool

Prior to 5.7 all you could do was watch the dirty pages and see if you could force the number down by setting the sever to flush pages thusly:
                   set global innodb_max_dirty_pages_pct = 0;

You can then watch the dirty page count drop until it reaches a low point (likely never near zero in a busy system) where you can shut it down by monitoring the dirty pages thusly:
                     mysqladmin ext -i10 |grep pages_dirty

But in 5.7 you can now have multiple page cleaners (the above trick still works) to help get the server in a state for a quick shutdown by setting
                     innodb_page_cleaners=n

Like all good things there is no free lunch, so if you kill the server rather than waiting for the page cleaner to do it's job, you will have to pay on the startup.

Enjoy