Saturday, April 20, 2019

More tips and a check list for that elusive Percona Xtra DB (pxc) (galera) cluster SST or IST

While it's clear that SST/IST are the magic sauce of Galera and their implementation in MySQL by Percona there are dragons lurking that can prevent this from working.  It seems that lots of people have this issue so I created this checklist you can use to try and resolve your IST/SST issues.  At one time or another every one of these issues has caused me to have an SST/IST fail.

 In general, when debugging SST/IST issues it is best to use a specific donor for your joiner node so that you can isolate your troubleshooting and testing to a specific set of nodes.

Checklist


  • Are the nodes in the same cluster?  Check wsrep_cluster_name
  • Is the server_id the same for all nodes in the cluster?
  • Is wsrep_node_address correct?  A typo here causes very interesting errors.
  • Is wsrep_node_name correct?  Again, a simple error here causes SST to fail.
  • Is gcom the same for all nodes?  Triple check this.  Use nslookup on every address to check for correctness.   Do a reverse nslookup to make sure fqdn resolves to the expected ip address.  
  • Is wsrep_sst_donor a fqdn or node name?  Do not use an ip address in this field.  Refer to the documentation for methods to code this. 
  • Are ports 3306, 4444, 4567, and 4568 open?  This is a biggie so see below for tips on how to test.
  • Are you using xtrabackup_v2 for your wsrep_sst_method?  If not you should.  My experience with rsync, mysqldump, and the deprecated xtrabackup (no _v2) have been poor.  
  • Are xtrabackup versions the same on all nodes?  
  • Is /etc/hosts fqdn and ip address correct on all nodes?  Test using dns for every name and ip address you see.
  • Make sure you examine every ERROR in the MySQL error log and sometimes the WARNINGS are also worth a look.  Infrequently another error can cause SST/IST to fail for issues actually unrelated to either SST or IST.  
  • is socat working?  (see below on how to use socat to test ports._)
  • will the MySQL server start at all?  If you have a valid install you can try something simple like mysqld --wsrep-recover to see if MySQL will even start.  If your datadir is invalid then blow it away and reinitialize MySQL using mysqld initialize_insecure to bootstrap mysql using your my.cnf and then try to start it using mysqld --wsrep-recover only to make sure it will start.  It may uncover other issues you have with MySQL that are not specific to SST/IST.  This helped me once when an error existed in my.cnf which was not logged to the error log until I ran this standalone test.  

Port testing

Now about those pesky ports.  The best way to test this is using socat which is the same utility that sst/ist uses to move data between nodes.  The socat utility established a bi-directional tunnel between hosts and should be tested on both hosts.   Substitute your ip address below for donor and joiner.  You must test this in both directions to be certain SST/IST will work.

run this pair of tests for each port 4444, 4567, and 4568.  

from the joiner run this
    socat TCP4-LISTEN:4444 STDOUT    # this starts a listener on the port

from the donor run this.  You should see hello on your joiner
    echo "hello" | socat -t 30 tcp:
:4444 -

Now reverse this and do the same where the donor becomes the listener.  Repeat for all three pots.  If your socat works then you have eliminated ports as an issue.  So go back and check the items in the checklist once again.  

For port 3306 use the MySQL client from each host to contact the other host.  Since your joiner is down you may need to start it without using the cluster in order to ensure that port 3306 is open.  (See the checklist on how to get a bare bones mysqld setup to run.) 




I hope this is useful.

See also this article (somewhat dated and redundant) from Percona.
https://www.percona.com/blog/2014/12/30/diagnosing-sst-errors-with-percona-xtradb-cluster-for-mysql/


Friday, March 29, 2019

percona SST fails with broken pipe or no route to host

I ran into a very strange set of issues when trying to build a PXC cluster node using SST.  The SST kept failing either due to "broken pipe" on both the donor and joiner or "no route to host" which was seen frequently on the joiner and less so on the donor.

Lots of head scratching and work with the network team finally figured this one out.  Percona support also played an important role.  We first started digging into the normal port issue.  PXC needs 3306, 4444, 4567, and 4568 open.  For SST the critical port is 4567.  We were on hosts that were located in two data centers and all hosts were in the DMZ.  Because of this, the networking was more complex than normal with two sets of firewalls and NATting going on as well.  I am no network person but thankfully I had good support.

Our first dig into the problem showed we had ports closed and it was unclear on which end it was breaking.  We were able to use tcp dump to finally convince ourselves that traffic could flow across the network on the critical ports.  We did this by running

tcpdump -n -o eth0  > output_file.

We ran this on the joiner with the ip address of the donor and we ran it on the donor with the address of the joiner.  This gives you a view of traffic flow and can confirm that you are making the connection.

After much work by the network team to open firewalls we finally were able to see the traffic flow between the two hosts.  During our investigation it was decided that we wanted to pin down the joiner and donor so that we would always know what hosts to investigate.  Here comes our first major error.  We added the
wsrep_sst_donor= 
on our joiner.  We used the ip address of the donor!  Don't do this.  It turns out that Galera has specifically recommended against using the ip address and either wants the wsrep_node_name or the hostname of the donor!  Once we fixed this we saw success of our SST.  But our woes were not over.

We then turned our attention to another node.  It too failed SST and we started head scratching.  Yes, the ports were open.  Yes, we had the hostname in the donor attribute.  After digging in Mr. Google we found a post that mentioned a failed SST due to a version difference.  Viola!  Our puppet manifest had accidentally put the wrong version of MySQL on one of the hosts and that was the reason for the SST failure.  Once the node was upgraded correctly the SST worked.

Hopefully this will help someone someday.

You can read the full Galera post on why not to use ip address as the donor here:
http://galeracluster.com/documentation-webpages/mysqlwsrepoptions.html#wsrep-sst-donor3



Thursday, January 10, 2019

create table duplicate key! What?

I ran into an interesting and informative edge case involving asynchronous replication in MySQL and Percona's pt-online-schema_change utility.

In our 5.7 MySQL environment we were adding columns to an 18M 500G table.  Unfortunately this not only copies the table to complete the alter it also blocks the world from writing.  So we had to come up with another way to manage this change.  We have used Percona's pt-osc utility before and have great praise for it.  Our setup involved a PXC cluster and a single standalone slave.

pt-osc creates a new table _table_new with foreign keys renamed to _old_name, e.g., an underscore is added to make the fk in the new table unique.  But, we had previously run a pt-osc on this table and had swapped the old/new table successfully so we did not check that those fk names had been changed.  When we ran a new pt-osc it replicated through to the slave and failed with duplicate key on the create table ddl.

The fix was simple; drop and recreate the foreign keys without the leading _ (use set foreign_key_checks=NO to prevent scanning on the fk rebuild).  Worked like a charm and we resumed the slave.

I confess that it was not simple to figure this out and thankfully Percona was around to help me.


Sunday, January 6, 2019

with percona xtradb, mysqldump can create inconsistent ddl missing auto increment for example!

When running Percona's XtraDB cluster and using pxc_strict_mode=ENFORCNG (the default) lock tables will generate an error.

mysqldump defaults to --opt which includes --lock-tables wherin the error gets thrown.  Of course you can temporarily change pxc_strict_mode to DISABLED or PERMISSIVE to get around the error however there can be side affects of this in a live system.

A better approach is to use mysqldump without --opt by specifying --no-opt however this can generate incorrect DDL because of the embedded option (in --opt) of --create-options.  The safe method if you want to use --no-opt is to specify the options individually and omit the --lock-tables implied by --opt.

As shown by help 
--opt               Same as --add-drop-table, --add-locks, --create-options,
                      --quick, --extended-insert, --lock-tables, --set-charset,
                      and --disable-keys. Enabled by default, disable with

                      --skip-opt.

So omit --lock-tables and include the others and your ddl will be consistent.

Color me unhappy when I copied a schema from one pxc cluster to another and missed this so that my copied schema omitted auto increment and automatic update of timestamp.  

Have fun.

Monday, March 20, 2017

when is free software not free

We all have the luxury of using various open source and free software products.  I'm a huge user of PostgreSQL, MySQL, various os/x utilities, etc.  I recently started a gig at a government site and have found they have a strange, but understandable, relation to open source software.

I was trying to get them to use a Wiki type product instead of  Microsoft's SharePoint.  At the same time we were considering moving some databases to Cassandra.  In both cases, the requests went through the rather lengthy approval process only to come back rejected.

Why?  Because in the government you cannot use or install any open source product unless you have engaged a vendor to support you.  Supposedly it's the law.   Go figure.

Of course this almost defeats the purpose of open source software, but then if you look at it from the point-of-view of the vendors who make all their money from the government it makes perfect sense.  They have pressured someone to get this enacted to give them a steady stream of income.  Forget that it may make no sense, it is the rule and so without a contract, without a billing vehicle, and without a vendor there's tons of software we cannot use.  It really ties the hands of the government folks who want to do the right thing but are kept from doing so.

I love this job.

Wednesday, March 8, 2017

systemctrl seems to ignore my.cnf no matter where it lives. selinux is in the way!

I recently faced a daunting problem.  The systems group turned over a new host to me, mysql was installed but not running, my configuration file was present, and all seemed perfect.

I double checked /etc/my.cnf and tweaked a couple of settings and went to initialize the db and get it running.     mysqld --initialize-insecure
worked perfect, my new ibdata and ib_logfile files got created, the log was clean, and I thought whoopee, let's just start this up.
    systemctl start mysqld 
was my next command.

Whoa!.  What's going on here.  All the log messages are going to /var/log/messages and mysql is not starting but is cycling up and down up and down.  Pretty soon /var/log/messages was full and my bad hair day began.

Well there must be a renegade my.cnf file somewhere.  Checking mysqld to see where it thought the files might live I ran
    mysqld --help --verbose 
and found the lines
    Default options are read from the following files in the given order:
    /etc/my.cnf /etc/mysql/my.cnf /usr/etc/my.cnf ~/.my.cnf
and so dutifully went off to see if one of those other files existed and was overwriting my options.  

No such luck.  The sound you hear is my hair being pulled out.  After 72 hours, reinstalling mysql, running initialize multiple time, making my.cnf small with just a few options I was no closer to a solution.  

But then (clue music) I remembered having something like this a long time ago.  selinux.  Checking 
/etc/selinux/config I found
    # This file controls the state of SELinux on the system.
    # SELINUX= can take one of these three values:
    #     enforcing - SELinux security policy is enforced.
    #     permissive - SELinux prints warnings instead of enforcing.
    #     disabled - No SELinux policy is loaded.
    SELINUX=disabled
    # SELINUXTYPE= can take one of three two values:
    #     targeted - Targeted processes are protected,
    #     minimum - Modification of targeted policy. Only selected processes are     protected.
    #     mls - Multi Level Security protection.
    SELINUXTYPE=targeted
which is exactly what I would expect EXCEPT that the time stamp was more recent than the latest uptime.   Sure enough, the os team had fixed selinux like I asked, but they forgot to reboot the host.  selinux was getting in my way and until it was resolved mysql would never start.

Now, after a reboot, all was well,  mysql reads my my.cnf and starts up just fine.  

Remember, after disabling selinux you must reboot.  I surely won't forget to check this again. 

Thursday, January 5, 2017

mange the size of slow query logs

The mysql slow query log is great.  But if you don't use if often, or as is likely, never look at it until there is an issue, it can grow unwieldy.  Fortunately Percona's mysql clone has options to let you have n copies of the slow query log, each of a maximum size, so that it rotates the logs and keeps the size(s) under control.

You probably already have something like this in your my.cnf file
    slow_query_log=1
    slow_query_log_file=/somewhere/slow_query.log

So now add this to get additional copies and rotate amongst them.
    max_slowlog_size=500M
    max_slowlog_files=10

Thank you Percona for yet another great feature.