Friday, March 4, 2016

setting up MySQL master-master replication; the complete checklist

Seems like almost all new MySQL dbas stumble a bit trying to set up master-master replication.  Here's my (hopefully) fool proof check list to try and make it work on the first try.

Given two mySql instances we'll call them host1.mydomain.com and host2.mydomain.com and we presume they are using the standard port 3306 to communicate, that you have a super mysql user we'll call root with password rootpw (name and password of course could be different), and that you can ssh to the hosts.  (Sorry but Windows is not my forte however the commands below should be very similar.)

Commands are in bold.

begin by checking on access on host1.mydomain.com and access from host1.mydomain.com to host 2.mydomain.com

  1. ssh host1.mydomain.com
  2. telnet host2.mydomain.com 3306      You should get something like "connected".  Then ctrl+v and quit  If it hangs then get someone on your network team to help you figure out why you cannot reach the other host.  In my experience this is 75% of the issues setting up replication. 
  3. which mysql (is the mysql client binary in your path?)  if not fix your path or in any mysql command specify the fully qualified location, something like /usr/bin/mysql
  4. login to mysql thusly  mysql -h host1.mydomain.com -u root -p     you will be prompted for the password
  5. record your server id.  show global variables like 'server_id';
  6. make sure you have binary logging as show binary logs;  If it says you are not using binary logging then add log-bin to your /etc/my.cnf file and bounce mysql. 
  7. exit from this instance   exit;
  8. try to reach host2 from host1.mydomain.com using the mysql client as   mysql -h host2.mydomain.com -u root -p you will be prompted for the password.  If this fails figure out why you cannot login.  Is it a user or password issue?  Are you sure the ports are open between the two hosts?  Is the root user not allowed to do remote logins in which case you may need another super user (select user from mysql.user where user='root'; and there should be a wildcard for host1 and host2 should be in the host name field)


now check on access on host2 and access to host1 from host2

  1. ssh host2.mydomain.com
  2. telnet host1.mydomain.com 3306      You should get something like "connected".  Then ctrl+v and quit  If it hangs then get someone on your network team to help you figure out why you cannot reach the other host.  In my experience this is 75% of the issues setting up replication. 
  3. which mysql (is the mysql client binary in your path?)  if not fix your path or in any mysql command specify the fully qualified location, something like /usr/bin/mysql
  4. login to mysql thusly  mysql -h host2.mydomain.com -u root -p     you will be prompted for the password
  5. record your server id.  show global variables like 'server_id';
  6. make sure you have binary logging as show binary logs;  If it says you are not using binary logging then add log-bin to your /etc/my.cnf file and bounce mysql.
  7. exit from this instance   exit;
  8. try to reach host1.mydomain.com from host2.mydomain.com using the mysql client as   mysql -h host1.mydomain.com -u root -p you will be prompted for the password.  If this fails figure out why you cannot login.  Is it a user or password issue?  Are you sure the ports are open between the two hosts?  Is the root user not allowed to do remote logins in which case you may need another super user (select user from mysql.user where user='root'; and there should be a wildcard for host1 and host2 should be in the host name field)
At this point we have proved that host1.mydomain.com and host2.mydomain.com can communicate to and from each other.  And we have the server ids, which must be different between the two hosts.  If the server ids are undefined or the same for the two hosts you must update your /etc/my.cnf configuration file to set a valid number and bounce the mysql instance.  The entry in the /etc/my.cnf file should look something like:
     server_id=900
and the numbers are not really important as long as all hosts participating in a replication setup have a unique number. 

Create a replication user on each mysql instance. We use the same user and password on both instances to keep this simple.  Do this on both host1 and host2.
     grant replication slave on *.* to 'repl'@'%' identified by 'password';
     grant replication client on *.* to 'repl'@'%' identified by 'password'; 
I'm using the user repl and you should pick a nice password.  (Purists will argue about allowing access to all schemas from all hosts; you can certainly restrict this but for simplicity I suggest you get this working first then make granularity adjustments.)

As a final check, try logging into each server from the other server using your new repl user and password.  You should be able to connect. 

Now let's define the first master and slave.  On host1.mydomain.com we find our master log position and log file name by doing show master status;   We use the file name and file position in the command below. 

On host2 we define host1.mydomain.com as our master by doing these three commands:
     change master to master_host='host1.mydomain.com',master_user='repl',master_password='password';
     change master to master_log_file='log file name from show master status';
     change master to master_log_pos=number_from_the_show_master_status;
Note the log_pos is not quoted since it's a number. 

Now start the slave by doing start slave;  Check your values by doing show master status\G which will show you a tabular list of all the settings and the status of the slave.

If all your settings are correct look for 
     Slave_IO_Running: Yes
     Slave_SQL_Running: Yes
which says the threads for replication are now running on your host1.mydomain.com and receiving data from host2.mydomain.com. If they are not Yes then check your mysql error log which will likely pinpoint the problem. Most of the time the errors are due to host1.mydomain.com and host2.mydomain.com not being able to talk to one another or binary logging not running or server ids not being set.  

Now repeat this process on host2.mydomain.com.   You will do the change master commands on  host2.mydomain.com  and you will do the show master status command on  host1.mydomain.com      













Sunday, August 25, 2013

mysql and golden gate to oracle

While installing Golden Gate for mysql replication I ran into two issues which, while documented, were buried in the books so I didn't see them. Maybe this will help someone.

1. be sure your socket file is in /tmp or you have a sym link from tmp.  Golden gate only looks here so if no socket, no connection.  Now that's just plain wrong.

2. Golden gate needs row based replication not function based.  That means you're stuck with some post 5.1.x release which maybe won't be an issue but forced us to do an in-place upgrade.  Don't ask.

Thursday, May 16, 2013

oracle dataguard reports ORA-01017: invalid username/password; logon denied

Wow, there are a ton of these on the web, so why post one more.  Because we  hit an issue which I did not find and which hopefully will help someone in the future.

In setting up dataguard on our 11g system we got the following errors after

connect /
show configuration

Configuration Status:
ORA-01017: invalid username/password; logon deniedORA-16625: cannot reach database ""DGM-17017: unable to determine configuration status


OK, this seems simple enough.  Our password file must be the problem.  After searching, regenerating the password file, and pulling out our hair we found this in our oracle parameters:


redo_transport_user    string    SYSTEM

which means, that dataguard will use SYSTEM as the connection to the database and not SYS.  Now, that's interesting but it gets more interesting.

If you generate the password file using orapwd it is likely you won't find SYSTEM in the file and that is required for dataguard to work.  In other words, merely creating the password file using orapwd and copying it to your standby site does not guarantee that it will work.

Why?  This is because with 11g a system grant such as

grant sysoper to SYSTEM;

gets written to the password file but if you come along and generate the file using orapwd you will lose this entry.

So, make certain your password file contains the correct user with sysoper privileges for dataguard to work.  You can use this query together with the redo_transport_user parameter to verify all is well.

select * from v$pwfile_users;

If the user in the redo_transport_user  is not in this list and does not have sysoper you won't be able to connect and you will get the ORA-01017 error.

Good luck.

Thursday, July 26, 2012

How to create a histogram from count data using sql

I am often asked to generate summary counts from various tables and frequently asked to "bucketize" the results for the user.  Once you do it the technique is pretty easy, but getting over that first hump can be challenging.  So here's a tutorial from a recent request (this is mysql but the sql is pretty standard).  (If you only have a few thousand rows of data it might be easier just to use the histogram function of Excel but if you have more than some maximum number of rows, this technique will always work.)

1. First get your counts, something like
         select some_key,count(*) from table1 group by some_key;

2. and stick these values into an intermediate table (yes, not required but easier to explain)
         create table kount as select some_key,count(*) as value from table1 group by some_key;

3. now decide what kinds of "buckets" you want for your histogram.  Be sure that you cover your min and max values from the previous count query.  So in my case I'll make buckets that represent orders of magnitude.  You create a table with these buckets and then populate a min and max value for each bucket thusly.
       create table bins (min_value int, max_value int);
       insert bins values(0,10),(10,100),(100,1000),(1000,10000),(10000,100000);

Our resultant buckets look like this:

+-----------+-----------+
| min_value | max_value |
+-----------+-----------+
|         0 |        10 |
|        10 |       100 |
|       100 |      1000 |
|      1000 |     10000 |
|     10000 |    100000 |
+-----------+-----------+

4. Now we do a join and get the counts for each bucket.  
      select b.min_value,b.max_value,count(*) from bins b 
          left outer join kounts a on a.value between b.min_value and b.max_value
          group by b.min_value 
          order by b.min_value

And our histogram values look like the following and can be put into your favorite charting product and made into a pretty graph.

+-----------+-----------+----------+
| min_value | max_value | count(*) |
+-----------+-----------+----------+
|         0 |        10 |  3189118 |
|        10 |       100 |   239142 |
|       100 |      1000 |     9004 |
|      1000 |     10000 |      208 |
|     10000 |    100000 |        4 |
+-----------+-----------+----------+

I hope this is useful to some new sql user.   Enjoy.  


Tuesday, September 29, 2009

operations sanity for your databases

Permit me to put on an operations hat for a moment and provide budding developers with some thoughts on how to keep out of trouble in your database environment. Here are some rules that I've found to be useful:

1. track database changes (versions etc.) in the database.

2. make all database changes backward compatible.

3. avoid triggers.

4. use referential integrity sparingly.

Now I'm sure some eyebrows will be raised by the last two recommendations. After all, how can this possibly be related to the operation of your production database. Let's look at these one at a time.

1. track database changes. I have found that trying to figure out what version, release, etc. is in my database is usually hard. You normally can't just look at a table and say, "oh, yea, that's release 2.01". An easy way to fix this is to incorporate a small table in each database (or schema) that tracks the changes. Here's the table (using a mysql format) I've used with some success.

create table db_history (

id int not null auto_increment,

version varchar(20) not null,

backed_out datetime null,

created datetime not null,

comment varchar(50) null,

primary key (id))

When I install a new release I automatically do an insert into this table to note the release installation. In the event I back it out, then I update the backed_out column. Quick and easy and any time I want to see what's been installed I simply do a select * from db_history. I can even use this table to document non-release type changes that might be made in the database simply by adding a comment.

For those of you using ruby-on-rails, they automatically implement a version table but the data contained there is pretty minimal. Of course you can backtrack from the version information to the sql but even in this environment I've found my table to be of more help.

2. make all database changes backward compatible. If I deploy a new version of my favorite web application, and later decide it wasn't quite up to snuff, then backing it out is usually as easy as moving a symlink or downloading a new tar ball. Unfortunately, this may be impossible or impractical in the case of a database change. For example, changing a column type or dropping a column can easily make your application fail. If the change requires an hour to apply to your large table, then it can easily take that long or longer to back it out. The rules to avoid this issue are pretty simple:

a. new columns and new tables should never break your code.

b. changed columns should be added and referenced by the new name but retain the old column for backward compatability and remove it in the next release.

c. dropped columns should only be removed after the second release which no longer uses them.

d. if a procedure or function has new parameters, in or out, then create a new procedure or function (e.g., foo_v2) and keep the old procedure or function around.

3. avoid triggers. Usually the functionality of a trigger can be duplicated in a procedure. Then when there is a problem, the code is right in front of the dba trying to debug the issue. This is a religious discussion and many people will disagree but I have found at 2:00 AM when things are broke that I can easily overlook a trigger.

4. use referential integrity sparingly. Again, this is a religious argument and some will disagree. But in my experience I have found many cases where an unanticipated table scan due to some RI goes un-noticed during development but haunts you in production. Sometimes RI cannot be avoided but by using it sparingly you can save yourself lots of late night head-scratching wondering why you are suddenly scanning our favorite 1M row table. Usually a check for existance of a key before an insert is all that's needed.

Have fun.

Sunday, July 19, 2009

how to use replication to forward transactions

Normal replication goes from host A to host B. But using the mysql my.cnf directive log_slave_updates you can store and forward transactions from an intermediary log to another host. So in this example, B is a slave of A; B specifies log_slave_updates, and then C as a slave of B gets all transactions from both A and B. Not quite multi-master support.

I have used this when cutting over a new system where existing hosts X & Y are replicating and I want to add a new host that will set up a new pair A & B.

Hope this is useful.

Friday, July 10, 2009

corrupt frm files for innodb tables

I've encountered a new mysql error for many (all) tables when innodb was starved for memory. The error:

Incorrect information in file: '...frm'
and any attempt to repair the table failed.

Much consternation!

But this occurred not because the table was corrupt but because I tried to give too much memory to the innodb buffer pool.

The real culprit in the error log was

InnoDB: Error: cannot allocate 2516598784 bytes of InnoDB: memory with malloc!
InnoDB: by InnoDB 228832092 bytes. Operating system errno: 12

Reducing the innodb-buffer-pool allocation to a more sane level for the box eliminated the apparent table corruption.