By: Gene Liverman user 26 Jul 2016 at 3:43 p.m. CDT

45 Responses
Gene Liverman gravatar
I am following the guide at [https://www.gluu.org/docs/cluster/](https://www.gluu.org/docs/cluster/) and must be missing something because it errors out with `Invalid Credentials`. ``` GLUU.[root@gluu-test ~]# /opt/opendj/bin/dsreplication initialize >>>> Specify server administration connection parameters for the source server Directory server hostname or IP address [gluu-test.example.edu]: 10.10.5.129 Directory server administration port number [4444]: How do you want to trust the server certificate? 1) Automatically trust 2) Use a truststore 3) Manually validate Enter choice [3]: 1 Global Administrator User ID [admin]: Password for user 'admin': >>>> Specify server administration connection parameters for the destination server Directory server hostname or IP address [gluu-test.example.edu]: 10.10.5.130 Directory server administration port number [4444]: How do you want to trust the server certificate? 1) Automatically trust 2) Use a truststore 3) Manually validate Enter choice [3]: 1 You must choose at least one base DN to be initialized. Initialize base DN o=gluu? (yes / no) [yes]: Initializing the contents of a base DN removes all the existing contents of that base DN. Do you want to remove the contents of the selected base DNs on server 10.10.5.130:4444 and replace them with the contents of server 10.10.5.129:4444? (yes / no) [yes]: The provided credentials are not valid in server 10.10.5.129:4444. Details: [LDAP: error code 49 - Invalid Credentials] The provided credentials are not valid in server 10.10.5.130:4444. Details: [LDAP: error code 49 - Invalid Credentials] ``` I am using the "Admin Pass" that was created with setup.py but have also tried the one it had me make for the replication admin... no luck either way. Prior to replication setup I did replace the certs for apache and tomcat via https://www.gluu.org/docs/further-reading/cert-full-update-procedures/#apache-web-server-certificate-update-process with a publicly signed one.

By Aliaksandr Samuseu staff 26 Jul 2016 at 3:53 p.m. CDT

Aliaksandr Samuseu gravatar
Hi, Gene. We've seen that already twice. It doesn't have anything to do with wrong credentials (though you should give this version a try, like, try to make a search in Gluu's LDAP directory with `ldapsearch` tool using your global admin's credentials, it should succeed), it's just confusing error message. 2 advices for you (try to apply them one after another, we would like to know ourselves which one will help): 1. When you will supply your replication nodes to script, use ip addresses, not dns names 2. Try to sync OpenDJ's certificate between both nodes. Regards, Alex.

By Gene Liverman user 27 Jul 2016 at 6:49 a.m. CDT

Gene Liverman gravatar
I was already trying to use the IP's so that isn't the fix. I also tried syncing the `/opt/opendj/config/keystore` and `/opt/opendj/config/keystore.pin` but that did not help. Also of note, before and now both, the connection to the local one isn't working so I doubt its a cert issue. Is the global admin "admin" and the password that was set by `setup.py` and displayed post-install?

By Aliaksandr Samuseu staff 27 Jul 2016 at 7:12 a.m. CDT

Aliaksandr Samuseu gravatar
>I also tried syncing the /opt/opendj/config/keystore and /opt/opendj/config/keystore.pin but that did not help It's not enough. The most important part is you must ensure that it also added to the default java keystore. As you said you already synced Apache's and Tomcat's certs using this guide, you should know procedure very well already. So, for example, if you decide to use certs from 1st node as source, you need: 1. Copy the `keystore` and `pin` files from there to the 2nd node (what you've done already) 2. At the 2nd node, remove previous OpenDJ's certificate from default java store, and add here the one you will get from the 1st node My guess is that because all these tools are java applications, when they try to establish initial connections for setting up replication, they use SSL and fail because LDAP server at other node provides them a self-signed certificate they can't verify (because the only way to make it trust self-signed cert is to have it added to some trustable cert/key storage on the node, like, default java storage) The strange thing is that I haven't experienced this kind of issue when creating a cluster using Ubuntu-based instances. It seems whatever it uses for providing SSL layer verifies certificates differently on different distros >Is the global admin "admin" and the password that was set by setup.py and displayed post-install? No, those are creds for "cn=directory manager" user, which is unique for each of the instance (until replication will start to function, at least). When you enable replication, it asks you to provide password for global admin user, it's an administrator account that can be used to access any LDAP node in your replication topology as an admin - it will initialize this account for you and will assign it a password you'll provide there. By default it should has name "cn=admin,cn=Administrators,cn=admin data".

By Aliaksandr Samuseu staff 27 Jul 2016 at 8:24 a.m. CDT

Aliaksandr Samuseu gravatar
One more thing. I'm not sure it's mandatory, but just in case: copy opendj certificate that is also stored in `/etc/certs` from the 1st node to the 2nd too.

By Gene Liverman user 27 Jul 2016 at 1:49 p.m. CDT

Gene Liverman gravatar
This does not seem to have helped so I am going to roll the VM's back to a snapshot I took just prior to setting up replication and try again. I will keep close notes on exactly what comes next and will post back shortly.

By Aliaksandr Samuseu staff 27 Jul 2016 at 2:16 p.m. CDT

Aliaksandr Samuseu gravatar
Yes, that will be better approach at this point. Please try my suggestions regarding OpenDJ's certs too, feel free to ask for details if it will seem for you that you don't understand some of the steps clearly.

By Gene Liverman user 27 Jul 2016 at 2:46 p.m. CDT

Gene Liverman gravatar
Still no luck. Below is a detailed account of what I just did. Going to start from scratch and try setting up replication BEFORE updating any of the webserver certificates. * Verified I can log into both gluu servers via the web interface * Ran the following on host1: ```bash GLUU.[root@gluu-test ~]# ./ldapGeneralConfigInstall.py Password for 'cn=Directory Manager': Setting Global properties... Setting Default Password Policy properties... ``` ```bash GLUU.[root@gluu-test ~]# ./replicationSetup.py Create a password for the replication admin: Enter number of OpenDJ servers: 2 Enter the hostname of server 1: 10.10.5.129 Enter the Directory Manager password for 10.10.5.129: Enter the hostname of server 2: 10.10.5.130 Enter the Directory Manager password for 10.10.5.130: Establishing connections ..... Done. Checking registration information ..... Done. Configuring Replication port on server 10.10.5.129:4444 ..... Done. Configuring Replication port on server 10.10.5.130:4444 ..... Done. Updating replication configuration for baseDN o=gluu on server 10.10.5.129:4444 .....Done. Updating replication configuration for baseDN o=gluu on server 10.10.5.130:4444 .....Done. Updating registration configuration on server 10.10.5.129:4444 ..... Done. Updating registration configuration on server 10.10.5.130:4444 ..... Done. Updating replication configuration for baseDN cn=schema on server 10.10.5.129:4444 .....Done. Updating replication configuration for baseDN cn=schema on server 10.10.5.130:4444 .....Done. Initializing registration information on server 10.10.5.130:4444 with the contents of server 10.10.5.129:4444 .....Done. Initializing schema on server 10.10.5.130:4444 with the contents of server 10.10.5.129:4444 .....Done. Replication has been successfully enabled. Note that for replication to work you must initialize the contents of the base DNs that are being replicated (use dsreplication initialize to do so). See /tmp/opendj-replication-3247646063935294775.log for a detailed log of this operation. Enabling Replication Complete. ``` * Used ip for dsreplication initialize and it failed. * Copied these from host1 to host2 * /opt/gluu-server-2.4.3/etc/certs/opendj.crt * /opt/gluu-server-2.4.3/opt/opendj/config/keystore * /opt/gluu-server-2.4.3/opt/opendj/config/keystore.pin * On host2: ```bash GLUU.[root@gluu-test ~]# keytool -list -v -keystore /usr/java/latest/lib/security/cacerts -storepass changeit | grep -i '_opendj' Alias name: gluu-test.example.edu_opendj GLUU.[root@gluu-test ~]# keytool -delete -alias gluu-test.example.edu_opendj -keystore /usr/java/latest/lib/security/cacerts -storepass changeit GLUU.[root@gluu-test ~]# openssl x509 -in /etc/certs/opendj.crt -outform der -out /etc/certs/opendj.der GLUU.[root@gluu-test ~]# keytool -import -alias gluu-test.example.edu_opendj --trustcacerts -file /etc/certs/opendj.der -keystore /usr/java/latest/lib/security/cacerts -storepass changeit Owner: CN=localhost, O=OpenDJ RSA Self-Signed Certificate Issuer: CN=localhost, O=OpenDJ RSA Self-Signed Certificate Serial number: 2c118dcf Valid from: Tue Jul 26 18:56:28 UTC 2016 until: Mon Jul 21 18:56:28 UTC 2036 Certificate fingerprints: MD5: 2B:DD:98:BA:69:FE:BB:8A:8B:92:32:E6:5C:3D:18:97 SHA1: 2C:53:E9:B0:28:71:80:82:95:77:95:4D:F4:D9:76:21:00:47:22:69 SHA256: 6B:4E:D9:2B:CB:6D:ED:F5:01:7D:21:53:F4:D1:71:45:E2:24:CB:4C:1B:5C:34:11:11:34:4C:E6:A7:F9:8E:4D Signature algorithm name: SHA1withRSA Version: 3 Trust this certificate? [no]: yes Certificate was added to keystore GLUU.[root@gluu-test ~]# /etc/init.d/tomcat restart && /etc/init.d/apache2 restart ``` * On host1 I reran `/opt/opendj/bin/dsreplication initialize` and got the exact same failing results.

By Gene Liverman user 28 Jul 2016 at 8:55 a.m. CDT

Gene Liverman gravatar
I started from scratch and it still failed. This time I added the partner host's cert to the default store instead of just copying it over but still no luck. Here is what I did on that front: ``` # On host2 openssl x509 -in /etc/certs/opendj.crt -outform der -out /etc/certs/opendj-gluu-idp-t02.der ``` Copied `opendj-gluu-idp-t02.der` to host1 then: ``` # On host1 keytool -import -alias gluu-idp-t02_opendj --trustcacerts -file /etc/certs/opendj-gluu-idp-t02.der -keystore /usr/java/latest/lib/security/cacerts -storepass changeit /etc/init.d/tomcat restart && /etc/init.d/httpd restart ``` Repeated import steps on host2 then re-ran `/opt/opendj/bin/dsreplication initialize` on host1 with the same failure.

By Gene Liverman user 28 Jul 2016 at 9:15 a.m. CDT

Gene Liverman gravatar
Just for kicks, I also tried adding the partner server's cert to the keystore used by opendj... ``` # on host1: keytool -import -alias gluu-idp-t02_opendj -file /etc/certs/opendj-gluu-idp-t02.der -keystore /opt/opendj/config/keystore -storepass `cat /opt/opendj/config/keystore.pin` /etc/init.d/tomcat restart && /etc/init.d/httpd restart ``` ``` # on host2: keytool -import -alias gluu-idp-t01_opendj -file /etc/certs/opendj-gluu-idp-t01.der -keystore /opt/opendj/config/keystore -storepass `cat /opt/opendj/config/keystore.pin` /etc/init.d/tomcat restart && /etc/init.d/httpd restart ```

By Aliaksandr Samuseu staff 28 Jul 2016 at 9:27 a.m. CDT

Aliaksandr Samuseu gravatar
Hi, Gene. > Just for kicks, I also tried adding the partner server's cert to the keystore used by opendj Don't think it's needed, as it's auxiliary tools what encounters problems, not the OpenDJ itself. Most likely they use default keystore. >This time I added the partner host's cert to the default store instead of just copying it over but still no luck. Have you removed previous cert from there? You can check by listing all of its contents and grepping for `_opendj`. It's better follow all steps to the letter in this case, I remember another user having problems with cert update task because he left old cert in there.

By Aliaksandr Samuseu staff 28 Jul 2016 at 9:33 a.m. CDT

Aliaksandr Samuseu gravatar
So, to clarify things, what you need to do: 1. Copy the `keystore` and `pin` files from the 1st to the 2nd node 2. Copy the `/etc/certs/opendj.crt` from the 1st to the 2nd 3. At the 2nd node, remove previous OpenDJ's certificate from default java store (I guess, it will have alias `gluu-idp-t02_opendj` there), and add there the one you will get from the 1st node, using the same alias the deleted one had.

By Aliaksandr Samuseu staff 28 Jul 2016 at 9:38 a.m. CDT

Aliaksandr Samuseu gravatar
Sorry, one more thing: 4) Restart opendj service at the 2nd node before proceeding to setting replication; or may be even the whole Gluu service there

By Aliaksandr Samuseu staff 28 Jul 2016 at 11:47 a.m. CDT

Aliaksandr Samuseu gravatar
That should have been asked first, actually: in our manual cluster guide we have a list of ports that must be opened on both nodes for it to function. Have you checked the possibility that some of them may be shut down by firewall? From what I've seen in discussions of this issue on OpenDJ's forum, this error message may actually erroneously appear under wide range of different circumstances. It should be interpreted in a very general manner, like "_something_ prevented the tool from establishing the connection"

By Gene Liverman user 28 Jul 2016 at 1:38 p.m. CDT

Gene Liverman gravatar
None of that helped and I do not have any firewall rules blocking anything... could this be a CentOS 6 thing? My instance of CentOS is fully patched.

By Aliaksandr Samuseu staff 28 Jul 2016 at 1:40 p.m. CDT

Aliaksandr Samuseu gravatar
How urgent is it? Do you have any particular deadline? I'm thinking about setting a test environment similar to yours and checking it for myself.

By Gene Liverman user 28 Jul 2016 at 1:44 p.m. CDT

Gene Liverman gravatar
Go for it. I want to get this up and running sooner rather than later but have some time too. FWIW, the main reason I chose CentOS 6 was that csync2 is available via a repo for it... otherwise I would have chosen CentOS 7. Just for curiosity sake, I am about to try a CentOS 7 setup in Vagrant to see if there is any difference. Thanks!

By Gene Liverman user 28 Jul 2016 at 2:50 p.m. CDT

Gene Liverman gravatar
Same issue on CentOS 7.2.

By Gene Liverman user 28 Jul 2016 at 7:38 p.m. CDT

Gene Liverman gravatar
Oddly, also having the same issue on ubuntu 14.04. I followed [https://www.gluu.org/docs/cluster/](https://www.gluu.org/docs/cluster/), had the same failure, copied over `/opt/opendj/config/keystore`, `/opt/opendj/config/keystore.pin`, and `/etc/certs/opendj.crt`, converted `opendj.crt` to `opendj.der`, deleted the existing entry for opendj in the default store, and imported via this: ``` keytool -import -alias gluubox.localdomain_opendj --trustcacerts -file /etc/certs/opendj.der -keystore /usr/java/latest/lib/security/cacerts -storepass changeit ``` I then reran `/opt/opendj/bin/dsreplication initialize` using IP's and it still failed. I verified that I can telnet from one host to the other on port 4444 and there is nothing special about these vm's so I am at a loss.

By Gene Liverman user 29 Jul 2016 at 8:45 a.m. CDT

Gene Liverman gravatar
Just for clarity and to ensure I wasn't messing things up or skipping a step, I scripted a lot of steps and worked in a single terminal on a fresh Ubuntu setup. Here is what I did in the order it was done. Any scripts referenced in a command will be shown just below the step. All this work was done on a pair of Vagrant boxes based on the `bento/ubuntu-14.04` box. `gluubox01` is my primary box, `gluubox02` is my secondary box. The cluster URL is [https://gluubox.localdomain](https://gluubox.localdomain). On gluubox01 I installed per [https://www.gluu.org/docs/deployment/ubuntu/](https://www.gluu.org/docs/deployment/ubuntu/). I then ran `setup.py` which resulted in: ``` hostname gluubox.localdomain orgName Mine os ubuntu city Sometown state GA countryCode US support email root@localhost tomcat max ram 1536 Admin Pass password Install oxAuth True Install oxTrust True Install LDAP True Install Apache 2 web server True Install Shibboleth 2 SAML IDP True Install Asimba SAML Proxy False Install CAS True Install oxAuth RP False ``` I then logged out of the chroot and executed `/vagrant/export.sh` (shown below) ```bash #!/bin/bash cp /opt/gluu-server-2.4.3/install/community-edition-setup/setup.properties.last /vagrant/setup.properties cp /opt/gluu-server-2.4.3/etc/certs/opendj.crt /vagrant/ openssl x509 -in /opt/gluu-server-2.4.3/etc/certs/opendj.crt -outform der -out /vagrant/opendj.der cp /opt/gluu-server-2.4.3/opt/opendj/config/keystore /vagrant/ cp /opt/gluu-server-2.4.3/opt/opendj/config/keystore.pin /vagrant/ cp /vagrant/cluster.sh /opt/gluu-server-2.4.3/root/ ``` I then logged out of gluubox01 and logged into gluubox02 and installed Gluu. I then ran `/vagrant/import.sh` (shown below) and followed the prompts echo'ed out by it. ```bash #!/bin/bash # Copy in the setup properties cp /vagrant/setup.properties /opt/gluu-server-2.4.3/install/community-edition-setup/setup.properties /etc/init.d/gluu-server-2.4.3 start echo 'After logging into the chroot, cd /install/community-edition-setup and run "/setup.py -c -s"' echo 'This script will continue once you log out of the chroot' echo /etc/init.d/gluu-server-2.4.3 login # post config (run after logging out of the chroot) /etc/init.d/gluu-server-2.4.3 stop cp /vagrant/opendj.crt /opt/gluu-server-2.4.3/etc/certs/opendj.crt cp /vagrant/opendj.der /opt/gluu-server-2.4.3/etc/certs/opendj.der cp /vagrant/keystore /opt/gluu-server-2.4.3/opt/opendj/config/keystore cp /vagrant/keystore.pin /opt/gluu-server-2.4.3/opt/opendj/config/keystore.pin /etc/init.d/gluu-server-2.4.3 start # cluster name inst='gluubox.localdomain' cat > /opt/gluu-server-2.4.3/root/cert-import.sh << EOF keytool -delete -alias ${inst}_opendj -keystore /usr/java/latest/lib/security/cacerts -storepass changeit keytool -import -alias ${inst}_opendj --trustcacerts -file /etc/certs/opendj.der -keystore /usr/java/latest/lib/security/cacerts -storepass changeit /etc/init.d/tomcat restart /etc/init.d/apache2 restart EOF chmod +x /opt/gluu-server-2.4.3/root/cert-import.sh echo 'run /root/cert-import.sh before returning to box1' echo /etc/init.d/gluu-server-2.4.3 login ``` After running `/root/cert-import.sh` inside the chroot I logged out of gluubox02 and went back to gluubox01. On gluubox01 I logged into the chroot and ran `/root/cluster.sh` that was copied in when exporting certs (cluster.sh is below) ```bash #!/bin/bash wget https://www.gluu.org/docs/cluster/ldapGeneralConfigInstall.py wget https://www.gluu.org/docs/cluster/replicationSetup.py # the ip's on the network between the hosts # these get used during replication setup echo 'host 1 = 172.28.128.11' echo 'host 2 = 172.28.128.12' python ldapGeneralConfigInstall.py python replicationSetup.py /opt/opendj/bin/dsreplication initialize ``` The running of `ldapGeneralConfigInstall.py` and `replicationSetup.py` went fine but I got the following (again) when running `/opt/opendj/bin/dsreplication initialize`: ``` >>>> Specify server administration connection parameters for the source server Directory server hostname or IP address [gluubox.localdomain]: 172.28.128.11 Directory server administration port number [4444]: How do you want to trust the server certificate? 1) Automatically trust 2) Use a truststore 3) Manually validate Enter choice [3]: 1 Global Administrator User ID [admin]: Password for user 'admin': >>>> Specify server administration connection parameters for the destination server Directory server hostname or IP address [gluubox.localdomain]: 172.28.128.12 Directory server administration port number [4444]: How do you want to trust the server certificate? 1) Automatically trust 2) Use a truststore 3) Manually validate Enter choice [3]: 1 You must choose at least one base DN to be initialized. Initialize base DN o=gluu? (yes / no) [yes]: Initializing the contents of a base DN removes all the existing contents of that base DN. Do you want to remove the contents of the selected base DNs on server 172.28.128.12:4444 and replace them with the contents of server 172.28.128.11:4444? (yes / no) [yes]: The provided credentials are not valid in server 172.28.128.11:4444. Details: [LDAP: error code 49 - Invalid Credentials] The provided credentials are not valid in server 172.28.128.12:4444. Details: [LDAP: error code 49 - Invalid Credentials] ```

By Aliaksandr Samuseu staff 29 Jul 2016 at 2:29 p.m. CDT

Aliaksandr Samuseu gravatar
Hi, Gene (sorry for typo). Thanks, it's a good step-by-step now. One thing, though. This has nothing to do with replication problems, but >tomcat max ram 1536 isn't enough. Please check our [requirements page](https://www.gluu.org/docs/deployment/).

By Gene Liverman user 29 Jul 2016 at 3:37 p.m. CDT

Gene Liverman gravatar
WOW, I can't believe its that simple! I will try this later this evening to verify. As a side note, it might be worth you all updating the default value in setup.py so that people like me don't run into this.

By Gene Liverman user 29 Jul 2016 at 3:39 p.m. CDT

Gene Liverman gravatar
BTW, is there an easy way to monitor whether or not I need to allocate more RAM to Tomcat? The guide says 3GB for test and 4-6 for prod but I am not sure what that will actually translate to for us.

By Aliaksandr Samuseu staff 29 Jul 2016 at 3:43 p.m. CDT

Aliaksandr Samuseu gravatar
One more observation: You do ``` /etc/init.d/tomcat restart /etc/init.d/apache2 restart ``` after importing certificates, but where is `# /etc/init.d/opendj restart` ?. (I've suggested it a bit later then my main post on required steps, totally slipped my mind back then, please see [this post](https://support.gluu.org/installation/dsreplication-initialize-fails-3027#at13101)). Have you restarted opendj on the 2nd node last time you tried this, after you copied certs, but before you attempted to configure replication?

By Aliaksandr Samuseu staff 29 Jul 2016 at 3:51 p.m. CDT

Aliaksandr Samuseu gravatar
> WOW, I can't believe its that simple! I will try this later this evening to verify. As a side note, it might be worth you all updating the default value in setup.py so that people like me don't run into this. Sorry that I must let you down, but you probably gave a quick glance to my post and missed the point that it won't help with replication problems :) At least it shouldn't. OpenDJ runs in a separate jvm than other Gluu's components, these allocations won't affect it. It's just with this little RAM devoted for tomcat's heap you most likely will get an empty/error page after you'll resolve ongoing issue anyway. >BTW, is there an easy way to monitor whether or not I need to allocate more RAM to Tomcat? The guide says 3GB for test and 4-6 for prod but I am not sure what that will actually translate to for us. It's in no way more special than any other jvm. You can monitor Tomcat's and OpenDJ's jvms the same way you monitor any other jvm. I personally use `VisualVM` + `jstatd` combo for remote monitoring, and `jstat` tool from command line (first you need to find out pid's of jvms with `jps` tool)

By Aliaksandr Samuseu staff 29 Jul 2016 at 3:57 p.m. CDT

Aliaksandr Samuseu gravatar
> it might be worth you all updating the default value in setup.py so that people like me don't run into this Yes, it's done in upcoming 2.4.4 package. It's a legacy from times of 2.3.x package.

By Gene Liverman user 29 Jul 2016 at 7:20 p.m. CDT

Gene Liverman gravatar
I had not tried restarting opendj but that did not help. I did, however, on restart get a ton of messages. See below for the output on host2. ```bash root@gluubox:~# /etc/init.d/gluu-server-2.4.3 login gluu-server-2.4.3 is running... logging in... Welcome to the Gluu Server! GLUU.root@gluubox:~# /etc/init.d/opendj restart Running: /bin/su ldap -c cd /opt/opendj/bin ; /opt/opendj/bin/stop-ds Stopping Server... [30/Jul/2016:00:14:22 +0000] category=SYNC severity=WARNING msgID=org.opends.messages.replication.135 msg=Replication server RS(2874) ignoring update 000001563926d6f1303a00000017 for domain "o=gluu" from directory server DS(12346) at 172.28.128.12/172.28.128.12:36586 because its generation ID 2230189 is different to the local generation ID 2184674 [30/Jul/2016:00:14:27 +0000] category=PLUGGABLE severity=NOTICE msgID=org.opends.messages.backend.370 msg=The backend site is now taken offline [30/Jul/2016:00:14:27 +0000] category=PLUGGABLE severity=NOTICE msgID=org.opends.messages.backend.370 msg=The backend userRoot is now taken offline [30/Jul/2016:00:14:27 +0000] category=CORE severity=NOTICE msgID=org.opends.messages.core.203 msg=The Directory Server is now stopped Running: /bin/su ldap -c cd /opt/opendj/bin ; /opt/opendj/bin/start-ds [30/Jul/2016:00:14:31 +0000] category=CORE severity=NOTICE msgID=org.opends.messages.core.134 msg=Gluu-OpenDJ 3.0.0-gluu (build 20160331045526, revision number ee0b5ef693678ceb4fa0e0794a4387aba2fe84cf) starting up [30/Jul/2016:00:14:33 +0000] category=UTIL severity=NOTICE msgID=org.opends.messages.runtime.21 msg=Installation Directory: /opt/opendj [30/Jul/2016:00:14:33 +0000] category=UTIL severity=NOTICE msgID=org.opends.messages.runtime.23 msg=Instance Directory: /opt/opendj [30/Jul/2016:00:14:33 +0000] category=UTIL severity=NOTICE msgID=org.opends.messages.runtime.17 msg=JVM Information: 1.7.0_95-b00 by Oracle Corporation, 64-bit architecture, 703070208 bytes heap size [30/Jul/2016:00:14:33 +0000] category=UTIL severity=NOTICE msgID=org.opends.messages.runtime.18 msg=JVM Host: gluubox.localdomain, running Linux 3.13.0-92-generic amd64, 3156021248 bytes physical memory size, number of processors available 2 [30/Jul/2016:00:14:33 +0000] category=UTIL severity=NOTICE msgID=org.opends.messages.runtime.19 msg=JVM Arguments: "-Dorg.opends.server.scriptName=start-ds" [30/Jul/2016:00:14:34 +0000] category=PLUGGABLE severity=NOTICE msgID=org.opends.messages.backend.513 msg=The database backend userRoot containing 153 entries has started [30/Jul/2016:00:14:35 +0000] category=PLUGGABLE severity=NOTICE msgID=org.opends.messages.backend.513 msg=The database backend site containing 2 entries has started [30/Jul/2016:00:14:35 +0000] category=EXTENSIONS severity=NOTICE msgID=org.opends.messages.extension.221 msg=DIGEST-MD5 SASL mechanism using a server fully qualified domain name of: localhost [30/Jul/2016:00:14:35 +0000] category=SYNC severity=NOTICE msgID=org.opends.messages.replication.204 msg=Replication server RS(2874) started listening for new connections on address 0.0.0.0 port 8989 [30/Jul/2016:00:14:36 +0000] category=SYNC severity=WARNING msgID=org.opends.messages.replication.146 msg=Directory server DS(12346) at 172.28.128.12/172.28.128.12:36640 presented generation ID 2230189 for domain "o=gluu", but the generation ID of this replication server RS(2874) is 2184674. This usually indicates that one or more directory servers in the replication topology have not been initialized with the same data, and re-initialization is required [30/Jul/2016:00:14:36 +0000] category=SYNC severity=WARNING msgID=org.opends.messages.replication.96 msg=Directory server DS(12346) has connected to replication server RS(2874) for domain "o=gluu" at 172.28.128.12/172.28.128.12:8989, but the generation IDs do not match, indicating that a full re-initialization is required. The local (DS) generation ID is 2230189 and the remote (RS) generation ID is 2184674 [30/Jul/2016:00:14:36 +0000] category=SYNC severity=WARNING msgID=org.opends.messages.replication.136 msg=Replication server RS(2874) not sending update 00000156348b71a96ae800000001 for domain "o=gluu" to directory server DS(12346) at 172.28.128.12/172.28.128.12:36640 because its generation ID 2230189 is different to the local generation ID 2184674 [30/Jul/2016:00:14:36 +0000] category=SYNC severity=WARNING msgID=org.opends.messages.replication.136 msg=Replication server RS(2874) not sending update 00000156348c602a6ae800000002 for domain "o=gluu" to directory server DS(12346) at 172.28.128.12/172.28.128.12:36640 because its generation ID 2230189 is different to the local generation ID 2184674 [30/Jul/2016:00:14:36 +0000] category=SYNC severity=WARNING msgID=org.opends.messages.replication.136 msg=Replication server RS(2874) not sending update 00000156348d4da36ae800000003 for domain "o=gluu" to directory server DS(12346) at 172.28.128.12/172.28.128.12:36640 because its generation ID 2230189 is different to the local generation ID 2184674 [30/Jul/2016:00:14:36 +0000] category=SYNC severity=WARNING msgID=org.opends.messages.replication.136 msg=Replication server RS(2874) not sending update 00000156348e05076ae800000004 for domain "o=gluu" to directory server DS(12346) at 172.28.128.12/172.28.128.12:36640 because its generation ID 2230189 is different to the local generation ID 2184674 ``` That gets repeated A LOT! Here is what's after it: ```bash 2.28.128.12/172.28.128.12:36640 because its generation ID 2230189 is different to the local generation ID 2184674 [30/Jul/2016:00:14:37 +0000] category=SYNC severity=WARNING msgID=org.opends.messages.replication.136 msg=Replication server RS(2874) not sending update 00000156392656ad6ae80000001c for domain "o=gluu" to directory server DS(12346) at 172.28.128.12/172.28.128.12:36640 because its generation ID 2230189 is different to the local generation ID 2184674 [30/Jul/2016:00:14:37 +0000] category=SYNC severity=NOTICE msgID=org.opends.messages.replication.62 msg=Directory server DS(1020) has connected to replication server RS(2874) for domain "cn=schema" at 172.28.128.12/172.28.128.12:8989 with generation ID 8408 [30/Jul/2016:00:14:37 +0000] category=SYNC severity=NOTICE msgID=org.opends.messages.replication.62 msg=Directory server DS(16662) has connected to replication server RS(2874) for domain "cn=admin data" at 172.28.128.12/172.28.128.12:8989 with generation ID 156291 [30/Jul/2016:00:14:37 +0000] category=PROTOCOL severity=NOTICE msgID=org.opends.messages.protocol.276 msg=Started listening for new connections on Administration Connector 0.0.0.0 port 4444 [30/Jul/2016:00:14:37 +0000] category=PROTOCOL severity=NOTICE msgID=org.opends.messages.protocol.276 msg=Started listening for new connections on LDAPS Connection Handler 0.0.0.0 port 1636 [30/Jul/2016:00:14:37 +0000] category=CORE severity=NOTICE msgID=org.opends.messages.core.135 msg=The Directory Server has started successfully [30/Jul/2016:00:14:37 +0000] category=CORE severity=NOTICE msgID=org.opends.messages.core.139 msg=The Directory Server has sent an alert notification generated by class org.opends.server.core.DirectoryServer (alert type org.opends.server.DirectoryServerStarted, alert ID org.opends.messages.core-135): The Directory Server has started successfully ``` I did miss that the memory thing wouldn't help... no biggie :)

By Simon Devlin user 01 Aug 2016 at 9:58 a.m. CDT

Simon Devlin gravatar
Guys Not sure if I'm on-track here or not, but I've been having the same issues with Invalid Credentials. I found that this worked for me - notice the use of the initialize-all command as described in the [forgerock docs](https://forgerock.org/opendj/doc/bootstrap/admin-guide/#init-repl-online) It's at least got me to the point where I can view replication status. Haven't had a chance to check into it further yet. ``` /opt/opendj/bin/dsreplication enable --host1 gbnvplapp123.example.com --port1 4444 --bindDN1 "cn=directory manager" --bindPassword1 my_bind_pw --replicationPort1 8989 --host2 gbnvplapp124.example.com --port2 4444 --bindDN2 "cn=directory manager" --bindPassword2 my_bind_pw --replicationPort2 8989 --adminUID admin --adminPassword my_admin_pw --baseDN "o=gluu" -X -n --trustAll /opt/opendj/bin/dsreplication initialize-all --adminUID admin --adminPassword my_admin_pw --baseDN o=gluu --hostname gbnvplapp123.example.com --port 4444 --trustAll --no-prompt ``` which results in ``` /opt/opendj/bin/dsreplication status -h localhost -p 4444 -I admin -w my_admin_pw -X -n -bash-4.2# /opt/opendj/bin/dsreplication status -h localhost -p 4444 -I admin -w my_admin_pw -X -n Suffix DN : Server : Entries : Replication enabled : DS ID : RS ID : RS Port (1) : M.C. (2) : A.O.M.C. (3) : Security (4) ----------:------------------------------:---------:---------------------:-------:-------:-------------:----------:--------------:------------- o=gluu : gbnvplapp123.example.com:4444 : 153 : true : 6039 : 24394 : 8989 : 0 : : false o=gluu : gbnvplapp124.example.com:4444 : 153 : true : 11616 : 28885 : 8989 : 0 : : false o=site : gbnvplapp123.example.com:4444 : 2 : : : : : : : o=site : gbnvplapp124.example.com:4444 : 2 : : : : : : : [1] The port used to communicate between the servers whose contents are being replicated. [2] The number of changes that are still missing on this server (and that have been applied to at least one of the other servers). [3] Age of oldest missing change: the date on which the oldest change that has not arrived on this server was generated. [4] Whether the replication communication through the replication port is encrypted or not. ``` These aren't secure connections but I'll take replication over TLS at the moment. Hope it's some help.

By Gene Liverman user 08 Aug 2016 at 12:52 p.m. CDT

Gene Liverman gravatar
I am back to looking at this today. I am starting over with the new 2.4.4 version that was recently released and will post back soon. I will also try Simon Devlin's solution if my initial results are unsuccessful.

By Aliaksandr Samuseu staff 08 Aug 2016 at 1:25 p.m. CDT

Aliaksandr Samuseu gravatar
Hi, Gene. I've been able to verify that issue exists. I still believe it's some kind of certificate issue, so it's not a bug, but an incorrect configuration. But as it's clear that our docs on clustering don't work as a step by step guide atm, it's still our issue anyway. I've tried Simon's advice for new 2.4.4 package, and it didn't work. It also needs to be mentioned that OpenDJ uses 3 certificates for SSL connections, for listeners on ports 1636, 4444 and 8989 - they are all different ones. What I've discovered is that we have only one of them - for port 1636 - in the default java keystore out of the box. As we are using ports 4444 and 8989 for all replication-related tasks, this really could be the cause of the issue. It's actually may be more than that, if you'll have a look at the `/opt/opendj/config` dir's composition. Here are related files from there: 1. `keystore` + `keystore.pin` 2. `truststore` 3. `admin-keystore` + `admin-keystore.pin` 4. `admin-truststore` 5. `ads-truststore` + `ads-truststore.pin` First 2 are related to regular secure connections (port 1636), but items 3) and 4) are for port 4444, and item 5) is for 8989 (replication). What should be tried next: 1. Add missing certs to the default java truststores on both nodes 2. Check what in these `truststore` files, add certs that probably are needed there Unfortunately I haven't yet had time to properly check this (my first attempt of adding them to truststore didn't help, but then it wasn't properly planned), so if you'll have a chance please give it a shot. Here is an article which may help you to understand which cert needs to go where (also check other articles linked from there): [link](https://onemoretech.wordpress.com/2013/02/13/creating-and-installing-a-self-signed-certificate-for-opendj/)

By Gene Liverman user 09 Aug 2016 at 7:50 a.m. CDT

Gene Liverman gravatar
I am glad to hear that this isn't me messing things up! I'll see if I can figure something out on the additional certs but will most likely need your help on that. Thanks again!

By Gene Liverman user 18 Aug 2016 at 10:22 a.m. CDT

Gene Liverman gravatar
Why was this closed? I was under the understanding that Aliaksandr Samuseu was working on fixing the documentation or otherwise providing a workable solution so that we could indeed cluster Gluu.

By Mohib Zico Account Admin 18 Aug 2016 at 10:37 a.m. CDT

Mohib Zico gravatar
Community ticket has no SLA so it can be closed anytime, anyone can still comment even in closed ticket. If Aliaksandr Samuseu has anything to share, he can comment here in closed ticket even.

By Aliaksandr Samuseu staff 18 Aug 2016 at 10:37 a.m. CDT

Aliaksandr Samuseu gravatar
Hi, Gene. Yes, sorry. I'm currently checking it on a new cluster setup of 2 2.4.4 instances. It was probably closed as one of the tickets that haven't get updated for too long (internal routine procedure).

By Gene Liverman user 24 Aug 2016 at 9:48 a.m. CDT

Gene Liverman gravatar
Aliaksandr Samuseu - having any luck with this? Just wanted to check in. Thanks!

By Aliaksandr Samuseu staff 24 Aug 2016 at 6:41 p.m. CDT

Aliaksandr Samuseu gravatar
Hi, Gene. Yes, I did it. Not sure what is the cause of the issue, but interactive mode of `dsreplication` tool fails like that whatever I do. But console mode finally worked. Here is my final command: `/opt/opendj/bin/dsreplication initialize-all --hostname 127.0.0.1 --port 4444 -I 'admin' -w 'GLOB_ADMIN_PASS' -b 'o=gluu' --trustAll --no-prompt`

By Aliaksandr Samuseu staff 24 Aug 2016 at 6:47 p.m. CDT

Aliaksandr Samuseu gravatar
You need to run this on your first node (the one that will be the source for initial file system replication too). The strange part is that it didn't require any certificates/truststore synchronization for OpenDJ at all, it worked right away.

By Mohib Zico Account Admin 25 Aug 2016 at 4:59 a.m. CDT

Mohib Zico gravatar
My two cents... I can't understand why you guys are facing this problem... I just followed same doc ( I think so ) and shipped two clusters last week.... didn't face any issue. If you want, I can do a screencast of my test and share with you ( no ETA though ).

By Aliaksandr Samuseu staff 25 Aug 2016 at 10:06 a.m. CDT

Aliaksandr Samuseu gravatar
Hi, Zico. It's okay. It works in console mode. If Gene will confirm we can just update docs page and recommend to run it this way.

By Aliaksandr Samuseu staff 26 Aug 2016 at 4:58 p.m. CDT

Aliaksandr Samuseu gravatar
Gene, could you confirm whether it works for you too?

By Gene Liverman user 29 Aug 2016 at 7:53 a.m. CDT

Gene Liverman gravatar
I was out sick some last week but hope to try this out today. I will post back as soon as I know something. Thank you!

By Gene Liverman user 30 Aug 2016 at 6:45 a.m. CDT

Gene Liverman gravatar
Yesterday got away from me but I am testing this right now.

By Gene Liverman user 30 Aug 2016 at 9:43 a.m. CDT

Gene Liverman gravatar
I have done testing in Vagrant and things seem good with a couple of minor notes: * I would strongly suggest updating the line in the docs like so: ``` # Current line /opt/opendj/bin/dsreplication initialize-all --hostname 127.0.0.1 --port 4444 -I 'admin' -w 'GLOB_ADMIN_PASS' -b 'o=gluu' --trustAll --no-prompt # New line with clarification on the password to use /opt/opendj/bin/dsreplication initialize-all --hostname 127.0.0.1 --port 4444 -I 'admin' -w 'REPLICATION_ADMIN_PASSWORD' -b 'o=gluu' --trustAll --no-prompt ``` * On the page for cluster setups just before "Csync2 configuration for host-2" it has you run `csync2 -cvvv -N idp2.gluu.org` but I believe it should be `csync2 -cvvv -N idp1.gluu.org`. * On the same page, I suggest updating the cron job to `*/2 * * * * /usr/sbin/csync2 -N idp1.gluu.org -xv 2>/var/log/csync2.log ` instead of listing every minute. Another idea I had was that on the page for installing csync2 on CentOS 7 that you might want to combine the 3 yum install lines together for simplicity. It might also be helpful to have a block under the description with all the commands in it like so: ``` yum -y install epel-release yum -y groupinstall "Development Tools" yum -y install librsync-devel gnutls-devel sqlite-devel cd /root && mkdir building_csync && cd building_csync/ wget http://oss.linbit.com/csync2/csync2-2.0.tar.gz tar -xz -f ./csync2-2.0.tar.gz && cd csync2-2.0/ ./configure --sysconfdir /usr/local/etc/csync2/ && make && make install ``` I am about to try this out on real servers and will post back with my results. If that goes well then I think we can close this out. Thanks again for all the help!

By Gene Liverman user 30 Aug 2016 at 2:47 p.m. CDT

Gene Liverman gravatar
I believe the replication worked, thanks! Now that I have gotten past this part I have hit another wall that I will deal with at [https://support.gluu.org/other/csync2-received-record-packet-of-unknown-type-83-3169/](https://support.gluu.org/other/csync2-received-record-packet-of-unknown-type-83-3169/) Thanks again for sticking with this!

By Aliaksandr Samuseu staff 30 Aug 2016 at 3:18 p.m. CDT

Aliaksandr Samuseu gravatar
Hi, Gene. Thanks for confirmation. This was useful for us too, it's a good test both for docs and for solution itself. >On the page for cluster setups just before "Csync2 configuration for host-2" it has you run csync2 -cvvv -N idp2.gluu.org but I believe it should be csync2 -cvvv -N idp1.gluu.org No, that part is correct. The `-N` key is used to override results of automatic hostname detection, telling it explicitly that it must assume it runs on the specified node. I believe it uses hostname by default as a hint, and in case of Gluu cluster hostnames will be the same for both nodes. So it's better to enforce a specific context when running the tool on each node clearly. I've edited command example as you suggested. I'll check if your other suggestions can be incorporated too. Thank you for you input, Gene.

By Gene Liverman user 30 Aug 2016 at 3:29 p.m. CDT

Gene Liverman gravatar
Awesome, thanks for the feedback!