By: Robert Dinga named 29 Jan 2018 at 2:38 p.m. CST

5 Responses
Robert Dinga gravatar
We're currently running a three-node Gluu cluster setup using the instructions in your documentation [here](https://gluu.org/docs/ce/3.1.1/installation-guide/cluster/). We ran into an issue with deadlocking in LDAP, seemingly around the metrics Gluu stores and updates in LDAP, and were directed to update OpenLDAP to 2.4.45 in [this ticket](https://support.gluu.org/single-sign-on/4903/ldap-entry-persistence-exception/). We've done that and are now seeing LDAP delete the appliance node. It then gets replaced with a "glue" objectClass node and the tree below it does remain intact (configuration, federations, trustRelationships, and so forth). Replication does not correct the node after this happens and the node remains in this state indefinitely. This is occurring even when there is no activity on the server. The following logging in oxauth (and similar in identity) pointed us at the issue - ``` 2018-01-29 01:15:38,529 DEBUG [oxAuthScheduler_Worker-1] [org.xdi.service.timer.RequestJobListener] (RequestJobListener.java:52) - Bound request started 2018-01-29 01:15:38,530 DEBUG [oxAuthScheduler_Worker-1] [org.xdi.service.timer.TimerJob] (TimerJob.java:34) - Fire timer event [org.xdi.service.cdi.event.ConfigurationEvent] with qualifiers [@org.xdi.s ervice.cdi.event.Scheduled()] 2018-01-29 01:15:38,530 DEBUG [oxAuthScheduler_Worker-1] [org.xdi.service.timer.RequestJobListener] (RequestJobListener.java:62) - Bound request ended 2018-01-29 01:15:38,598 DEBUG [oxAuthScheduler_Worker-2] [org.xdi.service.timer.RequestJobListener] (RequestJobListener.java:52) - Bound request started 2018-01-29 01:15:38,599 DEBUG [oxAuthScheduler_Worker-2] [org.xdi.service.timer.TimerJob] (TimerJob.java:34) - Fire timer event [org.xdi.oxauth.service.cdi.event.AuthConfigurationEvent] with qualifiers [@org.xdi.service.cdi.event.Scheduled()] 2018-01-29 01:15:38,599 DEBUG [oxAuthScheduler_Worker-2] [org.xdi.service.timer.RequestJobListener] (RequestJobListener.java:62) - Bound request ended 2018-01-29 01:15:38,599 TRACE [Thread-18126] [org.xdi.oxauth.service.AppInitializer] (AppInitializer.java:256) - Attempting to use ldapEntryManager: org.gluu.site.ldap.OperationsFacade@8d40f07 2018-01-29 01:15:38,600 ERROR [Thread-18126] [org.xdi.oxauth.service.AppInitializer] (AppInitializer.java:537) - Failed to load appliance entry from Ldap org.gluu.site.ldap.persistence.exception.EntryPersistenceException: Failed to find entry: inum=@!49B1.16CA.AFD9.4147!0002!3A18.DD8B,ou=appliances,o=gluu at org.gluu.site.ldap.persistence.AbstractEntryManager.find(AbstractEntryManager.java:444) ~[oxcore-ldap-3.1.1.Final.jar:?] at org.gluu.site.ldap.persistence.AbstractEntryManager.find(AbstractEntryManager.java:381) ~[oxcore-ldap-3.1.1.Final.jar:?] at org.gluu.site.ldap.persistence.LdapEntryManager$Proxy$_$$_WeldClientProxy.find(Unknown Source) ~[oxcore-ldap-3.1.1.Final.jar:?] at org.xdi.oxauth.service.AppInitializer.loadAppliance(AppInitializer.java:535) [classes/:?] at org.xdi.oxauth.service.AppInitializer.loadLdapIdpAuthConfigs(AppInitializer.java:486) [classes/:?] at org.xdi.oxauth.service.AppInitializer.loadLdapAuthConfigs(AppInitializer.java:563) [classes/:?] at org.xdi.oxauth.service.AppInitializer.reloadConfiguration(AppInitializer.java:257) [classes/:?] at org.xdi.oxauth.service.AppInitializer.reloadConfigurationTimerEvent(AppInitializer.java:244) [classes/:?] at org.xdi.oxauth.service.AppInitializer$Proxy$_$$_WeldSubclass.reloadConfigurationTimerEvent$$super(Unknown Source) [classes/:?] at sun.reflect.GeneratedMethodAccessor153.invoke(Unknown Source) ~[?:?] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_112] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_112] at org.jboss.weld.interceptor.proxy.TerminalAroundInvokeInvocationContext.proceedInternal(TerminalAroundInvokeInvocationContext.java:51) [weld-core-impl-3.0.0.Final.jar:3.0.0.Final] at org.jboss.weld.interceptor.proxy.AroundInvokeInvocationContext.proceed(AroundInvokeInvocationContext.java:78) [weld-core-impl-3.0.0.Final.jar:3.0.0.Final] at org.xdi.service.cdi.async.AsynchronousInterceptor$1.get(AsynchronousInterceptor.java:36) [oxcore-service-3.1.1.Final.jar:?] at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) [?:1.8.0_112] at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112] 2018-01-29 01:15:38,605 DEBUG [Thread-18126] [org.xdi.oxauth.service.AppInitializer] (AppInitializer.java:399) - Attempting to destroy ldapAuthEntryManager: org.gluu.site.ldap.persistence.LdapEntryManager@37cab289 2018-01-29 01:15:38,615 DEBUG [Thread-18126] [org.xdi.oxauth.service.AppInitializer] (AppInitializer.java:401) - Destroyed ldapAuthEntryManager: org.gluu.site.ldap.persistence.LdapEntryManager@37cab289 2018-01-29 01:15:38,615 DEBUG [Thread-18126] [org.xdi.oxauth.service.AppInitializer] (AppInitializer.java:363) - Attempting to destroy ldapAuthEntryManager: org.gluu.site.ldap.persistence.LdapEntryManager@37cab289 2018-01-29 01:15:38,616 DEBUG [Thread-18126] [org.xdi.oxauth.service.AppInitializer] (AppInitializer.java:365) - Destroyed ldapAuthEntryManager: org.gluu.site.ldap.persistence.LdapEntryManager@37cab289 2018-01-29 01:15:38,616 INFO [Thread-18126] [org.xdi.oxauth.service.AppInitializer] (AppInitializer.java:371) - Recreated instance ldapAuthEntryManager: [] 2018-01-29 01:15:38,619 ERROR [Thread-18126] [org.xdi.oxauth.service.AppInitializer] (AppInitializer.java:537) - Failed to load appliance entry from Ldap org.gluu.site.ldap.persistence.exception.EntryPersistenceException: Failed to find entry: inum=@!49B1.16CA.AFD9.4147!0002!3A18.DD8B,ou=appliances,o=gluu at org.gluu.site.ldap.persistence.LdapEntryManager.find(LdapEntryManager.java:264) ~[oxcore-ldap-3.1.1.Final.jar:?] at org.gluu.site.ldap.persistence.AbstractEntryManager.find(AbstractEntryManager.java:444) ~[oxcore-ldap-3.1.1.Final.jar:?] at org.gluu.site.ldap.persistence.AbstractEntryManager.find(AbstractEntryManager.java:381) ~[oxcore-ldap-3.1.1.Final.jar:?] at org.gluu.site.ldap.persistence.LdapEntryManager$Proxy$_$$_WeldClientProxy.find(Unknown Source) ~[oxcore-ldap-3.1.1.Final.jar:?] at org.xdi.oxauth.service.AppInitializer.loadAppliance(AppInitializer.java:535) [classes/:?] at org.xdi.oxauth.service.AppInitializer.setDefaultAuthenticationMethod(AppInitializer.java:509) [classes/:?] at org.xdi.oxauth.service.AppInitializer.reloadConfiguration(AppInitializer.java:263) [classes/:?] at org.xdi.oxauth.service.AppInitializer.reloadConfigurationTimerEvent(AppInitializer.java:244) [classes/:?] at org.xdi.oxauth.service.AppInitializer$Proxy$_$$_WeldSubclass.reloadConfigurationTimerEvent$$super(Unknown Source) [classes/:?] at sun.reflect.GeneratedMethodAccessor153.invoke(Unknown Source) ~[?:?] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_112] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_112] at org.jboss.weld.interceptor.proxy.TerminalAroundInvokeInvocationContext.proceedInternal(TerminalAroundInvokeInvocationContext.java:51) [weld-core-impl-3.0.0.Final.jar:3.0.0.Final] at org.jboss.weld.interceptor.proxy.AroundInvokeInvocationContext.proceed(AroundInvokeInvocationContext.java:78) [weld-core-impl-3.0.0.Final.jar:3.0.0.Final] at org.xdi.service.cdi.async.AsynchronousInterceptor$1.get(AsynchronousInterceptor.java:36) [oxcore-service-3.1.1.Final.jar:?] at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) [?:1.8.0_112] at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112] ``` This repeats every 30 seconds. The run previous to this, at 01:15:08, had no errors. In the ldap.log, I see the following: ``` Jan 29 01:15:08 sso-lab slapd[8769]: slap_queue_csn: queueing 0x7f6eec15bf10 20180129011508.526875Z#000000#003#000000 Jan 29 01:15:08 sso-lab slapd[8769]: syncrepl_message_to_op: rid=003 tid fb276700 Jan 29 01:15:08 sso-lab slapd[8769]: slap_queue_csn: queueing 0x7f6eec18f8d0 20180129011508.526875Z#000000#003#000000 Jan 29 01:15:08 sso-lab slapd[8769]: syncprov_matchops: skipping original sid 003 Jan 29 01:15:08 sso-lab slapd[8769]: slap_graduate_commit_csn: removing 0x7f6eec18f8d0 20180129011508.526875Z#000000#003#000000 Jan 29 01:15:08 sso-lab slapd[8769]: slap_graduate_commit_csn: removing 0x7f6ed811ac40 20180128234339.186916Z#000000#003#000000 Jan 29 01:15:08 sso-lab slapd[8769]: syncrepl_message_to_op: rid=003 be_add uniqueIdentifier=1516975806076,ou=201801,ou=oxauth,ou=@!49B1.16CA.AFD9.4147!0002!3A18.DD8B,ou=metric,o=@!49B1.16CA.AFD9.4147!0001!5018.C88B,o=gluu (0) Jan 29 01:15:08 sso-lab slapd[8769]: slap_queue_csn: queueing 0x7f6eec17b060 20180129011508.526875Z#000000#003#000000 Jan 29 01:15:08 sso-lab slapd[8769]: syncprov_sendresp: to=001, cookie=rid=002,sid=002,csn=20180129011508.526875Z#000000#003#000000 Jan 29 01:15:08 sso-lab slapd[8769]: slap_queue_csn: queueing 0x7f6eec1b3610 20180129011508.526875Z#000000#003#000000 Jan 29 01:15:08 sso-lab slapd[8769]: do_syncrep2: rid=001 cookie=rid=001,sid=001 Jan 29 01:15:08 sso-lab slapd[8769]: slap_queue_csn: queueing 0x7f6ee014d780 20180129011508.526875Z#000000#003#000000 Jan 29 01:15:08 sso-lab slapd[8769]: syncrepl_message_to_op: rid=001 tid fa274700 Jan 29 01:15:08 sso-lab slapd[8769]: slap_graduate_commit_csn: removing 0x7f6eec1b3610 20180129011508.526875Z#000000#003#000000 Jan 29 01:15:08 sso-lab slapd[8769]: slap_graduate_commit_csn: removing 0x7f6ed812ddf0 20180128234339.186916Z#000000#003#000000 Jan 29 01:15:08 sso-lab slapd[8769]: do_syncrep2: rid=003 cookie=rid=003,sid=003,csn=20180129011508.538134Z#000000#003#000000 Jan 29 01:15:08 sso-lab slapd[8769]: slap_queue_csn: queueing 0x7f6eec146af0 20180129011508.538134Z#000000#003#000000 Jan 29 01:15:08 sso-lab slapd[8769]: syncrepl_message_to_op: rid=003 tid fb276700 Jan 29 01:15:08 sso-lab slapd[8769]: slap_graduate_commit_csn: removing 0x7f6ee417f990 20180129003419.792722Z#000000#001#000000 Jan 29 01:15:08 sso-lab slapd[8769]: syncrepl_message_to_op: rid=001 be_add uniqueIdentifier=1516975806076,ou=201801,ou=oxauth,ou=@!49B1.16CA.AFD9.4147!0002!3A18.DD8B,ou=metric,o=@!49B1.16CA.AFD9.4147!0001!5018.C88B,o=gluu (68) Jan 29 01:15:08 sso-lab slapd[8769]: do_syncrep2: rid=001 delta-sync lost sync on (reqStart=20180129011508.000001Z,cn=accesslog), switching to REFRESH Jan 29 01:15:08 sso-lab slapd[8769]: slap_queue_csn: queueing 0x7f6eec1809b0 20180129011508.538134Z#000000#003#000000 Jan 29 01:15:08 sso-lab slapd[8769]: syncprov_matchops: skipping original sid 003 Jan 29 01:15:08 sso-lab slapd[8769]: slap_graduate_commit_csn: removing 0x7f6eec1809b0 20180129011508.538134Z#000000#003#000000 Jan 29 01:15:08 sso-lab slapd[8769]: syncprov_sendresp: to=001, cookie=rid=002,sid=002,csn=20180129011508.538134Z#000000#003#000000 Jan 29 01:15:08 sso-lab slapd[8769]: slap_graduate_commit_csn: removing 0x7f6ed810fdf0 20180128234439.185303Z#000000#003#000000 Jan 29 01:15:08 sso-lab slapd[8769]: syncrepl_message_to_op: rid=003 be_add uniqueIdentifier=1516975806077,ou=201801,ou=oxauth,ou=@!49B1.16CA.AFD9.4147!0002!3A18.DD8B,ou=metric,o=@!49B1.16CA.AFD9.4147!0001!5018.C88B,o=gluu (0) Jan 29 01:15:08 sso-lab slapd[8769]: slap_queue_csn: queueing 0x7f6eec145de0 20180129011508.538134Z#000000#003#000000 Jan 29 01:15:08 sso-lab slapd[8769]: slap_queue_csn: queueing 0x7f6eec15b750 20180129011508.538134Z#000000#003#000000 Jan 29 01:15:08 sso-lab slapd[8769]: slap_graduate_commit_csn: removing 0x7f6eec15b750 20180129011508.538134Z#000000#003#000000 Jan 29 01:15:08 sso-lab slapd[8769]: slap_graduate_commit_csn: removing 0x7f6ed8127010 20180128234439.185303Z#000000#003#000000 Jan 29 01:15:08 sso-lab slapd[8769]: do_syncrep2: rid=001 LDAP_RES_INTERMEDIATE - SYNC_ID_SET Jan 29 01:15:08 sso-lab slapd[8769]: slap_queue_csn: queueing 0x7f6ee0134b20 20180129011419.835082Z#000000#001#000000 Jan 29 01:15:08 sso-lab slapd[8769]: slap_graduate_commit_csn: removing 0x7f6ed8127090 20180129003519.793613Z#000000#001#000000 Jan 29 01:15:08 sso-lab slapd[8769]: syncrepl_del_nonpresent: rid=001 be_delete inum=@!49B1.16CA.AFD9.4147!0002!3A18.DD8B,ou=appliances,o=gluu (66) Jan 29 01:15:08 sso-lab slapd[8769]: slap_queue_csn: queueing 0x7f6ee01b89c0 20180129011419.835082Z#000000#001#000000 Jan 29 01:15:08 sso-lab slapd[8769]: syncprov_matchops: skipping original sid 001 Jan 29 01:15:08 sso-lab slapd[8769]: slap_graduate_commit_csn: removing 0x7f6ee01b89c0 20180129011419.835082Z#000000#001#000000 Jan 29 01:15:08 sso-lab slapd[8769]: slap_graduate_commit_csn: removing 0x7f6ed8108f00 20180129003519.793613Z#000000#001#000000 Jan 29 01:15:08 sso-lab slapd[8769]: slap_graduate_commit_csn: removing 0x7f6ee4168590 20180129003619.816498Z#000000#001#000000 Jan 29 01:15:08 sso-lab slapd[8769]: do_syncrep2: rid=001 LDAP_RES_SEARCH_RESULT Jan 29 01:15:08 sso-lab slapd[8769]: do_syncrep2: rid=001 cookie=rid=001,sid=001,csn=20180129011419.835082Z#000000#001#000000;20180129011440.933006Z#000000#002#000000;20180129011439.190719Z#000000#003#000000 Jan 29 01:15:08 sso-lab slapd[8769]: syncprov_sendresp: to=003, cookie=rid=002,sid=002 Jan 29 01:15:08 sso-lab slapd[8769]: do_syncrep2: rid=001 LDAP_RES_INTERMEDIATE - REFRESH_DELETE Jan 29 01:15:08 sso-lab slapd[8769]: do_syncrep2: rid=003 cookie=rid=003,sid=003 Jan 29 01:15:08 sso-lab slapd[8769]: slap_queue_csn: queueing 0x7f6ed812a9b0 20180129011419.835082Z#000000#001#000000 Jan 29 01:15:08 sso-lab slapd[8769]: syncrepl_message_to_op: rid=003 tid f0ffc700 Jan 29 01:15:08 sso-lab slapd[8769]: syncrepl_op_modify: rid=003 entryCSN too old, ignoring 20180129011419.835082Z#000000#001#000000 (inum=@!49B1.16CA.AFD9.4147!0002!3A18.DD8B,ou=appliances,o=gluu) Jan 29 01:15:08 sso-lab slapd[8769]: slap_graduate_commit_csn: removing 0x7f6ed811a990 20180128234508.526464Z#000000#003#000000 Jan 29 01:15:08 sso-lab slapd[8769]: syncrepl_message_to_op: rid=003 be_modify inum=@!49B1.16CA.AFD9.4147!0002!3A18.DD8B,ou=appliances,o=gluu (0) ``` This line stands out to me, about 2/3 of the way down - ``` Jan 29 01:15:08 sso-lab slapd[8769]: syncrepl_del_nonpresent: rid=001 be_delete inum=@!49B1.16CA.AFD9.4147!0002!3A18.DD8B,ou=appliances,o=gluu (66) ``` Google searches around syncrepl_del_nonpresent and the glue objectClass are returning results from ~10 years ago and fixes back in OpenLDAP versions 2.3.xx. Let me know if any additional logging or configs would help. Thanks!

By Chris Blanton user 29 Jan 2018 at 3:08 p.m. CST

Chris Blanton gravatar
Robert, We've noticed the same issue with OpenLDAP replication and are working with Symas to find a fix for the bugs. There's no current ETA but we are monitoring the situation closely. Because of this, we are now recommending Gluu's fork of OpenDJ for replication in Gluu 3.1.2. In addition, we're just about to release a new tool called Cluster Manager that simplifies the process of configuring Gluu for HA, LDAP replication, file system replication, monitoring and more. However, this tool will only be available to organizations with an active Gluu support contract. If you'd like to discuss further, just [schedule a call with us](https://gluu.org/booking). Thanks, Chris

By Robert Dinga named 29 Jan 2018 at 3:31 p.m. CST

Robert Dinga gravatar
Hi Chris, Thanks for your response. Is there documentation from Gluu on setting up replication with OpenDJ? The 3.1.1 and 3.1.2 docs reference symas. Would turning off the metrics in Gluu - maxing out metricReporterInterval and turning off updateApplianceStatus - help any? I'm thinking this would cut down on updates to the appliance node.

By Robert Dinga named 30 Jan 2018 at 9:56 a.m. CST

Robert Dinga gravatar
Also, regarding the fix from Symas for OpenLDAP, is there any guess on the timing - days, weeks, months? We're looking to be production ready soon and are trying to determine if it is worth the time to switch to OpenDJ (and back again) or just wait out the fix. This is also assuming that Gluu will be dropping support for OpenDJ again sometime after this fix. Is that a safe guess or will OpenDJ continue to be supported for some time?

By Chris Blanton user 30 Jan 2018 at 11:45 a.m. CST

Chris Blanton gravatar
Robert, We're still waiting for Symas to get back to us with a solid fix, but at this time we have no ETA. Regarding OpenDJ, we are going to continue supporting our fork as much as we can into the future. Also the 3.1.2 docs are updated.

By Cedric Ferraris user 27 Feb 2018 at 8:14 a.m. CST

Cedric Ferraris gravatar
Hi Chris, Any update on a fix from Symas? It's been almost a month since the last post on this ticket. Thanks