By: Marlon Martínez user 10 Jan 2023 at 6:12 p.m. CST

6 Responses
Marlon Martínez gravatar
we are facing some issues related to Shibboleth SAML IDP service on our Gluu QA installation. We followed the post installation instructions to install the service and everything was fine until we start experiencing server outages every 4-5 days. To do some tests, we recently installed a new brand Gluu Community Server and enabled the Shibboleth SAML IDP at the beggining of the installation. Everything was fine with the installation and the server was responding very well, we were not experiencing any server outages until we migrate all our users and OpenID clients from our QA environment. We migrated all this data to this new server to make login tests and see how it worked but know we are experiencing the same issue we have on our QA server, server outages every 4-5 days. Maybe it is important to let you know that the users were migrated by extracting the users ldap entries from our QA environment and importing them to the new server using Apache Directory Studio, the OpenID clients were created as usual using the oxTrust panel. One thing we noticed is that the FREE MEMORY value decreased after the data migration. We were checking the log files and these are some issues we found: /opt/gluu/jetty/oxauth/logs/oxauth-2022-12-01-2.log 2022-12-01 15:26:21,236 ERROR [Thread-67841] [org.gluu.oxauth.service.CleanerTimer] (CleanerTimer.java:217) - Failed to perform clean up. /opt/gluu/jetty/oxauth/logs/2022_12_01.jetty.log 2022-12-01 21:10:36,214 ERROR [qtp966739377-18] [org.gluu.oxauth.exception.GlobalExceptionHandler] (GlobalExceptionHandler.java:50) - Committed 2022-12-01 21:10:36,241 ERROR [qtp966739377-18] [org.gluu.oxauth.exception.GlobalExceptionHandler] (GlobalExceptionHandler.java:69) - Can't perform redirect to viewId: /error_service.htm 2022-12-01 21:11:09,957 ERROR [qtp966739377-18] [org.gluu.oxauth.uma.service.UmaRptService] (UmaRptService.java:121) - Failed to find entry: tknCde=5194b5772258bd4193740f5ad36a564ecbe914fe4d043c09a76f523f04eab239,ou=uma_rpt,ou=tokens,o=gluu 2022-12-01 21:12:10,049 ERROR [qtp966739377-18] [org.gluu.oxauth.uma.service.UmaRptService] (UmaRptService.java:121) - Failed to find entry: tknCde=5194b5772258bd4193740f5ad36a564ecbe914fe4d043c09a76f523f04eab239,ou=uma_rpt,ou=tokens,o=gluu 2022-12-01 21:13:10,075 ERROR [qtp966739377-18] [org.gluu.oxauth.uma.service.UmaRptService] (UmaRptService.java:121) - Failed to find entry: tknCde=5194b5772258bd4193740f5ad36a564ecbe914fe4d043c09a76f523f04eab239,ou=uma_rpt,ou=tokens,o=gluu 2022-12-01 21:14:10,145 ERROR [qtp966739377-19] [org.gluu.oxauth.uma.service.UmaRptService] (UmaRptService.java:121) - Failed to find entry: tknCde=5194b5772258bd4193740f5ad36a564ecbe914fe4d043c09a76f523f04eab239,ou=uma_rpt,ou=tokens,o=gluu 2022-12-01 21:15:10,223 ERROR [qtp966739377-20] [org.gluu.oxauth.uma.service.UmaRptService] (UmaRptService.java:121) - Failed to find entry: tknCde=5194b5772258bd4193740f5ad36a564ecbe914fe4d043c09a76f523f04eab239,ou=uma_rpt,ou=tokens,o=gluu /opt/gluu/jetty/oxauth/logs/2022_12_01.jetty.log 2022-12-01 23:35:17,912 INFO [qtp966739377-19] [org.gluu.oxauth.auth.AuthenticationFilter] (AuthenticationFilter.java:473) - JWT authentication failed: {} /opt/gluu/jetty/scim/logs/scim.log 01-12 15:26:21.212 ERROR [Thread-25778] oxtrust.service.init.ConfigurationFactory ConfigurationFactory.java:50- Failed to load configuration from LDAP Could you please help us to understand what is happening? Thank you for your support! Regards,

By Mohib Zico staff 10 Jan 2023 at 9:21 p.m. CST

Mohib Zico gravatar
Hi, >> Maybe it is important to let you know that the users were migrated by extracting the users ldap entries from our QA environment and importing them to the new server using Apache Directory Studio, How about pulling user's information from old server to new server with "Cache Refresh"? Also, what type of deployment it is? Is it cluster or single instance server?

By Marlon Martínez user 11 Jan 2023 at noon CST

Marlon Martínez gravatar
Hi Mohib, Thank you for your reply! Both servers (the previous one and the brand new) are single instance servers, with the same resources. We don't know if our instance resources are insufficient or why are we experiencing these server outages. We have not tried the Cache Refresh method yet. Regards

By Mohib Zico staff 16 Jan 2023 at 9:27 a.m. CST

Mohib Zico gravatar
Hello Marlon, >> We don't know if our instance resources are insufficient or why are we experiencing these server outages. You are using Gluu Server for quite some time I believe. So, I didn't bother you asking about your resources. :-) Anyways... - How much CPU, memory, disk space you have? - What's the userbase looks like? How many active users you have? - What's the authentication request / min or sec you get in your peak hours? - Can you please check what's the value of `cleanServiceInterval`?

By Marlon Martínez user 16 Jan 2023 at 2:13 p.m. CST

Marlon Martínez gravatar
Hi Mohib, Thank you for your response! I think it is good to let you know that this issue is in our QA environment. We'll enable and set up a SSO integration for our PRODUCTION environment the nexts months that's why we installed the Shibboleth IDP service on our QA environment first to begin with our tests. Our main concern is to experience this same issue on our PRODUCTION environment once we install the Shibboleth IDP service. These are the details of our QA & PRODUCTION environments: **QA** CPU: 2 Memory: 4GB Disk space: 60GB Active users: 14 users (these are the users that continually log in for testing) Authentication request time: 5 authentication requests per hour cleanServiceInterval property: oxTrust 300, oxAuth 60 Swap enabled **PROD** CPU: 4 Memory: 16GB Disk space: 60GB Active users: +5,000 Authentication request time: +500 authentication requests per hour cleanServiceInterval property: oxTrust 300, oxAuth -1 Swap enabled Thank you for all your help! Regards, Marlon

By Mohib Zico staff 16 Jan 2023 at 8:59 p.m. CST

Mohib Zico gravatar
Memory and CPU for QA is certainly low. You should increase it to at least 8 GB physical memory. `cleanServiceInterval` property looks good ( I am little sceptical about Prod's value. It's negative and negative means Cache Cleaning service is stopped ).

By Marlon Martínez user 17 Jan 2023 at 4:28 p.m. CST

Marlon Martínez gravatar
But do you think the Memory and CPU values are the reason of this outage? I think the cleanServiceInterval property is negative on our PRODUCTION environment due to performance tunning.