By: Helen Echebiri user 12 Dec 2024 at 3:58 p.m. CST

3 Responses
Helen Echebiri gravatar
We have been experiencing service downtime on the Igree platform within the last couple of days. We are only able to achieve a short interlude of respite occasionally even after a complete server restart, and after a couple of minutes, the issue resurfaces. Also, the API is irresponsive when we attempt to reach the idsandbox oxauth and Token URLs We require your indulgence to review and guide on the next steps of action. Attached are logs captured today to aid your review. Please note that the platform is intermittently inaccessible.

By Mohib Zico Account Admin 12 Dec 2024 at 9:09 p.m. CST

Mohib Zico gravatar
Hello Helen, Seems like your created ticket is `public` and you won't be able to use "attachment" feature here. Most probably you are not "entitled" as "Named User" in Gluu support system. Can you please ask someone from your "Application Support" team to create a new ticket? [ Screenshot attached ]. In the meantime, we are checking again your log you shared in email.

By Mohib Zico Account Admin 18 Dec 2024 at 11:22 p.m. CST

Mohib Zico gravatar
Hello again, We have analyzed your logs and seems like there might be couple of possible causes of this intermittent downtime: - Infrastructure Issues: - Resource contention (e.g., memory, CPU, or I/O bottlenecks). - Network instability affecting API endpoints or backend services (LDAP, database). - Cache and Session Management: - Inefficient cache usage, leading to frequent lookups in the database or LDAP directory, causing delays or overload. - LDAP and Persistence: - Null LDAP responses and delays in fetching user data could cause failures in authentication and token issuance. - High Request Volume: - Increased API traffic during specific periods could exacerbate the above issues, causing system-wide instability. I think server is facing [same](https://support.gluu.org/outages/11796/inaccessibility-of-the-gluuigree-application/#at89469) issue again. Can you please share three things: - What is the output of `dsreplication status` ? - When server is down, which process is taking most of the CPU and memory?

By Aliaksandr Samuseu staff 20 Dec 2024 at 8:38 a.m. CST

Aliaksandr Samuseu gravatar
Hi. We've tried to reach NIBSS team over the email regarding the follow up call we agreed to have today, but no answer so far. Let us know whether or not you still need to have it today, or when would you like to have it if it has to be another day. As a quick update, we haven't been able to find any related errors in logs captured during the previous call. The failure we observed back then apparently happened before the request even reached oxAuth. As these failures seem to coincide with connection resets errors appearing in the network traces, our assumption will be they are caused either by network issues, or resource depletion of some sort (the later makes even more sense as you mentioned restarts make the issue disappear for some time). Things like file descriptor limits need to be checked, we also need to double-check if Apache at Gluu nodes or nginx at the LB node are healthy and are able to process HTTP requests (check their logs for any errors for a start).