Multiple Atlassian services are experiencing issues

Incident Report for Admin Experience

Postmortem

All dates and times below are in UTC unless stated otherwise.

Summary

On May 8, 2026 between 00:22 and 06:08, one of our hosting providers suffered a significant incident in a specific availability zone in prod-east which led to Atlassian customers experiencing degraded performance and delays of background operations and automation execution.

The incident started on May 8, 2026 at 00:22 and was detected within 4 minutes by automated monitoring systems. Our teams worked to restore core access by 06:08. Final cleanup of backlogged processes and minor issues progressed in stages from there was completed iteratively by 19:15.

IMPACT

The primary infrastructure affected in this incident was the event processing pipeline in the prod-east region, which distributes events between Atlassian services and underpins background operations such as automation execution, search indexing, notifications, permission synchronisation.

Between 00:22 and 06:08, an infrastructure incident in our hosting provider triggered an ingestion failure in our event processing pipeline.
At 02:50, event ingestion was failed over to an unaffected availability zone, progressively restoring live event flows.
At 06:08, reliability for new ingestion in prod-east recovered to 100%. The remaining work was to drain the accumulated cross-region backlog of messages, which completed by 17:00.
By 18:48, Automation had processed their backlog of events that were created while processing

Automation

Between 00:22 and 02:50, customers with automation rules triggered by events originating from the prod-east region experienced a significant reduction in rule executions. During this window, event-triggered automation rules were not firing because the events that trigger them were not being delivered. Rule authoring, saving, and rules triggered manually, by schedules, or by webhooks were not affected.

At 02:50, the event processing infrastructure failed over to an unaffected availability zone, restoring delivery of live events to Automation and allowing new event-triggered rules to begin executing normally. However, events generated during the impact window still needed to be replayed before delayed automations could be processed.

Beginning at 08:28, upstream services replayed their queued events in a coordinated sequence, and all replayed events were processed by 18:48. During the replay window, customers may have experienced automation rules executing later than expected, a small number of rules reaching daily processing limits due to compressed replay, and time-sensitive rules not completing as expected if internal timeout thresholds were exceeded.

Jira and Jira Service Management

Between 00:22 and 02:50, customers with tenants hosted in the prod-east region experienced disruption to Jira and Jira Service Management event-driven features like automation, along with a short period of elevated errors during infrastructure failover. Core Jira experiences, including issue view, boards, and project navigation, remained available throughout the incident.

Jira event delivery was affected by the primary impact, preventing downstream services from receiving issue lifecycle events. This affected automation rules triggered by Jira events, AI agent orchestration in Jira, notifications for issue updates and transitions, search indexing for newly created or modified issues, and event-driven integrations between Jira and other Atlassian products.

At 02:50, the event processing infrastructure failed over to an unaffected availability zone, restoring delivery of new events. All events generated during the impact window were retained in a recovery queue and required replaying. This began at 08:28 and completed at 12:00. During the replay window, customers may have experienced automation rules executing later than expected, delayed notifications arriving hours after the triggering action, temporary gaps in search results for content created or modified during the impact window, and AI agent workflows not completing as expected where internal timeout thresholds were exceeded.

Confluence

Between 00:22 and 02:50, customers with tenants hosted in the prod-east region experienced disruptions to event-driven services in Confluence. This resulted in delays to search indexing, notifications, automation rule execution, and permission synchronisation.

The underlying event processing infrastructure failed over to an unaffected availability zone, after which live Confluence operations resumed normally. However, events generated during the impact window were queued for replay, and some background services remained delayed until that replay and related validation work completed.

Between 10:14 and 17:00, a bulk replay of all the queued tenant replay tasks was completed to restore data consistency. During and immediately after the replay window, customers may have experienced search results not reflecting content created or modified during the outage, delayed or missing notifications for page and comment activity, automation rules firing later than expected, and brief delays in permission synchronisation for tenants relying on incremental identity sync.

Bitbucket and Pipelines

Between 00:22 and 06:08, customers using Bitbucket and Pipelines experienced failures and degraded functionality across event-driven workflows. Core Git operations, including push, pull, and clone, were not affected and continued to operate normally throughout the incident.

Automatic pipeline triggers initiated by push or pull request events were unavailable during the impact window. Merge queues, custom merge checks, Forge-based triggers, workspace permission changes, and some workspace provisioning flows were also affected. Customers using merge queues were unable to merge pull requests, and some pipeline steps failed because queued work contributed to elevated concurrency limits.

At approximately 03:57, Pipelines was reconfigured to consume events through an alternative path, restoring automatic pipeline triggering. Merge queues, custom merge checks, Forge triggers, and other affected workflows were progressively restored as the underlying event processing infrastructure recovered. All Bitbucket and Pipelines services were confirmed fully operational by 06:08. After recovery, queued events were reviewed and replayed where safe to restore data consistency for billing, audit logging, and other background processes.

Identity Services

Between 00:22 and 02:50, customers with tenants hosted in the prod-east region experienced delays in the propagation of identity and group membership changes to downstream Atlassian products. Core identity operations, including authentication, login, and direct group management actions, were not affected and continued to function normally throughout the incident.

The impact was limited to asynchronous, event-driven operations that depend on the event processing pipeline. This included delays in delivering group membership and user profile changes to products such as Jira and Confluence, which affected downstream permission synchronisation and crowd sync flows. A small number of SCIM-based identity synchronisation and site provisioning workflows also experienced temporary delays.

After the event processing infrastructure recovered, backed-up identity and group directory events were replayed where required, restoring downstream consistency for affected products. No identity data was lost. Group membership changes, user profile updates, and provisioning-related events that occurred during the impact window were retained and processed after recovery.

REMEDIAL ACTIONS PLAN & NEXT STEPS

We know outages impact your productivity. While our monitoring and recovery processes helped us respond quickly, this incident highlighted opportunities to further strengthen resilience for event-driven services.

We are prioritizing improvements that will:

Enhance failover coverage so critical event processing can recover more smoothly during infrastructure disruptions.
Strengthen recovery handling so replayed events can be processed more quickly.

We apologize to customers whose services were impacted during this incident; we are taking immediate steps to improve the platform’s performance and availability.

Thanks,

Atlassian Customer Support.

Posted May 20, 2026 - 06:49 UTC

Resolved

On May 8, 2026, some customers utilizing Atlassian products experienced elevated error rates and degraded performance. The issue has now been resolved, and the service is operating normally for all affected customers.

Posted May 08, 2026 - 19:45 UTC

Update

Our services are now fully operational. We continue to replay any events that were missed during the incident, and are making good progress. We will provide a further update once replays are complete. If you experience any ongoing issues, please contact our support team.

We apologise for the disruption and thank you for your patience.

Posted May 08, 2026 - 11:12 UTC

Update

We continue to monitor the situation as services recover. We are currently in the process of clearing the backlog of queued events. We will provide a further update in approximately one hour.

Posted May 08, 2026 - 08:03 UTC

Monitoring

The underlying issue in public infrastructure which affected asynchronous event processing has been mitigated and all the affected services are recovering. We are now working on clearing the backlog of queued events, which means some actions (such as notifications, automation triggers, and data syncs) may be in degraded state. We will continue to monitor and provide updates as the backlog is cleared.

Posted May 08, 2026 - 05:53 UTC

Update

We are continuing to work with our public cloud provider to mitigate this issue. We are starting to see some recovery in regions outside of Eastern USA, however, users globally may still be experiencing issues with certain product features. These are listed at the bottom of each product page.

Posted May 08, 2026 - 04:10 UTC

Update

Our teams continue to work on mitigating the infrastructure outage from our public cloud provider. We will provide further updates when they are available.

Posted May 08, 2026 - 04:10 UTC

Update

We have identified that the root cause of the issue is related to an infrastructure outage from our public cloud provider. We are working closely with them to mitigate this issue. We will provide further updates when they become available.

Posted May 08, 2026 - 04:10 UTC

Update

Our teams continue to work on mitigating the infrastructure outage from our public cloud provider. We will provide further updates when they are available.

Posted May 08, 2026 - 03:00 UTC

Update

Posted May 08, 2026 - 03:00 UTC

Identified

Posted May 08, 2026 - 02:15 UTC

Investigating

We are experiencing issues with multiple Atlassian products. Our teams are investigating further and more updates including will be shared within 1 hour.

Posted May 08, 2026 - 01:31 UTC

This incident affected: User Management, Org Management, Access & Security, Custom Domains, Admin Navigation & Discovery, Billing & Provisioning, Atlassian Intelligence (AI), Compliance & Audit, and API Gateways.