Today's infra outage

Hi *,

a small update on the infra outage today (affecting bugzilla and
askbot mostly, and for some time also other services):

The cause has been identified (gluster's self-heal failed after a
hypervisor went down) and infra is working on restoring the remaining
services.
That might still take until later tonight to avoid creating further
inconsistencies..

keep an eye on
https://status.documentfoundation.org/ for further information and updates

ciao
Christian

Hi *,

a small update on the infra outage today (affecting bugzilla and
askbot mostly, and for some time also other services):

bugzilla and askbot have been restored.
Unfortunately bugzilla's db had to be restored from backup, so not all
comments/updates to bugs are reflected yet (timeframe of ~11 hours
between backup and begin of outage, around 80 updates) - they'll be
restored from the mail notifications tomorrow.

ciao
Christian

Hello,

Christian Lohmaier wrote:

a small update on the infra outage today (affecting bugzilla and
askbot mostly, and for some time also other services):

bugzilla and askbot have been restored.
Unfortunately bugzilla's db had to be restored from backup, so not all
comments/updates to bugs are reflected yet (timeframe of ~11 hours
between backup and begin of outage, around 80 updates) - they'll be
restored from the mail notifications tomorrow.

thanks a lot for all your work on dealing with the issues - I know that Guilhem was having very little sleep the night before already due to maintenance and you just returned from vacation yesterday, so thanks a lot to both of you!

When we're fully back in service, let's sit together and brainstorm about the reasons and what we can do. There are several ideas on the table - but let's discuss this after both of you had some sleep and the last cleanup bits are done.

Thanks a lot!
Florian

Hi *,

[…]
Unfortunately bugzilla's db had to be restored from backup, so not all
comments/updates to bugs are reflected yet (timeframe of ~11 hours
between backup and begin of outage, around 80 updates) - they'll be
restored from the mail notifications tomorrow.

Update on that: This also had been done now, except for a handful
issues that were created during the period (and where the id was
snatched from another bug that was filed after bugzilla was restored)
and that are being refiled right now.

ciao
Christian

Hi Cloph,

Christian Lohmaier wrote:

Update on that: This also had been done now, except for a handful
issues that were created during the period (and where the id was
snatched from another bug that was filed after bugzilla was restored)
and that are being refiled right now.

thanks for the update and the work behind the scenes!

As a follow-up for the community, we have also discussed about some first ideas what the cause of the problems are (i.e. why Gluster got out of sync) and how to address that in the future. We'll send a follow-up to the lists soon and also add it to the infra call's agenda.

Sorry for the inconveniences and thanks for everyone's patience!
Florian