News:

Welcome to Wittenberg!

Main Menu

Server outage 2023-10-23

Started by Danihel Txechescu, October 23, 2023, 09:30:53 PM

Previous topic - Next topic

Danihel Txechescu

Azul.

The server underwent an unexpected service discontinuity today, 2023-10-23, at about 19:28 Talossan Time, which lasted for about 1 hour, until 20:26 Talossan Time. The affected services were only the web server, which currently hosts Wittenberg, the main web space, the Wiki, and other web services; email remained operational throughout the entire time; backups are still safe.

The nature of this disruption was a hard drive getting filled up to 100%. This should never happen; why this happened was because it was not actively monitored. To prevent such an problem from occurring again, a dedicated lookup/monitor on hard disk utilization has now been put in place.

I apologize for the inconvenience.

Danihel Txechescu
MinTech

Miestră Schivă, UrN

I thank the Minister of Technology for his attention. There is no doubt - and this is backed up by what I've heard from the PermSecBackEnd - that the MinTech is responsibly administering our web presence on his private storage.

The thing is, the MinTech should not be administering our web presence on his private storage. Our national web presence should be administered redundantly on a server to which other members of the Government or the Civil Service can get access in emergencies, and can be transferred to another administrator seamlessly in case of a change of Government, or Danihel just getting bored.

A whole series of reforms were carried out over the last five years to precisely take administration of Witt, the national webspace, and our domain names out of the hands of private individuals (the King/Hool) and put them in public administration. But as we all know, DoRoyal has disappointed us as a host (we really shouldn't have taken Guy Incognito's recommendation, huh?), and there have been problems transferring the domain names because for some reason cxhn. Perþonest hasn't been talking to the Government.

The TNC have expressed their desire to implement this principle in one particular area - the Database, which should be replaced by something that is not the property of a private citizen. I urge the incoming Government - of whatever ideological persuasion - to apply it consistently, like previous Governments tried to. Put an end to this "half-in-half-out" situation with DoRoyal and find a new, permanent webhost; set up a Database replacement on this; and create redundant structures so that our online presence no longer relies on the goodwill or presence of particular individuals.

PROTECT THE ORGLAW FROM POWER GRABS - NO POLITICISED KING! Vote THE FREE DEMOCRATS OF TALOSSA
¡LADINTSCHIÇETZ-VOI - rogetz-mhe cacsa!
"IS INACTIVITY BAD? I THINK NOT!" - Lord Hooligan

Danihel Txechescu

Quote from: Miestră Schivă, UrN on October 24, 2023, 03:13:16 PM[...] Our national web presence should be administered redundantly on a server to which other members of the Government or the Civil Service can get access in emergencies, and can be transferred to another administrator seamlessly in case of a change of Government, or Danihel just getting bored. [...]

Just to make this clear, @Sir Lüc has complete access to that server, the backups, and to instructions on how to replicate the service with any other provider. This is not my personal server, it's a server I happened to set up for Talossa.

If I get run over by a plane or I happen to be in a bus crash, I trust Lüc will be able to do exactly what I did to keep the services running.

Sir Lüc

I should say publicly, as I did to Miestră privately, that Dan has my full confidence and has done an excellent job since taking office. The gray area around DoRoyal and the alternate hosting is not ideal, but I'm sure that can be fixed over the course of the incoming term.
Sir Lüc da Schir, UrB
Directeur Sportif, Gordon Hiatus Support Team

In my free time:
Túischac'h dal Cosă / Speaker of the Cosa
Wittmeister & Permanent Secretary of Backend Admin / Secretar Parmanint per l'Aðmistraziun del Backend
Deputy Scribe of Abbavilla / Distain Grefieir d'Abbavillă

Danihel Txechescu

Quote from: Danihel Txechescu on October 23, 2023, 09:30:53 PMAzul.

The server underwent an unexpected service discontinuity today, 2023-10-23, at about 19:28 Talossan Time, which lasted for about 1 hour, until 20:26 Talossan Time. The affected services were only the web server, which currently hosts Wittenberg, the main web space, the Wiki, and other web services; email remained operational throughout the entire time; backups are still safe.

The nature of this disruption was a hard drive getting filled up to 100%. This should never happen; why this happened was because it was not actively monitored. To prevent such an problem from occurring again, a dedicated lookup/monitor on hard disk utilization has now been put in place.

I apologize for the inconvenience.

Danihel Txechescu
MinTech

Today the server had exactly the same kind of problem, even with monitoring (though not as active during the weekend). I can only think of a recent upgrade to Wordpress that's changed things with the database service.

The cause behind these outages is the system losing its available scratch space, due to an aggressive saving of binary logs. While these are a good safety net, we have other kinds of database backups, so I will have this feature disabled entirely as it's causing too much pain.

Apologies again for this outage. This should not happen again once this feature is disabled. We'll see in two weeks' time.

Danihel Txechescu