Site-Related => News => Topic started by: Spectere on November 23, 2015, 08:25:51 PM

Title: Another Server Outage
Post by: Spectere on November 23, 2015, 08:25:51 PM
Remember how I was planning to migrate to a different distro? Yeah, well, I got motivated to do that again. Why? Because systemd is literally one of the worst things to happen to Linux and I want it out of my life.

So the site went down some time today. Dunno when. Why don't I know? I'll get to that in a sec, don't you worry. What's important is what happened: several services died spontaneously. I tried to restart them. This happened:


What? Uh. Okay. Couldn't get any insight from the logs or anything, so I did what I do every time systemd packs a mental: I rebooted the server (because everyone loved it when Windows 95 required a reboot to do...well, anything...right?).

Then...nothing. No httpd and, more importantly, no sshd. Fortunately, Linode is awesome and has a little thing called Lish that lets me console into the server via SSH. I connect to that, log in, get root, etc etc.

Nothing is started. sshd, httpd, mysqld are all stopped. I don't even have a network interface.


I check the logs. Nothing. The binary logs that systemd insists on using are corrupt and unreadable, so I have no idea what happened. Why does it have binary logs? Because whichever idiot "designed" it has never had to deal with a system failure before. Because they apparently think that the Windows Event Viewer system is super-convenient and shit. Because it's apparently not designed for bloody servers.

So I quickly restart everything. Fortunately, it all comes up. Hopefully "systemctl enable" actually starts working so that I can, y'know, reboot the server without having it (and myself) having a complete panic attack).

Way more excitement than I wanted on a Monday evening, but fortunately everything seems to be back in action. But yeah, I'm sick of this bullshit. I've had nothing but problems like these every time I've tried to use or administer a systemd-based distro, so fuck it. It's getting dumped in the trash where it belongs. sysvinit ain't perfect (though stuff like OpenRC helps) but at least it's a proven solution, and when something gets fucked up it's pretty transparent.

More updates to come as I continue with this project.
Title: Re: Another Server Outage
Post by: Bobbias on November 25, 2015, 08:27:55 AM
That sounds like one massive headache.
Title: Re: Another Server Outage
Post by: Zephlar on November 25, 2015, 01:46:49 PM
Christ man. Thanks for working hard to keep the handful of us left afloat!
Title: Re: Another Server Outage
Post by: Spectere on November 25, 2015, 05:59:54 PM
No problemo. :)

And hey, regardless, I still use this server for quite a few things. I definitely need it to be up. When it's not, Ian no happy. And when Ian no happy, ain't nobody happy.

Edit: I've also noticed that the site has been getting slower and less responsive over time. There is literally no explanation for it, as nothing untoward is happening, judging from the Apache and system logs. I'm sure that'll be resolved when I migrate away from shitstormd and Arch.
Title: Re: Another Server Outage
Post by: Spectere on November 30, 2015, 11:05:31 AM
So httpd randomly died last night. Something sent it SIGHUP and it went down like an anvil. Site was down from around midnight (EST) to a few minutes ago.

This has got to be the least stable server platform I've ever used.
Title: Re: Another Server Outage
Post by: Bobbias on December 01, 2015, 10:02:58 AM
I was wondering what was going on. I wondered if maybe you were switching it over to something more stable already.
Title: Re: Another Server Outage
Post by: Sneaky on December 01, 2015, 03:35:42 PM
shit's fallin apart
Title: Re: Another Server Outage
Post by: Spectere on December 02, 2015, 11:14:58 AM
I was wondering what was going on. I wondered if maybe you were switching it over to something more stable already.

It's not that simple. I have more running on here than just httpd. Minor stuff, but I'd definitely notice if I broke it. Additionally, I've been seriously considering a switch to nginx and making that jump is going to require additional research and testing. There's also Lachesis/Alice's subdomain and shell/SFTP access to consider, so I need to work with her to make sure that things don't break.

Unfortunately, Linode's distro list contains mostly systemd-infested distributions. The only two that don't "feature" that dysfunctional mess are Gentoo and Slackware. I know Gentoo like the back of my hand, but compiling everything on a low-cost VPS doesn't appeal to me (this one only has 1 core and 1GB of RAM which is fine for general use, but it's definitely not fine for heavy compilation). I've never used Slackware before, so before I even consider migrating I'm going to at least need to teach myself slackpkg.

As mentioned before, I'd been planning to migrate to Gentoo before. I actually had most of the site up and running on a separate Gentoo-powered Linode, but in order to make it viable I would have had to pay at least double the monthly cost (i.e. moving up to the 2 core/2GB node) to make it viable, and I don't see the point in paying $20/month so that updates can be done in a less disruptive fashion.

Additionally, even if I completely ignore the myriad of systemd issues I've had since I spun freyr up, I've learned that using a rolling distribution on a production server just isn't a great idea in the first place, especially one that's as unstable as Arch. I've had a few surprise updates where I wound up having to completely change around configuration files just so that services would start up, and that's something that Gentoo definitely wouldn't fix. Gentoo's "stable" branch doesn't move nearly as fast as Arch's (which is out of control; I don't think I'd even feel comfortable using it on a desktop system), but when they decide that Apache 2.4 is good, they're still going to switch to it. With something like CentOS you can typically just keep on doing security updates until the end of time and won't have to worry about scrambling around and completely revamping /etc/ to keep up with the latest version of things.
Title: Re: Another Server Outage
Post by: Spectere on December 07, 2015, 02:02:12 AM
Another week, another Apache crash. It looks like it was only down for about an hour and a half this time (yeah. "Only"). I just updated Apache, so hopefully the new version has a fix for its nasty tendency to randomly segfault every week, a few seconds after midnight (hurray, I found a pattern?).

Also, Arch is in the process of completely taking a shit, which is why I'm probably experiencing a seemingly abnormal amount of systemd anomalies. I have a feeling someone completely busted the MariaDB package, because it absolutely refuses to update because /var/lib/mysql already exists. Well, uh, yeah, of course it bloody does.

I just don't even know.

So yeah, I must emphasize again: don't run servers on bleeding edge rolling distros. You will get cut, burned, and all manners of mutilated. I knew that Arch was very much ahead of the curve in terms of package versions, but I had no idea that so little effort went into ensuring basic service stability. Like I said, I wouldn't even feel safe using it on a desktop system.