Services and sites not restarting on reboot

titi · January 6, 2021, 6:08am

Hello EE community,

New year, same problems: after system reboot all sites and services are down
There is no docker container listed on docker container ls command

This topic was already discussed here:

and there is a bug report on github as well:

I can manually restart services and sites, however, I could not find a solution to have all containers restart on reboot.

Anyone have any idea?

titi · January 14, 2021, 6:57am

Does anybody have any idea on this?
Because this is important to me, I am ready to pay for a solution. Should I create a paid job opportunity for this issue?

freedog96150 · January 14, 2021, 11:05pm

Just sharing my experiences hoping you gain more insight.

My hosting environment: 100% Digital Ocean (different sized instances for different setups).
Droplet 1: $5 (smallest instance) - 5 EE sites all low traffic sites
Droplet 2: $5 (smallest instance) - 1 EE site medium traffic site
Droplet 3 $20 (Medium instance) - 3 EE sites heavy traffic sites

Droplet 1 NEVER reboots properly. Any reboot results in sites that do not come back online and containers that do not start properly. Droplet 2 ALWAYS reboots/restarts properly, every single time. Droplet 3 ALWAYS reboots/restart properly every single time.

Looking at the logs for Droplet ,1 it becomes quickly apparent that proper startup order is not achieved and sites often attempt to run prior to either the global database or the nginx-proxy being properly initialized. Once that happens, the proper labels are never added to conf files and no amount of restarting of the sites or the services will fix this issue. I have also seen log entries where the database just runs out of memory on those smallest droplets and never recovers. But that mostly returns a database connection error.

Droplets 2 and 3 always seem to restart in the proper order. I assume on droplet 2 that with only a single site, there is not stack of commands running and all comes up in an orderly fashion. I can also assume that on Droplet 3, having considerably more RAM and CPU cycles that everything just starts fast enough to keep up with site loading.

My resolution that works every time -

sudo ee site disable sitename.com once for each site
cd to the services directory /opt/easyengine/services

docker-compose down && docker-compose up -d

then enable all the sites again with

sudo ee site enable sitename.com for each site

I have read where users repeatedly reboot their servers and get lucky and everything comes back in the proper order and starts. That seems to be a bit like pulling the lever on a slot machine and hoping for triple 7’s. Finding a programmatic way is always more efficient.

I have been able to duplicate this problem on Linode and Vultr accounts using their smallest instances and the same 5 sites as used on my droplet 1. This eliminates this as a Digital Ocean specific problem. I conclude that this is a resource issue and that the mysql server is probably the culprit being the one service utilizing the most resources on any server.

I have also noted that there are existing Github issues addressing both the configuration of mysql on small resource servers as well as assuring that the depends-on directives are properly followed during startup. I have the knowledge of docker-compose to tinker with different settings but the reality is that I only reboot the server every few months at most and just use the resolution explained above to get sites back up and running.

Perhaps a feature request for something like a sudo ee site reinitialize or similar nomenclature which would verify that all the proxy and database conf files match the containers and restart all if needed will be the answer?

titi · January 20, 2021, 2:30am

Thank you for your input, the “disable/enable each site” is the way I get my sites up at the moment.
I suspected mysql was(is) the culprit but I have no evidence to support my theory
However, I will do further research to find a proper way of getting all sites up on reboot.
I will also post the solution in here when/if found.

freedog96150 · January 22, 2021, 2:28am

Have you tried to look at the individual docker logs for the global-db service on your server? Sometimes those logs can shed light. Otherwise, I find that looking in the servers system logs can also show you what may be happening.

me@host:~$ docker logs services_global-db_1 on my servers. Yours may be a different container name. If you need to look up container names just use docker ps -a and you will get a list of all running and stopped containers.

You should see a long list of logged notices. Look for any warnings and definitely at all errors. As I mentioned earlier, most of my errors were the container getting killed when system resources were depleted. Tossing more CPU and RAM at it seems to resolve those issues. I know that there are resource controls in current versions of docker-compose. Not for he faint of heart, but check out https://docs.docker.com/compose/compose-file/compose-file-v2/#cpu-and-other-resources if you are curious.

I would also take a look at the settings in the my.cnf file and tune and optimize for your server. I can only speak to the smallest Digital Ocean droplets but I tend to modify the my.cnf on those smaller instances to the following:

(typically found in /etc/mysql/my.cnf which is /opt/easyengine/services/mariadb/conf/my.cnf on EEv4)

key_buffer = 16K
max_allowed_packet = 1M
thread_stack = 64K
table_cache = 4
sort_buffer = 64K
net_buffer_length = 2K

Note that these settings may not work on other setups and all needs to be tested first. Make copies of any files prior to changing. You should also consider putting in any changes in the conf.d directory instead of modifying the original my.cnf. These are just starting points to look for resolution and should not be marked as a solution. Optimizing mysql servers is a complex task.

system · February 21, 2021, 2:28am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.