Halloween Moodle Panic: Cache Crisis

The last three weeks of October were frightful  due to a series of incidents affecting the Moodle server of my School. Randomly server crashes and a set of strange behaviours using Moodle led me, as admin, to a stressing weeks. That’s way I’m goin to collect all the incidents and write a post called Moodle Panic 🙂

Affected Moodle version: 2.4.6

The next list of incidents ended up with a server freezing. Surprisingly none of them was correlated to peak times in the use of Moodle.

Date of freezing –> Errors showed when trying to access Moodle WEB Server

  • Friday 4th October at 00:00h AM –> Fatal error: $CFG->dataroot is not writable, admin has to fix directory permissions! Exiting.
  • Sunday 6th October at 04:00h AM –> Fatal error: $CFG->dataroot is not writable, admin has to fix directory permissions! Exiting.
  • Wednesday 16th October at 04:00h AM –> Fatal error: $CFG->dataroot is not writable, admin has to fix directory permissions! Exiting.
  • Saturday 26th October at 19:00h PM –> Fatal error: $CFG->dataroot is not writable, admin has to fix directory permissions! Exiting.
  • Monday 29th October at 00:00h AM –> Fatal error: $CFG->dataroot is not writable, admin has to fix directory permissions! Exiting.

All these incidents crashed the server and I had to reboot it manually. As soon as the server rebooted it started to rebuild the raid1 but everything started to work fine again… until the next incident.

Incidents reported by teachers using Moodle at School

  • Monday, 21st October at 18:06h –> A teacher posted a Glossary entry and something went wrong so she got the next error: «Detectado un error de codificación, debe ser corregido por un programador: Failed to unserialise data from file. Either failed to read, or failed to write.«

The error  message appeared at the bottom of her glossary entry but before the set of icons to edit and delete. So the entry couldn’t be deleted and no more entries were displayed because the error stopped to render the full page. I managed to find out the entry id in the database and then I could delete it through the Moodle interface changing manually the id in the URL (Ask me how to do it if you need it) . The problem had been sorted it out  and it worked again. But, as soon as the teacher tried to add the same entry again (the word added was ‘Biodiversity’) it happened again 🙁 . It was weird because if she tried to add ‘Biodiversity1’ instead of ‘Biodiversity’ there was no problem at all. At that moment, I started to suspect about something related to cache so as soon as I ‘purge all caches‘ the problem was sorted out again.

  • On Monday 28th, meanwhile I was (as admin) opening a moodle lesson to my students, I received an unexpected error message: «Detectado un error de codificación, debe ser corregido por un programador: Failed to unserialise data from file. Either failed to read, or failed to write.«.  Although I was able to sort it out clicking again on ‘purge all caches‘ as I had done before for the Glossary entry I realized that  something began to go wrong 🙁 At that moment I though that I knew how to solve it but I didn’t know what was causing the problem –> Stress! 🙁
  • Tuesday 29th: Something similar happened to me again but at that time I couldn’t resolve the problem same way (just clicking on ‘purge all caches’) anymore.  The persisting message «Detectado un error de codificación, debe ser corregido por un programador: Failed to unserialise data from file. Either failed to read, or failed to write» led me to start to worry about the whole integrity of my Moodle site. I was completely lost!
  • I changed my state at that moment from Stress to Panic 🙁 I say Panic because, as I mentioned before, after each reboot the server started to rebuild the raid1 and if something went wrong along the time taken to complete the reconstruction, it could have produced a lost of data.

What did I do

Moodle forums and Moodle tracker are the right places to look for help about Moodle. But I had been searching there for similar issues since some days ago and I hadn’t found similar scenarios. There were some incidents reported about ‘Failed to unserialise data from file’ were related to fresh installations of Moodle  but I was looking there for reports about:

  • Server crashes
  • «Failed to unserialise data from file..» messages
  • «Fatal error: $CFG->dataroot is not writable» messages

After using a 3rd disk to save a, just in case, full backup of both SO and Moodle Site and coming back from Panic to Stress again, I started to explore other options to sort it out.

As I couldn’t ‘purge all caches‘ from where I had done it before I tried a different approach trying to purge from console: user@my_server:/var/www/web1/web/admin# sudo -u www-data /usr/bin/php /var/www/web1/web/admin/cli/purge_caches.php but I failed again and got the same debugging message:

Error code: codingerror
* line 468 of /cache/stores/file/lib.php: coding_exception thrown
* line 371 of /cache/stores/file/lib.php: call to cachestore_file->prep_data_after_read()
* line 295 of /cache/classes/loaders.php: call to cachestore_file->get()
* line 1358 of /cache/classes/loaders.php: call to cache->get()
* line 522 of /lib/dml/mysqli_native_moodle_database.php: call to cache_application->get()
* line 1269 of /lib/dml/mysqli_native_moodle_database.php: call to mysqli_native_moodle_database->get_columns()
* line 1565 of /lib/dml/moodle_database.php: call to mysqli_native_moodle_database->set_field_select()
* line 1302 of /lib/moodlelib.php: call to moodle_database->set_field()
* line 1205 of /lib/outputrequirementslib.php: call to set_config()
* line 1503 of /lib/moodlelib.php: call to js_reset_all_caches()
* line 51 of /admin/cli/purge_caches.php: call to purge_all_caches()

I had read a post on Moodle Forums where Ken Task proposed to manually erase the cache folder at moodledata from console. He said that the folder should be rebuilt again and automatically as soon as the web server started to receive requests. But I was scared about doing this kind of things without being completely sure about the final result. So, I started to look, once again, around the ‘Moodle Admin Menu’ trying to find a way out when I arrived to:

Home ► Site administration ► Plugins ► Caching ► Configuration

From there I found the ‘Configured store instances‘ section and the three links to purge content straight from there. Purging cache stores ended up solving the problem…. until now.  No more server freezes either.

A further investigation on server logs at crashing time (actually the last logs before crashing) showed me nothing in particular at syslog, messages, apache2 logs. But in kern.log I noticed this:

Oct 29 00:00:58 server kernel: [52934.011290] ata1.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 frozen
Oct 29 00:00:58 server kernel: [52934.011297] ata1.01: failed command: FLUSH CACHE EXT
Oct 29 00:00:58 server kernel: [52934.011304] ata1.01: cmd ea/00:00:00:00:00/00:00:00:00:00/b0 tag 0
Oct 29 00:00:58 server kernel: [52934.011305]          res 40/00:01:00:00:00/00:00:00:00:00/10 Emask 0x4 (timeout)
Oct 29 00:00:58 server kernel: [52934.011309] ata1.01: status: { DRDY }

Cacti charts running in my server since I installed Moodle some years ago didn’t give any evidence of a failure that could have been causing these issues. I also checked both disks in our raid1 with smarttools long tests and the use of RAM looking for failures with no success. I had also suspected of APC running in my server since only some months ago but I couldn’t find either any evidence of a relationship with it. Today I think that all of this has been caused by Moodle Cache but I don’t know yet how and when. That’s why I’m posting here and reporting to Moodle Forum and Tracker to help to debug.

Thanks for reading and feel free to comment, to disagree or even to correct my improvable English 😉

Viva Moodle!

My posts to Moodle Forums / Tracker:

https://moodle.org/mod/forum/discuss.php?d=240739

https://moodle.org/mod/forum/discuss.php?d=216810

https://tracker.moodle.org/browse/MDL-42673

 

Un comentario

Deja una respuesta

Tu dirección de correo electrónico no será publicada. Los campos obligatorios están marcados con *

Este sitio usa Akismet para reducir el spam. Aprende cómo se procesan los datos de tus comentarios.