Computer maintenance workers at Kyoto University have announced that due to an apparent bug in software used to back up research data, researchers using the University’s Hewlett-Packard Cray computing system, called Lustre, have lost approximately 77 terabytes of data. The team at the University’s Institute for Information Management and Communication posted a Failure Information page detailing what is known so far about the data loss.
The team, with the University’s Information Department Information Infrastructure Division, Supercomputing, reported that files in the /LARGEO (on the DataDirect ExaScaler storage system) were lost during a system backup procedure. Some in the press have suggested that the problem arose from a faulty script that was supposed to delete only old, unneeded log files. The team noted that it was originally thought that approximately 100TB of files had been lost, but that number has since been pared down to 77TB. They note also that the failure occurred on December 16 between the hours of 5:50 and 7pm. Affected users were immediately notified via emails. The team further notes that approximately 34 million files were lost and that the files lost belonged to 14 known research groups. The team did not release information related to the names of the research groups or what sort of research they were conducting. They did note data from another four groups appears to be restorable. Also unclear is whether the research groups who lost their data will be reimbursed for the money spent conducting research on the university’s supercomputer system. Such costs are notoriously high, running into the hundreds of dollars per hour of computing time.
Some news outlets are reporting that the backup system was supplied by Hewlett-Packard and that the failure occurred after an HP software update. The same outlets are also reporting that HP has accepted blame for the data loss and is offering to make amends. The team at the university reported that the backup procedure was halted as soon as it became clear that something was awry and university officials suggest that in the future, incremental backup procedures will always be used to prevent the loss of data.