Norwegian version of this page

TSD Operational Log - Page 8

Published May 11, 2020 10:24 AM

Some compute nodes are down at the moment, causing jobs to be re-queued. We are working to bring them back online.

Published May 8, 2020 3:12 PM

There are currently problems with the file-export feature on data.tsd.usit.no

 

Update:

The issue should now be resolved.

Published May 8, 2020 12:27 PM

Currently, some users are not able to log in to TSD via ThinLinc. We are working on fix.

Published May 7, 2020 11:31 AM

There's a license upgrade issue.

Update: the new licenses have been installed.

Published Apr. 20, 2020 12:32 PM

The machine exporting the /cluster filesystem crashed, and was down for around 10 minutes. This has caused a lot of machines mounting this share to be unresponsive.

All machines which are having trouble mounting the share will be rebooted shortly to fix any issues.

UPDATE:

All machines should now be available again. Let us know if you still have trouble accessing your machines.

Published Apr. 17, 2020 9:38 AM

Due to an issue with the virtualisation infrastructure, some VMs have been unintentionally rebooted. If you are having issues please let us know. We are investigating to fix any outstanding problems.

Published Apr. 14, 2020 2:00 PM

We need to perform a brief maintenance on the data portal, during which it will not be possible to import or export files.

Published Apr. 7, 2020 12:20 PM

We are investigating issues related to export in TSD projects

Published Apr. 6, 2020 8:13 AM

Due to a Linux operating system update some virtual machines are experiencing issues. We are working to solve the problem.

Published Apr. 1, 2020 11:24 AM

While this is ongoing, it will not be possible to reset passwords, or get new QR codes.

Published Mar. 31, 2020 8:54 AM

Since Saturday evening, there has been some instability with the Colossus file system, which affects all submit hosts and running jobs on Colossus.

Update:

The file system was up again at 09.44. All systems should work as normal, but please inform us if you have problems.

Published Mar. 21, 2020 11:39 PM

As a step in the process of getting the new storage in production, we will restrict access to Colossus from Monday March 23rd. to March 27th.

During this time, there will be no access to Colossus, and the /cluster storage, including the project areas and software modules. You can also not access these services from the TSD's virtual machines.

The HNAS areas ("durable") will not be affected by this downtime.

We will update this page with the progress of our work during the maintenance window.

Published Mar. 20, 2020 9:37 AM

Bio-IT processor software upgraded to v3.5.7.

Published Mar. 12, 2020 12:07 PM

Our TSD-users are currently experiencing issues with SPSS license. We are working on resolving this issue, but it might take a couple of working days to fix it.

Published Mar. 3, 2020 1:03 PM

UPDATE: This was solved, 2020-03-03, 13:39

VMware Horizon is currently down. This means that access to Windows virtual machines through VMware Horizon and https://view.tsd.usit.no is currently unavailable.

We are working on solving this as soon as possible.
Our apologies for the inconvenience.


-- 
Best regards,
TSD

Published Mar. 2, 2020 3:10 PM

The machine exporting the /cluster filesystem crashed, causing hanging mounts on machines which mounts the /cluster file system.

We're working on solving the issue.

--
Best regards,
TSD

Published Feb. 3, 2020 9:58 AM

We are working on fixing an issue affecting ssh between project VMs. While the issue persists, you may experience trouble accessing your Colossus submit host.

Published Jan. 31, 2020 1:44 PM

TSD-users cannot log in and we are investigating the cause of this and working on a fix.

Published Jan. 20, 2020 8:39 AM

We are having trouble with the Colossus NFS export, and are working to solve it.

Published Jan. 15, 2020 9:20 AM

UPDATE: Maintenance is done, and all exports of /cluster should be back to normal as of 12:58, 15-01-2020.

Due to the twice occurring crashes of the file system so far this week, we will be taking down the file system again today at 12:00, 15-01-2020 for quick maintenance.

As a result of this, /cluster will become unavailable on submit-hosts and other project VMs which mount /cluster. However, HPC-jobs running on Colossus itself will not be affected.

Published Jan. 13, 2020 3:31 PM

The machine responsible for making /cluster on Colossus available to the project machines in TSD crashed at 15:15 13-01-2020.

The services are now back up and running as expected.
For most projects this should not impact regular operations, however could create problems for projects which frequently access /cluster from their virtual machines.

We are currently checking all projects and working on getting everything back in order for the projects still affected by the outage.
 

-- 
Best regards,
TSD

Published Jan. 6, 2020 11:30 AM

SPSS is displaying a warning for  license expiry. Please ignore this message. The problem will be solved soon.

Published Jan. 3, 2020 1:27 PM

We are working to solve the issue.