Norwegian version of this page

TSD Operational Log - Page 7

Published Nov. 12, 2020 10:04 AM

Vi opplever for tiden problemer med ? f? kontakt med view.tsd.usit.no. Dette er en lastbalanserer, som sender videre til selve tjenesten.

Inntil videre kan view1.tsd.usit.no benyttes direkte.

Update (10:21): Problemene er l?st.

Published Nov. 9, 2020 1:25 PM

We're performing emergency maintenance on the cluster storage (ESS). Start time: 13:30. Duration: several hours. Possible interruptions to /cluster/software and /cluster/projects/pXX may occur.

Published Nov. 4, 2020 9:26 AM

We're performing maintenance on the cluster storage (ESS). Start time: 10:00. Duration: several hours. Possible interruptions to /cluster/software and /cluster/projects/pXX may occur.

Update: The maintenance was partially successful. We're considering an additional maintenance window and will notify you well in advance via email.

 

Published Oct. 30, 2020 10:09 AM

We are investigating the cause of a degradation in performance of project storage.

Published Oct. 23, 2020 8:31 PM

The TSD gateway was unstable and causing connections to our login services as well as data uploads to fail. we have done a failover to our secondary gateway, and the issue should be resolved.

Published Oct. 20, 2020 12:27 PM

We're experiencing issues with TSD selfservice login using BankID.

Update:

The problem is resolved.

Published Oct. 16, 2020 9:38 AM

External Import Links are temporary not working

Published Oct. 15, 2020 7:37 AM

Due to serious security issues, we must expedite the Windows patching, and reboot all Windows VMs in TSD.

For the detailed information about the vulnerability, please check this site:

https://portal.msrc.microsoft.com/en-US/security-guidance/advisory/CVE-2020-16898

This vulnerability is outside the control of TSD, and we are patching urgently as this might risk our internal security measures.

Published Oct. 14, 2020 9:05 PM

Due to a technical issue, the Web services and applications at the http://data.tsd.usit.no website, will be temporarily unavailable starting 10pm local time today, while we rectify the problem. We expect to have the issue resolved within one hour.

Published Oct. 14, 2020 1:33 PM

User creation is temporary stopped due to technical problems.

Published Oct. 7, 2020 8:02 AM

We're fixing the cluster/software NFS share on submit hosts, app nodes and RHEL7 login nodes. You'll not be able to submit jobs to Colossus or access the /cluster/software and /cluster/project mounts.

Published Oct. 6, 2020 1:18 PM

We are working on solving an issue in the consent system that will require it to be off for few hours 

Published Oct. 2, 2020 1:34 PM

We are fixing the issue.

Published Sep. 29, 2020 8:59 AM

Today we are upgrading VMware Horizon, and as such it is not possible to log in to Windows VMs.

Published Sep. 24, 2020 2:08 PM

There was a problem with a service related to changing QR-codes, which caused users to be unable to change their QR code between 09:15 and 14:00.

Published Sep. 24, 2020 10:24 AM

Yesterday, between 14 and 21, many jobs failed to start due to a problem with the scratch file system. These jobs have been requeued now, and should start as normal again.

We are still trying to figure out what the cause was. The indications so far is that the filesystem got full, either in terms of disk space or number of files. If that is the case, jobs using $SCRATCH can have been affected or even crashed, so please check your jobs.

Update, 2020-09-27: We have confirmed that it was one or more jobs that filled up $SCRATCH, in the sense that they created too many files. We are setting up monitoring to be able to find out which user's jobs are responsible should it happen again.

 

Published Sep. 17, 2020 10:53 AM

We are fixing issues with Windows login at view.tsd.usit.no

Published Aug. 17, 2020 11:10 AM

Many Windows hosts ended up in an inaccessible state after automated upgrades over the weekend. We are currently getting the hosts back up, and will make adjustments to avoid this issue from reoccuring.

Published Aug. 14, 2020 12:22 PM

Due to maintenance on the Colossus compute cluster, the queue system (Slurm) commands (sbatch, squeue, etc.) will be unavailable for a couple of minutes. This will happen a couple of times today. Running jobs on Colossus will not be affected. Nothing else on VMs will be affected (for instance, access to project areas and software modules).

Published Aug. 11, 2020 12:54 PM

We will have a short  stop maintenance of selfservice between 13.00 and 14.00 today 11/08/2020.

Best TSD Team

Published Aug. 11, 2020 12:54 PM

We will have a short  stop maintenance of selfservice between 13.00 and 14.00 today 11/08/2020.

Best TSD Team

Published July 29, 2020 1:04 PM

We are currently having issues related to changing user account passwords in TSD. We're working to resolve this as quickly as possible.

--
Best regards,
TSD

Published July 27, 2020 1:03 PM

We're currently having some trouble with access to the Colossus storage. We're working on solving this as quickly as possible.

Unfortunately, this will cause login problems for some of the machines in projects which are connected to Colossus.

--
Best regards,
TSD

Published June 30, 2020 12:01 PM

Update: Dragen has been updated to CentOS7 and licenses have been renewed starting August 1 2020 till July 31 2022. Access to Dragen has been revoked for all projects except p22. Access to Dragen can however be requested by sending an email to TSD.

We're upgrading Dragen to CentOS7 and installing the new filesystem. We will update the log when its back online.

Published June 22, 2020 2:15 PM

The maintenance  is starting at 14.30 and will last no more than 15 minutes.