Norwegian version of this page

TSD Operational Log - Page 6

Published Apr. 28, 2021 12:06 PM

Login to TSD is currently unavailable.
We are working to solve the problem as quickly as possible.
Our apologies for the inconvenience.

-- 
The TSD Team

Published Apr. 27, 2021 2:06 PM

All RHEL6 ThinLinc (pxx-tl01-l) machines have now been shut down, as mentioned in the email sent in february. With a few exceptions.

A new RHEL8 Machine has also been made available to every project which can be accessed at https://view.tsd.usit.no
Read: /english/services/it/research/sensitive-data/use-tsd/login/index.html#toc8

If you for any reason need to access your RHEL6 Machine for a limited time, please contact us:/english/services/it/research/sensitive-data/contact/index.html

Published Apr. 23, 2021 2:40 PM

Update 20:00 April 27: a few submit and login hosts that mount /cluster are experiencing new NFS hangs. Some host have been rebooted.

There were NFS hangs on submit and login nodes that mount /cluster.

Published Apr. 23, 2021 12:34 PM

We are performing network maintenance on Thursday 29/4/2021.
We do not expect there to be any interruptions.

Published Apr. 7, 2021 1:26 PM

The cost command, used to query cpu quota usage on Colossus, is currently not working for projects without Sigma2 quota. 

Update: the cost command now displays usage stats for Sigma2 quota, and will display NA and an info message for projects without Sigma2 quota.

Published Mar. 26, 2021 10:51 AM

Starting from April 1st., we will be introducing the following changes in the distribution of Colossus Quotas:

  • We will reduce the Sigma2 pool of resources to 1536 cpu cores, with no gpu nodes. Only TSD-projects with cpu hour quota from Sigma2 can use this pool.
  • We will move the removed resources from the Sigma2 pool to a dedicated resource, called “tsd”, consisting of 288 cpu cores on ordinary compute nodes, plus 128 cpu cores and 4 gpu cards on two gpu nodes.
  • All TSD-projects can use the “tsd” resource, by submitting jobs using "--account=pNN_tsd" instead of "--account=pNN". Please check this document, for the complete procedure:
    /english/services/it/research/sensitive-data/use-tsd/hpc/dedicated-resources.html
  • There will be a limit of 200,000 cpu hours on “tsd” resource, as it is limited. However, we may increase this limit in future.
Published Mar. 18, 2021 8:29 AM

Login through VMware was unavailable for some hours last evening.

Update 21:20:  Issue resolved.

The TSD Team

Published Mar. 6, 2021 10:40 PM

IDPorten is having technical problems. When they are resolved everything will continue normally

Published Feb. 9, 2021 8:50 AM

We're experiencing NFS hangs on many Linux hosts mounting /cluster since 5:55 this morning.

Its also affecting /cluster on the Colossus compute nodes. The majority of compute nodes have been rebooted which may have affected running jobs.

Update 12:00: The submit hosts and Colossus are currently unavailable.

Update 14:00: The issue has been resolved, and we're rebooting the submit hosts now.

Published Feb. 6, 2021 12:02 AM

Due to an outage, login through VMware is currently unavailable.
You should however still be able to login through https://login.tl.tsd.usit.no if your project has a Linux-VM.

We are working on getting things back to normal as quickly as possible.

--
The TSD Team

Published Feb. 3, 2021 12:56 PM

The storage system for project storage (not Colossus) is having performance issues. This is causing instability in file import and export, and some slowness on virtual machines. We are debugging and fixing this.

Published Feb. 3, 2021 9:34 AM

Consent Portal from registration temporary due to service modification. This does not mean that consent is no longer acquired. The consents will be delivered to your project normally. The already registered forms will continue to be exposed to consenters on the external portal. We expect to resume form registration in couple of days.

Published Feb. 1, 2021 9:21 AM

We are working on solving an issue with Microsoft Office in TSD, giving this error:

?Microsoft office can't find your license for this application. A repair attempt was unsuccessful or was canceled. Microsoft office will now exit"

We will update the progress here, once the issue is resolved.

Published Jan. 21, 2021 11:07 AM

Update 14:30: Maintenance is complete, and submit hosts are now being rebooted.

Colossus will have downtime Thursday 21 January from 12:00-14:00 due to a third party issue.

Colossus and submit hosts will not be available during this time. Any pending jobs will automatically be rescheduled after the downtime.

This message will be updated once the maintenance is complete.

 

Published Jan. 16, 2021 12:50 PM

Unfortunately our login service is down and you will not be able to log in. 

We are working on bringing everything back as quickly as we can, and will update this message as we move forward with solving the issue.


-- 
Best regards,
The TSD team

Published Jan. 11, 2021 9:51 PM

Update (21:45) - Maintenance is completed and submit hosts are being rebooted.

Update (16:00) - The hardware upgrade is taking longer than anticipated and it extended until further notice.

IBM is performing hardware replacement on the ESS storage on monday from 12:00-16:00.

Colossus and submit hosts will not be available during this time.

This message will be updated once the maintenance is complete.

Published Dec. 13, 2020 6:47 PM

We are experiencing issues with the Colossus storage system, and have reached out to the vendor for technical support, with the highest priority. Some projects' submit hosts may experience NFS hangs.

Published Dec. 11, 2020 1:16 PM

We're experiencing problems with the ESS storage, affecting /cluster NFS mounts and login to submit hosts and RHEL login nodes.

There might also be interruptions to HPC jobs.

Published Dec. 7, 2020 10:46 AM

We will be changing the certificates for view.tsd.usit.no and view-ous.tsd.usit.no next monday, and a short downtime is to be expected. No more than 30mins. This message will be updated once we are done. 

Published Dec. 4, 2020 2:29 PM

We will be performing some maintenance on services related to user authentication on monday.

 

Logins might be unavailable for a few minutes while this is ongoing, so if you experience issues when trying to log in, please try again a few minutes later.

 

This message will be updated once the maintenance is complete.

Published Dec. 2, 2020 11:29 AM

We are testing a new login gateway today Wed 02/12 at 16:00, this might cause temporary interruptions to connections to TSD. The test will not last longer than a few seconds.

Published Nov. 26, 2020 11:03 AM

We're experiencing NFS hangs on many Linux hosts since last night 20:00. We're working on a solution, which will involve rebooting the hosts.

Published Nov. 24, 2020 3:59 PM

The maintenance will last not more than 1 hour

Published Nov. 24, 2020 7:20 AM

We are upgrading our network between 07:00 and 09:00 today, November 24, which will cause disruptions to TSD services. Please do no perform critical work that cannot be saved.

TSD