Norwegian version of this page

TSD Operational Log - Page 10

Published Aug. 25, 2019 10:54 AM

Dear TSD User

We are experiencing issues with Windows login and are working to fix it.

Published Aug. 22, 2019 3:18 PM

Dear TSD User

We are experiencing issues with Colossus, which is delaying jobs from being run. We are working to fix the problem.

Published Aug. 19, 2019 9:04 AM

As previously announced, we are starting today at 09:00, and continue working throughout the day. Colossus will not be available during this period. The maintenance will include an upgrade of both network and NFS-export.

Please note that this means the /cluster file system will be unavailable during the maintenance stop, and some of the VMs mounting /cluster might need to be rebooted.

No currently running jobs will be canceled due to the stop, but jobs that will not be able to finish before 09:00 on Monday, will be held in the queue until after the maintenance.

Update:

10:56

We have partially completed the upgrade, and Colossus is ready to use again. Due to a hardware-error, we were unable to replace it from the NFS-export machine. We will address this issue later. Also, we managed to run a command that will prevent similar crashes as the one happened yesterday.

Published Aug. 18, 2019 8:40 AM

We are having issues with Windows and linux login, and are working to fix the issue

Published Aug. 17, 2019 7:36 PM

Dear TSD User

The nfs export of /cluster to project VMs is currently down, we are diagnosing the issue and working to fix it.

Published Aug. 14, 2019 11:24 AM

Dear TSD users,

The /cluster file system was down between 10:15 and 11:10 due to a crash of one of the file system daemons. The file system is now up again, but many jobs on colossus have likely crashed in the mean time, so please check your jobs. The VMs mounting /cluster will also have experienced problems.

Things should be back to normal again now, but please don't hesitate to contact us if you're still experiencing problems.

Our apologies for the inconvenience.

-- 
The TSD team.

Published Aug. 12, 2019 8:25 AM

We are experiencing issues with some services, which may lead to  users being unable to login to TSD through VMWare Horizon Client. We are investigating the cause of this and working on fix.

Update:

- https://view.tsd.usit.no/ is up again.

Published Aug. 6, 2019 12:09 PM

Dear TSD User

We discovered that due to infrastructure issues, the selfservice portal's QR code generation did not work as intended from Monday up until today at 12:00. If you tried to reset your QR code during this period, we kindly ask you to do so again.

Published July 31, 2019 10:09 AM

We are experiencing issues with some services, which may lead to some users being unable to login to TSD through ThinLinc. We are investigating the cause of this and working on fix.

Published July 4, 2019 9:03 AM

TSDs self service portal will be unavailable for a short period at 9.15  2019-07-04. We will update this notice with more information and more precise time frames shortly.

Our apologies for any inconvenience this might cause.
 

Published June 27, 2019 3:16 PM

TSDs self service portal will be unavailable for a short period at 10:00, 2019-06-28. We will update this notice with more information and more precise time frames shortly.

Our apologies for any inconvenience this might cause.

-- 
Best regards,
TSD

Published June 25, 2019 9:17 AM

The self service portal will be unavailable for a short period, while the database group is performing an upgrade.

Published June 24, 2019 11:39 AM

Dear TSD User

As planned and announced, we have shut down sftp data transfers to and from TSD. For data import and export, please use https://data.tsd.usit.no - the new data transfer service works from all major browsers as long as javascript and cookies are enabled. If you prefer to use the command-line, or need further assistance please contact our user support.

Published June 20, 2019 9:48 AM

There will be a scheduled minor upgrade of PostgreSQL on 25th of June from 08:00 - 09:30.

During this downtime, the applications running PostgreSQL will not work, as we will restart the database in your project. Other services inside TSD will continue working as normal.

 

Published June 18, 2019 1:28 PM

Dear TSD users,

selfservice.tsd.usit.no is currently unavailable. We are working on getting it back up again as quickly as possible.

--
Best regards,
TSD

Published June 17, 2019 8:38 AM

We are experiencing issues with some services, which may lead to some users being unable to login to TSD. We are investigating the cause of this and working on fix.

Published June 7, 2019 10:49 AM

The DRAGEN node is now accessible on colossus and can take slurm workloads. Please read the updated docs:

/english/services/it/research/sensitive-data/use-tsd/hpc/dragen.html

Abdulrahman @ TSD

Published June 6, 2019 3:57 PM

We are doing maintenance on TSD login from 16:00 until 18:00 today. During the maintenance new login sessions will not be possible, but active sessions will continue working.

Published June 5, 2019 8:50 PM

The Colossus file system is having issues at the moment, making the cluster unusable. We are working on fixing it.

Update: The file system is up again. The problems started around 16:15 today, and lasted until 21:00. During that time, it is likely that jobs on Colossus have crashed, so check your results. It is also likely that the problems have caused nfs hangs on the Linux VMs that mount /cluster.

Published June 5, 2019 1:19 PM

Dear TSD User

We are having some issues with web-based file export and are working to fix the problem. Import is working as expected.

Published June 5, 2019 9:20 AM

We are experiencing issues with parts of Colossus, which may affect submit hosts. The underlying problem has been fixed, but the symptoms of unresponsive hosts may persist and require a reboot of the VM. Please send a support case if you experience this problem and we will work to resolve it asap.

Published June 4, 2019 9:09 AM

We are experiencing issues with some services, which affect users that are using /cluster on their VMs in TSD. We are investigating the cause of this and working on fix.

Published May 22, 2019 11:52 AM

Dear TSD users,

The proxy for the API-services has stopped working. We're looking into it.

--
The TSD team

Published May 20, 2019 3:56 PM

Dear TSD users!
Colossus is currently running a little slower than usual. We're not quite sure why yet, but looking into the issue.

The reason - most likely - is an overload on the file systems due to two things

  • We added a new storage server (96TiBs) and the system is rebalancing
  • The rebalancing causes the backup system to detect a LOT of changes and thus it puts a lot of load on the storage system to backup everything.

There is no way around this except downtime and manual balancing which is unwanted by all.

Thanks for you continued patience.

-- 
Best regards,
The TSD team

Published May 20, 2019 8:47 AM

We are experiencing issues with some services, which may lead to some users being unable to login to TSD, and get a 504 Gateway Timeout Error. We are investigating the cause of this and working on fix.