Norwegian version of this page

TSD Operational Log - Page 13

Published July 6, 2018 1:00 PM

TSD is inaccessible for the moment. The engineering team is actively working to correct the issue.

TSD@USIT

Published July 4, 2018 12:37 PM

Due to the planned power outage today, we had problems with the NFS-export, and therefore some jobs on Colossus were affected by this. The problem is now fixed, and we are sorry for the inconvenience.

 

TSD@USIT

Published June 25, 2018 10:17 AM

Dear TSD users!

25.06.2018 @ 13:00 - 14:00 we will be performing a short scheduled maintenance of the central filesystem in TSD. Some of you might experience short hiccups in accessing your files during this time frame.

-- 
Best regards,
TSD team

Published June 15, 2018 3:10 PM

Dear TSD users,

We regret to inform you, that due to sudden increase in concurrent Linux logins over the last two weeks, we've reached our limit for concurrent logins.

This means that some of you might not be able to log in to your Linux-VMs using the conventional method of https://login.tl.tsd.usit.no.

We have ordered more licenses and are eagerly waiting for them to arrive.

In the meantime:

You can log in to your Linux-VMs by first logging in to your Windows-VM using the following address: https://view.tsd.usit.no and then opening the application Putty to SSH to your vm. The hostname for your machine will have the following format, here using project p11 as an example: p11-tl01-l.tsd.usit.no


-- 
Best regards,
TSD

Published May 14, 2018 11:56 AM

Dear TSD users,

We are again experiencing some troubles with the FileLock
We are working hard to find a solution.
Our sincere apologies for the inconvenience.

-- 
Best regards,
TSD-team

Published May 14, 2018 11:28 AM

Some projects could not login to their windows machine through view. The incident took place during the time slot:

Start:          2018-05-13 13:00
End:            2018-05-14 10:00

We apologize for the inconvenience

TSD@USIT

 

Published May 11, 2018 11:28 AM

Dear TSD users,

We are currently experiencing some troubles with the FileLock
We are working hard to find a solution.
Our sincere apologies for the inconvenience.

-- 
Best regards,
TSD-team

Published May 8, 2018 3:18 PM

Dear TSD users,

We regret to inform you that we are currently experiencing some troubles with the nettskjema-integration for TSD.

Due to this, not all nettskjema answers will be copied over to your import folder at the moment.

They are still stored securely and encrypted on one end of the chain, and we are working as quickly as we can to figure out what has gone wrong.

Published Apr. 16, 2018 4:03 PM

Lately, the usage of Colossus has increased, and now usually every node is running jobs.  This triggered a problem in the queue system, which basically prevented projects not already having a running job to get their jobs started.

We have now developed a local fix to the queue system so that this no longer happens.  The patch was applied today, and seems to work well. This means that now all projects should be able to get jobs started.

The most user visible change is that if you use the command "pending" to view your project's pending jobs, the top jobs should now get an estimated start time (most of the time, at least).

Published Mar. 23, 2018 1:09 PM

We have recently detected problems with file import/export, and are currently working on solving this.

Published Mar. 15, 2018 3:12 PM

Dear TSD user,

The upgrade process is taking a bit longer than expected, and we are now working on the security checks. Due to this, we are extending the maintenance window to 16. March.

We are sorry for the inconvenience.

Published Mar. 13, 2018 2:12 PM

Dear TSD user,

As announced earlier, we will start doing maintenance on the virtual desktop infrastructure in TSD on Wednesday the 14th and Thursday the 15th of March.

This will not be a complete downtime for TSD. We will be working to upgrade the Windows servers and the VMware Horizon login infrastructure. This will be done in an incremental fashion over these two days. Please do not schedule any long running jobs on your Windows servers during this time. The consequence of the maintenance is that it will be impossible to log in to your project' s Windows server. This will probably only last a few hours per any given project. Linux VMs will not be affected. Colossus jobs will not be affected either.

At the same time we plan to perform an upgrade to a newer version of PostgreSQL. Projects running this database should expect some downtime on Wednesday.

These changes will help us provide better service to yo...

Published Feb. 12, 2018 1:02 PM

Dear TSD users,

Unfortunately, we had a network outage last night, and this is causing some issues with login and mounting of file systems in TSD at the moment.

We are hard at work resolving the issue as quickly as possible.
Our apologies for the inconvenience.

--
Best regards,
TSD

Published Feb. 7, 2018 1:25 PM

The file system on Colossus crashed at around 13:10 today. We are currently working on solving the problem as quickly as possible.

UPDATE:

The file system went down due to a fuse going. We have investigated and moved cables around to make sure that we do not get a repeat of this incident. Our apologies for the inconvenience.

Published Feb. 2, 2018 12:14 PM

Currently, jobs on Colossus have problem starting. They seem to hang in "CONFIGURING" (CF) state, but eventually start after some minutes. We are investigating the problem, and will come back with more information when we know more.

We are sorry for the inconvenience.

Update 14:28: The problem is fixed, and the jobs are starting as normal. Thank you for your patience.

 

Published Jan. 29, 2018 9:17 AM

Dragen is down due to security patching

Published Jan. 5, 2018 9:40 AM

Dear TSD user

Due to a serious security vulnerability in modern processors, which in practice affects all operating systems, and therefore almost all IT services, we need to perform maintenance on the entire TSD infrastructure. This means that all TSD services will be affected. Please do not start any critical work during this period.

We will try our best to complete the work tomorrow between 09:00 and 17:00 but due to the scope of work and the short planning horizon it is possible that parts of TSD will remain under maintenance until Monday 8th of January 17:00.

If you want more information please consult UiO security [1]. This has also been widely covered in Norwegian press [2,3]. Those who want more detail about the vulnerability can refer to more technical explanations [4,5].

Regards

Leon du Toit

[1]...

Published Dec. 28, 2017 9:32 AM

Dear TSD user

We are performing security related maintenance on TSD infrastructure on Thursday, 28 December 2017. From 09:00 to 17:00 it will not be possible to log in to TSD, or to initiate new data transfers through the file-lock. Active colossus jobs will not be affected.

We will post information about progress on the 28th on the operational log.

Apologies for the inconvenience.

TSD@USIT

Published Dec. 27, 2017 10:18 AM

We are experiencing issue with https://view.tsd.usit.no and users trying to login to Windows VMs through this method will not be able to get in.

Sorry for the inconvenience and we will fix this as soon as possible.

 

 

 

Published Nov. 29, 2017 11:22 AM

Unfortunately we are seeing some issues with some of our Linux VMs.

We are hard at work looking for the root cause of the issue.

If you are experiencing trouble with your virtual machines, please contact us at tsd-drift@usit.uio.no and we will do what we can to get things back in order as quickly as possible.

Our sincere apologies for any inconvenience.
TSD@USIT

Published Nov. 24, 2017 4:25 PM

Due to a security issue in the last version of Thinlinc, we must upgrade the Thinlinc infrastructure tomorrow 28/11-2017 between  13:00-15:00 CET.

During the operation, the login to the Linux virtual machines will not be possible. The windows users will instead only experience three short outages, during which the machines will be hanging for a short while. All the process running on both linux and windows VMs might die out. Our best advice is then to log out from the machine as soon as the maintenance window will start.

Please follow the operation on our operational log.

We apologise for the inconvenience.

TSD@USIT

Published Nov. 20, 2017 1:01 PM

Jobs that will not end before 28th November will be placed on a pending state and will start automatically after 28 Nov.

e.g.  as of now (20th Nov) if you ask for a time limit of less than a week the job should start. Otherwise the job state will show as PD....

Published Nov. 19, 2017 6:00 PM

On Saturday morning (18/11-2017 around 11:00 am) there has been a crash of one of the virtualization cluster in TSD. The failover mechanism has automatically moved all to the machine to the other clusters, but in this process were the machines rebooted.

We will investigate together with the vendor the causes of this failure.

In the meantime we apologize for the inconvenience.

Francesca@TSD

Published Nov. 15, 2017 4:40 PM

One of the /cluster filesystem IO nodes went down at 15:00, leading to parts of the /cluster filesystem being unavailable.  It was restarted at 16:00, and we are currently checking the Linux VMs for nfs hangs.

EDIT: Our tests didn't indicate any nfs related hangs on the Linux VMs.

Published Oct. 30, 2017 12:46 PM

We will retry the upgrade of  the queue system on Colossus today, starting at around 12:45. This will lead to the queue system commands (squeue, sbatch, etc) being unavailable for a while. We estimate about 15 minutes. In the mean time, running jobs will continue as normal.

We do not expect any user visible changes after the upgrade.

Update: The upgrade has been done now, and seems to have gone well.