Norwegian version of this page

TSD Operational Log - Page 15

Published Feb. 10, 2017 3:28 PM

We have had a failure of the primary jumphost  but the failover machanism took over and moved the system to the secondary one. The infrastructure shall be back to normal very soon. We are investigating what caused the failure.

We apologize for the inconvenience.

Published Jan. 30, 2017 9:19 AM

We are having some network problem at the moment and the services in TSD is not reachable. We are working to investigate the causes in order to solve the problem as soon as possible.

We apologize for the inconvenience.

Regards

Nihal@TSD

Published Jan. 26, 2017 9:12 AM

Dear TSD-users!

There is an issue accessing the services on colossus at the moment. 

This may result in:

  • /cluster/projects/pXX being inaccessible
  • Software modules being inaccessible
  • Issues submitting to Slurm

We are working on this issue to get it resolved ASAP.

We apologize for the inconvenience.

Regards,

Nihal @TSD

Published Jan. 25, 2017 9:04 AM

Dear TSD-users!

There is an issue accessing the colossus storage at the moment. This may result in /cluster/projects/pXX being inaccessible.

We are working on this issue to get it resolved ASAP.

We apologize for the inconvenience.

Regards,

Nihal @TSD

Published Jan. 23, 2017 1:55 PM

Dear TSD-users!

Due to a disk failure on tsd-fx01.tsd.usit.no which happened 2017-01-20, 22:59, the file import in TSD has been unavailable until 2017-01-23, 10:56.

The issue has been resolved, and the system is back in production.

We apologize for the inconvenience.

Regards,
Benjamin

Published Dec. 23, 2016 12:06 PM

There is an issue accessing the colossus storage at the moment. This may result in:

  • /cluster/projects/pXX being inaccessible
  • Software modules being inaccessible
  • Issues submitting to Slurm

We are working on this issue to get it resolved ASAP.

We apologize for the inconvenience.

Regards,

Abdulrahman

 

Published Dec. 20, 2016 10:04 AM

Due the network problem we had yesterday morning, the Colossus disk was not properly mounted and exported to the project machines. This results in problem accessing the /cluster/projects partition, mounting modulefiles and running slurm. Some of the jobs that were running when the network issue occurred might have been affected by problem.

We are on the issue now, hoping to solve it as soon as possible during the day.

We apologize for the inconvenience.

Francesca

Published Dec. 19, 2016 8:38 AM

Update: Problem is fixed. You can now login into TSD.

 

---------------------------------------------------------------------------------------------------

We are having some network problem at the moment and the TSD is not reachable. We are working to investigate the causes in order to solve the problem as soon as possible.

We apologize for the inconvenience.

Regards,

Erik

Published Dec. 5, 2016 1:12 PM

Due to a failure in the heating system, the Colossus front-end and one of the rack went down this morning (05/12-2016) and the cluster is not available at the moment. We are working to reboot the system at the moment.

Jobs that were running on the rack that went down, unavoidably died while those running on the other rack are most likely running even though the front end is not available.

We apologize for the inconvenience.

Published Dec. 1, 2016 9:43 AM

Dear TSD users,

There has been a network problem, which has been safely bypassed by our failover mechanism. However some of the linux VMs are still mounting the filesystem and therefore  are not accessible at the moment. The process will take 2 hours. If after 11:00 today you still experience problem with your linux VM, please let us know (tsd-drift@usit.uio.no).

We are investigating the cause of the network problem.

We apologize for the inconvenience.

Regards,

Francesca

 

Published Nov. 14, 2016 3:01 PM

Dear TSD-users,

The service is back to normal now.

We apologize for the inconvenience,

Regards

Nihal @TSD

Published Nov. 14, 2016 12:26 PM

Dear TSD-users,

we are experiencing problem on the Filelock filesystem. The service is in function but extremely slow. We are investigating the causes, hoping to solve the problem as soon as possible.

We apologize for the inconvenience,

Regards,

Francesca@TSD

Published Nov. 4, 2016 1:20 PM

Dear TSD-users,

we need to reboot the machine that allows to two factor authentication. The reboot will happen at 13:30 today and  it will take around 5 minutes. During the reboot it will not be possible to perform any login to TSD. Sessions already opened will remain opened.

 

We apologize for the inconvenience,

Regards,

Francesca@TSD

 

 

Published Nov. 4, 2016 12:53 PM
Dear TSD-user

Users will not be able to change your password via "https://brukerinfo.tsd.usit.no  inside TSD at the moment.

We apologize for the inconvenience,

Regards,

Nihal D. Perera

Published Nov. 4, 2016 10:06 AM

Dear TSD-linux users,

it has been found a security VULNERABILITY on the linux RHEL6 kernel and the linux machines (physical and virtual) in TSD have been rebooted during the night. This was absolutely needed. We are now working to reset those virtual linux client that hasn’t rebooted properly. It will take up to one hour.

Please follow the operations on our operational log.

We apologise for the inconvenience.
Regards,
Francesca@TSD

Published Nov. 1, 2016 4:26 PM

Dear TSD-user

We encountered a bug on our file server a couple of weeks ago, and to resolve this issue, quota statistics on the HNAS disk usage had to be reset. It will take up to two weeks until you will be able to see the correct quota statistics again. In the meantime you have the disk available but there is no management engine to enforce quotas on it. To avoid severe incident, we kindly request you, for the next three weeks, to inform us in advance if you need extra disk space (more then 1TiB) and we will try to adjust your request. Please do not import or produce amount of date for more then 1TiB without informing us.

Please notice that this message regards only the usage of HNAS disk, and not the usage of Colossus disk.

We apologize for any inconvenience this may cause you.

Regards,

Francesca@TSD

Published Oct. 25, 2016 7:33 PM

Dear TSD-users,

the maintenance is finished according to plan today at 25/10 at 16:00. The service is back in production.

Regards,

Francesca

 

Published Oct. 14, 2016 3:24 PM

Dear TSD-users,

we have observed that the old versions of the VMWare Horizont View Client are no longer compatible with the TSD infrastructure and the use of them might result into login problems (system hangs upon login). Therefore we strongly suggest you to upgrade the Client on your local machine to the latest version. The upgrade take few minutes but you need administrator right to do it. If you do not have it, please contact your local IT.

Please read the instructions on how-to upgrade here:
http://www.uio.no/english/services/it/research/storage/sensitive-data/use-tsd/login/pcoip/Install_VMwareClient_on_Win_Mac_Linux.html

Regards,
Francesca@TSD

 

Published Oct. 14, 2016 3:22 PM

Dear TSD-users,

there will be a downtime of the TSD infrastructure on Tue. 25/10 between 13:00 and 16:00. During the downtime, we will upgrade the HNAS storage disk and the Colossus cluster and we will set up a new gateway for failover mechanism. The login to any service in TSD will be not possible at any time during the maintenance stop and the Colossus will be drained. In order not to kill any running job, we have put a reservation on the Colossus cluster so that all the jobs that are supposed to run beyond the maintenance will not start immediately, but will instead queue until the maintenance is finished.

Regards,
Francesca@TSD

 

Published Oct. 5, 2016 3:39 PM

Dear TSD users,

on Friday 07/10 between 13:00 and 14:00, there will be a second (and last) update of the VMWare View infrastructure. During the maintenance, it will not be possible to login to the Windows VMs via Horizont View Client (PCoIP login). The login will be instead possible by using the old ssh+RDP mechanism, if the user properly "Log off" from the last opened PCoIP section.
Please notice that the machines will keep running during the maintenance and no processes will be killed.
The user of the linux VM are not affected by the maintenance.

Follow our operations on our operational log:
http://www.uio.no/tjenester/it/forskning/sensitiv/log/

This is really the last step before enabling the web-based login protocol to the Windows VMs. You will get more info about it very soon. Sorry for the short notice of this maintenance but the need for it came late and for your convenience we prefer to do operations during the autumn-holidays week....

Published Oct. 4, 2016 1:21 PM

Dear TSD-users

 Issue related to the mounting  filesystem is now resolved.

Sorry for the inconvenience.

 

Nihal @TSD

Published Oct. 4, 2016 9:30 AM

 We are experiencing some issues related to the file-system. This will course problems to our Linux VM users. Listing files (ls) and "module load" may not work due to this. We are are working on a fix and please see here for updates.

Sorry for the inconvenience.

 

Nihal @TSD

Published Sep. 30, 2016 6:18 AM

Dear TSD user,

we need to perform some maintenance on the Colossus cluster and therefore we need to reserve the cluster to avoid that jobs will be running at the time of the maintenance. Since the time required for the operation is short we have decided, for the first time and exceptionally, to make it during weekend. This means that we have reserve the cluster from today (Friday 30/09) at 8:00. Jobs that are already running will keep on running. New jobs will not start but will stay on the queue until the reservation is removed. We expected to finish by Sunday at latest.

Please follow the status of the maintenance on our operational log!

We are working to make Colossus bigger, safer, stronger...

Regards,

Francesca

 

Published Sep. 26, 2016 3:25 PM

Dear users

Loging to windows via VMware Horizon klient is available now.

 

Best reagrds

Nihal D.Perera

Published Sep. 21, 2016 12:17 PM

On Monday 26/09 between 13:00 and 15:00, the VMWare View infrastructure will be updated. During the maintenance, it will not be possible to login to the Windows VMs via Horizont View Client (PCoIP login). The login will be instead possible by using the old ssh+RDP mechanism, if the user properly "Log off" from the last opened PCoIP section.

Please notice that the machines will keep running during the maintenance and no processes will be killed.
The user of the linux VM are not affected by the maintenance.

Follow our operations on our operational log:
http://www.uio.no/tjenester/it/forskning/sensitiv/log/

This is the last step before enabling the web-based login protocol to the Windows VMs. You will get more info about it very soon.

Regards,
Francesca