TSD Operational Log - Page 7
We are working on solving an issue with Microsoft Office in TSD, giving this error:
?Microsoft office can't find your license for this application. A repair attempt was unsuccessful or was canceled. Microsoft office will now exit"
We will update the progress here, once the issue is resolved.
Update 14:30: Maintenance is complete, and submit hosts are now being rebooted.
Colossus will have downtime Thursday 21 January from 12:00-14:00 due to a third party issue.
Colossus and submit hosts will not be available during this time. Any pending jobs will automatically be rescheduled after the downtime.
This message will be updated once the maintenance is complete.
Unfortunately our login service is down and you will not be able to log in.
We are working on bringing everything back as quickly as we can, and will update this message as we move forward with solving the issue.
--
Best regards,
The TSD team
Update (21:45) - Maintenance is completed and submit hosts are being rebooted.
Update (16:00) - The hardware upgrade is taking longer than anticipated and it extended until further notice.
IBM is performing hardware replacement on the ESS storage on monday from 12:00-16:00.
Colossus and submit hosts will not be available during this time.
This message will be updated once the maintenance is complete.
We are experiencing issues with the Colossus storage system, and have reached out to the vendor for technical support, with the highest priority. Some projects' submit hosts may experience NFS hangs.
We're experiencing problems with the ESS storage, affecting /cluster NFS mounts and login to submit hosts and RHEL login nodes.
There might also be interruptions to HPC jobs.
We will be changing the certificates for view.tsd.usit.no and view-ous.tsd.usit.no next monday, and a short downtime is to be expected. No more than 30mins. This message will be updated once we are done.
We will be performing some maintenance on services related to user authentication on monday.
Logins might be unavailable for a few minutes while this is ongoing, so if you experience issues when trying to log in, please try again a few minutes later.
This message will be updated once the maintenance is complete.
We are testing a new login gateway today Wed 02/12 at 16:00, this might cause temporary interruptions to connections to TSD. The test will not last longer than a few seconds.
We're experiencing NFS hangs on many Linux hosts since last night 20:00. We're working on a solution, which will involve rebooting the hosts.
The maintenance will last not more than 1 hour
We are upgrading our network between 07:00 and 09:00 today, November 24, which will cause disruptions to TSD services. Please do no perform critical work that cannot be saved.
TSD
Vi opplever for tiden problemer med ? f? kontakt med view.tsd.usit.no. Dette er en lastbalanserer, som sender videre til selve tjenesten.
Inntil videre kan view1.tsd.usit.no benyttes direkte.
Update (10:21): Problemene er l?st.
We're performing emergency maintenance on the cluster storage (ESS). Start time: 13:30. Duration: several hours. Possible interruptions to /cluster/software and /cluster/projects/pXX may occur.
We're performing maintenance on the cluster storage (ESS). Start time: 10:00. Duration: several hours. Possible interruptions to /cluster/software and /cluster/projects/pXX may occur.
Update: The maintenance was partially successful. We're considering an additional maintenance window and will notify you well in advance via email.
We are investigating the cause of a degradation in performance of project storage.
The TSD gateway was unstable and causing connections to our login services as well as data uploads to fail. we have done a failover to our secondary gateway, and the issue should be resolved.
We're experiencing issues with TSD selfservice login using BankID.
Update:
The problem is resolved.
External Import Links are temporary not working
Due to serious security issues, we must expedite the Windows patching, and reboot all Windows VMs in TSD.
For the detailed information about the vulnerability, please check this site:
https://portal.msrc.microsoft.com/en-US/security-guidance/advisory/CVE-2020-16898
This vulnerability is outside the control of TSD, and we are patching urgently as this might risk our internal security measures.
Due to a technical issue, the Web services and applications at the http://data.tsd.usit.no website, will be temporarily unavailable starting 10pm local time today, while we rectify the problem. We expect to have the issue resolved within one hour.
User creation is temporary stopped due to technical problems.
We're fixing the cluster/software NFS share on submit hosts, app nodes and RHEL7 login nodes. You'll not be able to submit jobs to Colossus or access the /cluster/software and /cluster/project mounts.
We are working on solving an issue in the consent system that will require it to be off for few hours