KAGRA Logbook

MIF (General)

takafumi.ushiba - 21:19 Sunday 07 December 2025 (35780)

Sudden reduction of GRX and GRY power from the PSL room

Since LSC_LOCK guardian cannot lock GRX and GRY for more than 3 hours, I requested DOWN state to LSC_LOCK guardian.
I briefly checked the reasons and found that GRX and GRY power at fiber outputs coming from the PSL room suddenly dropped when the last lockloss from OBSERVATION state happened (fig1).
I could not confirm the reason of laser power reductions remotely, so it is necessary to check the status of GRX and GRY lasers tomorrow morning.

Images attached to this report

35780_1765109695_35779_1765109413_Screenshot from 2025-12-07 21-09-09.png

Comments to this report:

takahiro.yamamoto - 23:14 Sunday 07 December 2025 (35781)

k1alspll model was sudden dead around 17:40.
This is sufficient reason why GRX and Y cannot be locked, but I haven't checked a time order of these troubles and a direction of causality.

According to a check of web screen captures, it didn't seem to be well known issues such as OS hang-up, a timing glitch, and a Dolphin glitch.
Though I guess that it can be recovered by restarting models on k1als0, it may be better to gather logs and various information before starting recovery work because it may be a problem that we faced in the 1st time.

After then, what we can try is to restart models on k1als0. In this case, the Dolphin treatment isn't required. If it cannot solve this issue, rebooting computer after disabling the Dolphin connection of k1als0 is required. So all SAFE is required. If rebooting computer also doesn't work well, it might be a power down of the power supply unit for IO chassis or a malfunction of IO chassis itself (in the past, the power supply unit had been sudden dead and a capacitor on the baseboard of IO chassis was broken by a degradation over time). In this case, we should enter the mine and check each equipment. If it's related to the power supply around ALS racks, not only DGS-ish but also analogue electronics should be checked.

satoru.ikeda - 12:57 Monday 08 December 2025 (35786)

Work related to K-Log#35781.
> The k1alspll model suddenly stopped around 17:40.

[Investigation]
Fig1, 2, 3
k1iopals0: Errors occurred in ADC and DAC.
k1alspll: Errors occurred in FE and ADC.
DC failed transmission with 0xbad.
ADC timing errors were present in dmesg.
[18445221729.386442] k1alspll: ADC TIMEOUT 0 42534 38 42598
Fig4
No particular abnormalities around DAQ.
Fig5
Copy of SDF before k1alspll restart.
No particular errors related to Dolphin either.
No other notable logs were found.

[Recovery Procedure]
Based on the above, recovery was performed following the steps documented by YamaT-san in K-Log#35781.
1. k1alspll Model Restart
As shown in Fig. 6 and 7, k1alspll recovered, but a DAC error remained on k1iopals0.
2. Restart models including the iop model
The iop DAC error disappeared, and it recovered normally.

Images attached to this comment

Non-image files attached to this comment

takahiro.yamamoto - 15:19 Monday 08 December 2025 (35788)

Was saturation on ADC (the 3rd bit on ADC board stat) a common one such as GrPDs were frequently saturated during unlock state?
Or was some kind of special behavior like digital-ish able to be seen?

Just overflow on the 3rd bit on ADC board stat doesn't raise a ADC error bit on STATE_WORD.
Temporary hopping error on the 2nd bit on ADC board stat might be a cause of raising ADC error bit.
But if so, it's strange not to be shown strange values in timing related info. such as CPU meter.

takahiro.yamamoto - 19:29 Monday 08 December 2025 (35792)

And also, was IPC error bit on k1lsc and k1calcs only related to the Dolphin IPC from/to k1alspll?

takahiro.yamamoto - 20:51 Monday 08 December 2025 (35794)

I heard concrete situation from Ushiba-kun. In this time Guardian didn't detect lockloss of PLL and also Ezca connection errror didn't occurr because EPICS IOC kept alive. If EPICS switches were turned ON/OFF after the model control was completely lost, it wasn't work and an analog control loop might be kept closed without control. In this case, abnormal saturation was no longer so strange (so please ignore my previous questions). there might be no way to investigate such a situation except for taking all signals on different two front-ends by splitting in analog because SDF and DAQ-ed channels for PLL became already unreliable. Only thing we can do is to try restarting models and rebooting front-end without any thoughs.