Reports 1-1 of 1 Clear search Modify search
DGS (General)
takahiro.yamamoto - 6:13 Saturday 21 June 2025 (34320) Print this report
Timing synchronization was lost on ITMX and ITMY

At 00:48:27 JST on Jun. 21, timing synchronization was lost on the k1ix1 and k1iy1. 

Though IRIG-B channels must show 5 or 6 (us) in normal, they show crazy value as shown in Fig.1,2. Because of the absence of the timing synchronization, global control signals cannot be sent from LSC, ASC etc. to ITMX and ITMY. So it's now difficult to recover lock in automatically.

I haven't caught what happen yet. At least model restart seems to be required even in the best case. In worse case, computer reboot or replacing some hardware (probably timing slave, PCIe IRIG-B card or IO chassis) might be required. Anyway, I need to gather more information.

Please see continuous discussion in #emergency on Slack.

Images attached to this report
Comments to this report:
shinji.miyoki - 6:37 Saturday 21 June 2025 (34322) Print this report

[YamaT-san comments in the Slack]

LSC_LOCK guardian can reach ENGAGE_PRFPMI_ASC. This means PRFPMI LSC lock is succeeded. So it seems NOT so serious hardware troubles. The reason why ASC cannot be engaged is ASC signal cannot be sent from ASC models to ITM models without timing synchronization.

And also, there is no strange logs on the real-time front-end for ITMX and ITMY. So computer hardwares doesn’t also seem to have serious problems. The most suspicious case is just an electrical glitch on the timing signal of either TDS or IRIG-B that caused the timing to go out of synchronization.

What we can try at first is rebooting models. For doing that

  1. OBS_INTENT flag should be retracted.
  2. DOWN must be requested to LSC_LOCK guardian.
  3. SAFE must be requested to ITMX and ITMY models.

Restarting models can be done by executing /opt/rtcds/userapps/release/cds/common/scripts/restart_iop_model.sh
on k1ix1 and k1iy1. Because they are not connected to Dolphin network, there is nothing to fear.If they cannot be recovered by above procedure, power cycles of real-time front-end (ICV1 rack), IO chassis (IXV1 and IYV1 racks) or IRIG-B chassis (ICV1 rack) might be necessary. Power cycles for real-time front-end can be done remotely via BMC web interface. But we need enter the mine for doing power cycles for IO chassis or IRIG-B chassis. I honestly don’t know which power cycle should be done. At first, we can try that for the real-time front-end. An order of the others can be decided after the visual inspection.

I may not be available tomorrow morning. I’m available afternoon.
If some DGS expert will be available tomorrow morning, it may be better to do above thing by them.

shinji.miyoki - 7:02 Saturday 21 June 2025 (34323) Print this report

06:48 Sawada-kun off the observing bit. Miyoki set the LSC_LOCK guardian at PRFPMI_LOCKED_WITH_3F.

We will keep this until the next recovery action.

shinji.miyoki - 10:45 Saturday 21 June 2025 (34324) Print this report

[Oshino, Tanaka, Nakagaki, Sawada(on site),  Ushiba, Ikeda, Miyoki (Remote)]

8:54 request DOWN state to LSC_LOCK

8:55 request SAFE state to ITMX and ITMY

Fig.1 ~ 3 shows the SDF status. All safe.snap should be accepted.

After several treatments, Fig.4~5 were accepted.

 

We have tried 4 reboots from iop on k1ix1 and 1 on k1iy1 with no return. We did not try to reboot the machine at this point. Oshino-san and Nakagaki-san enter the mine and visually check the IRIG-B equipment.

Images attached to this comment
shinji.miyoki - 13:35 Saturday 21 June 2025 (34325) Print this report

[Oshino, Tanaka, Nakagaki, Sawada (on site), Ushiba, Miyoki, Ikeda, Miyakawa(remote)]

Oshino kun found that the power is down for the fanout and IRIG-B as photos. 2.5A current limiter is now activated, and the voltage has become ~2V in the power supply for the IRIG-B and fanout instrument. (Fig.1,2)

Firstly, Oshino-kun turned off the power supply and detached both power cables from the instrument's sides. Then, he turned on the power supply, and normal operation of the individual power supply was confirmed 

Then he checked whether the fanout and IRIG-B could be rebooted individually after connecting the power cables one by one. Consequently, both instruments could be rebooted. So we suspected that some shoring troubles in these power cables because the same cables having a history of shorting according to handling conditions.

These power cables are an old version that have a metal connector cover. The new power cables are designed to have plastic covers. So, Tanaka-kun found the new version of power cables in Shimode-san's room, and he confirmed their continuity and no cross-talk between the pins.

On the other hand, the ring-size of the O-connecter seems to be larger than the receiver on the power supply, according to the photo of the backside of the supply. We need to shrink the ring size like old ones to connect them. Finally, they could attach the cable connectors on the power supply. (Fig,3,4)

Finally, the fanout and IRIG-B can be rebooted. (Fig.5,6)

Images attached to this comment
takahiro.yamamoto - 23:36 Sunday 22 June 2025 (34341) Print this report
It seems to be occurred again around 20:45.
It might have to replace the power supply unit for DC12V.
shinji.miyoki - 14:00 Monday 23 June 2025 (34346) Print this report

[Miyakawa, Oshino, Miyoki, Hayakawa, Ohmae, Yamaguchi, Ushiba(ControlRoom)]

I confirmed that the power supply(PS) situation is the same as the last case, in other words, the current trip is at ~ 2.5A, and ~ 2V output (Fig.1).

I also noticed that the inside of the 19-inch rack that houses fanout, IRIG-B, HDDs, and PS for fanout/IRIG-B, and several PS for the DGS systems was so hot. The front panel and rear panel that had mesh holes were closed. I checked the air flow inside the 19-inch rack between the opening and closing of these panels. When they were open, the air flowed from the back of the  PS for the DGS systems and HDD units to the rear side. On the other hand, I felt a tiny air flow through the rear panel if the rear panel was closed. When I opened the rear panel again, I felt hotter air inside the 19-inch rack. I checked the temperature outside of the 19-inch rack, then it was ~29C. The PS for fanout/IRIG-B seemed to have no fan inside. In addition, one side wall of the PS for fanout/IRIG-B was almost covered with the side panel of the 19-inch rack shelf. This situation also contributes to enhancing the temperature inside the PS.

I urgently turned off the PS. However, Oshino-kun and Miyakawa-kun seemed to have a plan to measure the resistance between the 12V output and the ground to check whether the shorting had happened or not. Sorry for my quick action.

We discussed what was the source of this accident. However, we could not clearly identify it under this situation. Miwakawa-kun noticed that this PS had a trip function. If a trip happens, the red light should be illuminated in the front panel. However, it was not illuminated. So, some kind of trip could not happen. While the current limit happened in this PS. This is a kind of abnormal situation. 

Then we discussed what the possible trials were to identify the trouble source.

  1. Anyway, we try to turn on the SP again without any change, whether the shorting situation still continues or not.

Then we confirmed that the PS could work! and fanout/IRIG-B were rebooted without any troubles. In addition, we decided to make a space on the closed side of the PS as instructed in the manual (Fig.2). Then what is the next?

  1. Assuming the hotter environment resulted in this abnormal situation, we will keep the present instruments and connections after preparing some methods to cool them.
  2. Prepare one new (not new actually, but used in other places) PS, instead of the present PS, because the suspected instrument should be replaced in general.
  3. Prepare two PSs for fanout and IRIG-B individually. Then check their situation.

We selected 1). 

To cool inside the 19-inch rack, anyway, we removed the front and rear panels. In addition, I asked Hayakawa-kun's team to relocate the water cooling type cooler to the front of this 19-inch rack and put two circulators in front of this cooler to send the cool air to the PS and IRIG-B/fanout units (Fig.3,4).

The temperature at each point is shown in Fig.5(output of the cooler),6(near PS for fanout/IRIG-B),7(fanout/IRIG-B).

We will monitor what will happen or not.

Images attached to this comment
shinji.miyoki - 13:57 Wednesday 25 June 2025 (34360) Print this report

We put a web camera at the front room of the IXYV area to see this area and the front surface of the 19 inch rack housing IRIGB, etc.

This camera was connected to the PoE HUB of the DGS system at port 20, instructed by Oshino-kun.

Images attached to this comment
Search Help
×

Warning

×