Reports 1-1 of 1 Clear search Modify search
DGS (General)
takaaki.yokozawa - 5:59 Friday 11 April 2025 (33348) Print this report
Trouble shooting mcf0 machine
In this morning, I noticed that there are several trouble happened in K1MCF0 machine.
The status of K1MCF0 were placed in Fig.1. and Fig.2.

Based on this manual
https://gwdoc.icrr.u-tokyo.ac.jp/cgi-bin/private/DocDB/ShowDocument?docid=8358
(If this manual is too old, please let me know)
I checked the IO chassis(Fig.3.), Burt check, model restart, but situation didn't change.
K1MCF0 would not affect to other real time PCs, since no dolphin network and so on, I decided to reboot the real time PC.
Then, situation became cleared.

If my treat was not collect, please let me know.
Images attached to this report
Comments to this report:
takaaki.yokozawa - 8:18 Saturday 12 April 2025 (33375) Print this report
Similar situation happened again (Fig.1. ).
So I performed the reboot of the k1mcf0
But the model cannot be recovered now (Fig.2.).

We can login the k1mcf0 PC via ssh
No response of the
controls@k1mcf0 ~ 1$ dmesg | grep "ADC cards"
So, the PC cannot recognize the IO chassis?
Images attached to this comment
takahiro.yamamoto - 16:48 Saturday 12 April 2025 (33383) Print this report
Real-time front-end cannot find any PCIe board on IO chassis. Front-end OS seems to launch properly.
So it seems to be a problem on the IO chassis side.
DC24V source for K1MCF should be alive because K1IMC is alive and the power source is shared on these two racks.
(breakers are independent, so it is necessary to check if only one of them is tripped.)

Error messages about model dead were already volatilized, so that it's difficult to predict a more detailed situation from remote. At first, we need to localize a broken point in the mine. If a base board of IO chassis was broken, replacing IO chassis can recover from this situation (I'm not sure where they are stored. SK building?). If HIB host-adapter cards are broken, we may need to use a new type of IO chassis.
takahiro.yamamoto - 12:56 Monday 14 April 2025 (33397) Print this report
I did an on-site inspection and I couldn't find any strangeness.
But MCF couldn't be recovered yet though I tried to reboot, cold-boot, etc.
I thought replacing HIB card is a thing we should do at first.
Remaining works can be done in parallel with IFO works because it's now related to the power sources and IMC SAFE is not required.

-----
Check log
- DC24V is served properly from the power source according to a check by multimeter
- DC3V, 5V, and 12V are properly provided on the mother board according to a check by multimeter.
- Timing Slave seems to be synchronized according to an indicator on the front panels of Timing Slave and Fanout.
- ADC and DAC seems to be launched according to an indicator on each board.
- HIB card also seems to have no problem according to an indicator on the board.

Concerns
All AI output cables were not screwed and a GND pin side of some of them were half unplugged. I'm not sure it's a cause of this problem. But such kind of usages can make troubles on electronics. Note that "Portable" is not a reason of allowing unfixed cables. Before I started my work, I screwed all cables on the AI output. Please ensure that cables are screwed on the front/rear panels.

The speaker seems to emit a popping sound when IO chassis is ON/OFF. (I haven't check a situation when AI chassis is ON/OFF. And also, it may emit when the real-time models are launched up properly though they cannot be launched now.) Anyway, I thought my heart stop beating. It might be better to stop the speaker amp by corresponding persons before starting maintenance works or to share for maintainers the procedure to stop/re-start the speaker amp.
takahiro.yamamoto - 17:31 Monday 14 April 2025 (33405) Print this report
I couldn't recover MCF today though I tried to replace HIB host and adapter cards (combination of 2 hosts and 3 adapter cards).
Note the operation test of replaced cards hasn't been done in the test stand. So I'm not sure that both original and replaced cards are broken or problem is in other place yet.

Continuing recovery work in the mine in such a situation is just a wasted time. So I will find a pair of HIB cards which is ensured their operation in Mozumi tomorrow. After then, I will re-try to recover MCF.
takahiro.yamamoto - 1:48 Wednesday 16 April 2025 (33429) Print this report

I checked all combinations of HIB host-adapter cards and found some pairs working well on the test environment.

Test was done with V2 computer which is running as PXE boot, IO chassis (S1706947) and 3m-long copper HIB cable. MCF issue is that not only RCG but also OS cannot find PCIe cards on IO chassis. So test was done as checking PCIe card (Contec BIO1616) can be found by the 'lspci' command.

Because I checked multiple host cards and adapter cards, compatibility between host and adapter cards seems to be reliable. On the other hand, I used only one IO chassis and short copper cable. If there is some compatibility issue with main board of IO chassis and cable length, results below might not be reproduced in the mine environment. But as my experiences, combinations which didn't work well with short copper cable is always didn't work with long optical cables. So results below should help to narrow available combinations. Compatibility table between the serial numbers of adapter cards and host cards is as follows.

Unfortunately, there appears to be no such law as a simple revision dependency. Also, the two adapter cards in the bottom row were not recognized by any of the host cards, so it seems likely that they are malfunctioning. Yesterday, I used host card of S/N=120018 and adapter cards of S/N=197215, 201053, and 197223 which were labeled "OK" (I'm not sure who and when checked it and hot to check it okey.). But any combination doesn't work also on the test stand. 

|---------------+--------+--------+------------+------------+------------+------------+-------|
| A\H           | 120018 | 120016 | QS13491142 | QS13491113 | QS13491022 | QS13491183 | HIB35 |
|---------------+--------+--------+------------+------------+------------+------------+-------|
| 120041 (none) | o      | o      | o          | o          | o          | o          | o     |
| 120319 (none) | o      | o      | o          | o          | o          | o          | o     |
| 120043 (none) | x      | o      | o          | o          | o          | o          | o     |
| 119895 (none) | x      | x      | o          | o          | o          | o          | o     |
|---------------+--------+--------+------------+------------+------------+------------+-------|
| 197215 (B1)   | x      | o      | o          | o          | o          | o          | o     |
| 197224 (B1)   | x      | o      | o          | o          | o          | o          | o     |
|---------------+--------+--------+------------+------------+------------+------------+-------|
| 201053 (B2)   | x      | o      | o          | o          | o          | o          | o     |
| 197223 (B2)   | x      | o      | o          | o          | o          | o          | o     |
|---------------+--------+--------+------------+------------+------------+------------+-------|
| 197219 (B1)   | x      | x      | x          | x          | x          | x          | x     |
| 201050 (B2)   | x      | x      | x          | x          | x          | x          | x     |
|---------------+--------+--------+------------+------------+------------+------------+-------|

satoru.ikeda - 14:29 Wednesday 16 April 2025 (33438) Print this report

YamaT-san (Remote), Washimi-san, Yokozawa-san, Nakagaki-san, Ikeda

We have performed recovery work on K1MCF0.

Summary

First, We tried turning on the RTPC, and it booted up normally.
The key difference from the previous K-Log #33397 is as follows:
 We turned off the amplifier (KM750) before powering on the RTPC.

Details

Status before starting work:

IO chassis power was ON.
RTPC was powered OFF, and both the HIB cable and USB keyboard were disconnected.
The PEM amplifier power was ON.

Work was carried out in the following steps:
1. Requested the PEM team to check the cable connections, and no issues were found.
2. The PEM team turned off the amplifier (KM750).
3. Reconnected the cables that had been removed during the previous test:
 Connected the RTPC's HIB cable and USB keyboard.
4. Turned on the RTPC power.
5. After startup, confirmed via dmesg that the RTPC recognized cards such as ADC/DAC.
6. Turned the amplifier power back on (which was turned off in step 2).
7. Tidied up the shaker cables.
8. Performed an injection test using the speaker.
9. Performed an injection test using the shaker.

No issues occurred in any of the above steps.

Next Plan

For now, we will leave the system as it is and monitor it.

If a similar issue occurs in the future and cannot be resolved using the standard recovery procedure, we will test whether powering off the amplifier resolves the issue.
 If the system recovers just by turning off the amplifier, there may be a grounding issue on the amplifier side.

If that doesn’t help, we will consider replacing the HIB card or taking other measures.

Non-image files attached to this comment
Search Help
×

Warning

×