DMG (Data transfer & archiving)takahiro.yamamoto - 16:35 Monday 04 August 2025 (34741)
Print this reportInstability on the data transferAs I reported in klog#34729, data transfer to Kashiwa is often failure since last night. Because re-send process at 3am everyday seems to be working well, missed data will be probably available around 3am tomorrow. Now GWF files in attached text files which are not available on Kashiwa (They were provided after 3am today).
LL hoft also has many gap. So many chopped files were provided by llhoftNoGap process as shown in the attached figure.
There is no report about SINET trouble for now. Investigation of the cause in KAGRA side (software, server hardware, network equipments, etc.) is also necessary.
And also especially for CAL/DET, some processes at Kashiwa with shorter cadence than 1-day (LL h(t), cache, segments, omicron triggers, etc.), must be rerun (some of them might be able to be skipped because now IFO is in OOOM).
nobuyuki.kanda - 16:58 Monday 04 August 2025 (34742)
Print this reportIt might be caused by a heavy load on kagra-dsr-b1 by OSDF data exchange. So, I'll immediately contact to Marco Meyer. Please wait.
takahiro.yamamoto - 16:01 Tuesday 05 August 2025 (34752)
Print this reportAfter stopping an OSDF process in yesterday night, the situation has now stabilized. There is no missed frames in recent half-day. And also, missed raw GWF files were also filled by the resend process around 3am. So we can start to reproduce cache and segment files.
Because IFO was in out-of-observing-mode, reproduction of LL h(t) and omicron triggers are not essential. Actually what we will do is only providing zero-filled data even if we will reproduce them. Necessity of them depends on how different in convenience for end-users, open-data production and so on. (In principle, missing files are not a problem, but some softwares don't work well with file gaps.)
takahiro.yamamoto - 12:14 Monday 06 October 2025 (35243)
Print this reportSince the planned power outage at Kashiwa ended (t ~= 0.6 days in the attached figure), there have been 203 times of data loss on the Low-latency stream. Instability of LL data transfer may have returned.
Images attached to this comment
takahiro.yamamoto - 22:19 Monday 06 October 2025 (35254)
Print this reportAccording to the investigation by Marco, xrootd which was a stopped process last time was accidentally launched due to the power cycle of Kashiwa cluster. He stopped it again and also removed it from the service list of systemd. After stoping xrootd, there is no missing frames in recent several hours. So it's never launched accidentally in future.
By the way, I found latest 10000s was not shown on attached figure in the previous post. So I attached same plot again including recent situation.
----- Postscript: I noticed full and science frames were also missed today. Missed files exist on Kamioka both on K1FW0 and K1FW1. It might be transferred today's midnight by the re-send process. But if they will not be re-sent, manual transfer will be required.
Missing files are as follows. [full] K-K1_C-1443735168-32.gwf K-K1_C-1443750112-32.gwf K-K1_C-1443750368-32.gwf
takahiro.yamamoto - 13:32 Tuesday 07 October 2025 (35262)
Print this reportA lack of full and science frames were filled by the automated re-send process last night. Remaining issue is a lack of LL h-of-t. We plan to fill it on AR frames instead of LL frames.