KAGRA Logbook

DetChar (General)

shoichi.oshino - 10:05 Wednesday 30 October 2024 (31462)

Tested new version of SummaryPage

[Yuzurihara, Oshino]

We tested the latest version of SummaryPage and confirmed that it worked with KAGRA environment.

Details:

1. Installed Debian12 operating system to the test server
2. Installed Mambaforge
3. Download the ligo-summary-pages repository from ligo GitLab
4. Installed ligo-summary-3.10 environment
5. Tested KAGRA configuration file with command line
6. Installed HTCondor version 23
7. Submitted KAGRA condor file

Comments to this report:

hirotaka.yuzurihara - 16:30 Wednesday 30 October 2024 (31465)

Here is a supplemental comment.
The condor job was submitted yesterday evening on the test server. To test the stability of the process, we left it all the night.
This morning, the condor job was in the hold state with the error message. Here is the error message in ~/public_html/summary/log/gw_daily_summary.log:

012 (093.000.000) 2024-10-30 07:03:52 Job was held.
Job has gone over cgroup memory limit of 0 megabytes. Peak usage: 0 megabytes. Consider resubmitting with a higher request_memory.
Code 34 Subcode 0

The memory of the test server was 8 GB. We need much more memory to run the process of the summary page stably. Note that the memory in the k1sum0 (the current summary page is running there) is 128 GB.

Alternative test server

We discussed the possibility of using a computer to test the summary page.

k1det0 (384 GB memory, 20 CPU): This computer is currently used to run Fscan and Hveto.
k1det1 (384 GB memory, 20 CPU): This computer is currently used to generate segment files and run Gauch
detchar cluster in the SK computer room. (128 GB memory): This computer is not ready to run the process. We need to set up the computer environment.

The k1det0 will probably be used to test the summary page by replacing the hard disk, after discussing yamaT-san.

Mystery of how to read frame data

The environmental variable to identify the NDS server on the workstation is set and named NDSSERVER or NDS2SERVER. We didn't set these variables when we tested the summary page on the test server. However, the process runs and succeeds in reading the frame data. It's better to make clear how to read the frame data in gwsumm.

hirotaka.yuzurihara - 13:28 Thursday 07 November 2024 (31525)

I understood how to read frame data in gwsumm. In the configuration file of k1global, there is a setting option to set the nds host and port number. We could read the frame data because the k1nds0 is set correctly in this ini file.
On the other hand, I tried to read the past frame data by setting k1nds2 and proper port number. When I read 32 channels at the same time, the process failed with the following error:

RuntimeError: Low level daq error occured [22]: Too many channels or too much data requested.

takahiro.yamamoto - 14:17 Thursday 07 November 2024 (31526)

In my remember, current SummaryPages (k1sum0) doesn't use k1nds[0-2] because too many accesses from k1sum0 to NDS interrupts site commissioning activities which depends on NDS.
It now accesses gwf data via GWDataFind server + local NDS2 on k1ldv0.
From the view point of conflicts with the commissioning activities, it's better to use same manner on the new enviroment.

hirotaka.yuzurihara - 17:56 Thursday 07 November 2024 (31530)

Thank you for the comment. I understand your thoughts.
When I appropriately set the environment variable (LIGO_DATAFIND_SERVER), we could read the data via the GWDataFind server! So, I think there is no CPU load on k1nds[01] as previously. In addition, we succeeded in making the summary page at the detchar cluster! (without HTcondor) Today, Oshino-san and I finished the HTcondor test at the detchar cluster and improved the configuration to use multiple CPUs at the same time.
Tomorrow, we will try running the gwsumm using the HT condor with multiple CPUs.