Reports 1-1 of 1 Clear search Modify search
DetChar (General)
shoichi.oshino - 10:05 Wednesday 30 October 2024 (31462) Print this report
Tested new version of SummaryPage
[Yuzurihara, Oshino]

We tested the latest version of SummaryPage and confirmed that it worked with KAGRA environment.

Details:

1. Installed Debian12 operating system to the test server
2. Installed Mambaforge
3. Download the ligo-summary-pages repository from ligo GitLab
4. Installed ligo-summary-3.10 environment
5. Tested KAGRA configuration file with command line
6. Installed HTCondor version 23
7. Submitted KAGRA condor file
Comments to this report:
hirotaka.yuzurihara - 16:30 Wednesday 30 October 2024 (31465) Print this report

Here is a supplemental comment.
The condor job was submitted yesterday evening on the test server. To test the stability of the process, we left it all the night.
This morning, the condor job was in the hold state with the error message. Here is the error message in ~/public_html/summary/log/gw_daily_summary.log:

 

012 (093.000.000) 2024-10-30 07:03:52 Job was held.
        Job has gone over cgroup memory limit of 0 megabytes. Peak usage: 0 megabytes.  Consider resubmitting with a higher request_memory.
        Code 34 Subcode 0

 

The memory of the test server was 8 GB. We need much more memory to run the process of the summary page stably. Note that the memory in the k1sum0 (the current summary page is running there) is 128 GB. 

 

Alternative test server

We discussed the possibility of using a computer to test the summary page.

  1. k1det0 (384 GB memory, 20 CPU): This computer is currently used to run Fscan and Hveto.
  2. k1det1 (384 GB memory, 20 CPU): This computer is currently used to generate segment files and run Gauch
  3. detchar cluster in the SK computer room. (128 GB memory): This computer is not ready to run the process. We need to set up the computer environment.

The k1det0 will probably be used to test the summary page by replacing the hard disk, after discussing yamaT-san.

 

Mystery of how to read frame data

The environmental variable to identify the NDS server on the workstation is set and named NDSSERVER or NDS2SERVER. We didn't set these variables when we tested the summary page on the test server. However, the process runs and succeeds in reading the frame data. It's better to make clear how to read the frame data in gwsumm.

hirotaka.yuzurihara - 13:28 Thursday 07 November 2024 (31525) Print this report

I understood how to read frame data in gwsumm. In the configuration file of k1global, there is a setting option to set the nds host and port number. We could read the frame data because the k1nds0 is set correctly in this ini file.
On the other hand, I tried to read the past frame data by setting k1nds2 and proper port number. When I read 32 channels at the same time, the process failed with the following error:

RuntimeError: Low level daq error occured [22]: Too many channels or too much data requested.

takahiro.yamamoto - 14:17 Thursday 07 November 2024 (31526) Print this report
In my remember, current SummaryPages (k1sum0) doesn't use k1nds[0-2] because too many accesses from k1sum0 to NDS interrupts site commissioning activities which depends on NDS.
It now accesses gwf data via GWDataFind server + local NDS2 on k1ldv0.
From the view point of conflicts with the commissioning activities, it's better to use same manner on the new enviroment.
hirotaka.yuzurihara - 17:56 Thursday 07 November 2024 (31530) Print this report

Thank you for the comment. I understand your thoughts.
When I appropriately set the environment variable (LIGO_DATAFIND_SERVER), we could read the data via the GWDataFind server! So, I think there is no CPU load on k1nds[01] as previously. In addition, we succeeded in making the summary page at the detchar cluster! (without HTcondor) Today, Oshino-san and I finished the HTcondor test at the detchar cluster and improved the configuration to use multiple CPUs at the same time.
Tomorrow, we will try running the gwsumm using the HT condor with multiple CPUs.

hirotaka.yuzurihara - 17:26 Thursday 18 December 2025 (35900) Print this report

I restarted the test of the recent version of Summary page at detchar cluster (called k1detms0). The version of gwsumm is 2.2.7. (the current summary page version is 1.0.2). 
I confirmed that the process is running and making the summary page well. I will leave the condor process running for generation, to check its stability.

Solving font issue

When I tried to run the recent summary page on k1detms0, I faced the issue that the used font is different from the current summary page and the IGWN summaru page. By runing lsp-set-variables, the necessary enviromental variables (such as MATPLOTLIBRC and XDG_DATA_HOME) are set properly, when we enter the conda environment. After that, the apperance of the web page is same as the current summary page.
To check the apperance and contents (actually the contents should be same as the current summary page, because of same ini files), please check  ~controls/summary/day/20251218 of k1detms0.

For the next time when we set-up the environment for running summary page. the script is also helpful. This script can make the conda virtual enviroment and the set-up the environmental variables.

Holding the condor process

Yesterday, I saw the process entered the hold state two hour later I submitted the job. So, for the future work, I would like to check the stability. Actually, the k1detms0 has only 32 GB. It might be issue. (the another idea is to retire k1sum0 and replace with the new dedicated computer for summary page)

Search Help
×

Warning

×