[Oshino, Yuzu]
This is Thursday's report. This is a continuation work of klog.
It was confirmed that the latest gwsumm can be run in the local environment (without HTcondor) of the detchar cluster. So it is possible to update from gwsumm 1.0.2 (this old version is running on k1sum0) to 2.2.7.
We could not run the gwsumm job by using the HTcndor at the detchar cluster. We need a deeper understanding of the HTcondor.
Details
- We successfully got gwsumm working on the test server, so we moved the workspace to the detchar cluster.
- First, we installed the necessary packages and built the mamba environment. Detailed notes of the work are summarised in the wiki.
- As pointed out in klog, we now use the GWDataFind server to read the data instead of k1nds. This was done by just setting the environment variable LIGO_DATAFIND_SERVER in .bashrc.
- When we submit the job using the gwpy, we want to use multiple CPUs to make it faster by using parallel processing. Initially, the HTcondor setting in the cluster was not mature. So, we improved the configuration to manage the CPUs dynamically. Now, we can use the 16 CPUs in one job at maximum.
- It was confirmed that the latest gwsumm can be run in the local environment (without HTcondor) of the detchar cluster. So it is possible to update from gwsumm 1.0.2 (this old version is running on k1sum0) to 2.2.7.
- Our original plan is to concentrate the scattered computational jobs related to detchar (such as summary page, Segment production, Gauch, Omicron, Fscan, Hveto, ....) in a detchar cluster in the SK building. Accordingly, we can save the number of IP addresses allocated for the servers.
- We need to run a job that constantly generates summary pages using HTcondor on the detchar cluster.
- I have tried many things, but it works well on a server with a local compute node like k1sum0 (universe = local) but not on a computer with a separate login and compute nodes like the detchar cluster (universe = vanilla)...
- The error message is attached.
Remained
- To construct a computer environment with high workability, it's necessary to share the home directory with the login node and the computer node (k1detms0 and k1detcl[012]), similar to the system-B.
- We might need to consider the network configuration around the cluster because I'm not sure the current network configuration contributes to saving the IP address.
- I need more the deeper knowledge about the HTcondor. It might be a good idea to consult the LIGO computing team.
- I need to investigate the memory size used for the summary page production. If necessary, we will buy more memories.