Reports 1-1 of 1 Clear search Modify search
DetChar (General)
hirotaka.yuzurihara - 16:11 Friday 18 July 2025 (34595) Print this report
Duplication in the recent cache files at Kashiwa cluster

[Hido, Yuzurihara]

Hido-san reported that there was a duplication in the recent cache files at the kmst-2 (Kashiwa cluster). The time of the duplication are summarized in the attached txt file. I checked several thing, but I'm not sure the critical cause. It's better to perform the countermeasure.

Details

  • As I checked the history of condor submission, the job to produce the cache file finished without errors.
  • I found many job were running around the time of the duplication.
    • To separate the cause of the duplication, I tried to run many jobs to occupy the cpu resources and to run the job in the detchar account. The last submitted job run at the dedicated cpu. So, the many jobs are not direct cause.
    • Note that the job to produce the cache file is running at the dedicated cpu since 2025/05/07 (see klog33698).
  • I reproduced the cache files for 14359, 14360, 14366 directories. The cache files including the duplication are stored in /home/detchar/work/20250718_HTcondor 
  • We will need to reproduce the segment files for these time, after several checks.
  • As a countermeasure, it's better to update the script to remove the duplications in the cache file (such as using uniq). Before running the script regularly, I will test the script.
Non-image files attached to this report
Comments to this report:
takahiro.yamamoto - 17:28 Friday 18 July 2025 (34596) Print this report
How about a delay on transition from IDLE to RUN of condor queue list?

Though I don't remember an detailed implementation the executed script probably decides the time span when it should analyze at the beginning of the script. This is done after transiting from IDLE to RUN. So if waiting time in IDLE is longer than the job submission interval and multiple makeCache jobs spooled on the queue list transit to RUN at the same time, duplication can be occurred.
Search Help
×

Warning

×