DMG (Data system trouble)takahiro.yamamoto - 20:49 Saturday 15 October 2022 (22520)
Print this reportReplacement of data transfer servers was failed[Kanda, YamaT]
We tried to replace a data transfer server as a new system but it didn't work well due to the problem of a write speed of NFS.
----- k1fw1 saves data files on the disk of hyades-1 via NFS in current system. Because hyades-2 was prepared as a part of the system-B which was a data transfer and storing system at Kashiwa, we tried to change the writing destination of k1fw1 from hyades-1 to hyades-2.
But daqd hung up repeatedly when k1fw1 mounted the disk on hyades-2 because a write speed is not enough. k1fw0 can write down frame files to the disk on hyades-0 (current size is ~400MB per a 32-second long file) within 10 second and k1fw1 can do on hyades-1. On the other hand, k1fw1 spent ~120s to write down frame files to the disk on hyades-2.
We tried to some types of settings for NIC, NFS and firewall, but writing speed was not improved. NIC is surely linked up as 10Gbps and MTU is set properly as a jumbo frame. NFS settings is just a copy of hyades-1 (of course same as hyades-0). Because there is no change on the frame writer, this problem doesn't seem on daqd softwares. It may be a problem on the compatibility of the hardware between old and new systems.
I will investigate them and tried to replace again on the next maintenance day.
Comments to this report:
takahiro.yamamoto - 20:43 Saturday 22 October 2022 (22628)
Print this reportI tried some tests of a new disk server again, but writing speed didn't improved.
----- There was no improvement by replacing SFP and/or optical fiber cable. Writing speed on the new disk server still ~120s/file.
I also measured the writing time of cp command. Writing speed to NFS region is ~10s/file. In the case of current system, it takes ~3s. So copy to the new server is also slightly slower than one to the current server. But it's only a few time slower. I'm not sure why daqd process on the new disk is ~50 times slower than on the current system.
By the way, these tests takes 30min~1hrs. per 1 trial. Half day is not enough. I can do only a few trial in half day because there are some interruptions.
nobuyuki.kanda - 20:57 Saturday 22 October 2022 (22630)
Print this report
Thanks a lot for testing new server connection.
I'm not sure why daqd. Can we confirm a code of wrting frame ?, i.e. certeain is it 'framewrite' ? or some C functions?
I'll try to find some reason on the function/method of writing to nfs.