Abstract
After we installed two NICs on k1dc0 for the DAQ stream (see also klog#29110), IPC glitch rate decrease as once per 1-2 days.Because all glitches occurred on the front-end computers which were connected to the primary NIC and amount of data on the primary NIC was quite larger than one on the secondary NIC, I took a balance of the data rate of two NICs.
We probably need a couple of weeks in current glitch rate to conclude that the situation will be improved or not.
Details
We had installed secondary NIC on k1dc0 for the DAQ stream in order to disperse an amount of data in the work of klog#29110. At that time, we didn't modified the launch script of mx_stream and there was a bias in the amount of data in the two NICs. Fifteen of the 25 front-end computers were connected to the primary NIC with a data volume of 28.1 MB/s. The remaining 10 front-end computers were connected to the secondary NIC with a data volume of 13.6 MB/s.After this update, glitch rate was decreased from a few ~ a few tens per day to once per 1-2 days. So dual NIC configuration seemed to have some effect to reduce IPC glitches.
Remaining glitches occurred only on the front-end computers which were connected to the primary NIC. As mentioned above, data rate and a number of the front-end computers on the primary NIC were larger than ones on the secondary NIC. So I guessed that data rate and/or a number of front-end computers were related to the glitches and took a balance of them on two NICs.
Since an assignment of front-end computers to each NIC is done in /diskless/root/etc/init.d/mx_stream, the way to determine the card number and endpoint number in this script was changed (original code is commented out). Now, 13 and 12 front-end computers are connected to the primary and the secondary NIC, respectively. And also, I modified an order of front-ends in /diskless/root/etc/rtsystab in order to take a balance of total data rate of each NIC (the old file is kept as rtsystab.20240517). Finally, data rate is also balanced as 21.3MB/s on the primary and 20.4MB/s on the secondary. Attachments are the list of front-end name, data rate, serial number of front-end, endpoint number and card number of before and after this work.
Because I'm not sure that a cause of remaining glitches is really the data rate or not, I don't know the situation is improved or not. Considering the current rate of glitches, it would take a couple of weeks to make any kind of conclusion about the effect of this work.