Reports 1-1 of 1 Clear search Modify search
DGS (General)
takahiro.yamamoto - 2:09 Saturday 18 May 2024 (29571) Print this report
balancing DAQ data rate of two NICs on k1dc0

Abstract

After we installed two NICs on k1dc0 for the DAQ stream (see also klog#29110), IPC glitch rate decrease as once per 1-2 days.
Because all glitches occurred on the front-end computers which were connected to the primary NIC and amount of data on the primary NIC was quite larger than one on the secondary NIC, I took a balance of the data rate of two NICs.
We probably need a couple of weeks in current glitch rate to conclude that the situation will be improved or not.

Details

We had installed secondary NIC on k1dc0 for the DAQ stream in order to disperse an amount of data in the work of klog#29110. At that time, we didn't modified the launch script of mx_stream and there was a bias in the amount of data in the two NICs. Fifteen of the 25 front-end computers were connected to the primary NIC with a data volume of 28.1 MB/s. The remaining 10 front-end computers were connected to the secondary NIC with a data volume of 13.6 MB/s.

After this update, glitch rate was decreased from a few ~ a few tens per day to once per 1-2 days. So dual NIC configuration seemed to have some effect to reduce IPC glitches.

Remaining glitches occurred only on the front-end computers which were connected to the primary NIC. As mentioned above, data rate and a number of the front-end computers on the primary NIC were larger than ones on the secondary NIC. So I guessed that data rate and/or a number of front-end computers were related to the glitches and took a balance of them on two NICs.

Since an assignment of front-end computers to each NIC is done in /diskless/root/etc/init.d/mx_stream, the way to determine the card number and endpoint number in this script was changed (original code is commented out). Now, 13 and 12 front-end computers are connected to the primary and the secondary NIC, respectively. And also, I modified an order of front-ends in /diskless/root/etc/rtsystab in order to take a balance of total data rate of each NIC (the old file is kept as rtsystab.20240517). Finally, data rate is also balanced as 21.3MB/s on the primary and 20.4MB/s on the secondary. Attachments are the list of front-end name, data rate, serial number of front-end, endpoint number and card number of before and after this work.

Because I'm not sure that a cause of remaining glitches is really the data rate or not, I don't know the situation is improved or not. Considering the current rate of glitches, it would take a couple of weeks to make any kind of conclusion about the effect of this work.
Non-image files attached to this report
Comments to this report:
takahiro.yamamoto - 12:08 Monday 10 June 2024 (29803) Print this report

Abstract

Because we had model updates on Jun. 7th (see also klog#29795), I made an interim report about IPC glitches from May 17th (klog#29571) to Jun. 7th.
In these three weeks, there was 3 IPC glitches.
Glitch situation seemed to be improved by balancing a data rate on two NICs from once per 1-2 days to once per week.
During this check, I found the data rate on K1ASC decreased by the update in klog#29795.
Some DQ channels were not recorded properly now because of a mistake in the model modification.

Details

In order to improve the IPC glitch situation, we had installed the secondary NIC (klog#29110) and taken a balance of data rate on two NICs (klog#29571). As reported in klog#29571, glitch rate decrease by installing the secondary NIC from a few per day to once per 1-2 days. Though we had still only 3 weeks data after balancing a data rate, I checked the recent situation because model update was done in last Friday.

In recent 3 weeks, we had 3 IPC glitches as follows.
[2024-05-27 07:36:56]: K1IMC0, K1OMC0, K1EY0
[2024-05-28 11:07:32]: K1IMC0, K1PR0, K1IX1, K1EY0
[2024-06-01 12:42:54]: K1IMC0, K1PR0, K1BS

I'm not sure we can conclude a latest glitch rate as once per week with only 3 weeks data. But it seemed to be improved from the previous situation (once per 1-2 days). I'll continue to check the situation after the model updates.

By the way, after the model update on last Friday, data rate on K1ASC decreased from 4312kB/s to 3693kB/s in spite of adding many DQ channels on k1asc model according to klog#29445. It seems to be caused by changing DQ channel block in the model file from plain text to html format probably by right-click menu as follows.
controls@k1ctr4:/opt/rtcds/userapps/release$ grep '>#DAQ Channels<' */k1/models/k1asc.mdl
"; text-indent:0px; line-height:100%;\"><span style=\" font-size:14px;\">#DAQ Channels</span></p>\n<p style=\"-q"
I haven't checked which channels have been missing in current DAQ list. Anyway, many channels (~700kB/s = ~20% of ASC related channels) were removed from DAQ list. For fixing this situation, all DQ channel blocks in k1asc model must be checked and broken blocks must be fixed. After then k1asc model must be rebuilt and restarted.
Search Help
×

Warning

×