Reports 1-1 of 1 Clear search Modify search
VAC (General)
takahiro.yamamoto - 3:44 Friday 14 February 2025 (32674) Print this report
Improvement for error handling of vacuum DAQ
As I reported in klog#32669, vacuum DAQ sometimes shows too large value.
I implemented more strict check of return values from CC-10 devices to avoid this issue.
This update was deployed for all RaspPi and this error is now properly handled.
The error rate itself has not been improved, and it appears that the timing interval at which vacuum.sh (which is used for making plot)
is executed needs to be managed precisely in order to improve the error rate.

-----
By checking the log of vacuum DAQ, I found that too large value can be distinguished from the proper value. In the normal case, CC-10 returns ['\x02', '0', 'S', a, b, c, d, '\r'] or ['\x02', '0', 'S', '1', '\r', '\x02', '0', 'S', a, b, c, d, '\r']. A difference of these two format seems to come from the difference of firmware version. Pressure value can be calculated as (a + 0.1b) x 10^{(2c-1)d}Pa. The original code was getting 2nd-5th characters from the back of the return value to accommodate the difference between the two formats.

In the error case, CC-10 returns different (but very similar) format such as ['\x02', 'S', a, b, c, d, '\r']. According to the CC-10 manual, such kind of return value is not assigned. So some kind of fatal error seems to occur. Anyway I found that by checking the 7th character from the back of the return value, it is possible to know if the return value is correct format or not. In new DAQ code, this strict check of the format of return value is implemented. Now vacuum DAQ can handle when such kind of wrong format appears.

Though error can be handled more strictly, error rate has not been improved. Such kind of errors seems to occur the conflict of the serial communication by two processes. One is this DAQ process. And the other is vacuum.sh which is used to make web plot. DAQ code is now executed with the cadence of 60s precisely (actually executed at 15s in every minutes). On the other hand, vacuum.sh is implemented as an infinite loop and sleep command. By my measurement, vacuum.sh is executed with the cadence of 60.1 ~ 60.8s. So execution time (interval) is not anchored and we cannot avoid accessing serial port by two processes at the same time by long term operation. In order to reduce the error rate itself, we need to manage the execution interval of vacuum.sh more precisely.
Comments to this report:
takahiro.yamamoto - 22:08 Friday 14 February 2025 (32689) Print this report

Abstract

As the series work about the improvement of vacuum DAQ, I set the timing synchronization on RaspPi.
(Timestamp in log files was ~a few years different from the correct time, so it's difficult to check the log investigation.)
A direct cause of wrong system time was due to the settings of DNS not NTP.
After reseting DNS, RaspPi system is synchronized by systemd-timesyncd.

Details

At first, I tried to install the some package for synchronizing the server time. But RaspPi couldn't access the standard repositories. After some checks, I found that this was caused by wrong DNS settings. In 2020, large update of ICRR network was done and the server addresses of DNS in ICRR LAN were also changed at that time. But list of nameserver on RaspPi was still old address. Because nameservers are managed by resolvconf, I modified /etc/resolvconf.conf as name_servers=x.x.x.x and restarted NIC by ifdown eth0 && ifup eth0. After then, list of nameservers in /etc/resolv.conf was updated as correct servers and I got an access to standard repositories.

As the next, I planned to install ntpd or chronyd. But I found timestamp of logs became already correct and systemd-timesyncd was running. Probably, RaspPi had been synchronized properly around 2018(?) when this system was installed for the first time. They had just been left unsynchronized since the network change in 2020. Anyway, RaspPi system is now synchronized and a log-file analysis will be getting easier.
takahiro.yamamoto - 1:19 Tuesday 18 February 2025 (32715) Print this report

Abstract

As the series work about the improvement of vacuum DAQ, I improved behavior of a process for making web plot and succeeded to remove serial communication errors on DAQ process.
Fig.1 shows the error rate before (left hand side of crosshair) and after (right hand side) this improvement.
Vertical axis shows the error code, so 0 means no error and non-zero value means presence of some kind of error.
Though there were several error in hour before this work, there is no error in recent 4 hours after the improvement.
 

Details

As I reported in klog#32674, DAQ process and making web plot conflicted (vacuum.sh) with timing accessing serial port because of unmanaged execution timing of vacuum.sh. So I modified the script to making the web plot. In the old implementation, accessing the serial port was executed by an infinite while loop and the sleep function and this implementation induces ambiguity of execution timing. For improving this situation, I anchored the execution timing of this script to the system time by using crontab.

In the past, RaspPi sometimes became too slow its response due to overload, so I also changed the method of access to the server application for CC-10 with the same manner as the DAQ process, instead of using ssh access. (Because RaspPi cannot uses AES-NI, frequent SSH access is so inefficient.) This change reduced the execution time by about ~1/4 (~16s -> ~4s), which should be sufficient for periodical execution even with the cadence of less than a minute.

Finally, I noticed that the server on which vacuum.sh is executed was not synchronized with the reference time, so I started ntpd and configured it to start automatically when the OS starts up. Because this server had also too old DNS information same as klog#32689, ntpd wasn't able to access the time server. So I set latest DNS information by NetworkManager.

Images attached to this comment
Search Help
×

Warning

×