AIDA
GELINA
BRIKEN
nToF
CRIB
ISOLDE
CIRCE
nTOFCapture
DESPEC
DTAS
EDI_PSA
179Ta
CARME
StellarModelling
DCF
K40
DESPEC
Draft saved at 00:00:00
Fields marked with
*
are required
Entry time:
Sat Mar 29 11:26:56 2025
Author
*
:
Subject
*
:
> 16th May 08:00 - 12:00 shift > Author: MS > > > 08:00 > > > > FEE64 module aida09 global clocks failed, 6 > Clock status test result: Passed 15, Failed 1 > > Understand status as follows > Status bit 3 : firmware PLL that creates clocks from external clock not locked > Status bit 2 : always logic '1' > Status bit 1 : LMK3200(2) PLL and clock distribution chip not locked to external clock > Status bit 0 : LMK3200(1) PLL and clock distribution chip not locked to external clock > If all these bits are not set then the operation of the firmware is unreliable > > > > > FEE64 module aida09 failed > Calibration test result: Passed 15, Failed 1 > > If any modules fail calibration , check the clock status and open the FADC Align and Control browser page to rerun calibration for that module > > > > > Base Current Difference > aida01 fault 0xf294 : 0xf296 : 2 > aida02 fault 0xd8ec : 0xd8ee : 2 > aida03 fault 0xf001 : 0xf003 : 2 > aida04 fault 0xd992 : 0xd994 : 2 > aida05 fault 0x714c : 0x7163 : 23 > aida06 fault 0x5a49 : 0x5a4a : 1 > aida07 fault 0x5aca : 0x5acb : 1 > aida08 fault 0xb92e : 0xb92f : 1 > White Rabbit error counter test result: Passed 8, Failed 8 > > Understand the status reports as follows:- > Status bit 3 : White Rabbit decoder detected an error in the received data > Status bit 2 : Firmware registered WR error, no reload of Timestamp > Status bit 0 : White Rabbit decoder reports uncertain of Timestamp information from WR > > > > > Base Current Difference > aida05 fault 0x0 : 0xa : 10 > aida12 fault 0x0 : 0x3 : 3 > aida13 fault 0x0 : 0x4d : 77 > FPGA Timestamp error counter test result: Passed 13, Failed 3 > If any of these counts are reported as in error > The ASIC readout system has detected a timeslip. > That is the timestamp read from the time FIFO is not younger than the last > > > > > Returned 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > Mem(KB) : 4 8 16 32 64 128 256 512 1k 2k 4k > aida01 : 27 7 2 3 2 3 2 4 2 3 7 : 40228 > aida02 : 3 8 3 2 2 2 2 4 2 3 7 : 39996 > aida03 : 23 9 6 1 0 3 2 4 2 3 7 : 40100 > aida04 : 34 27 17 7 2 4 3 3 2 2 7 : 38608 > aida05 : 19 9 5 2 2 2 3 3 2 3 7 : 39844 > aida06 : 5 5 2 1 0 4 2 4 2 3 7 : 40060 > aida07 : 28 2 10 3 2 4 1 4 2 3 7 : 40192 > aida08 : 19 5 5 1 2 5 1 3 2 3 7 : 39652 > aida09 : 20 8 3 3 1 3 2 3 2 3 7 : 39648 > aida10 : 27 10 0 2 2 2 2 3 1 4 7 : 40572 > aida11 : 3 3 2 1 2 4 2 3 2 3 7 : 39652 > aida12 : 16 7 11 2 3 4 2 2 2 3 7 : 39464 > aida13 : 14 10 4 3 1 5 2 2 2 3 7 : 39400 > aida14 : 21 9 10 1 1 2 2 3 1 4 7 : 40604 > aida15 : 19 8 2 3 0 4 3 2 2 3 7 : 39436 > aida16 : 24 5 8 3 3 4 2 2 2 3 7 : 39464 > > > > > > > > *** Timestamp elapsed time: 225.065 s > FEE elapsed dead time(s) elapsed idle time(s) > 0 0.038 0.000 > 1 9.479 0.000 > 2 0.195 0.000 > 3 5.921 0.000 > 4 0.000 11.742 > 5 0.036 0.000 > 6 0.013 0.000 > 7 0.498 0.000 > 8 0.436 0.000 > 9 0.000 107.300 > 10 2.787 0.000 > 11 0.905 0.000 > 12 0.831 0.000 > 13 0.000 55.939 > 14 0.080 0.000 > 15 0.267 0.000 > 16 0.000 0.000 > 17 0.000 0.000 > 18 0.000 0.000 > 19 0.000 0.000 > 20 0.000 0.000 > 21 0.000 0.000 > 22 0.000 0.000 > 23 0.000 0.000 > 24 0.000 0.000 > 25 0.000 0.000 > 26 0.000 0.000 > 27 0.000 0.000 > 28 0.000 0.000 > 29 0.000 0.000 > 30 0.000 0.000 > 31 0.000 0.000 > 32 0.000 0.000 > > > > > > > 10:00 > > > > FEE64 module aida06 global clocks failed, 6 > FEE64 module aida09 global clocks failed, 6 > Clock status test result: Passed 14, Failed 2 > > Understand status as follows > Status bit 3 : firmware PLL that creates clocks from external clock not locked > Status bit 2 : always logic '1' > Status bit 1 : LMK3200(2) PLL and clock distribution chip not locked to external clock > Status bit 0 : LMK3200(1) PLL and clock distribution chip not locked to external clock > If all these bits are not set then the operation of the firmware is unreliable > > > > > FEE64 module aida09 failed > Calibration test result: Passed 15, Failed 1 > > If any modules fail calibration , check the clock status and open the FADC Align and Control browser page to rerun calibration for that module > > > > Base Current Difference > aida01 fault 0xf294 : 0xf296 : 2 > aida02 fault 0xd8ec : 0xd8ee : 2 > aida03 fault 0xf001 : 0xf003 : 2 > aida04 fault 0xd992 : 0xd994 : 2 > aida05 fault 0x714c : 0x7166 : 26 > aida06 fault 0x5a49 : 0x5a4a : 1 > aida07 fault 0x5aca : 0x5acb : 1 > aida08 fault 0xb92e : 0xb92f : 1 > White Rabbit error counter test result: Passed 8, Failed 8 > > Understand the status reports as follows:- > Status bit 3 : White Rabbit decoder detected an error in the received data > Status bit 2 : Firmware registered WR error, no reload of Timestamp > Status bit 0 : White Rabbit decoder reports uncertain of Timestamp information from WR > > > > > > Base Current Difference > aida05 fault 0x0 : 0xa : 10 > aida12 fault 0x0 : 0x3 : 3 > aida13 fault 0x0 : 0x4d : 77 > FPGA Timestamp error counter test result: Passed 13, Failed 3 > If any of these counts are reported as in error > The ASIC readout system has detected a timeslip. > That is the timestamp read from the time FIFO is not younger than the last > > > > > Returned 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > Mem(KB) : 4 8 16 32 64 128 256 512 1k 2k 4k > aida01 : 19 7 6 2 2 3 2 3 2 3 7 : 39716 > aida02 : 8 4 3 2 2 3 2 3 2 3 7 : 39600 > aida03 : 23 11 5 1 0 4 2 3 1 4 7 : 40740 > aida04 : 42 25 16 7 2 4 4 3 2 2 7 : 38864 > aida05 : 24 5 5 1 2 2 3 3 1 4 7 : 40824 > aida06 : 11 4 4 1 1 4 2 4 2 3 7 : 40172 > aida07 : 21 8 5 1 2 3 2 4 2 3 7 : 40196 > aida08 : 15 5 6 1 1 4 2 4 2 3 7 : 40228 > aida09 : 15 10 4 1 2 3 2 3 2 3 7 : 39660 > aida10 : 23 8 2 2 2 2 2 3 1 4 7 : 40572 > aida11 : 6 4 4 2 2 4 2 3 2 3 7 : 39736 > aida12 : 18 8 8 1 4 4 3 2 2 3 7 : 39720 > aida13 : 23 6 2 3 2 5 2 2 2 3 7 : 39436 > aida14 : 29 3 11 1 1 2 2 3 1 4 7 : 40604 > aida15 : 29 7 2 2 0 4 3 3 2 3 7 : 39948 > aida16 : 2 3 6 2 3 4 2 2 2 3 7 : 39296 > > > > > > > *** Timestamp elapsed time: 225.065 s > FEE elapsed dead time(s) elapsed idle time(s) > 0 0.038 0.000 > 1 9.479 0.000 > 2 0.195 0.000 > 3 5.921 0.000 > 4 0.000 11.742 > 5 0.036 0.000 > 6 0.013 0.000 > 7 0.498 0.000 > 8 0.436 0.000 > 9 0.000 107.300 > 10 2.787 0.000 > 11 0.905 0.000 > 12 0.831 0.000 > 13 0.000 55.939 > 14 0.080 0.000 > 15 0.267 0.000 > 16 0.000 0.000 > 17 0.000 0.000 > 18 0.000 0.000 > 19 0.000 0.000 > 20 0.000 0.000 > 21 0.000 0.000 > 22 0.000 0.000 > 23 0.000 0.000 > 24 0.000 0.000 > 25 0.000 0.000 > 26 0.000 0.000 > 27 0.000 0.000 > 28 0.000 0.000 > 29 0.000 0.000 > 30 0.000 0.000 > 31 0.000 0.000 > 32 0.000 0.000 > > 12:00-16:00 > 16th May 12:00 - 16:00 shift > Author: JS > > 11:57 Taking over from Magda. Running full checks. > usbec ok. Max ~1700 Hz 1MHz on DSSD1, DSSD ~ 75% > Current ok 06.410 uA 006.835 uA > Stats good 1Statistics aidas-gsi(6).png > Temps ok 1Temperature and status scan aidas-gsi(6).png > Analysis ok R7_385. Dead time FEE1 a little hight 6% > > PAUSE: 166 RESUME: 166 > > *** Timestamp elapsed time: 196.305 s > FEE elapsed dead time(s) elapsed idle time(s) > 0 0.044 0.000 > 1 12.405 0.000 > 2 0.807 0.000 > 3 7.260 0.000 > 4 0.014 0.000 > 5 0.458 0.000 > 6 0.027 0.000 > 7 0.497 0.000 > 8 0.723 0.000 > 9 0.000 88.857 > 10 6.189 0.000 > 11 1.443 0.000 > 12 0.474 0.000 > 13 0.000 35.565 > 14 0.000 0.000 > 15 0.147 0.000 > 16 0.000 0.000 > 17 0.000 0.000 > 18 0.000 0.000 > 19 0.000 0.000 > 20 0.000 0.000 > 21 0.000 0.000 > 22 0.000 0.000 > 23 0.000 0.000 > 24 0.000 0.000 > 25 0.000 0.000 > 26 0.000 0.000 > 27 0.000 0.000 > 28 0.000 0.000 > 29 0.000 0.000 > 30 0.000 0.000 > 31 0.000 0.000 > 32 0.000 0.000 > > FEE64 module aida09 global clocks failed, 6 > Clock status test result: Passed 15, Failed 1 > > FEE64 module aida09 failed > Calibration test result: Passed 15, Failed 1 > > Base Current Difference > aida01 fault 0xf294 : 0xf296 : 2 > aida02 fault 0xd8ec : 0xd8ee : 2 > aida03 fault 0xf001 : 0xf003 : 2 > aida04 fault 0xd992 : 0xd994 : 2 > aida05 fault 0x714c : 0x716e : 34 > aida06 fault 0x5a49 : 0x5a4a : 1 > aida07 fault 0x5aca : 0x5acb : 1 > aida08 fault 0xb92e : 0xb92f : 1 > White Rabbit error counter test result: Passed 8, Failed 8 > > Base Current Difference > aida05 fault 0x0 : 0xa : 10 > aida12 fault 0x0 : 0x3 : 3 > aida13 fault 0x0 : 0x4d : 77 > FPGA Timestamp error counter test result: Passed 13, Failed 3 > > Returned 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > Mem(KB) : 4 8 16 32 64 128 256 512 1k 2k 4k > aida01 : 20 7 4 2 2 3 2 3 1 4 7 : 40712 > aida02 : 25 10 7 1 1 4 1 3 2 3 7 : 39556 > aida03 : 22 10 5 1 0 4 2 4 2 3 7 : 40216 > aida04 : 40 24 17 7 2 5 3 3 2 2 7 : 38736 > aida05 : 4 8 3 0 1 3 3 3 1 4 7 : 40768 > aida06 : 21 6 6 2 2 3 2 4 2 3 7 : 40228 > aida07 : 25 4 6 3 2 3 2 4 2 3 7 : 40260 > aida08 : 19 9 4 1 1 4 2 4 2 3 7 : 40244 > aida09 : 16 9 4 2 2 3 2 3 2 3 7 : 39688 > aida10 : 19 10 5 1 2 2 2 4 1 4 7 : 41100 > aida11 : 21 10 2 1 2 3 3 3 2 3 7 : 39908 > aida12 : 25 7 7 2 2 3 2 3 2 3 7 : 39756 > aida13 : 13 7 3 4 2 5 2 3 2 3 7 : 39964 > aida14 : 21 7 9 2 1 2 2 4 1 4 7 : 41116 > aida15 : 23 6 2 3 0 4 3 3 2 3 7 : 39948 > aida16 : 9 11 10 2 3 4 2 2 2 3 7 : 39452 > > Tom says Aida09 clock fail is ok as its status bit is "6". > The large white difference for FEE5 is known and has been determined to be ok, a post run investigation will be undertaken. > > 12:35 - > usbec ok. > Current ok > Stats good > Temps ok > Analysis ok R7_395. Dead time FEE1 still high 6.6% > > 13:33 - > usbec ok. Max implants ~ 1.8kHz > Current ok 006.850 uA 007.250 uA > Stats - Aida11 runing low < 5k was ~20k overnight > Temps ok > Analysis R7_415. Dead time FEE1 & FEE10 high 10% > > 14:00 > usbec ok - ucesb1.png > Current ok - bias1.png > Stast - Aida11 low - > > 14:20 aida fee rebooted itself. A powercycle was performed. Upon reboot we are seeing extremely large amounts of noise in the FEEs. Looking at the waveforms we have very large 100kHz pick up in the FEEs. This has resulted in 50% deadtime in many FEEs including the p+n. > > 15:28 Because of extremely high rates across all FEEs have decided to do a powercycle. Before restarting the FEEs will give them a couple of minutes to cool. > > 18:00 Since the start of the shift we have been trying to recover the system froma large increase in noise following the crash at ~14:00. > During this time NH has entered the area and inspected the system and also grounded the AIDA snout. This provided us with some improvement on the noise. > The rates are still slightly above where we were before the crash but now appear stable. To counteract the dead time in the n+n strips we have raised the threshold to 0x64 for ASIC4 in all FEEs. > We are now running with around 10% deadtime on FEE4 and less elsewhere for n+n. For p+n most have zero dead time apart from FEE11 which is still noisy. > During the time we were trying to recover the system screenshots were taken of the waveforms. He it could be seen that the 100kHz noise was very apparent. Particularly in the n+n strips. > > 18:08 System wide checks all ok - bar some ADC but waveforms disabled > Statistics ok - 210516_1809_Stats > Temperatures ok - 210516_1809_Temp > Bias and leakage ok - 210516_1810_Bias > > 18:37 Performed an ASIC check and now the rates have dropped in all n+n strips. Currently very small amounts of dead time > 18:40 Realised this was because it raised the threshold of all strips to 0x64 on the n+n side. > > 19:25 Removed S452 from 1e2.... drive. Before removing checked with Nic backed up to Lustre. Also verified four ourselves. > Now have around 4.2TB left which will provide around 80 hours of writing > > 20:16 We noticed iptraf was using around 30% CPU usage. We investigated whether it had any effect on the dead time but from what we have seen it has not. > > 20:57 System wide checks. Clock ok > > Base Current Difference > aida05 fault 0x1552 : 0x1556 : 4 > White Rabbit error counter test result: Passed 15, Failed 1 > > Understand the status reports as follows:- > Status bit 3 : White Rabbit decoder detected an error in the received data > Status bit 2 : Firmware registered WR error, no reload of Timestamp > Status bit 0 : White Rabbit decoder reports uncertain of Timestamp information from WR > > > Base Current Difference > aida05 fault 0x0 : 0x1 : 1 > FPGA Timestamp error counter test result: Passed 15, Failed 1 > If any of these counts are reported as in error > The ASIC readout system has detected a timeslip. > That is the timestamp read from the time FIFO is not younger than the last > > Statistics ok - 210516_2056_Stats > Temp ok - 210516_2058_Temp > Bias and leakage current ok - 210516_2058_Bias > > 23:16 System wide checks: > Clock still ok > > Base Current Difference > aida05 fault 0x1552 : 0x155a : 8 > White Rabbit error counter test result: Passed 15, Failed 1 > > Understand the status reports as follows:- > Status bit 3 : White Rabbit decoder detected an error in the received data > Status bit 2 : Firmware registered WR error, no reload of Timestamp > Status bit 0 : White Rabbit decoder reports uncertain of Timestamp information from WR > > > Base Current Difference > aida05 fault 0x0 : 0x2 : 2 > FPGA Timestamp error counter test result: Passed 15, Failed 1 > If any of these counts are reported as in error > The ASIC readout system has detected a timeslip. > That is the timestamp read from the time FIFO is not younger than the last > > Statistics - 210516_2315_Stats > Temperature - 210516_2316_Temp > Bias and leakage current ok - 210516_231
Encoding
:
HTML
ELCode
plain
Suppress Email notification
Attachment 1:
Drop attachments here...
Draft saved at 00:00:00
ELOG V3.1.4-unknown