AIDA
GELINA
BRIKEN
nToF
CRIB
ISOLDE
CIRCE
nTOFCapture
DESPEC
DTAS
EDI_PSA
179Ta
CARME
StellarModelling
DCF
K40
DESPEC
Draft saved at 00:00:00
Fields marked with
*
are required
Entry time:
Wed May 19 09:09:27 2021
Author
*
:
Subject
*
:
16th May 08:00 - 12:00 shift Author: MS 08:00 FEE64 module aida09 global clocks failed, 6 Clock status test result: Passed 15, Failed 1 Understand status as follows Status bit 3 : firmware PLL that creates clocks from external clock not locked Status bit 2 : always logic '1' Status bit 1 : LMK3200(2) PLL and clock distribution chip not locked to external clock Status bit 0 : LMK3200(1) PLL and clock distribution chip not locked to external clock If all these bits are not set then the operation of the firmware is unreliable FEE64 module aida09 failed Calibration test result: Passed 15, Failed 1 If any modules fail calibration , check the clock status and open the FADC Align and Control browser page to rerun calibration for that module Base Current Difference aida01 fault 0xf294 : 0xf296 : 2 aida02 fault 0xd8ec : 0xd8ee : 2 aida03 fault 0xf001 : 0xf003 : 2 aida04 fault 0xd992 : 0xd994 : 2 aida05 fault 0x714c : 0x7163 : 23 aida06 fault 0x5a49 : 0x5a4a : 1 aida07 fault 0x5aca : 0x5acb : 1 aida08 fault 0xb92e : 0xb92f : 1 White Rabbit error counter test result: Passed 8, Failed 8 Understand the status reports as follows:- Status bit 3 : White Rabbit decoder detected an error in the received data Status bit 2 : Firmware registered WR error, no reload of Timestamp Status bit 0 : White Rabbit decoder reports uncertain of Timestamp information from WR Base Current Difference aida05 fault 0x0 : 0xa : 10 aida12 fault 0x0 : 0x3 : 3 aida13 fault 0x0 : 0x4d : 77 FPGA Timestamp error counter test result: Passed 13, Failed 3 If any of these counts are reported as in error The ASIC readout system has detected a timeslip. That is the timestamp read from the time FIFO is not younger than the last Returned 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Mem(KB) : 4 8 16 32 64 128 256 512 1k 2k 4k aida01 : 27 7 2 3 2 3 2 4 2 3 7 : 40228 aida02 : 3 8 3 2 2 2 2 4 2 3 7 : 39996 aida03 : 23 9 6 1 0 3 2 4 2 3 7 : 40100 aida04 : 34 27 17 7 2 4 3 3 2 2 7 : 38608 aida05 : 19 9 5 2 2 2 3 3 2 3 7 : 39844 aida06 : 5 5 2 1 0 4 2 4 2 3 7 : 40060 aida07 : 28 2 10 3 2 4 1 4 2 3 7 : 40192 aida08 : 19 5 5 1 2 5 1 3 2 3 7 : 39652 aida09 : 20 8 3 3 1 3 2 3 2 3 7 : 39648 aida10 : 27 10 0 2 2 2 2 3 1 4 7 : 40572 aida11 : 3 3 2 1 2 4 2 3 2 3 7 : 39652 aida12 : 16 7 11 2 3 4 2 2 2 3 7 : 39464 aida13 : 14 10 4 3 1 5 2 2 2 3 7 : 39400 aida14 : 21 9 10 1 1 2 2 3 1 4 7 : 40604 aida15 : 19 8 2 3 0 4 3 2 2 3 7 : 39436 aida16 : 24 5 8 3 3 4 2 2 2 3 7 : 39464 *** Timestamp elapsed time: 225.065 s FEE elapsed dead time(s) elapsed idle time(s) 0 0.038 0.000 1 9.479 0.000 2 0.195 0.000 3 5.921 0.000 4 0.000 11.742 5 0.036 0.000 6 0.013 0.000 7 0.498 0.000 8 0.436 0.000 9 0.000 107.300 10 2.787 0.000 11 0.905 0.000 12 0.831 0.000 13 0.000 55.939 14 0.080 0.000 15 0.267 0.000 16 0.000 0.000 17 0.000 0.000 18 0.000 0.000 19 0.000 0.000 20 0.000 0.000 21 0.000 0.000 22 0.000 0.000 23 0.000 0.000 24 0.000 0.000 25 0.000 0.000 26 0.000 0.000 27 0.000 0.000 28 0.000 0.000 29 0.000 0.000 30 0.000 0.000 31 0.000 0.000 32 0.000 0.000 10:00 FEE64 module aida06 global clocks failed, 6 FEE64 module aida09 global clocks failed, 6 Clock status test result: Passed 14, Failed 2 Understand status as follows Status bit 3 : firmware PLL that creates clocks from external clock not locked Status bit 2 : always logic '1' Status bit 1 : LMK3200(2) PLL and clock distribution chip not locked to external clock Status bit 0 : LMK3200(1) PLL and clock distribution chip not locked to external clock If all these bits are not set then the operation of the firmware is unreliable FEE64 module aida09 failed Calibration test result: Passed 15, Failed 1 If any modules fail calibration , check the clock status and open the FADC Align and Control browser page to rerun calibration for that module Base Current Difference aida01 fault 0xf294 : 0xf296 : 2 aida02 fault 0xd8ec : 0xd8ee : 2 aida03 fault 0xf001 : 0xf003 : 2 aida04 fault 0xd992 : 0xd994 : 2 aida05 fault 0x714c : 0x7166 : 26 aida06 fault 0x5a49 : 0x5a4a : 1 aida07 fault 0x5aca : 0x5acb : 1 aida08 fault 0xb92e : 0xb92f : 1 White Rabbit error counter test result: Passed 8, Failed 8 Understand the status reports as follows:- Status bit 3 : White Rabbit decoder detected an error in the received data Status bit 2 : Firmware registered WR error, no reload of Timestamp Status bit 0 : White Rabbit decoder reports uncertain of Timestamp information from WR Base Current Difference aida05 fault 0x0 : 0xa : 10 aida12 fault 0x0 : 0x3 : 3 aida13 fault 0x0 : 0x4d : 77 FPGA Timestamp error counter test result: Passed 13, Failed 3 If any of these counts are reported as in error The ASIC readout system has detected a timeslip. That is the timestamp read from the time FIFO is not younger than the last Returned 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Mem(KB) : 4 8 16 32 64 128 256 512 1k 2k 4k aida01 : 19 7 6 2 2 3 2 3 2 3 7 : 39716 aida02 : 8 4 3 2 2 3 2 3 2 3 7 : 39600 aida03 : 23 11 5 1 0 4 2 3 1 4 7 : 40740 aida04 : 42 25 16 7 2 4 4 3 2 2 7 : 38864 aida05 : 24 5 5 1 2 2 3 3 1 4 7 : 40824 aida06 : 11 4 4 1 1 4 2 4 2 3 7 : 40172 aida07 : 21 8 5 1 2 3 2 4 2 3 7 : 40196 aida08 : 15 5 6 1 1 4 2 4 2 3 7 : 40228 aida09 : 15 10 4 1 2 3 2 3 2 3 7 : 39660 aida10 : 23 8 2 2 2 2 2 3 1 4 7 : 40572 aida11 : 6 4 4 2 2 4 2 3 2 3 7 : 39736 aida12 : 18 8 8 1 4 4 3 2 2 3 7 : 39720 aida13 : 23 6 2 3 2 5 2 2 2 3 7 : 39436 aida14 : 29 3 11 1 1 2 2 3 1 4 7 : 40604 aida15 : 29 7 2 2 0 4 3 3 2 3 7 : 39948 aida16 : 2 3 6 2 3 4 2 2 2 3 7 : 39296 *** Timestamp elapsed time: 225.065 s FEE elapsed dead time(s) elapsed idle time(s) 0 0.038 0.000 1 9.479 0.000 2 0.195 0.000 3 5.921 0.000 4 0.000 11.742 5 0.036 0.000 6 0.013 0.000 7 0.498 0.000 8 0.436 0.000 9 0.000 107.300 10 2.787 0.000 11 0.905 0.000 12 0.831 0.000 13 0.000 55.939 14 0.080 0.000 15 0.267 0.000 16 0.000 0.000 17 0.000 0.000 18 0.000 0.000 19 0.000 0.000 20 0.000 0.000 21 0.000 0.000 22 0.000 0.000 23 0.000 0.000 24 0.000 0.000 25 0.000 0.000 26 0.000 0.000 27 0.000 0.000 28 0.000 0.000 29 0.000 0.000 30 0.000 0.000 31 0.000 0.000 32 0.000 0.000 12:00-16:00 16th May 12:00 - 16:00 shift Author: JS 11:57 Taking over from Magda. Running full checks. usbec ok. Max ~1700 Hz 1MHz on DSSD1, DSSD ~ 75% Current ok 06.410 uA 006.835 uA Stats good 1Statistics aidas-gsi(6).png Temps ok 1Temperature and status scan aidas-gsi(6).png Analysis ok R7_385. Dead time FEE1 a little hight 6% PAUSE: 166 RESUME: 166 *** Timestamp elapsed time: 196.305 s FEE elapsed dead time(s) elapsed idle time(s) 0 0.044 0.000 1 12.405 0.000 2 0.807 0.000 3 7.260 0.000 4 0.014 0.000 5 0.458 0.000 6 0.027 0.000 7 0.497 0.000 8 0.723 0.000 9 0.000 88.857 10 6.189 0.000 11 1.443 0.000 12 0.474 0.000 13 0.000 35.565 14 0.000 0.000 15 0.147 0.000 16 0.000 0.000 17 0.000 0.000 18 0.000 0.000 19 0.000 0.000 20 0.000 0.000 21 0.000 0.000 22 0.000 0.000 23 0.000 0.000 24 0.000 0.000 25 0.000 0.000 26 0.000 0.000 27 0.000 0.000 28 0.000 0.000 29 0.000 0.000 30 0.000 0.000 31 0.000 0.000 32 0.000 0.000 FEE64 module aida09 global clocks failed, 6 Clock status test result: Passed 15, Failed 1 FEE64 module aida09 failed Calibration test result: Passed 15, Failed 1 Base Current Difference aida01 fault 0xf294 : 0xf296 : 2 aida02 fault 0xd8ec : 0xd8ee : 2 aida03 fault 0xf001 : 0xf003 : 2 aida04 fault 0xd992 : 0xd994 : 2 aida05 fault 0x714c : 0x716e : 34 aida06 fault 0x5a49 : 0x5a4a : 1 aida07 fault 0x5aca : 0x5acb : 1 aida08 fault 0xb92e : 0xb92f : 1 White Rabbit error counter test result: Passed 8, Failed 8 Base Current Difference aida05 fault 0x0 : 0xa : 10 aida12 fault 0x0 : 0x3 : 3 aida13 fault 0x0 : 0x4d : 77 FPGA Timestamp error counter test result: Passed 13, Failed 3 Returned 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Mem(KB) : 4 8 16 32 64 128 256 512 1k 2k 4k aida01 : 20 7 4 2 2 3 2 3 1 4 7 : 40712 aida02 : 25 10 7 1 1 4 1 3 2 3 7 : 39556 aida03 : 22 10 5 1 0 4 2 4 2 3 7 : 40216 aida04 : 40 24 17 7 2 5 3 3 2 2 7 : 38736 aida05 : 4 8 3 0 1 3 3 3 1 4 7 : 40768 aida06 : 21 6 6 2 2 3 2 4 2 3 7 : 40228 aida07 : 25 4 6 3 2 3 2 4 2 3 7 : 40260 aida08 : 19 9 4 1 1 4 2 4 2 3 7 : 40244 aida09 : 16 9 4 2 2 3 2 3 2 3 7 : 39688 aida10 : 19 10 5 1 2 2 2 4 1 4 7 : 41100 aida11 : 21 10 2 1 2 3 3 3 2 3 7 : 39908 aida12 : 25 7 7 2 2 3 2 3 2 3 7 : 39756 aida13 : 13 7 3 4 2 5 2 3 2 3 7 : 39964 aida14 : 21 7 9 2 1 2 2 4 1 4 7 : 41116 aida15 : 23 6 2 3 0 4 3 3 2 3 7 : 39948 aida16 : 9 11 10 2 3 4 2 2 2 3 7 : 39452 Tom says Aida09 clock fail is ok as its status bit is "6". The large white difference for FEE5 is known and has been determined to be ok, a post run investigation will be undertaken. 12:35 - usbec ok. Current ok Stats good Temps ok Analysis ok R7_395. Dead time FEE1 still high 6.6% 13:33 - usbec ok. Max implants ~ 1.8kHz Current ok 006.850 uA 007.250 uA Stats - Aida11 runing low < 5k was ~20k overnight Temps ok Analysis R7_415. Dead time FEE1 & FEE10 high 10% 14:00 usbec ok - ucesb1.png Current ok - bias1.png Stast - Aida11 low - 14:20 aida fee rebooted itself. A powercycle was performed. Upon reboot we are seeing extremely large amounts of noise in the FEEs. Looking at the waveforms we have very large 100kHz pick up in the FEEs. This has resulted in 50% deadtime in many FEEs including the p+n. 15:28 Because of extremely high rates across all FEEs have decided to do a powercycle. Before restarting the FEEs will give them a couple of minutes to cool. 18:00 Since the start of the shift we have been trying to recover the system froma large increase in noise following the crash at ~14:00. During this time NH has entered the area and inspected the system and also grounded the AIDA snout. This provided us with some improvement on the noise. The rates are still slightly above where we were before the crash but now appear stable. To counteract the dead time in the n+n strips we have raised the threshold to 0x64 for ASIC4 in all FEEs. We are now running with around 10% deadtime on FEE4 and less elsewhere for n+n. For p+n most have zero dead time apart from FEE11 which is still noisy. During the time we were trying to recover the system screenshots were taken of the waveforms. He it could be seen that the 100kHz noise was very apparent. Particularly in the n+n strips. 18:08 System wide checks all ok - bar some ADC but waveforms disabled Statistics ok - 210516_1809_Stats Temperatures ok - 210516_1809_Temp Bias and leakage ok - 210516_1810_Bias 18:37 Performed an ASIC check and now the rates have dropped in all n+n strips. Currently very small amounts of dead time 18:40 Realised this was because it raised the threshold of all strips to 0x64 on the n+n side. 19:25 Removed S452 from 1e2.... drive. Before removing checked with Nic backed up to Lustre. Also verified four ourselves. Now have around 4.2TB left which will provide around 80 hours of writing 20:16 We noticed iptraf was using around 30% CPU usage. We investigated whether it had any effect on the dead time but from what we have seen it has not. 20:57 System wide checks. Clock ok Base Current Difference aida05 fault 0x1552 : 0x1556 : 4 White Rabbit error counter test result: Passed 15, Failed 1 Understand the status reports as follows:- Status bit 3 : White Rabbit decoder detected an error in the received data Status bit 2 : Firmware registered WR error, no reload of Timestamp Status bit 0 : White Rabbit decoder reports uncertain of Timestamp information from WR Base Current Difference aida05 fault 0x0 : 0x1 : 1 FPGA Timestamp error counter test result: Passed 15, Failed 1 If any of these counts are reported as in error The ASIC readout system has detected a timeslip. That is the timestamp read from the time FIFO is not younger than the last Statistics ok - 210516_2056_Stats Temp ok - 210516_2058_Temp Bias and leakage current ok - 210516_2058_Bias 23:16 System wide checks: Clock still ok Base Current Difference aida05 fault 0x1552 : 0x155a : 8 White Rabbit error counter test result: Passed 15, Failed 1 Understand the status reports as follows:- Status bit 3 : White Rabbit decoder detected an error in the received data Status bit 2 : Firmware registered WR error, no reload of Timestamp Status bit 0 : White Rabbit decoder reports uncertain of Timestamp information from WR Base Current Difference aida05 fault 0x0 : 0x2 : 2 FPGA Timestamp error counter test result: Passed 15, Failed 1 If any of these counts are reported as in error The ASIC readout system has detected a timeslip. That is the timestamp read from the time FIFO is not younger than the last Statistics - 210516_2315_Stats Temperature - 210516_2316_Temp Bias and leakage current ok - 210516_231
Encoding
:
HTML
ELCode
plain
Suppress Email notification
Resubmit as new entry
Attachment 1:
Screenshot_2021-05-16_Statistics_aidas-gsi(4).png
Original size: 2144x1035
Attachment 2:
Screenshot_2021-05-16_Statistics_aidas-gsi(5).png
Original size: 2144x1035
Attachment 3:
Screenshot_2021-05-16_Temperature_and_status_scan_aidas-gsi(4).png
Original size: 2144x1035
Attachment 4:
Screenshot_2021-05-16_Temperature_and_status_scan_aidas-gsi(5).png
Original size: 2144x1035
Attachment 5:
1Statistics_aidas-gsi(6).png
Original size: 1905x568
Attachment 6:
1Temperature_and_status_scan_aidas-gsi(6).png
Original size: 1905x584
Attachment 7:
Bias1.png
Original size: 502x344
Attachment 8:
210516_0857_Temp.png
Original size: 1917x578
Attachment 9:
210516_1554_Layout8.png
Original size: 1896x862
Attachment 10:
210516_1555_Layout7.png
Original size: 1893x869
Attachment 11:
210516_1600_Layout1.png
Original size: 1896x585
Attachment 12:
210516_1601_Stats.png
Original size: 1913x532
Attachment 13:
210516_1605_1606.png
Original size: 1917x511
Attachment 14:
210516_1808_Stats.png
Original size: 1915x535
Attachment 15:
210516_1809_Temp.png
Original size: 1911x572
Attachment 16:
210516_1810_Bias.png
Original size: 496x332
Attachment 17:
210516_2056_Stats.png
Original size: 1910x531
Attachment 18:
210516_2058_Bias.png
Original size: 499x340
Attachment 19:
210516_2313_Stats.png
Original size: 1919x518
Attachment 20:
210516_2315_Temp.png
Original size: 1914x583
Attachment 21:
210516_2316_Bias.png
Original size: 491x339
Attachment 22:
Drop attachments here...
Draft saved at 00:00:00
ELOG V3.1.4-unknown