AIDA GELINA BRIKEN nToF CRIB ISOLDE CIRCE nTOFCapture DESPEC DTAS EDI_PSA 179Ta CARME StellarModelling DCF K40
  DESPEC  ELOG logo
Message ID: 299     Entry time: Fri May 7 13:11:22 2021
Author: NH 
Subject: Friday 7th May 
14:11 - Alpha has been running most of morning

Just saw rates in tape spike to 6 MB/s... 
Stop output to tape and look:
ASIC check & load... all rates except aida04 back to 0
Start output to tape
Back to ~300 KB/s

R6_49 will be affected by this.
Others seem OK

System wide check failures:

		 Base 		Current 	Difference
aida01 fault 	 0xb405 : 	 0xb406 : 	 1  
aida02 fault 	 0xefc7 : 	 0xefc8 : 	 1  
aida03 fault 	 0xdaab : 	 0xdaac : 	 1  
aida04 fault 	 0x8f7c : 	 0x8f7d : 	 1  
aida05 fault 	 0xb5bd : 	 0xb5be : 	 1  
aida06 fault 	 0xeff1 : 	 0xeff2 : 	 1  
aida07 fault 	 0x8f57 : 	 0x8f58 : 	 1  
aida08 fault 	 0xbef7 : 	 0xbef8 : 	 1  
White Rabbit error counter test result: Passed 8, Failed 8

Understand the status reports as follows:-
Status bit 3 : White Rabbit decoder detected an error in the received data
Status bit 2 : Firmware registered WR error, no reload of Timestamp
Status bit 0 : White Rabbit decoder reports uncertain of Timestamp information from WR


	
			 Base 		Current 		Difference
aida12 fault 	 0x0 : 	 0x200 : 	 512  
aida13 fault 	 0x0 : 	 0x61 : 	 97  
FPGA Timestamp error counter test result: Passed 14, Failed 2
If any of these counts are reported as in error
The ASIC readout system has detected a timeslip.
That is the timestamp read from the time FIFO is not younger than the last


The FPGA errors seem to come on FEEs with no WR errors or clock errors (including Lock/PLL page)

Temps & Rates OK

Will now stop Alpha run and check noise situation

Run stopped at R6_49

Switching to nostorage (NOTAPE/R2)

Set ASICs to threshold 0xa, Rates OK
Rate to Tape/MBS ca 9 MB/s

No change to situation since yesterday... good sign

During MBS test aida13 rebooted... crash log from ttyUSB5 attached

It was later noticed connector half off in aida05, reseated and after some time got noise levels OK again...
cables on top very sensitive, indicative of issues inside maybe(?). We avoid touching AIDA for now :)

A power cycle of all FEEs was performed after aida13's reboot to make sure things are good.
All systems check OK
Attachment 1: elog_locks.png  57 kB  Uploaded Fri May 7 14:17:57 2021  | Hide | Hide all
elog_locks.png
Attachment 2: elog_temps.png  77 kB  Uploaded Fri May 7 14:19:10 2021  | Hide | Hide all
elog_temps.png
Attachment 3: elog_rates.png  82 kB  Uploaded Fri May 7 14:19:15 2021  | Hide | Hide all
elog_rates.png
Attachment 4: elog_rates1.png  41 kB  Uploaded Fri May 7 14:21:44 2021  | Hide | Hide all
elog_rates1.png
Attachment 5: aida13.txt  4 kB  Uploaded Fri May 7 19:48:43 2021  | Hide | Hide all
06/19:25:39|Data Acquisition Statistics counters now cleared^M
06/19:25:55|Clear Statistics (1)^M
06/19:25:55|------------[ cut here ]------------^M
07/15:23:58|kernel BUG at mm/slab.c:2974!^M
07/15:23:58|Oops: Exception in kernel mode, sig: 5 [#1]^M
07/15:23:58|PREEMPT Xilinx Virtex440^M
07/15:23:59|Modules linked in: aidamem xdriver xh_spidev_register^M
07/15:23:59|NIP: c009211c LR: c00920b0 CTR: 00000007^M
07/15:23:59|REGS: c0391bf0 TRAP: 0700   Not tainted  (2.6.31)^M
07/15:23:59|MSR: 00021000 <ME,CE>  CR: 24022048  XER: 00000000^M
07/15:23:59|TASK = c036e318[0] 'swapper' THREAD: c0390000^M
07/15:23:59|GPR00: 00000001 c0391ca0 c036e318 c680daf0 c694003c 0000000a c6940020 00000009 ^M
07/15:23:59|GPR08: 0000001b c680dae0 00000cf0 c680dae0 24008042 00005aa8 c03a0000 00000020 ^M
07/15:23:59|GPR16: c0390000 c03a069c c03a0000 c038c384 c038cc18 00000020 00000000 00200200 ^M
07/15:23:59|GPR24: 00100100 c0390000 00000000 c680dae8 c680dae0 c680ae00 00000006 c680e400 ^M
07/15:23:59|NIP [c009211c] cache_alloc_refill+0x130/0x608^M
07/15:23:59|LR [c00920b0] cache_alloc_refill+0xc4/0x608^M
07/15:23:59|Call Trace:^M
07/15:23:59|[c0391ca0] [c00920b0] cache_alloc_refill+0xc4/0x608 (unreliable)^M
07/15:23:59|[c0391d00] [c00927d8] kmem_cache_alloc+0xc4/0xcc^M
07/15:23:59|[c0391d20] [c0042420] __sigqueue_alloc+0x50/0xb8^M
07/15:23:59|[c0391d40] [c0042938] __send_signal+0x78/0x260^M
07/15:23:59|[c0391d70] [c0042f78] group_send_sig_info+0x70/0x9c^M
07/15:24:00|[c0391da0] [c00438a8] kill_pid_info+0x48/0x8c^M
07/15:24:00|[c0391dc0] [c0038e8c] it_real_fn+0x1c/0x30^M
07/15:24:00|[c0391dd0] [c0050c40] hrtimer_run_queues+0x184/0x240^M
07/15:24:00|[c0391e30] [c0040ba8] run_local_timers+0x10/0x2c^M
07/15:24:00|[c0391e40] [c0040bf4] update_process_times+0x30/0x70^M
07/15:24:00|[c0391e60] [c005a000] tick_periodic+0x34/0xe8^M
07/15:24:00|[c0391e70] [c005a0d4] tick_handle_periodic+0x20/0x120^M
07/15:24:00|[c0391eb0] [c000af70] timer_interrupt+0xa4/0x10c^M
07/15:24:00|[c0391ed0] [c000e9c4] ret_from_except+0x0/0x18^M
07/15:24:00|[c0391f90] [c0006fac] cpu_idle+0xcc/0xdc^M
07/15:24:00|[c0391fb0] [c000172c] rest_init+0x70/0x84^M
07/15:24:00|[c0391fc0] [c0341854] start_kernel+0x230/0x2ac^M
07/15:24:00|[c0391ff0] [c0000204] skpinv+0x194/0x1d0^M
07/15:24:00|Instruction dump:^M
07/15:24:00|2f1e0000 409900f4 387c0010 3b7c0008 80dc0000 7f9c3000 419e014c 81060010 ^M
07/15:24:00|801d001c 7c004010 38000000 7c000114 <0f000000> 81260010 801d001c 7f804840 ^M
07/15:24:00|Kernel panic - not syncing: Fatal exception in interrupt^M
07/15:24:00|Call Trace:^M
07/15:24:00|[c0391a40] [c0005de8] show_stack+0x44/0x16c (unreliable)^M
07/15:24:00|[c0391a80] [c00345bc] panic+0x94/0x168^M
07/15:24:01|[c0391ad0] [c000bd44] die+0x178/0x18c^M
07/15:24:01|[c0391af0] [c000c000] _exception+0x164/0x1b4^M
07/15:24:01|[c0391be0] [c000e978] ret_from_except_full+0x0/0x4c^M
07/15:24:01|[c0391ca0] [c00920b0] cache_alloc_refill+0xc4/0x608^M
07/15:24:01|[c0391d00] [c00927d8] kmem_cache_alloc+0xc4/0xcc^M
07/15:24:01|[c0391d20] [c0042420] __sigqueue_alloc+0x50/0xb8^M
07/15:24:01|[c0391d40] [c0042938] __send_signal+0x78/0x260^M
07/15:24:01|[c0391d70] [c0042f78] group_send_sig_info+0x70/0x9c^M
07/15:24:01|[c0391da0] [c00438a8] kill_pid_info+0x48/0x8c^M
07/15:24:01|[c0391dc0] [c0038e8c] it_real_fn+0x1c/0x30^M
07/15:24:01|[c0391dd0] [c0050c40] hrtimer_run_queues+0x184/0x240^M
07/15:24:01|[c0391e30] [c0040ba8] run_local_timers+0x10/0x2c^M
07/15:24:01|[c0391e40] [c0040bf4] update_process_times+0x30/0x70^M
07/15:24:01|[c0391e60] [c005a000] tick_periodic+0x34/0xe8^M
07/15:24:01|[c0391e70] [c005a0d4] tick_handle_periodic+0x20/0x120^M
07/15:24:01|[c0391eb0] [c000af70] timer_interrupt+0xa4/0x10c^M
07/15:24:01|[c0391ed0] [c000e9c4] ret_from_except+0x0/0x18^M
07/15:24:01|[c0391f90] [c0006fac] cpu_idle+0xcc/0xdc^M
07/15:24:01|[c0391fb0] [c000172c] rest_init+0x70/0x84^M
07/15:24:01|[c0391fc0] [c0341854] start_kernel+0x230/0x2ac^M
07/15:24:02|[c0391ff0] [c0000204] skpinv+0x194/0x1d0^M
07/15:24:02|Rebooting in 180 seconds..
07/15:27:02|^MISOL Version 1.00 Date 9th January 2017
ELOG V3.1.4-unknown