|
AIDA
GELINA
BRIKEN
nToF
CRIB
ISOLDE
CIRCE
nTOFCapture
DESPEC
DTAS
EDI_PSA
179Ta
CARME
StellarModelling
DCF
K40
|
DESPEC |
 |
|
Message ID: 299
Entry time: Fri May 7 13:11:22 2021
|
Author: |
NH |
Subject: |
Friday 7th May |
|
|
14:11 - Alpha has been running most of morning
Just saw rates in tape spike to 6 MB/s...
Stop output to tape and look:
ASIC check & load... all rates except aida04 back to 0
Start output to tape
Back to ~300 KB/s
R6_49 will be affected by this.
Others seem OK
System wide check failures:
Base Current Difference
aida01 fault 0xb405 : 0xb406 : 1
aida02 fault 0xefc7 : 0xefc8 : 1
aida03 fault 0xdaab : 0xdaac : 1
aida04 fault 0x8f7c : 0x8f7d : 1
aida05 fault 0xb5bd : 0xb5be : 1
aida06 fault 0xeff1 : 0xeff2 : 1
aida07 fault 0x8f57 : 0x8f58 : 1
aida08 fault 0xbef7 : 0xbef8 : 1
White Rabbit error counter test result: Passed 8, Failed 8
Understand the status reports as follows:-
Status bit 3 : White Rabbit decoder detected an error in the received data
Status bit 2 : Firmware registered WR error, no reload of Timestamp
Status bit 0 : White Rabbit decoder reports uncertain of Timestamp information from WR
Base Current Difference
aida12 fault 0x0 : 0x200 : 512
aida13 fault 0x0 : 0x61 : 97
FPGA Timestamp error counter test result: Passed 14, Failed 2
If any of these counts are reported as in error
The ASIC readout system has detected a timeslip.
That is the timestamp read from the time FIFO is not younger than the last
The FPGA errors seem to come on FEEs with no WR errors or clock errors (including Lock/PLL page)
Temps & Rates OK
Will now stop Alpha run and check noise situation
Run stopped at R6_49
Switching to nostorage (NOTAPE/R2)
Set ASICs to threshold 0xa, Rates OK
Rate to Tape/MBS ca 9 MB/s
No change to situation since yesterday... good sign
During MBS test aida13 rebooted... crash log from ttyUSB5 attached
It was later noticed connector half off in aida05, reseated and after some time got noise levels OK again...
cables on top very sensitive, indicative of issues inside maybe(?). We avoid touching AIDA for now :)
A power cycle of all FEEs was performed after aida13's reboot to make sure things are good.
All systems check OK |
|
|
|
|
|
|
|
|
|
06/19:25:39|Data Acquisition Statistics counters now cleared^M
06/19:25:55|Clear Statistics (1)^M
06/19:25:55|------------[ cut here ]------------^M
07/15:23:58|kernel BUG at mm/slab.c:2974!^M
07/15:23:58|Oops: Exception in kernel mode, sig: 5 [#1]^M
07/15:23:58|PREEMPT Xilinx Virtex440^M
07/15:23:59|Modules linked in: aidamem xdriver xh_spidev_register^M
07/15:23:59|NIP: c009211c LR: c00920b0 CTR: 00000007^M
07/15:23:59|REGS: c0391bf0 TRAP: 0700 Not tainted (2.6.31)^M
07/15:23:59|MSR: 00021000 <ME,CE> CR: 24022048 XER: 00000000^M
07/15:23:59|TASK = c036e318[0] 'swapper' THREAD: c0390000^M
07/15:23:59|GPR00: 00000001 c0391ca0 c036e318 c680daf0 c694003c 0000000a c6940020 00000009 ^M
07/15:23:59|GPR08: 0000001b c680dae0 00000cf0 c680dae0 24008042 00005aa8 c03a0000 00000020 ^M
07/15:23:59|GPR16: c0390000 c03a069c c03a0000 c038c384 c038cc18 00000020 00000000 00200200 ^M
07/15:23:59|GPR24: 00100100 c0390000 00000000 c680dae8 c680dae0 c680ae00 00000006 c680e400 ^M
07/15:23:59|NIP [c009211c] cache_alloc_refill+0x130/0x608^M
07/15:23:59|LR [c00920b0] cache_alloc_refill+0xc4/0x608^M
07/15:23:59|Call Trace:^M
07/15:23:59|[c0391ca0] [c00920b0] cache_alloc_refill+0xc4/0x608 (unreliable)^M
07/15:23:59|[c0391d00] [c00927d8] kmem_cache_alloc+0xc4/0xcc^M
07/15:23:59|[c0391d20] [c0042420] __sigqueue_alloc+0x50/0xb8^M
07/15:23:59|[c0391d40] [c0042938] __send_signal+0x78/0x260^M
07/15:23:59|[c0391d70] [c0042f78] group_send_sig_info+0x70/0x9c^M
07/15:24:00|[c0391da0] [c00438a8] kill_pid_info+0x48/0x8c^M
07/15:24:00|[c0391dc0] [c0038e8c] it_real_fn+0x1c/0x30^M
07/15:24:00|[c0391dd0] [c0050c40] hrtimer_run_queues+0x184/0x240^M
07/15:24:00|[c0391e30] [c0040ba8] run_local_timers+0x10/0x2c^M
07/15:24:00|[c0391e40] [c0040bf4] update_process_times+0x30/0x70^M
07/15:24:00|[c0391e60] [c005a000] tick_periodic+0x34/0xe8^M
07/15:24:00|[c0391e70] [c005a0d4] tick_handle_periodic+0x20/0x120^M
07/15:24:00|[c0391eb0] [c000af70] timer_interrupt+0xa4/0x10c^M
07/15:24:00|[c0391ed0] [c000e9c4] ret_from_except+0x0/0x18^M
07/15:24:00|[c0391f90] [c0006fac] cpu_idle+0xcc/0xdc^M
07/15:24:00|[c0391fb0] [c000172c] rest_init+0x70/0x84^M
07/15:24:00|[c0391fc0] [c0341854] start_kernel+0x230/0x2ac^M
07/15:24:00|[c0391ff0] [c0000204] skpinv+0x194/0x1d0^M
07/15:24:00|Instruction dump:^M
07/15:24:00|2f1e0000 409900f4 387c0010 3b7c0008 80dc0000 7f9c3000 419e014c 81060010 ^M
07/15:24:00|801d001c 7c004010 38000000 7c000114 <0f000000> 81260010 801d001c 7f804840 ^M
07/15:24:00|Kernel panic - not syncing: Fatal exception in interrupt^M
07/15:24:00|Call Trace:^M
07/15:24:00|[c0391a40] [c0005de8] show_stack+0x44/0x16c (unreliable)^M
07/15:24:00|[c0391a80] [c00345bc] panic+0x94/0x168^M
07/15:24:01|[c0391ad0] [c000bd44] die+0x178/0x18c^M
07/15:24:01|[c0391af0] [c000c000] _exception+0x164/0x1b4^M
07/15:24:01|[c0391be0] [c000e978] ret_from_except_full+0x0/0x4c^M
07/15:24:01|[c0391ca0] [c00920b0] cache_alloc_refill+0xc4/0x608^M
07/15:24:01|[c0391d00] [c00927d8] kmem_cache_alloc+0xc4/0xcc^M
07/15:24:01|[c0391d20] [c0042420] __sigqueue_alloc+0x50/0xb8^M
07/15:24:01|[c0391d40] [c0042938] __send_signal+0x78/0x260^M
07/15:24:01|[c0391d70] [c0042f78] group_send_sig_info+0x70/0x9c^M
07/15:24:01|[c0391da0] [c00438a8] kill_pid_info+0x48/0x8c^M
07/15:24:01|[c0391dc0] [c0038e8c] it_real_fn+0x1c/0x30^M
07/15:24:01|[c0391dd0] [c0050c40] hrtimer_run_queues+0x184/0x240^M
07/15:24:01|[c0391e30] [c0040ba8] run_local_timers+0x10/0x2c^M
07/15:24:01|[c0391e40] [c0040bf4] update_process_times+0x30/0x70^M
07/15:24:01|[c0391e60] [c005a000] tick_periodic+0x34/0xe8^M
07/15:24:01|[c0391e70] [c005a0d4] tick_handle_periodic+0x20/0x120^M
07/15:24:01|[c0391eb0] [c000af70] timer_interrupt+0xa4/0x10c^M
07/15:24:01|[c0391ed0] [c000e9c4] ret_from_except+0x0/0x18^M
07/15:24:01|[c0391f90] [c0006fac] cpu_idle+0xcc/0xdc^M
07/15:24:01|[c0391fb0] [c000172c] rest_init+0x70/0x84^M
07/15:24:01|[c0391fc0] [c0341854] start_kernel+0x230/0x2ac^M
07/15:24:02|[c0391ff0] [c0000204] skpinv+0x194/0x1d0^M
07/15:24:02|Rebooting in 180 seconds..
07/15:27:02|^MISOL Version 1.00 Date 9th January 2017
|