AIDA GELINA BRIKEN nToF CRIB ISOLDE CIRCE nTOFCapture DESPEC DTAS EDI_PSA 179Ta CARME StellarModelling DCF K40
  AIDA  ELOG logo
Message ID: 759     Entry time: Fri Nov 2 01:56:00 2018
Author: TD 
Subject: [How To] Recover from unable to connect to nnaida ...  
If you obtain the 'unable to connect' from one, or more, of the FEE64s try the
following procedures *before* power-cycling/rebooting all of the FEE64s:

1) 'unable to connect' whilst DAQ is going using the Merger
   
   https://elog.ph.ed.ac.uk/AIDA/303

2) determine whether multiple FEE64s are unable to connect

aidas1> ping nnaida1
PING nnaida1 (10.1.1.1) 56(84) bytes of data.
64 bytes from nnaida1 (10.1.1.1): icmp_seq=1 ttl=64 time=2.81 ms
64 bytes from nnaida1 (10.1.1.1): icmp_seq=2 ttl=64 time=2.81 ms
64 bytes from nnaida1 (10.1.1.1): icmp_seq=3 ttl=64 time=2.80 ms
^C
--- nnaida1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2029ms
rtt min/avg/max/mdev = 2.807/2.812/2.817/0.061 ms
aidas1> ping nnaida2
PING nnaida2 (10.1.1.2) 56(84) bytes of data.
64 bytes from nnaida2 (10.1.1.2): icmp_seq=1 ttl=64 time=2.87 ms
64 bytes from nnaida2 (10.1.1.2): icmp_seq=2 ttl=64 time=2.81 ms
64 bytes from nnaida2 (10.1.1.2): icmp_seq=3 ttl=64 time=2.83 ms
64 bytes from nnaida2 (10.1.1.2): icmp_seq=4 ttl=64 time=2.81 ms
^C
--- nnaida2 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3350ms
rtt min/avg/max/mdev = 2.810/2.833/2.874/0.069 ms

  :
  :
   
  etc

If you find that you are unable to ping a group (or groups) of 8x FEE64s
it is probable that the issue is a fuse failure(s) in the USB-controlled
ac mains relay. It will be necessary to replace the fuse(s) and perform a
cold start of the FEE64s

 https://elog.ph.ed.ac.uk/AIDA/418

3) If you are able to ping all FEE64s telnet to the FEE64 you are unable to
connect to, login as root, and issue a reboot command

aidas1> telnet nnaida2
Trying 10.1.1.2...
Connected to nnaida2.
Escape character is '^]'.

Linux 2.6.31 (localhost) (08:23 on Thursday, 01 November 2018)

login: root
Password: 
Last login: Mon May 23 16:02:30 from myserver
-bash-3.2# ls
a.out  ld_aidamem.csh  xaida
-bash-3.2# ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 Oct28 ?        00:00:07 init [3]  
root         2     0  0 Oct28 ?        00:00:00 [kthreadd]
root         3     2  0 Oct28 ?        00:00:00 [ksoftirqd/0]
root         4     2  0 Oct28 ?        00:00:00 [watchdog/0]
root         5     2  0 Oct28 ?        00:00:00 [events/0]
root         6     2  0 Oct28 ?        00:00:00 [khelper]
root         7     2  0 Oct28 ?        00:00:00 [async/mgr]
root         8     2  0 Oct28 ?        00:00:00 [kblockd/0]
root         9     2  0 Oct28 ?        00:00:00 [kseriod]
root        10     2  0 Oct28 ?        00:00:00 [khungtaskd]
root        11     2  0 Oct28 ?        00:00:00 [pdflush]
root        12     2  0 Oct28 ?        00:00:00 [pdflush]
root        13     2  0 Oct28 ?        00:00:00 [kswapd0]
root        14     2  0 Oct28 ?        00:00:00 [aio/0]
root        15     2  0 Oct28 ?        00:00:00 [nfsiod]
root        20     2  0 Oct28 ?        00:00:00 [81400400.hd-xps]
root        21     2  0 Oct28 ?        00:00:00 [81400000.xps-sp]
root        22     2  0 Oct28 ?        00:00:00 [kpsmoused]
root        25     2  0 Oct28 ?        00:00:00 [rpciod/0]
root        54     1  0 Oct28 ?        00:00:00 /sbin/udevd -d
root       226     1  0 Oct28 ?        00:00:00 syslogd -m 0
root       229     1  0 Oct28 ?        00:00:00 klogd -x
root       259     1  0 Oct28 ?        00:00:00 rpcbind
root       275     1  0 Oct28 ?        00:00:00 xinetd -stayalive -pidfile /var/
root       377     1 98 Oct28 ?        3-22:18:16 ./AidaExecV8
root       392     1  0 Oct28 ttyS0    00:00:00 /sbin/mingetty --noclear console
root       404   275  0 08:23 ?        00:00:00 in.telnetd: myserver
root       405   404  0 08:23 ?        00:00:00 login -- root            
root       406   405  1 08:23 ttyp0    00:00:00 -bash
root       427   406  6 08:23 ttyp0    00:00:00 ps -ef
-bash-3.2# reboot

Broadcast message from root (ttyp0) (Thu Nov  1 08:25:20 2018):

The system is going down for reboot NOW!
-bash-3.2# Connection closed by foreign host.

Wait 5 minutes for the filesystem to be mounted and the FEE64 boot sequence
to complete. 

Switch to Desktop 1 and re-select 'Data Acquisition Run Control' tab and select 
Update. The status of the FEE64 you were unable to connect to should now 'undefined'.

Follow cold start sequence steps 7-10 and 15

 https://elog.ph.ed.ac.uk/AIDA/418
ELOG V3.1.4-unknown