Friday, April 19, 2024

Exadata: Usage of script with location and description

 Exadata: Usage of script with location and description

Below are details about exadata Storage Server, Compute Node, Infiband Switches, IB Switches,Network, Hardware detials, Imageinfo and Logs location details


Utility PathUsage/Comments
InfinibandSome of these tools may be found in /opt/oracle.SupportTools/ibdiagtools on cells or database servers. Also see the  Infiniband Triage wiki page.
/opt/oracle.SupportTools/ibdiagtools/infinicheck
/opt/oracle.SupportTools/ibdiagtools/verify-topology
ibquery errors
/usr/bin/ibdiagnetDetecting fabric issues
/usr/sbin/ibaddrExamining HCA state & guids
/usr/sbin/ibcheckerrorsDetecting fabric issues
/usr/sbin/ibcheckerrsDetecting fabric issues
/usr/sbin/ibcheckstateDetecting fabric issues
/usr/sbin/ibcheckwidthDetecting fabric issues
/usr/sbin/ibclearcountersReset counters when detecting fabric issues
/usr/sbin/ibclearerrorsReset counters when detecting fabric issues
/usr/sbin/ibdatacountersNot directly used. perfquery is used instead
/usr/sbin/ibdatacountsNot directly used. perfquery is used instead
/usr/sbin/ibhostsLising cells/db nodes
/usr/sbin/iblinkinfo.plObtaining the fabric topology
/usr/sbin/ibnetdiscoverObtaining the fabric topology
/usr/sbin/ibnodesLising cells/db nodes/switches
/usr/sbin/ibpingChecking IB level connectivity
/usr/sbin/ibportstateTesting port failure/disabling bad links
/usr/sbin/ibqueryerrors.plDetecting fabric issues
/usr/sbin/ibstatExamining HCA state & guids
/usr/sbin/ibstatusExamining HCA state & guids
/usr/sbin/ibswitchesListing IB switch names
/usr/sbin/ibtracertExamining IB routes
/usr/sbin/perfqueryComputing throughput, detecting fabric errors
/usr/sbin/saqueryNot directly used
/usr/sbin/set_nodedesc.shSetting the HCA node description based on node type
/usr/sbin/sminfoDeterming location of master SM
/usr/sbin/smpdumpnot directly used
/usr/sbin/smpquerynot directly used
/usr/sbin/vendstatnot directly used
/usr/bin/ibv_deviceslisting local HCAs
/usr/bin/ibv_devinfolisting details of local HCAs
/usr/bin/ibv_rc_pingpongDetermining working status of HCA
/usr/bin/ibv_srq_pingpongDetermining working status of HCA
/usr/bin/ibv_uc_pingpongDetermining working status of HCA
/usr/bin/ibv_ud_pingpongDetermining working status of HCA
/usr/bin/mstflintBurning new HCA firmware/obtaining current firmware version
/usr/bin/ib_rdma_bwComputing IB level stats for troubleshooting
/usr/bin/ib_rdma_latComputing IB level stats for troubleshooting
/usr/bin/ib_read_bwComputing IB level stats for troubleshooting
/usr/bin/ib_read_latComputing IB level stats for troubleshooting
/usr/bin/ib_send_bwComputing IB level stats for troubleshooting
/usr/bin/ib_send_latComputing IB level stats for troubleshooting
/usr/bin/ib_write_bwComputing IB level stats for troubleshooting
/usr/bin/ib_write_latComputing IB level stats for troubleshooting
/usr/bin/qperfComputing throughput for RDS/TCP/SDP protocols
/sbin/ifconfigDetermining configuration/status of network interfaces
/usr/bin/ib-bondDetermining active slave interface for bond0
/usr/bin/rds-genNot directly used
/usr/bin/rds-infoExamining RDS state
/usr/bin/rds-pingDetermining RDS connectivity
/usr/bin/rds-sinkNot directly used
/usr/bin/rds-stressProfiling RDS performance
Imaging and versionsThese tools are related to imaging status and info as well as versions installed
imagehistory
imageinfoOnly on database servers version >= 11.2.1.3
/opt/oracle.cellos/CheckHWnFWProfileOnly applicable on cells. With the -d option, it will display versions found. Without options, it will report any mismatches against known correct vaiues.
/opt/oracle.SupportTools/CheckSWProfile.shOnly applicable on cells. Without options, displays any mismatch against known good configurations.
collectlogs.shfor collecting logs from onecommand deployments
Networking 
cat /proc/net/bonding/bond*
cat /sys/class/net/eth?/operstate
cat /sys/class/net/bond*/operstate
ifconfig
ethtool <interface_name>reports information about the interface like link mode capabilities
Logfiles on both database server and cells
/var/log/messagesOlder versions of this file will be automatically renamed as messages.<number> with number 1 being the most recent history.
dmesg (a command that displays log)
/var/log/cellos/validations.log
/var/log/cellos/validations/*log
Logfiles on cells

$ADR_BASE/diag/asm/cell/<hostname>

/trace/alert.log

Cell’s alert log. Also will find cell’s trace files in the same directory as the alert.log
Logfiles on database servers

$ORACLE_BASE/diag/asm/+asm/<instname>

/trace/

alert_<instname>.log

ASM alert logfile

$ORACLE_BASE/diag/rdbms/<dbname>/

<instname>/trace/alert_<instname>.log

DB alert log – one for each database running…may be more than one DB

/u01/app/11.2.0/grid/log/<hostname>/

alert<hostname>.log

Grid Infrastructure alert logfile. This log is relatively high-level and will often lead you to one of the logs mentioned in the entry just below this one.

/u01/app/11.2.0/grid/log/<hostname>/

[cssd,crsd,diskmon]/*.log

Logfiles for CSSD, CRSD, and diskmon processes. These processes are the most likely ones to have issues and will expose most issues.
Infiniband Switches
sminfoshows the current subnet master switch in the fabric – there should be exactly one regardless of how many switches are present in the fabric
ibswitcheslists all IB switches in the fabric
showunhealthyshows any unhealthy sensors
env_testlists all the data from the environmental sensors in the switch
nm2versionshows the current versions – use this to determine what version the switch is running right now
getfanspeedshows the speed of the internal fans in the switch – can be useful if showunhealthy indicates a problem with one of the fans
Cell software commands (cellcli and friends)These commands may be run from within cellcli
list cell detail
list alerthistory
list celldisk detail
list griddisk detail
list lun detail
list physicaldisk detail
list flashcache detail
list griddisk attributes name,status,asmmodestatus,asmdeactivationoutcome
alter cell validate configuration
adrcishow incident
mdadm –misc –detail /dev/md*for an overview of the state of the raid devices on the storage cell
cat /proc/mdstatfor a view of the status of the devices
/usr/local/bin/ipconf –verify
mdadm -Q –detail /dev/md?state information on a particular meta device
<GRID_HOME>/bin/kfod disks=alllists disks available from DB node for ASM use (run on DB node)
HardwareThese commands may be run to query hardware status. Unless otherwise noted, they apply to cells and database servers.
ipmitool sel listLists the system event logs – these logs sometimes show HW events that aren’t seen elsewhere.
ipmitool sunoem cli ‘show /SYS’Shows system serial number, fault_state (overall fault state, not necessarily a rollup – may be a fault on a component-level)
/opt/MegaRAID/MegaCli/MegaCli64 -adpallinfo -a0All adapter info
/opt/MegaRAID/MegaCli/MegaCli64 -FwTermLog -dsply -a0Diplay controller’s log
/opt/MegaRAID/MegaCli/MegaCli64 -AdpBbuCmd -GetBbuStatus -a0Get battery status
/opt/MegaRAID/MegaCli/MegaCli64 -AdpBbuCmd -GetBbuProperties -a0Get battery properties
/opt/MegaRAID/MegaCli/MegaCli64 -LDinfo -Lall -aALLLooking for WriteThrough? on the Current Cache Policy – if disabled, may affect performance; easier to get this information from cellcli -e list lun attributes name,lunWriteCacheMode,status
/opt/MegaRAID/MegaCli/MegaCli64 -LDPdInfo -aAllHelpful to investigate predictive failure if necessary
/opt/MegaRAID/MegaCli/MegaCli64 -PDList -a0The Inquiry Data will contain the drive firmware, but decoding the string to get the firmware requires special instructions – beyond what is here. Check list physicaldisk attributes physicalFirmware in cellcli for drive FW version.
lspci [-v [ -v [ -v ]]]Listing PCI devices. The more -v arguments you add, the more information detail it provides
lsscsiEspecially helpful on cells. Flash cards will show up as MARVELL devices. There should be 16 flash devices listed. If not, there’s a card missing or not visible to the OS.
/opt/oracle.cellos/scripts_aura.shThis script lists the flash disks as will be seen from the cell software
/opt/oracle.SupportTools/sundiag.shGathers many diagnostic command outputs and important logfiles for analysis of storage cell and disk issues





















































































































































No comments:

Post a Comment