Gathering BaseDB Info for Support Requests

database oci May 14, 2024

When working with BaseDB in OCI, sometimes things can go wrong. Having opened multiple Support Requests, I've noticed a pattern in what Support asks for when troubleshooting issues. I've tried to document these asks and I have made it a point to now gather them before I even open an SR. Sometimes I'm in luck and this process reveals the issue to myself and I can skip the SR. Other times I still need support and I just gave them a head start.

Here are a few things I look at when dealing with BaseDB issues. For more information you can review the dbcli reference docs.

Version

What are the versions of the dcs tools and services? Do they seem dated? Do they match other BaseDB systems I may have? Do they need to be updated?


# Check version with rpm
sudo rpm -qa|egrep 'dcs|exa|dbaas'

# Update
sudo /opt/oracle/dcs/bin/cliadm update-dbcli
sudo /opt/oracle/dcs/bin/cliadm list-jobs | tail
sudo rpm -qa|egrep 'dcs|exa|dbaas'

Jobs

What are recent dcs jobs that have run that are related to my issue? Did they fail? Is there any useful info in the Verbose output?

sudo -s

/opt/oracle/dcs/bin/dbcli list-jobs
/opt/oracle/dcs/bin/dbcli list-jobs | grep Failure

JOB_ID=f55c65fd-62d4-4900-bd92-d517ecc18981 # ID of DCS Job
/opt/oracle/dcs/bin/dbcli describe-job -i $JOB_ID
/opt/oracle/dcs/bin/dbcli describe-job -i $JOB_ID -l Verbose

Logs

After reviewing the jobs, what do their logs say? Anything in the admin or agent logs? What about the job specific logs? Any ORA- errors? Any RMAN logs?

sudo -s

# Common Logs
cd /opt/oracle/dcs/log
ll ./dcs-admin*.log
ll ./dcs-agent*.log
ll ./dcs-agent-debug*.log
ll ./dcsagent-stderr*.log

# Job Logs
JOB_ID=f55c65fd-62d4-4900-bd92-d517ecc18981 # ID of DCS Job
cd /opt/oracle/dcs/log
ll ./jobs/$JOB_ID.log
grep ORA- ./jobs/$JOB_ID.log # Find ORA errors

# RMAN Logs
cd /opt/oracle/dcs/log
ll -tr ./$(hostname)/rman/bkup/*/

DCS Info Package

This is a little script I have used to package up my research in a zip file. It can be a great way to get all this information and share with someone else, like a Support Engineer.

# Run this to find a failed job to investigate
sudo -s
/opt/oracle/dcs/bin/dbcli list-jobs | grep Failure

# Save the Job ID as a variable
JOB_ID=0920dd87-3b70-4c58-8e9a-d87d85231950 # ID of Failed Job

# Run this script to collect DCS Info in a temp directory and zip file
DCS_HOME=/opt/oracle/dcs
INFODIR=$(mktemp -d -t dcsinfo.XXXX)

# DCS tools info
echo "DCS component versions"   >> ${INFODIR?}/dcstools.info
echo "========================" >> ${INFODIR?}/dcstools.info
sudo rpm -qa|egrep 'dcs'        >> ${INFODIR?}/dcstools.info
echo -e >> ${INFODIR?}/dcstools.info

echo "DCS services status"      >> ${INFODIR?}/dcstools.info
echo "========================" >> ${INFODIR?}/dcstools.info
systemctl status initdcsadmin   >> ${INFODIR?}/dcstools.info
systemctl status initdcsagent   >> ${INFODIR?}/dcstools.info

cp ${DCS_HOME?}/log/dcs-admin.* ${INFODIR?}/
cp ${DCS_HOME?}/log/dcs-agent.* ${INFODIR?}/
cp ${DCS_HOME?}/log/dcs-agent-debug.* ${INFODIR?}/

# dbcli jobs info
${DCS_HOME?}/bin/dbcli list-jobs   >> ${INFODIR?}/dbcli-list-jobs.all
${DCS_HOME?}/bin/dbcli list-jobs | grep Failure >> ${INFODIR?}/dbcli-list-jobs.failed

# dbcli describe job
cp ${DCS_HOME?}/log/jobs/${JOB_ID?}.log ${INFODIR?}/
${DCS_HOME?}/bin/dbcli describe-job -l Verbose -i ${JOB_ID?} >> ${INFODIR?}/dbcli-describe-job.${JOB_ID?}
RMAN_LOG=$(${DCS_HOME?}/bin/dbcli describe-job -l Verbose -i ${JOB_ID?} | grep "Log File" | cut -c12-)
RMAN_LOG=${RMAN_LOG::-2}
cp ${RMAN_LOG?} ${INFODIR?}/

# zip up DCS Info
zip -r ${INFODIR?}/dcsinfo.zip ${INFODIR?}

# Complete
echo "DCS Info was collected in ${INFODIR}"
echo -e 
ls -la "${INFODIR}"

Conclusion

Reviewing this information on a BaseDB system that is having issues is a quick way to see if anything obvious jumps out at you. If nothing stands out, then hopefully you already have a lot of information to share in your SR and give it a head start. For more troublesome issues, I do tend to find some back and forth with information gathering, but every little bit helps in reducing the time from issue to resolution!