Some general trouble shooting notes when you get this far into these scripts.
This log can be useful:
/var/log/cs_data_handler_is.bash.log
After stopping and starting the services (CPotlpagentCli.sh), a lot with testing, or accidently creating a infinite loop in your custom script you can get all skyline services hung. They will not gather any data even after a reboot. This will be in the log file:
Unable to acquire script lock: /tmp/cs_data_handler_is.bash.lock
Just delete the directory and restart services again.
Log rotate doesn't seem to be working on my MLM's for /opt/CPotlpAgent/otlp_agent.log. It hit 200MB during this trouble shooting and just stopped putting new logs in the file. Other MLM's that I haven't messed with are at 200Mb and haven't put anything new in the file in months. So that must be a hard limit. So if you are not getting logs in that file, check the size.
I think the solution to my problem is to include the MDS Checkpoint profile in the skyline scripts / processers.
/opt/CPmds-R81.20/scripts/MDSprofile.sh
I've added the line into my custom script and into "/opt/CPotlpAgent/cs_data_handler_is.bash" with a couple different syntax's and it never can seem to find mdsenv to run.
Directly running a new instance of bash like in the scirpt above will work at the CLI, I think because it is pulling the profile from the logged in user and maybe it gets that somehow when its added to skyline as the logged in user, but once the skyline services are restarted "normally" it fails and if you dig into the log files you get these super easy to read error messages. Somebody got a little over zealous with the 'remove white space" function....
"ts=2025-05-06T15:54:07.363Z caller=level.go:63 ts=2025-05-06T15:54:07.363Z caller=level.go:63 level=info msg="Collector: /home/admin/mlm_total_logginghas disabled due to: " Script:/var/log/CPotlpAgent/backup/scripts/mlm_total_logging.shchangethestatetodisableddueto:TheCommand:/bin/bash,Error:Error:exitstatus1,Stderr:bash:cannotsetterminalprocessgroup(139635):Inappropriateioctlfordevicebash:nojobcontrolinthisshellbash:mdsenv:commandnotfounderror:syntaxerror,unexpectedLITERAL,expecting'}'01compileerror;terminated=(MISSING)"
Now it ran this for several hours when I restarted it per the SK for custom metrics, but when I did a reboot, then I started getting this message and it stopped working.
So at this point I just need to take most of this thread and put it in an actual TAC case.