by steve-myers » Fri Jul 20, 2012 5:26 pm
Here is a terminology issue. "... is not taking CPU" is usually interpreted as not using any CPU time. I had to read through most of the post to realize what was really meant was, " ... not getting any CPU time." The trouble is this may not be the reality, either. All we know for sure here is something is happening that causes the job to take much longer than expected to run.
Many things can cause this to happen, Mr. Sample's analysis is incomplete.
The very first thing a performance analyst must do is examine the JESMSGLG datasets for a job that runs in the normally expected time and the job that required longer than expected.
The very first thing to check is the job start time. Did both jobs start when scheduled? A job that requires about three hours to run that starts an hour later than expected is going to end an hour late than expected. No performance analyst in the world can fix this problem!
Next, see if there is an environmental issue. Did the job have to wait for dataasets. This can be quickly determined in the JESMSGLG datasets? Does the job require manually mounted tape volumes? If it does, were the tapes mounted promptly? For that matter were there any other environmental issues, such as an allocation recovery?
Next, review the step times, both the elapsed time and the CPU utilization. A step that requires much more CPU time requires analysis. If there was more real work for the step, then that requires analysis, and is beyond anything we can do here.
All of these points are something you must do, and should do before you take the issue to the performance group.