HPC常见问题

研发部(2011年06月8日)

作业提交时,被拒绝

  • Check resource requirement string, run time limit
  • Submission to an unauthorized queue or host
  • Requested soft limits exceeding a queues hard limits

为什么我的作业一直等待运行pend

  • bhist –l <jobid> 显示历史运行情况
  • bjobs –lp 查看等待原因
  • bjobs –u all

可能的原因包括:

  • Has the user requested unrealistic resources?
  • More memory than any host has
  • Resource requirements may be too stringent
  • Is the users id valid on the execution host(s)?
  • The user may have requested exclusive execution
  • If FCFS scheduling is used, the user may be last
  • If fairshare scheduling is used, the user may have exhausted their fairshare allocation

为什么我的作业异常退出

  • bjobs –l 检查退出代码
  • 127 – Command not found
  • 128 – Command invoked cannot execute
  • 130 – Scripts terminated by Control-C
  • 若提交作业时使用了 “-o”选项,检查输出文件查看原因其它可能原因,包括
  • Check resource limits on queues
  • Check that the application and its data files are accessible from the execution host(s)
  • Is an application license available from the execution host?