Résumé des commandes et options essentielles de SLURM


Liste


* Voir les serveurs/ partitions: scontrol show nodes
* Soumettre  job: sbatch
* Connexion interactive: srun
* Supprimer job: scancel
* Voir job en cours: sstat, squeue  ou  scontrol show job 
* Voir job terminé: sacct

* La liste des partitions et des noeuds diponibles: sinfo

 


Les Variables SLURM

 

--depend=[state:job_id] #partition name (-hold_jid)
--nodelist=[nodes] #host preferrence (-l hostname)
--array=[array_spec] #job arrays (-t )
--begin=[datetime] #begin time (-a)
--exclusive or shared #resource sharing (-l exclusive)

 

$SLURM_JOBID #jobID ($JOB_ID)
$SLURM_SUBMIT_DIR #submit directory ($SGE_O_WORKDIR)
$SLURM_SUBMIT_HOST #submit host ($SGE_O_HOST)
$SLURM_NODELIST #node list ($PE_HOSTFILE)
$SLURM_ARRAY_TASK_ID #job array index ($SGE_TASK_ID)
$SLURM_NNODES (#SBATCH -N)
$SLURM_NTASKS (#SBATCH -n)
$SLURM_NTASKS_PER_NODE (#SBATCH -task-per-node)
$SLURM_CPUS_PER_TASK (#SBATCH -c)

 


De SGE vers SLURM

qdel -> scancel

qstat ->  squeue ou scontrol show job

qacct -> sacct

qlogin -> srun

qmon->  sview


Les formats de sortie de qacct

Account             AdminComment        AllocCPUS           AllocGRES          
AllocNodes          AllocTRES           AssocID             AveCPU             
AveCPUFreq          AveDiskRead         AveDiskWrite        AvePages           
AveRSS              AveVMSize           BlockID             Cluster            
Comment             Constraints         ConsumedEnergy      ConsumedEnergyRaw  
CPUTime             CPUTimeRAW          DBIndex             DerivedExitCode    
Elapsed             ElapsedRaw          Eligible            End                
ExitCode            Flags               GID                 Group              
JobID               JobIDRaw            JobName             Layout             
MaxDiskRead         MaxDiskReadNode     MaxDiskReadTask     MaxDiskWrite       
MaxDiskWriteNode    MaxDiskWriteTask    MaxPages            MaxPagesNode       
MaxPagesTask        MaxRSS              MaxRSSNode          MaxRSSTask         
MaxVMSize           MaxVMSizeNode       MaxVMSizeTask       McsLabel           
MinCPU              MinCPUNode          MinCPUTask          NCPUS              
NNodes              NodeList            NTasks              Priority           
Partition           QOS                 QOSRAW              Reason             
ReqCPUFreq          ReqCPUFreqMin       ReqCPUFreqMax       ReqCPUFreqGov      
ReqCPUS             ReqGRES             ReqMem              ReqNodes           
ReqTRES             Reservation         ReservationId       Reserved           
ResvCPU             ResvCPURAW          Start               State              
Submit              Suspended           SystemCPU           SystemComment      
Timelimit           TimelimitRaw        TotalCPU            TRESUsageInAve     
TRESUsageInMax      TRESUsageInMaxNode  TRESUsageInMaxTask  TRESUsageInMin     
TRESUsageInMinNode  TRESUsageInMinTask  TRESUsageInTot      TRESUsageOutAve    
TRESUsageOutMax     TRESUsageOutMaxNode TRESUsageOutMaxTask TRESUsageOutMin    
TRESUsageOutMinNode TRESUsageOutMinTask TRESUsageOutTot     UID                
User                UserCPU             WCKey               WCKeyID     WorkDir