Using device 3 (rank 3, local rank 3, local size 4) : Tesla K80
Using device 1 (rank 1, local rank 1, local size 4) : Tesla K80
Using device 0 (rank 0, local rank 0, local size 4) : Tesla K80
Using device 2 (rank 2, local rank 2, local size 4) : Tesla K80
 running on    4 total cores
 distrk:  each k-point on    4 cores,    1 groups
 distr:  one band on    1 cores,    4 groups
 using from now: INCAR     
  
 *******************************************************************************
  You are running the GPU port of VASP! When publishing results obtained with
  this version, please cite:
   - M. Hacene et al., http://dx.doi.org/10.1002/jcc.23096
   - M. Hutchinson and M. Widom, http://dx.doi.org/10.1016/j.cpc.2012.02.017
  
  in addition to the usual required citations (see manual).
  
  GPU developers: A. Anciaux-Sedrakian, C. Angerer, and M. Hutchinson.
 *******************************************************************************
  
 vasp.6.1.1 19Jun20 (build Jun 25 2020 02:42:34) complex                        
  
 MD_VERSION_INFO: Compiled 2020-06-25T09:42:34-UTC in devlin.sd.materialsdesign.
 com:/home/medea2/data/build/wwolf/vasp6.1.1/13539/x86_64/src/src/build/gpu from
  svn 13539
 
 This VASP executable licensed from Materials Design, Inc.
 
 POSCAR found :  2 types and     241 ions
 LDA part: xc-table for Pade appr. of Perdew
  
 WARNING: The GPU port of VASP has been extensively
 tested for: ALGO=Normal, Fast, and VeryFast.
 Other algorithms may produce incorrect results or
 yield suboptimal performance. Handle with care!
  
 POSCAR, INCAR and KPOINTS ok, starting setup
creating 32 CUDA streams...
creating 32 CUDA streams...
creating 32 CUDA streams...
creating 32 CUFFT plans with grid size 160 x 150 x 150...
creating 32 CUFFT plans with grid size 160 x 150 x 150...
creating 32 CUFFT plans with grid size 160 x 150 x 150...
creating 32 CUDA streams...
creating 32 CUFFT plans with grid size 160 x 150 x 150...
Device Memory Info:
Total: 11441.2 MB
Free: 16.6 MB
Used: 11424.6 MB
Requested: 16.9 MB

CUDA Error in cuda_mem.cu, line 181: out of memory
 Failed to allocate device memory!
Device Memory Info:
Total: 11441.2 MB
Free: 16.3 MB
Used: 11424.9 MB
Requested: 16.9 MB

CUDA Error in cuda_mem.cu, line 181: out of memory
 Failed to allocate device memory!
*****************************
Error running VASP parallel with MPI

#!/bin/bash
cd "/home/user/MD/TaskServer/Tasks/172.16.0.48-32000-task10540"
export PATH="/home/user/MD/Linux-x86_64/IntelMPI5/bin:$PATH"
export LD_LIBRARY_PATH="/home/user/MD/Linux-x86_64/IntelMPI5/lib:/home/user/MD/TaskServer/Tools/vasp-gpu6.1.1/Linux-x86_64:$LD_LIBRARY_PATH"
"/home/user/MD/Linux-x86_64/IntelMPI5/bin/mpirun" -r ssh  -np 4 "/home/user/MD/TaskServer/Tools/vasp-gpu6.1.1/Linux-x86_64/vasp_gpu"

forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source             
vasp_gpu           000000000539AE74  Unknown               Unknown  Unknown
libpthread-2.22.s  00007F1F7A333C70  Unknown               Unknown  Unknown
vasp_gpu           000000000536BF07  Unknown               Unknown  Unknown
vasp_gpu           000000000068E885  Unknown               Unknown  Unknown
vasp_gpu           000000000178A3FC  Unknown               Unknown  Unknown
vasp_gpu           000000000043FC9E  Unknown               Unknown  Unknown
libc-2.22.so       00007F1F6CAED725  __libc_start_main     Unknown  Unknown
vasp_gpu           000000000043FB29  Unknown               Unknown  Unknown
*****************************