You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

59 lines
3.3 KiB

# Using SecretLab GPU Servers
1 year ago
## GPU Servers
SGPUW1: `172.30.200.1` (3x RTX 6000 Ada 48GB, 768GB RAM, 8TB SSD, Xubuntu 22.04 LTS)
SGPUW2: `172.30.200.2` (1x RTX 4090 24GB, 1x RTX 4080 16GB, 1x QRTX 8000 48GB, 192GB RAM, 8TB SSD, Xubuntu 22.04 LTS)
## Connect to SecretLab VPN
SGPUW1 and SGPUW2 are only accessible via SecretLab VPN. Download SecretLab VPN profile from Nextcloud, install OpenVPN client software on your device, install the SecretLab VPN profile on your client device, and then connect to SecretLab VPN before attempting to access these GPU servers.
Note: SecretLab VPN does *not* tunnel your Internet traffic, only the SecretLab VPN subnet traffic is routed thru it.
If RDP or SSH fails to connect, remember the first thing to check is if SecretLab VPN is connected or not!
## Connect To SecretLab GPU Servers (SGPUW1, SGPUW2)
To connect to the GPU, take the following steps:
Open up your terminal, then ssh into the GPU. For example: `$ ssh <username>@172.30.200.1`.
This would request your already assigned password. You can either type in or copy-paste in the password.
(Optional) Run `$ nvidia-smi` to see the number and types of GPUs on the server.
Run `$ genv devices` to see the GPUs attached to existing genv sessions. *You cannot use GPUs already in use!*
Assuming there are free GPUs, activate a new genv session and attach a GPU to it using these commands: `$ genv activate` followed by `$ genv attach --count 1`.
Now you are ready to run code on the GPU!
## Resource Management
On SGPUW1, each user may have a maximum of 1 GPU attached to their genv session(s). Additional GPU(s) will automatically be detached by genv. SGPUW2 does not have any such restrictions.
On SGPUW1, all GPUs are detached at 9:01 AM Central time. Users will have to re-attach a GPU to their genv session on or after 9:02 AM. This also means processes running on the GPU will be killed, and have to be restarted. Start long running tasks well in advance, or wait for the next 9:02 AM. SGPUW2 does not have any such restrictions.
SGPUW2 has different GPU models with varying capabilities. To attach a specific GPU to you genv session, run `$ genv activate` followed by `$ genv attach --index <number>`. For example, `$ genv attach --index 1` will attach the RTX 4090.
If you ls when you log in through the terminal, you will encounter something funny because you will have zero or no directories.
to resolve this, you have to log into the VM (Microsoft remote Desktop) of the lab. then, you open remote desktop connection.
This should ask for the computer, you would input the IP address (172.30.200.1) and then click connect.
This should lead you to where you input your username and password. You should input your assigned username and password here and click `'OK'`
Logging in through this should create your directories automatically for you. You can check this by opening up the terminal in the remote desktop connection and typing in ls to see your directory. then you can log out and close the remote desktop.
## Photo walkthrough
<img width="660" alt="Screenshot 2023-10-04 at 2 27 04 PM" src="https://github.com/Shejeebhomee/GPUs/assets/15135729/d778c5de-79a7-4874-aa38-a36198c62356">
## Notes
* [x] Make sure you are connected to the VPN.
* [x] Start the conda environment before starting the GPU.
## License