A section for the community to share useful commands and tools used for troubleshooting.
The nunet capacity
command has three important options that are very helpful when using the device management service (DMS) on compute provider machines:
--full
: This option displays the maximum capacity of resources on the device. This corresponds to the CPU and RAM configuration on a machine. You can also use -f
.
--available
: This option displays the available resources on the device. It shows how much resources are available before or after onboarding. You can also use -a
.
--onboarded
: This option displays the resources that has already been allocated to NuNet. It means the machine has already been onboarded. You can also use -o
.
To display the available resources in a readable format, you can use the option--pretty.
It formats the output of the command in a more readable way.
For example:
nunet capacity --onboarded --pretty
Note: The previous commandavailable
(in older versions of DMS) that was used instead of capacity
(updated name) has been deprecated and removed.
From time to time you might want to check what version number of a particualt piece of software you have installed.
sudo apt info nunet-dms
sudo apt info service-provider-dashboard
sudo apt info management-dashboard
As nunet uses systemd you can use the stop start and restart commands to control the services.
sudo systemctl stop nunet-dms
sudo systemctl start nunet-dms
sudo systemctl restart nunet-dms
Sudo systemctl restart service-provider-dashboard
Use journalctl command to view the DMS logfile the -f switch will keep showing new lines as and when they are written, you can press ctrl + c to quit. You can copy and paste the logfiles into chats or bug requests. You can also use the -n 100 command to only show you the last 100 lines or -n 1000 to show the last 1000 etc.
sudo journalctl -f -u nunet-dms
If you know what you are looking for you can filter the log file for certain events using the | command with the grep function
sudo journalctl -f -u nunet-dms | grep "peer"
You can also filter out specific lines using the grep -v command this with show everything except a line containg the string you write at the end
sudo journalctl -f -u nunet-dms | grep -v "traces export"
You can use this when testing the Service Provider Dashboard SPD to filter the log that has all the debug data. Some of the debug data is useful but there are quite a few things we can ignore while testing this specifc part.
sudo journalctl -f -n 10000 -u nunet-dms | grep -v "traces export" | grep -v "dial backoff" | grep -v "UpdateDHT Create Stream error:" | grep -v "Attempting to Send DHT Update to:" | grep -v "Sending DHT update to" | grep -v "dht update from:"
From time to time you may want to see more information than the standard logging shows. You can do this by updating the service file that nunet-dms uses to run. Run this command to open the service file be careful not to edit anything else in the file.
sudo nano /etc/systemd/system/nunet-dms.service
Once the file is open you can use the cursor keys to navigate to the [Service] section paste this line into the file and save it (ctrl +x then press y and press enter)
Environment=NUNET_DEBUG=true
once you have added the line you need to reload systemd then restart the DMS service using the command below then #stop-and-start-the-various-services
sudo systemctl daemon-reload
Now when you check the log there will much more data in there. Please beware that you should not leave debugging switched on all the time as it will use extra system resources. to disable it do the reverse of what we just did.
From time to time you might want to remove the configuration for example if you want to test installing / onboarding from scratch. After you uninstall the applications using the apt command there is a directory that contains your onboarding info / config. To remove that info and start again run the following.
cd /etc/nunet/
sudo rm nunet.db
sudo rm metadataV2.json
You are almost certainly going to need to look at the developer tools in your web browser to check what's happening behind the scenes. Here are the ways you can get to the developer tools in various browsers.
Google Chrome, firefox, brave: hold down ctrl + shift + i
When the devloper tools window opens click the console tab
open the wsl config file
sudo nano /etc/wsl.conf
paste this into the file and save it hold ctrl + x then press y to save
[boot]
systemd=true
open a powershell terminal and type the following command
wsl --shutdown
(your WSL terminal will close ) Wait 30 seconds to ensure the machine has completely shutdown then open a new wsl session. Systemd should now be running.
You may want to check symetric NAT on your connection as that may have an impact on how many peers you can connect to or how quickly you connect to them. If you have a web browser on your machine you can use this tool to check. just open the URL it will tell you immediatley https://tomchen.github.io/symmetric-nat-test/
Somethimes you may want to completely wipe a WSL instance, uninstalling the app and reinstalling it does not remove the disk of the system to when you reboot you still retain the old settings, this is kind of by design so that you dont accidently loose data. However sometimes you just want to blow it away and start again.
wsl --shutdown
wsl --list
The list command will tell you want wsl instances are registered, to unregister and WIPE the WSL Installation use the following command. It will just do it and not prompt you or ask if you are sure so be carefull with this command and dont unregister the instance if it contains data you want to keep.
wsl --unregister Ubuntu-20.04
A compute-side error on NuNet can be identified as part of the following piece of log in the system/gist, if the job deployment fails. If you notice this as below along-with the error, please report it to the dev team:
Traceback (most recent call last):
File "prepare_ml.py", line 19, in <module>
prepare_ml.py
is what prepares and deploys the job inside a NuNet ML container.
A user-end error on NuNet can be identified as part of the following piece of log along-with the error in the system/gist, if a submitted job fails:
Traceback (most recent call last):
File "ml_job.py", line 19, in <module>
ml_job.py
is the job that runs inside a NuNet ML container. This is based on an ML model URL that was submitted through the Service Provider Dashboard (SPD).
If you are tester, you can try with a different URL that runs a separate model. If you are a developer/researcher, you would need to debug the code inside your ML/computational model URL, update it, and submit again.