Has anyone gotten TensorBoard to work with Open OnDemand? It’s the visualization interface for TensorFlow. I have a staffer who’d like to use it for classes and notice it’s not on the list of apps that have been ported so far.
TIA - Susan Litzinger
Yes, still in developoment but mostly working where it allocates a node, starts TB with the log you specify and allows you to attach to the TB web server. Currently has a small problem where the link given after it allocates the node does not browse to the tensorboard server correctly. But you can still connect then if you put in the servername:port in the browser manually. If you would like a copy I can post the current draft with this caveat.
I would love to see the draft.
Current version is here:
@lcapps thanks for sharing!
I very much appreciate your offering up your version of TensorBoard for Open OnDemand. However, I get hung up on step 1. The directions tell me to download this file:
but it gitlab-master.nvidia.com does no longer exists and I have google every way I can find for something like NVIDIA & cluw and nothing is coming up. Do you have an alternate location for the file? Thanks in advance.
Louis - we were able to get it working on our OnDemand instance. Will be interested in hearing if you’re able to get the website to draw up eventually, rather than having to connect to the node using an SSH tunnel. Thanks for sharing!
@dsajdak and @lcapps we just asked one of our interns to look into setting up Tensorboard as a web service. If we have success we’ll post an example of how we did it like the RStudio and Jupyter examples.
@dsajdak and @rodgers.355, this is great to hear. Did not get a chance to understand why the web service link does not work correctly so will be good to get it working.
Easiest way to do this is to use the jupyter notebook and jupyter-tensorboard python app. TensorBoard shows up as a kernel in the JN pulldown.
We just recently pushed our Tensorboard OnDemand app: https://github.com/stanford-rc/sh_ood-apps/tree/master/sh_tensorboard
It’s based on a native installation of Tensorflow, loadable through a module system (we use Lmod). We don’t use containers for that, but that part can easily be customized.
One particularly interesting feature of that Tensorboard OnDemand app (for me at least! :)) is that it implements an authenticating reverse proxy. Because Tensorboard doesn’t provide any kind of authentication mechanism for its web interface, on a shared environment, anybody knowing the hostname and port number of a running Tensorboard instance can connect to it.
To mitigate this, we implemented an authentication mechanism that basically sets a browser cookie in the OnDemand interactive app page (the “Connect to Tensorboard” button does this) which is then checked by the authenticating reverse proxy that controls access to the Tensorboard web interface. Without that cookie, access to the Tensorboard web interface is refused. And if the cookie is ever lost, users can re-create it by visiting the “My Interactive Sessions” page and clicking the “Connect” button again.
It’s been running in production for some time now on our Sherlock cluster and seems to be working fine for us.
If anyone wants to give it a try, please don’t hesitate to let us know how it goes!
That is a really cool simple solution for an authenticating reverse proxy. We did a similar approach with https://github.com/OSC/bc_osc_example_shiny for launching a Shiny app, but used OpenResty (NGINX) via Singularity and started the Shiny app listening on a Unix socket. The result was far more complex.
This is looking good, thanks. However I tried to use the twisted pip/conda packages as an alternative to installing rpms, but i had errors with
>>> from twisted.web.error import ForbiddenResource
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: cannot import name 'ForbiddenResource'
I noticed that ErrorPage, NoResource and ForbiddenResource in twisted.web.error were deprecated since 9.0 and are removed since twisted web v12 (in 2012), so will be problematic for recent versions.
First thanks for sharing the TensorBoard OOD App. I am porting it on our clusters per our user request. The App is now working under the YCRC OOD environment, except one problem - the call graph display area in the ‘GRAPHS’ tab is not showing properly. It is way too small and cannot be enlarged. However, if I log onto the node where the TensorBoard server is running and view it locally on that node, the ‘GRAPHS’ display properly.
I am attaching the two screen shots to show the difference. Not sure if you have ever seen the same problem.
Figure 1: the ‘bad’ graph as viewed from OOD
Figure 2: the ‘good’ graph as viewed from the compute node where the tensorbaord server is running
They use an older version of twisted which is provided in Python 2.
I resolved this using a virtual environment with Python 2. First create a directory called
tensorboard/template. Then create a virtual environment in
tensorboard/template/lib/.venv. Activate the virtual env and then pip install
twisted. Now we need to use this virtual env in
before.sh.erb. Simply add this line at the beginning of
@kilian . I’m trying this out right now (using OOD 2.0.13) and am having a few issues getting connected through the web interface. I just receive a 403 message saying “Forbidden Resource: Sorry, resource is forbidden”
It’s quite possibly my changing of some of the code caused this, but I was having an issue similar to the issue that @sbutcher had with the ‘ForbiddenResource’ module by modifying bin/authrevproxy.py:
#from twisted.web.error import ForbiddenResource
from twisted.web.resource import Resource, ForbiddenResource
When I launch the job through OnDemand, it looks like the authrevproxy start up now, and tensorboard also starts, so it looks like that part is good to go. I have tensorboard running on a 127.0.0.1 port (and I can connect to it from the local system). There’s a proxy port as well and that is listening. The output log doesn’t show anything significant (to me). It shows that the servers are listening on their ports and some CUDA warnings, which I’m not worried about, and the last line is the tensorboard startup message.
The URL that’s created by ondemand after the app launches is in the format:
I don’t know where to find any other errors as to what’s throwing the forbidden resource. Does anyone have an idea?
I fixed this by swapping the authproxy.py from the OSC app with the one in the Stanford one.
Here is my working version, tweak as needed: GitHub - mjbludwig/tensorboard_ood
I still get forbidden resource. I’m not sure I’ll spend much more time on it. There’s probably some other issue buried somewhere that I can’t find. Thanks though.