Creating an Interactive transcriber

I have developed a Transcription tool using WhisperX in the ORCA cluster that I want to make available to others within my research unit, but I can’t sufficiently generalize it for a general audience. It requires generating ssh keys, use of the terminal, and manual installation of WhisperX and FFMpeg that an unsophisticated user will not be able to do on their own. I am thinking an OpenOnDemand interface would be the solution for generalizing it.

Basically I need help packaging the ffmpeg and whisperx so a user can invoke it without manually installing it in their ORCA environment. Then create a front end where a user uploads a file, once the file is uploaded it kicks off a slurm job transcribing it, deletes the file once transcribed, then notifies the user that the transcription files are ready for download.

Would anyone be able to point me in the right direction to get this started?

My colleague Michael has developed a streamlite app around WhisperX and we are serving it in our own ondemand installation. His work on the app is at GitHub - michaelcoe/offline-transcription-app: This Webapp implements the faster-whisper package https://github.com/SYSTRAN/faster-whisper and the vosk package https://github.com/alphacep/vosk-api/tree/master models. · GitHub

If you need more help turning into an ondemand app I am sure we can share the barebones of it.

1 Like

I would love that. I am just at a loss at how to get started. It looks like Michael’s app is about the same as the one I developed, where a job is uploaded through a web app then the transcription is kicked off on the cluster. The only difference is that mine is mediated by a watchdog process that catches the upload and runs the job whereas his runs the command to start the job once it has been uploaded.

Is there an environment or documentation on creating the front end that would be hosted in On Demand?

I think you might be looking for Tutorials: Passenger Apps — Open OnDemand 4.1.0 documentation . We support Python, Javascript, and Ruby apps, so you have a lot of choice as far as what you use for the frontend code. If you already have a script to do the grunt work of the transcription, then hooking in to that should be fairly straightforward from any of those three.

While building a full app around it will allow you to expand the app in the future, for a simpler approach you can also look into making it a simple batch connect application by having the user upload their file into the filesystem with OnDemand, then entering the path in the app form. That way OnDemand can handle the frontend and slurm interaction, and you only need to supply the job script.

If you need a way to share a package that lots of people will be using (so individuals don’t have to install their own copy) you may look into Lmod or an equivalent (your system likely already has one), as that provides an easy way to load shared packages in your scripts.

I’d be happy to assist with any of the above mentioned items if you run into issues!