environment:
- OOD 4.0.8 unpopulated dev system
- hyper-v VM
- selinux disabled
- cgroupsv2 enabled
- no upload-limit mods in /etc/ood/config/apps/dashboard/env
I’m seeing an issue that seems very similar to this March 2024 report about large uploads failing with OOD 3.0.1.
However my “large” uploads are only ~1.5GB, and the problem seems isolated to uploading to our /scratch fs (which is an NFS export of a bare metal lustre mount to our Vast system).
About the error:
- filemanager’s upload dialog eventually says “Upload failed” after sitting at 100% for a long time
- the browser’s console shows 4x POST errors suggesting it tried 4 attempts (probably why it sat at 100%)
- the uploaded does seem to always be successfully uploaded, despite the failure claim, as verified in a shell with md5sum
- a 1572605045 byte file seems to always generate this failure
- a 1474317229 byte file seems to sometimes fail completely, but sometimes succeed (meaning the filemanager sometimes says “Upload failed” and the browser’s console shows 4 POST errors… but other times the browser’s console will show fewer than 4 POST errors, and the filemanager will act as though no error occurred)
- uploading to our /home fs does not fail
(I don’t know how relevant it is but this Jan 2025 report also mentions a similar quasi-failure-quasi-success mechanic related to a lustre fs.)
The nginx/PUN’s log doesn’t seem to show any errors, but Apache indicates it’s timing out, and not much more.
[root@ondemand-dev jtb49]# tail -4 /var/log/httpd/ondemand-dev.hpc.nau.edu_error_ssl.log
[Tue Feb 03 12:12:27.160370 2026] [proxy_http:error] [pid 956647:tid 140345941669632] (70007)The timeout specified has expired: [client 10.15.138.10:64727] AH01102: error reading status line from remote server httpd-UDS:0, referer: https://ondemand-dev.hpc.nau.edu/pun/sys/dashboard/files/fs/scratch/jtb49
[Tue Feb 03 12:12:27.160530 2026] [proxy:error] [pid 956647:tid 140345941669632] [client 10.15.138.10:64727] AH00898: Error reading from remote server returned by /pun/sys/dashboard/files/upload/fs, referer: https://ondemand-dev.hpc.nau.edu/pun/sys/dashboard/files/fs/scratch/jtb49
[Tue Feb 03 12:13:44.915813 2026] [proxy_http:error] [pid 956647:tid 140345950062336] (70007)The timeout specified has expired: [client 10.15.138.10:64727] AH01102: error reading status line from remote server httpd-UDS:0, referer: https://ondemand-dev.hpc.nau.edu/pun/sys/dashboard/files/fs/scratch/jtb49
[Tue Feb 03 12:13:44.915974 2026] [proxy:error] [pid 956647:tid 140345950062336] [client 10.15.138.10:64727] AH00898: Error reading from remote server returned by /pun/sys/dashboard/files/upload/fs, referer: https://ondemand-dev.hpc.nau.edu/pun/sys/dashboard/files/fs/scratch/jtb49
