Opened 3 weeks ago

Last modified 10 days ago

#147 new defect

survexport doesn't flush csv files on windows

Reported by: Philip Schuchardt Owned by: Olly Betts
Priority: minor Milestone:
Component: aven Version: 1.4.19
Keywords: Cc:

Description

Problem: On Windows the exported CSV can appear “missing” or partially written immediately after survexport exits because the MinGW-compiled binary (used in the release under C:/Program Files (x86)/Survex) leaves its FILE* buffers cached and doesn’t FlushFileBuffers?. The next task in CaveWhere? races ahead and sees a zero/partial-length file even though the process has closed. Native MSVC builds don’t exhibit this because the MSVC CRT already flushes at fclose(). Full csv file write can take more than 500ms after survexport is closed with the official build.

Expected: When survexport exits it should FlushFileBuffers? (or open with write-through) so downstream tasks see the final file quickly.

Diagnosis: The official release build lacks the explicit flush; the MSVC-built version does not show the race.

Impact: Export/CSV parsing tasks occasionally fail or read incomplete data on Windows installations using the standard Survex release. I have code in CaveWhere? to reduce this issue for the official release, but I'm shipping CaveWhere? with MSVC build that doesn't' have this flushing problem. This is pretty low priority, unless people are using survexport in some data processing pipeline on windows.

I'm not sure if cavern has this same problem, but I've only observed it in survexport.

Change History (6)

comment:1 Changed 3 weeks ago by Philip Schuchardt

It's likely that it's also in cavern too, but I was using MSVC build of cavern so that's probably why I didn't see it.

comment:2 Changed 11 days ago by Olly Betts

FWIW, mingw uses one of Microsoft's CRTs (in particular our build currently uses UCRT) so it's a bit surprising that the mingw-build behaves differently if this is really being done by fclose() there...

I'm happy to add a suitable call, but FlushFileBuffers() is Microsoft-specific and seems to take a HANDLE so not very friendly to portable code - I'm definitely not keen to rewrite all the code to work in terms of Microsoft proprietary IO functions rather than standard C/C++/POSIX ones. The POSIX equivalent is probably fsync()/fdatasync() but it doesn't look like that's available in mingw.

comment:3 Changed 10 days ago by Olly Betts

Does adding _commit(fileno(file_handle)); before fclose(file_handle); help? It seems _commit() is the most direct Microsoft equivalent to fsync() (and more useful here as it takes a file descriptor).

I'd be much happier understanding why this only affects mingw though.

comment:4 Changed 10 days ago by Philip Schuchardt

I haven't actually built survex with mingw. I've just confirmed that it is an issue with the release version of survex. Do you still cross compile it under linux? Do you have a link to instructions or git action that builds it. If so, I can try to reproduce the issue on mingw in a development environment and see if _commit helps.

comment:5 Changed 10 days ago by Olly Betts

It's no longer cross-compiled - the huge number of dependencies that using GDAL pulled in made that pretty much unfeasible as they'd all also need to be cross-compiled first (or a way found to make use of packaged versions).

It's now build by the msys2-ucrt64 action in .github/workflows/main.yml.

comment:6 Changed 10 days ago by Olly Betts

Another thought - if your MSVC build isn't using UCRT, this could be a UCRT vs MSVCRT difference.

I picked UCRT as it's the most modern Microsoft CRT and has been the mingw default for more than 3 years (https://www.msys2.org/news/#2022-10-29-changing-the-default-environment-from-mingw64-to-ucrt64) but if changing the build to use msystem: mingw64 solves this that's an option.

Note: See TracTickets for help on using tickets.