Current version: par2cmdline-0.4-tbb-20150503
par2cmdline 0.4 with Intel Threading Building Blocks 4.3 Update 1
This is the standard CPU-only version and is considered stable. An experimental CPU/GPU version is available further down this page.
These are command line program executables. These pre-built binaries can be run from a USB thumb drive without requiring anything to be installed to your computer.
Download: Source code (GPLv2) [325kB 20150503]
Download: GNU/Linux 64-bit computers [238kB 20141125]>
Download: Mac OS X 10.5 32-bit and 64-bit Intel Macs [393kB 20141125]
Download: Windows XP/Vista/7/8 32-bit PCs [260kB 20150503]
Download: Windows Vista/7/8 64-bit PCs [288kB 20150503]
Warning: a 64-bit capable CPU is not sufficient to use this binary. You must also have a 64-bit version of Windows installed.
The changes in the 20150503 version are:
- in the Windows version, it was not possible to repair files which were larger than 2GB in size because of a 32-bit signed integer extension bug when handling file offsets. Fixed by always treating file offsets as a 64-bit unsigned integer in both the 32-bit and 64-bit versions of the program. This problem only affected the Windows version so there are no new binary builds of the Mac or Linux versions.
The changes in the 20141125 version are:
- when creating parity files, the main packet was not always being written to the parity files when they were processed concurrently because the main packet was not being safely appended to the list of packets to output because a non-thread-safe data container was being used. This bug would manifest when a large number of source files were being processed. Fixed by using a thread-safe data container.
- when creating parity files, the "Opening: <file>" messages will only be displayed for the first n source files, where n defaults to 200. This restriction was added so that creating parity files for a large number of source files would not cause a lot of scrolling which in turn would make the processing take a long time. Use the new -z<n> command line switch to set a different limit. Use -z0 to specify no limit.
- verification of extra files is now performed concurrently if requested to do so (previously they were always verified serially)
- the -t parameter can now include a positive integer value to restrict the logical number of CPUs with which to process data with.
- in the Windows version, the program's CPU scheduling priority can now be specified using the -p parameter
- the heap became fragmented during the verification of data files because the checksum data buffer was allocated and deallocated for each file verified, which resulted in the program's memory footprint (aka its "working set") steadily increasing during the verification phase. This would result in the 32-bit Windows version failing to verify large data sets because it could not allocate verification data buffers. To solve this, the checksum data buffer is no longer allocated and deallocated for each file verified. Instead, a pool of checksum objects is created and that pool of objects is then used and re-used for verifying data files. The size of the pool matches the number of logical CPUs which the program is asked to use. This change benefits all versions of the program because by reducing heap fragmentation, larger data sets can be processed using less virtual memory.
- numerous small code changes were made to remove unnecessary string copying. Such redundant copying would further fragment the heap as well as use up memory for temporary strings which did not need to be allocated in the first place.
- updated to Intel TBB 4.3 Update 1
- removed use of MAX_PATH or other fixed-size path buffers to avoid buffer overflow errors
- the program failed to build under newer C++ standard libraries because they no longer provide std::auto_ptr<T>. Fixed by either using std::unique_ptr<T> (if available) or by providing our own version of std::auto_ptr<T>.
- the 32-bit Mac OS X version now requires 10.5 or later
- stopped building the FreeBSD version because the FreeBSD ports system can now build the par2 program and TBB library without requiring any changes to the sources of either and because it isn't possible to build a "portable" version of the program, in the sense that the TBB library cannot be in the same directory as the par2 executable - it must be installed into /usr/lib/, and that is a job best left to the FreeBSD ports system.
This is the experimental CPU/GPU version. It may be useless to you. It is considered obsolete.
Download: [316kB] Source code (GPLv2). See CPU-only version (the source files are the same).
The changes in the 20090203 version are:
- fixed a bug which affected the Linux and Mac versions whereby repairs would fail if the file being repaired was short or had one or two bad blocks (because the async write to the file's last byte was failing).
- on Windows, the program now stores directory paths in par2 files using '/' as the path separator instead of '\' (as per the Par 2.0 specification document). Note: directory paths are stored only when the '-d' switch is used.
- merged the sources from the CPU-only and CPU/GPU versions so that both versions now build from the same set of source files using different 'configure' options (Mac, Linux, FreeBSD) or project files (Windows). See the source distribution's README_FIRST.txt for building instructions.
About the NVIDIA CUDA version
There is no guarantee that this program will perform correctly. It may not create or repair data files correctly due to unknown bugs in the program code. Even though it has been tested on test data and correctly worked on those files, it may not work on your files since the GPU program is new and may have unknown bugs in it. Caveat emptor.
This version of the par2 program has been modified to utilise NVIDIA CUDA 2.0 technology, which enables it to process data using the processor (GPU) on certain video cards. Most of the processing is still performed by the computer's CPU but some will be offloaded to the video card's GPU. The amount of offloading depends on how much speed/power the GPU has. After processing all of the data for par2 creation or par2 repair, the program will display, as a percentage, how much of the processing was done by the GPU (or whether the GPU was not available for use).
There are two factors which determine how much processing the GPU can provide:
- the amount of video card memory. Some of the memory will be used for the video display, and this is partly determined by the operating system. For example, if the OS/video-driver performs drawing acceleration using extra video memory, less memory is available for CUDA use. For example, on a 128MB video card running Mac OS X 10.5, only about 22MB was available for use by CUDA applications. If the parity data totals more than 22MB, only a portion of that data can be processed by the GPU. Of course this is only an example and your system will probably have a different amount of memory available for CUDA use. Because of OS use, it is recommended that for Mac OS X, a video card with at least 256MB of video memory is recommended. For Windows XP, a video card of at least 128MB is recommended, and for Windows Vista, at least 256MB is recommended.
- the video card's speed, which depends on both the GPU's speed and the video memory's bandwidth. For the GPU, its speed depends on both its clock rate and the number of stream processors it has. For example, a GeForce 8600 GT has 32 stream processors compared to a 9800 GTX which has 128 stream processors. Memory bandwidth depends on both how wide the data path is between the GPU and its memory (for example, a 64-bit wide data bus will transfer data half as quickly as a 128-bit wide data bus), as well as the clock rate of the video memory - the higher the clock rate, the faster the GPU can move data from/to the video memory and this in turn affects how fast it processes data.
Requires a "Compute Capability 1.1" device, which is any 200 series GeForce card, any 9 series GeForce card, and most 8 series GeForce card EXCEPT for the first generation cards such as the 8800 Ultra, 8800 GTX, 8800 GTS, and certain Tesla and Quadro cards: search the web for "Compute Capability 1.0" devices. 1.0-only devices are not capable of being used. Cards such as the 8400, 8500, 8600, 8800 GS, 8800 GT, 8800M GTS (mobile), and 8800M GTX (mobile) are capable of being used.
Mobile variants will also work, for example, 8600 refers to both the desktop and mobile versions such as 8600 GT (desktop) and 8600M GT (mobile).
The CUDA runtime/toolkit may need to be downloaded and installed by you because NVIDIA do not permit redistribution of it with third party executables. If you need to install the runtime, please search for "NVIDIA CUDA toolkit" in your favourite search engine.
On Windows, it appears that the CUDA runtime/toolkit ships with recent video card driver software from NVIDIA. You can verify this by checking for it at this path: "C:\Windows\system32\nvcuda.dll".
On Mac OS X 10.5, check for the driver at this path: "/System/Library/Extensions/CUDA.kext", and for the runtime library at this path: "/usr/local/cuda/lib/libcudart.dylib". Mac OS X users will probably need to download and install the CUDA runtime/toolkit. You should be aware that the default install options for the CUDA runtime/toolkit does not install the required CUDA driver, so it needs to be installed by performing a custom install of the runtime/toolkit: be sure to check the checkbox for "CUDA.kext".
- only available as a 32-bit executable for Windows XP and later, and Intel Mac OS X 10.5.2 and later. Due to time constraints, other systems such as GNU/Linux are not available at this time. You are most welcome to modify/build/test it for other systems if you feel up to the challenge :)
- "low end" GPUs are "slow", ie, they do not contribute to much of the processing. For example, to create 128MB (256 blocks of 524288 bytes) of parity data on a 128MB 8600M GT in a Core 2 Duo 2.2GHz machine, about 2% of the workload was offloaded to the GPU. For the same 128MB of parity data, a 256MB 8600M GT in a Core 2 Duo 2.4GHz machine offloaded about 5% of the workload to the GPU (mainly because having more memory allowed more data to be processed on the video card). It is expected that "high end" video cards will have even higher GPU offloading, but without access to such a video card (yes, some of us can't splurge on that top-of-the-line video card!), it's mere speculation as to what sort of performance will occur. :) Maybe someone will send an email with some answers :)
- sometimes the CUDA runtime reports little or no available memory on the video card for use by programs, which will result in this version not being able to use the GPU for processing. This problem is probably related to video display acceleration by the OS, in which case, closing windows and/or applications will probably free up video memory. It may, however, require a reboot to reset the video card (you should do this only as a last resort).
See the "--- About the NVIDIA CUDA version ---" section in the distribution's README_FIRST.txt file for more information such as source code building instructions/requirements.
Copyright 2007-2009, chuchusoft.com
Hosted by www.000webhost.com