Australia's #1 industrial directory for equipment & suppliers

NULLs: What're they all about?

Supplier: Manufacturing Software
22 April, 2008

Long time ManuSoft users will probably already know that NULLs in your ManuSoft data files are a problem, but you probably don't know exactly what they are and why they are problem or how they get there. This entry is a quick discussion on the topic of NULLs.

So, firstly why "NULL" in all capitals? This is because we are referring to a particular 'control code' as defined by the ASCII character set. All modern PCs use the ASCII character set which defines how all the letters, number and symbols on your keyboard are stored as numbers in memory, as well as defining a number of non-displaying characters, like "CR" (Carriage Return), "LF" (Line Feed) and also one called "NULL".

These non-displaying 'control codes' take up the first 32 positions in the ASCII table, numbered from 0 to 31, and NULL takes position 0.

ManuSoft has always attempted to avoid all control codes in its data files, for several reasons, but avoiding the NULL character is particularly important because in the C/C++ programming language that ManuSoft is written in the NULL character has a special meaning; it marks the end of a string of characters.

So, say ManuSoft wants to read some information, like the standard wording for a Purchase Order, from your WORK.DAT file. In the C++ code we open the WORK.DAT file, position the read pointer at the particular location where that standard wording is saved, and issue a command to read in a block of 600 characters (the maximum length of the standard wording text.)

If, for whatever reason, there was a NULL character at the 200 character mark then only the first 200 characters would actually be read by the C++ file sub-system. All the text you had typed in to positions 201 through 600 would be ignored and would not get printed.

Now that example is not particularly dramatic, but when you start considering other possibilities, like a NULL in the middle of a stock record, so that the stock code and description read and update just fine, but the quantity in stock seems to never change, and you can see what sorts of problems can appear. Sometimes quite inexplicable behaviour can be traced back to a NULL in an odd spot.

So how do NULLs appear in your files? ManuSoft obviously tries to avoid creating them itself wherever possible, but some further facts about the behaviour of C/C++ and Windows makes them crop up from time to time. Firstly, there's the problem of buffers.

When ManuSoft wants to create a brand new stock record, for example, we initialize a 1024 character "buffer" in memory. We would then normally fill that buffer with exactly 1024 characters before writing the contents to the end of the file.

But if for some reason the buffer was only filled with 500 characters then position 501 would contain a NULL (to mark the end of the string), and that NULL could then be written to the file along with the rest of the buffer.

A second main cause is a file being expanded when it is written to. If a file is 50,000 bytes long and a C/C++ program asks Windows to write some characters to the file at position 55,000, then characters 50,001 to 54,999 in the file will end up being NULLs.

And then there are just the regular glitches that can happen on any computer network given enough traffic and enough time. Because the NULL character is the "default" for unallocated memory, etc. if anything goes wrong with any transferring of information inside a computer or around a network then there's a good chance a NULL could end up in the data somewhere.

I said earlier that ManuSoft has always tried to avoid control characters in its data files, but over time we've had to compromise our position on this. The first was when we made the ManuSoft .DAT files compatible with ODBC access, back in Version 6.1.

To allow the Microsoft ODBC Text Driver to read our .DAT files we had to insert the control characters "CR" and "LF" at the end of every record in the database, as this is the Microsoft DOS/Windows standard for marking the "end of a line". The next big change was when we converted to using the .DBF file format in Version 6.4 of ManuSoft.

The .DBF file format defines a "header" that must appear at the start of every file, before the actual data. This header information includes many NULLs and other control codes. But despite these changes over the years the actual data itself inside our data files continues to exclude all control characters.

Because NULLs in particular are a problem in our data files we created a utility called "FixNULLs" which you can use to clean up your data files, changing all NULL characters to the harmless "space" character instead. This utility used to be a very ordinary looking "DOS mode" program which you had to run at the command prompt.

After changing to the .DBF file format this utility has been updated to have a Windows interface and to be a lot more user friendly. You can queue up a number of files to all be checked (by browsing for them, or by drag-and-dropping the files from Windows Explorer) and when run the program gives more feedback and will automatically skip "safe" NULL characters in the the header section of any .DBF files.

The FixNULLs program can be downloaded from the Miscellaneous Files section of the support web site. It would be very hard for you to do any unwitting damage to your data files using this utility, but we would generally recommend you only use it after consulting with ManuSoft Support personnel.