Tuesday, January 13, 1998
The Year 2000 Problem
The Millennial Bug has been much in the news lately, but this entry is prompted by a letter to the editor appearing in this morning's Providence Journal-Bulletin. The author, reputed to be a math and computer science major at Harvard University, attempted to deflate the year 2000 problem (Y2K as it is known in the industry) as being a non-problem. Now I would agree with anyone who stated that there has been a lot of hype about this problem, some of which has been spread by consulting companies eager to make a quick buck; nevertheless, the problem is quite real. Some companies may need to make a few minor changes to their software; others may need to make extensive revisions. (One major part of the problem, of course, is the task of analyzing all of your software systems to determine the extent of the problem at your site.)
This young man, proving once again that a little knowledge is a dangerous thing, states that there is no problem at all. He knows that computers really deal with ones and zeroes and that numbers up to 127 can be expressed with just seven binary digits, so that 99 and 100 can both be expressed with the same number of bits. He assumes, based on this fact, that he has proven that the millennial change will not require any additional space and is, therefore, a non-problem.
It is rather distressing that this young man, supposedly a student at what is alleged to be one of our finest universities, especially being a student who is majoring in the field of computer studies, can be so ignorant of both the history of computer applications and the current state of affairs. How can he have failed to read any of the numerous studies in the professional literature or in the relevant computer industry trade publications. Indeed, even the popular news magazines (such as Time) and major newspapers (such as the Journal-Bulletin) have published articles giving sufficient information about the problem to show that it is not a question of binary arithmatic.
The data processing field predates computers. More than a century has passed since Hollerith developed the punch card. When your data is stored on punched cards, you are sparing in your use of space. When computers came into commercial use, they may have filled a room, but they had less memory and less computing power than an inexpensive scientific calculator you can pick up in a discount store. Even in the 1960’s a “mainframe” computer may have had 32k of main memory. (Compare that with typical desktop or laptop computers today that might have a thousand times that amount of memory.) It was quite common in the fifties and sixties to design record layouts that only used two digits to store the year field for a date. After all, that is the common short way of expressing a date. The truly important point is not how much main memory was used, but how much storage space for the data. Data was stored on punched cards and on magnetic tape. Each character of data stored takes one byte. If you use two digits instead of four to express the year portion of a date, you save two column positions on a punch card (which typically had 80 positions available) and you save two bytes in each record you write to tape. On each date you save two bytes. If you have 500,000 records in a file, you save one million bytes of storage. If you have more records and more date fields, then you save even more storage. When disk storage became available, it was quite expensive; all the more reason to save space.
And why, in the 1960’s and 1970’s, would anyone really be concerned about the year 2000?
Okay, in 1979 I was involved in the design and implementation of a new on-line student record system for a university and we did use four digit years, but we had the advantage of developing all new software and converting the data from old “flat” sequential tape file format to VSAM files. We were aware of the year 2000 and had the luxury of designing for it. Most programmers were faced with maintaining compatibility with older file formats or with older software. Sure, they could have done something but there has always been a major impediment to intelligent design: management. Systems could have coped with the date problem in the seventies or eighties or early nineties, but proper design and coding would have meant additional time which means additional cost. Upper management would never allow that, spending money now for a problem so far in the future. And what information systems professional would be so foolish as to put his career on the line by pointing out executive shortsightedness.. (Especially considering the comment, which I have heard quite often, “By the time 2000 gets here, I won’t be in this job anyway, so why put my neck on the chopping block now?”)
Ah, but eventually chickens do come home to roost and the year 2000 is quite close now. Many two digit date fields will cause no problem at all. Others will cause major problems. How do you tell one from the other? You have to examine every date field. You have to look at every program. You have to examine the code in those programs, thousands of lines of code, patched and modified over the years by multiple programmers, and determine how each date is used. You have to understand the interdependencies of all of your date fields in all of your programs in all of your systems. After performing that analysis, you can begin to decide on how to make changes to avoid problems. You have to write code to allow the computer to decide how to handle the dates. Oh, you will change them all to four digit fields? Okay, now you have to convert and rewrite all of your files. Oh, and change every program that uses those files. And every program that uses data passed on by those other programs. Oh, and your data file conversion routines... they’ll have to be able to know if the date should begin with 19 or 20 because your file might already have 21st century dates (this license expires in..., this mortgage is paid up in..., etc.)
And, of course, the year 2000 will not wait until after midnight on December 31, 1999. If you have a system that does sales or production forecasting that projects eighteen months ahead, your Y2K crisis will hit you in the second week of this coming summer.
The question is not one of adding 1 to 99 and getting 100. The question, rather, is what year is represented by 00 or 01 or 43? Is that 1900 or 2000, 1901 or 2001, 1943 or 2043? Is a payment dated 01/01/00 right on time or is it one hundred years past due? Sort these records by date. Duhhhh.... People can usually make sense of a date. If you see an expiration date on a can of soup that says “Best if used by 04/01/00” you know that you can safely put that item in your shopping cart. Based on context, you can decide that it means April of 2000 and not April of 1900. A computer cannot know that unless a programmer has written code to handle that. That is the year 2000 problem.
And although COBOL programs are capable of working with binary data and performing binary arithmatic, that has absolutely nothing to do with the Y2K problem. It is simply a question of whether or not the year has been stored as a four digit field. And, although a majority of the business programming that has to be modified was written in COBOL, it is not a COBOL problem, it is a data problem. Programs written in PL/I (or any other language) have the same problem. The roots of the problem, using a two digit representation of the year field, antedate COBOL (which only dates from the early sixties).
The Y2K problem is real and is serious, but right now I am just amazed at the ignorance displayed by this letter writer. Indeed, since he goes on to stridently admonish businesses that they should stop and investigate before throwing away fortunes, I can only characterize it as arrogant ignorance. I would expect that any student involved with computer studies at any post-secondary institution would have a reasonable understanding of the Y2K problem; that a math and computer major at Harvard would be so ignorant is absolutely appalling.