Approximation to numbers

Although it may not seem so to the beginner, it is important to examine ways in which numbers are represented.

  1. Number representation

    We humans normally represent a number in decimal (base 10) form, although modern computers use binary (base 2) and also hexadecimal (base 16) forms. Numerical calculations usually involve numbers that cannot be represented exactly by a finite number of digits. For instance, the arithmetical operation of division often gives a number which does not terminate; the decimal (base 10) representation of 2/3 is one example. Even a number such as 0.1 which terminates in decimal form would not terminate if expressed in binary form. There are also the irrational numbers such as , which do not terminate. In order to carry out a numerical calculation involving such numbers, we are forced to approximate them by a representation involving a finite number of significant digits (S). For practical reasons (for example, the size of the back of an envelope or the `storage' available in a machine), this number is usually quite small. Typically, a `single precision' number on a computer has an accuracy of only about 6 or 7 decimal digits (cf. below).

    To five significant digits (5S ), 2/3 is represented by 0.66667, by 3.1416, and by 1.4142. None of these are exact representations, but all are correct to within half a unit of the fifth significant digit. Numbers should normally be presented in this sense, correct to the number of digits given.

    If the numbers to be represented are very large or very small, it is convenient to write them in floating point notation (for example, the speed of light 2.99792 x 108 m/s, or the electronic charge 1.6022 x 10-19 coulomb). As indicated, we separate the significant digits (the mantissa) from the power of ten (the exponent); the form in which lhe exponent is chosen so that the magnitude of the mantissa is less than 10, but not less than 1 is referred to as the scientific notation.

    In 1985,the Institute of Electrical and Electronics Engineers published a standard for binary floating point arithmetic. This standard, known as the IEEE Standard 754, has been widely adopted (it is very common on workstations used for scientific computation). The standard specifies a format for `single precision' numbers and a format for `double precision' numbers. The single precision format allows 32 binary digits (known as bits) for a floating point number with 23 of these bits allocated to the mantissa. In the double precision format the values are 64 and 52 bits, respectively. On conversion from binary to decimal, it turns out that any IEEE Standard 754 single precision number has an accuracy of about six or seven decimal digits, and a double precision number an accuracy of about 15 or 16 decimal digits.

  2. Round-off error

    The simplest way of reducing the number of significant digits in the representation of a number is merely to ignore the unwanted digits. This procedure, known as chopping, was used by many early computers. A more common and better procedure is rounding, which involves adding 5 to the first unwanted digit and then chopping. For example, chopped to four decimal places (4D ), is 3.1415, but it is 3.1416 when rounded; the representation 3.1416 is correct to five significant digits (5S). The error involved in the reduction of the number of digits is called round-off error. Since is 3.14159.. , we note that chopping has introduced much more round-off error than rounding.

  3. Truncation error

    Numerical results are often obtained by truncating an infinite series or iterative process (cf. STEP 5). Whereas round-off error can be reduced by working to more significant digits, truncation errors can be reduced by retaining more terms in the series or more steps in the iteration; this, of course, involves extra work (and perhaps expense!).

  4. Mistakes

    In the language of Numerical Analysis, a mistake (or blunder) is not an error! A mistake is due to fallibility (usually human, not machine). Mistakes may be trivial, with little or no effect on the accuracy of the calculation, or they may be so serious as to render the calculated results quite wrong. There are three things which may help to eliminate mistakes:
    1. care;
    2. checks, avoiding repetition;
    3. knowledge of the common sources of mistakes.
    Common mistakes include: transposition of digits (for example, reading 6238 as 6328); misreading of repeated digits (for example, reading 62238 as 62338); misreading of tables (for example, referring to a wrong line or a wrong column); incorrectly positioning a decimal point; overlooking signs (especially near sign changes).

  5. Examples

    The following examples illustrate rounding to four decimal places (4D):

    The following example illustrates rounding to four significant digits (4S):


  1. What may limit the accuracy of a number in a calculation?
  2. What is the convention adopted in rounding?
  3. How can mistakes be avoided`?


  1. What are the floating point representations of the following numbers: 12.345, 0.80059, 296.844, 0.00519?
  2. For each of the following numbers: 34.78219, 3.478219, 0.3478219, 0.03478219,
  3. chop to three significant digits (3S),
  4. chop to three decimal places (3D),
  5. round to three significant digits (3S),
  6. round to three decimal places (3D ).

  • For the number

    5/3 = 1.66666...

    determine the magnitude of the round-off error when it is represented by a number obtained from the decimal form by:

    1. chopping to 3S,
    2. chopping to 3+D,
    3. rounding to 3S,
    4. rounding to 3D.