What Is the Largest Value That Can Be Stored in One Byte

Subsections

7.four.1 Bits, bytes, and words
7.4.2 Binary, Octal, and Hexadecimal
7.four.3 Numbers
- 7.4.3.1 Integers
- 7.iv.3.two Real numbers
7.iv.4 Instance study: Network traffic
7.iv.5 Text
7.4.6 Data with units or labels
- 7.4.6.one Dates
- 7.4.half dozen.ii Money
7.iv.7 Binary values
7.4.8 Retentivity for processing versus memory for storage

seven.four Estimator Retention

Given that we are e'er going to store our information on a figurer, it makes sense for united states to find out a little bit about how that information is stored. How does a calculator store the letter `A' on a hard bulldoze? What about the value $\frac{1}{3}$ ?

Information technology is useful to know how to store information on a computer because this volition allow us to reason almost the amount of infinite that will be required to shop a data set, which in plough will allow us to make up one's mind what software or hardware nosotros will demand to be able to piece of work with a data fix, and to decide upon an appropriate storage format. In order to access a data set correctly, it tin also be useful to know how data has been stored; for example, at that place are many ways that a simple number could be stored. We will besides look at some important limitations on how well information can be stored on a calculator.

7.iv.ane Bits, bytes, and words

Image CDsurfacecropped-0

The surface of a CD magnified many times to show the pits in the surface that encode data.^7.2

The most fundamental unit of measurement of computer retentivity is the fleck. A chip tin can be a tiny magnetic region on a hard deejay, a tiny dent in the reflective fabric on a CD or DVD, or a tiny transistor on a memory stick. Whatsoever the physical implementation, the important matter to know about a bit is that, like a switch, it tin only accept one of ii values: information technology is either "on" or "off".

$% the gap below seems to be important!?? \par \includegraphics{store-bitbyteword}$

A collection of 8 bits is called a byte and (on the majority of computers today) a drove of four bytes, or 32 bits, is called a word. Each private data value in a data ready is normally stored using ane or more bytes of memory, merely at the lowest level, any data stored on a computer is but a large collection of bits. For example, the first 256 bits (32 bytes) of the electronic format of this book are shown below. At the everyman level, a information set is merely a series of zeroes and ones like this.

00100101 01010000 01000100 01000110 00101101 00110001 00101110 00110100 00001010 00110101 00100000 00110000 00100000 01101111 01100010 01101010 00001010 00111100 00111100 00100000 00101111 01010011 00100000 00101111 01000111 01101111 01010100 01101111 00100000 00101111 01000100 00100000

The number of bytes and words used for an private information value will vary depending on the storage format, the operating system, and fifty-fifty the calculator hardware, but in many cases, a unmarried letter or character of text takes up i byte and an integer, or whole number, takes upwards ane word. A real or decimal number takes up one or ii words depending on how it is stored.

For example, the text "hello" would take up 5 bytes of storage, one per graphic symbol. The text "12345" would also require 5 bytes. The integer 12,345 would accept up four bytes (1 word), as would the integers 1 and 12,345,678. The real number 123.45 would accept up 4 or 8 bytes, as would the values 0.00012345 and 12345000.0.

7.4.2 Binary, Octal, and Hexadecimal

A piece of calculator retention can be represented past a serial of 0's and 1'south, with one digit for each bit of memory; the value 1 represents an "on" bit and a 0 represents an "off" bit. This notation is described as binary form. For instance, below is a single byte of memory that contains the letter `A' (ASCII code 65; binary 1000001).

01000001

A single word of memory contains 32 bits, so it requires 32 digits to represent a word in binary grade. A more than user-friendly notation is octal, where each digit represents a value from 0 to vii. Each octal digit is the equivalent of three binary digits, then a byte of retentiveness can be represented by 3 octal digits.

Binary values are pretty like shooting fish in a barrel to spot, but octal values are much harder to distinguish from normal decimal values, so when writing octal values, it is common to precede the digits by a special character, such as a leading `0'.

As an instance of octal course, the binary code for the character `A' splits into triplets of binary digits (from the correct) similar this: 01 000 001. So the octal digits are 101, commonly written 0101 to emphasize the fact that these are octal digits.

An fifty-fifty more than efficient way to represent retentivity is hexadecimal form. Hither, each digit represents a value between 0 and 16, with values greater than 9 replaced with the characters a to f. A single hexadecimal digit corresponds to 4 bits, so each byte of retentiveness requires only 2 hexadecimal digits. As with octal, it is mutual to precede hexdecimal digits with a special character, e.g., 0x or #. The binary form for the character `A' splits into two quadruplets: 0100 0001. The hexadecimal digits are 41, commonly written 0x41 or #41.

Another standard practice is to write hexadecimal representations every bit pairs of digits, corresponding to a single byte, separated by spaces. For instance, the retentivity storage for the text "simply testing" (12 bytes) could be represented as follows:

6a 75 73 74 20 74 65 73 74 69 6e 67

When displaying a block of computer retentivity, some other standard do is to nowadays 3 columns of data: the left cavalcade presents an offset, a number indicating which byte is shown get-go on the row; the middle column shows the actual memory contents, typically in hexadecimal form; and the right column shows an interpretation of the memory contents (either characters, or numeric values). For example, the exam "simply testing" is shown below complete with beginning and character brandish columns.

0  :  6a 75 73 74 20 74 65 73 74 69 6e 67  |  just testing

We volition use this format for displaying raw blocks of memory throughout this section.

seven.4.three Numbers

Recollect that the most basic unit of measurement of memory, the chip, has two possible states, "on" or "off". If we used one bit to store a number, nosotros could use each different state to correspond a different number. For case, a bit could be used to represent the numbers 0, when the scrap is off, and 1, when the bit is on.

We volition demand to store numbers much larger than 1; to practise that we need more bits.

If nosotros use two bits together to store a number, each flake has two possible states, so there are four possible combined states: both bits off, first bit off and second chip on, commencement bit on and 2nd flake off, or both bits on. Over again using each state to represent a different number, we could store iv numbers using two bits: 0, 1, 2, and 3.

The settings for a series of bits are typically written using a 0 for off and a ane for on. For example, the 4 possible states for two bits are 00, 01, 10, 11. This representation is called binary notation.

In full general, if we use k $.25, each scrap has two possible states, and the bits combined tin correspond two^k possible states, and so with 1000 bits, nosotros could represent the numbers 0, i, ii upwardly to 2^g - 1.

vii.4.3.one Integers

Integers are commonly stored using a discussion of memory, which is iv bytes or 32 bits, and then integers from 0 up to 4,294,967,295 (ii³² - one) can exist stored. Beneath are the integers one to 5 stored as four-byte values (each row represents one integer).

        0  :  00000001 00000000 00000000 00000000  |  1  4  :  00000010 00000000 00000000 00000000  |  2  eight  :  00000011 00000000 00000000 00000000  |  3 12  :  00000100 00000000 00000000 00000000  |  iv 16  :  00000101 00000000 00000000 00000000  |  5

This may await a little strange; inside each byte (each block of viii bits), the bits are written from right to left like we are used to in normal decimal notation, but the bytes themselves are written left to right! It turns out that the computer does not mind which order the bytes are used (equally long as we tell the computer what the lodge is) and most software uses this left to right gild for bytes.^7.3

Two issues should immediately exist apparent: this does non allow for negative values, and very large integers, 2³² or greater, cannot be stored in a word of memory.

In practice, the first problem is solved by sacrificing one bit to indicate whether the number is positive or negative, and so the range becomes -ii,147,483,647 to 2,147,483,647 ( $\pm 2^{31} - 1$ ).

The second problem, that we cannot store very large integers, is an inherent limit to storing information on a computer (in finite retention) and is worth remembering when working with very large values. Solutions include: using more retention to shop integers, e.thou., 2 words per integer, which uses up more than memory, then is less retention-efficient; storing integers every bit real numbers, which can introduce inaccuracies (see below); or using capricious precision arithmetic, which uses as much memory per integer as is needed, simply makes calculations with the values slower.

Depending on the computer language, information technology may too exist possible to specify that merely positive (unsigned) integers are required (i.east., reclaim the sign bit), in order to gain a greater upper limit. Conversely, if only very pocket-size integer values are needed, information technology may be possible to use a smaller number of bytes or even to work with only a couple of bits (less than a byte).

vii.iv.3.ii Existent numbers

Real numbers (and rationals) are much harder to store digitally than integers.

Think that one thousand bits can stand for 2^g different states. For integers, the first state can correspond 0, the second state can represent one, the tertiary country can stand for ii, and so on. We can just go equally high as the integer 2^yard - 1, but at least we know that nosotros tin account for all of the integers up to that point.

Unfortunately, we cannot do the same thing for reals. Nosotros could say that the outset land represents 0, but what does the 2d state represent? 0.one? 0.01? 0.00000001? Suppose we chose 0.01, so the starting time state represents 0, the second country represents 0.01, the third land represents 0.02, and and so on. Nosotros can at present only go as high as 0.01 x (2^yard - 1), and we have missed all of the numbers betwixt 0.01 and 0.02 (and all of the numbers between 0.02 and 0.03, and infinitely many others).

This is another important limitation of storing information on a estimator: there is a limit to the precision that we tin achieve when we store real numbers. Almost real values cannot exist stored exactly on a computer. Examples of this problem include not only exotic values such equally transcendental numbers (e.yard., $\pi$ and east ), only also very simple everyday values such as $\frac{1}{3}$ or fifty-fifty 0.i. This is not as dreadful as it sounds, because even if the exact value cannot be stored, a value very very close to the truthful value can exist stored. For example, if nosotros use eight bytes to store a real number and so nosotros can store the distance of the earth from the sun to the nearest millimetre. And then for applied purposes this is normally not an issue.

The limitation on numerical accuracy rarely has an effect on stored values because it is very hard to obtain a scientific measurement with this level of precision. However, when performing many calculations, fifty-fifty tiny errors in stored values can accrue and event in significant problems. We will revisit this issue in Chapter xi. Solutions to storing real values with full precision include: using even more memory per value, particularly in working, (e.g., 80 $.25 instead of 64) and using arbitrary-precision arithmetics.

A real number is stored as a floating-indicate number, which means that it is stored as two values: a mantissa, grand , and an exponent, e , in the form m x two^e . When a single give-and-take is used to store a real number, a typical organisation^seven.4uses 8 bits for the exponent and 23 bits for the mantissa (plus i bit to point the sign of the number).

$% the gap below seems to be important!?? \par \includegraphics{store-floatingpoint}$

The exponent mostly dictates the range of possible values. Xi bits allows for a range of integers from -127 to 127, which ways that it is possible to store numbers equally pocket-sized as ten^-39 (2^-127 ) and as big as 10³⁸ (2¹²⁷ ).^7.5

The mantissa dictates the precision with which values can be represented. The issue here is non the magnitude of a value (whether it is very large of very pocket-sized), but the amount of precision that tin can exist represented. With 23 bits, it is possible to correspond 2²³ different real values, which is a lot of values, but still leaves a lot of gaps. For instance, if we are dealing with values in the range 0 to one, nosotros tin can take steps of $\frac{1}{2^{23}} \approx 0.0000001$ , which ways that we cannot represent any of the values between 0.0000001 and 0.0000002. In other words, we cannot distinguish between numbers that differ from each other past less than 0.0000001. If we deal with values in the range 0 to ten,000,000, we tin can merely accept steps of $\frac{10,000,000}{2^{23}} \approx 1$ , then we cannot distinguish betwixt values that differ from each other by less than 1.

Below are the real values 1.0 to v.0 stored as four-byte values (each row represents i real value). Retrieve that the bytes are ordered from left to right then the most important byte (containing the sign bit and most of the exponent) is the i on the right. The start bit of the byte second from the right is the last bit of the mantissa.

        0  :  00000000 00000000 10000000 00111111  |  one  four  :  00000000 00000000 00000000 01000000  |  two  8  :  00000000 00000000 01000000 01000000  |  3 12  :  00000000 00000000 10000000 01000000  |  4 sixteen  :  00000000 00000000 10100000 01000000  |  5

For instance, the exponent for the start value is 0111111 1, which is 127. These exponents are "biased" by 127 and so to get the terminal exponent we subtract 127 to get 0. The mantissa has an implicit value of one plus, for fleck i , the value ii^-i . In this example, the entire mantissa is nil, and then the mantissa is just the (implicit) value 1. The terminal value is 2⁰ 10 i = 1.

For the terminal value, the exponent is meg one, which is 129, less 127 is 2. The mantissa is 01 followed by 49 zeroes, which represents a value of (implicit) ane + two^-2 = 1.25. The terminal value is ii² 10 1.25 = five.

When real numbers are stored using two words instead of one, the range of possible values and the precision of stored values increases enormously, but at that place are nevertheless limits.

7.4.4 Instance written report: Network traffic

The central It department of the University of Auckland has been collecting network traffic data since 1970. Measurements were made on each package of data that passed through a certain location on the network. These measurements included the time at which the parcel reached the network location and the size of the packet.

The fourth dimension measurements are the time elapsed, in seconds, since Jan $1^{\rm st}$ 1970 and the measurements are extremely authentic, being recorded to the nearest 10,000 ${}^{\rm th}$ of a 2nd. Over time, this has resulted in numbers that are both very large (there are 31,536,000 seconds in a year) and very precise. Figure vii.2 shows several lines of the data stored equally plainly text.

**Figure 7.2:** Several lines of network packet data as a plain text file. The number on the left is the number of seconds since Jan 1 ${}^{\rm st}$ 1970 and the number on the correct is the size of the packet (in bytes).
1156748010.47817 lx 1156748010.47865 1254 1156748010.47878 1514 1156748010.4789 1494 1156748010.47892 114 1156748010.47891 1514 1156748010.47903 1394 1156748010.47903 1514 1156748010.47905 lx 1156748010.47929 lx ...

Figure 7.2: Several lines of network packet data as a plain text file. The number on the left is the number of seconds since Jan 1 ${}^{\rm st}$ 1970 and the number on the correct is the size of the packet (in bytes).

1156748010.47817 lx 1156748010.47865 1254 1156748010.47878 1514 1156748010.4789 1494 1156748010.47892 114 1156748010.47891 1514 1156748010.47903 1394 1156748010.47903 1514 1156748010.47905 lx 1156748010.47929 lx ...

By the middle of 2007, the measurements were approaching the limits of precision for floating point values.

The data were analysed in a arrangement that used viii bytes per floating point number (i.e., 64-chip floating-betoken values). The IEEE standard for 64-chip or "double-precision" floating-indicate values uses 52 bits for the mantissa. This allows for approximately^7.6 2⁵² different real values. In the range 0 to one, this allows for values that differ by as niggling every bit $\frac{1}{2^{52}} \approx 0.0000000000000002$ , but when the numbers are very big, for instance on the society of 1,000,000,000, it is but possible to store values that differ by $1,000,000,000 \times \frac{1}{2^{52}} \approx 0.0000002$ . In other words, double-precision floating-point values can be stored with up to merely 16 significant digits.

The time measurements for the network packets differ by as little every bit 0.00001 seconds. Put another way, the measurements take xv meaning digits, which ways that it is possible to shop them with total precision as 64-bit floating-signal values, but but only.

Furthermore, with values so close to the limits of precision, arithmetic performed on these values tin can become inaccurate. This story is taken upwards again in Department 11.5.14.

7.four.five Text

Text is stored on a computer by first converting each character to an integer and then storing the integer. For case, to store the letter `A', nosotros volition really store the number 65; `B' is 66, `C' is 67, and so on.

A alphabetic character is usually stored using a single byte (viii bits). Each letter of the alphabet is assigned an integer number and that number is stored. For example, the alphabetic character `A' is the number 65, which looks like this in binary format: 01000001. The text "how-do-you-do" (104, 101, 108, 108, 111) would wait like this: 01101000 01100101 01101100 01101100 01101111

The conversion of letters to numbers is called an encoding. The encoding used in the examples above is called ASCII^7.7and is great for storing (American) English text. Other languages require other encodings in order to let non-English characters, such every bit `ö'.

ASCII simply uses 7 of the 8 bits in a byte, then a number of other encodings are merely extensions of ASCII where any number of the form 0xxxxxxx matches the ASCII encoding and the numbers of the form 1xxxxxxx specify different characters for a specific set of languages. Some common encodings of this class are the ISO 8859 family of encodings, such as ISO-8859-1 or Latin-ane for West European languages, and ISO-8859-ii or Latin-ii for East European languages.

Even using all 8 bits of a byte, it is only possible to encode 256 (2⁸ ) different characters. Several Asian and eye-Eastern countries have written languages that use several g different characters (e.chiliad., Japanese Kanji ideographs). In lodge to shop text in these languages, information technology is necessary to use a multi-byte encoding scheme where more one byte is used to store each character.

UNICODE is an endeavour to provide an encoding for all of the characters in all of the languages of the World. Every character has its own number, often written in the form U+xxxxxx. For example, the letter `A' is U+000041 ^7.8and the letter `ö' is U+0000F6. UNICODE encodes for many thousands of characters, so requires more than than 1 byte to store each character. On Windows, UNICODE text will typically use 2 bytes per graphic symbol; on Linux, the number of bytes volition vary depending on which characters are stored (if the text is only ASCII it volition simply take one byte per character).

For example, the text "merely testing" is shown below saved via Microsoft'southward Notepad in three dissimilar encodings: ASCII, UNICODE, and UTF-viii.

0  :  6a 75 73 74 twenty 74 65 73 74 69 6e 67  |  only testing

The ASCII format contains exactly 1 byte per character. The fourth byte is binary code for the decimal value 116, which is the ASCII code for the letter of the alphabet `t'. Nosotros can see this byte pattern several more times, whereever there is a `t' in the text.

        0  :  ff fe 6a 00 75 00 73 00 74 00 xx 00  |  ..j.u.due south.t. . 12  :  74 00 65 00 73 00 74 00 69 00 6e 00  |  t.east.southward.t.i.n. 24  :  67 00                                |  g.

The UNICODE format differs from the ASCII format in ii means. For every byte in the ASCII file, there are now two bytes, one containing the binary code we saw before followed by a byte containing all zeroes. At that place are also two boosted bytes at the start. These are called a byte social club mark (BOM) and betoken the social club (endianness) of the two bytes that make up each alphabetic character in the text.

        0  :  ef bb bf 6a 75 73 74 xx 74 65 73 74  |  ...just test 12  :  69 6e 67                             |  ing

The UTF-8 format is mostly the same every bit the ASCII format; each letter of the alphabet has only one byte, with the same binary lawmaking as before because these are all common english letters. The departure is that there are 3 bytes at the first to act every bit a BOM.^7.9

seven.4.6 Data with units or labels

When storing values with a known range, it can exist useful to take advantage of that knowledge. For case, suppose nosotros want to store information on gender. There are (usually) but 2 possible values: male and female. Ane mode to shop this information would exist equally text: "male" and "female". Nevertheless, that approach would take upwardly at least four to 6 bytes per observation. We could do better by storing the information as an integer, with 1 representing male and ii representing female, thereby merely using equally little as one byte per observation. We could do even better past using but a unmarried flake per observation, with "on" representing male and "off" representing female.

On the other paw, storing "male" is much less probable to pb to confusion than storing 1 or past setting a bit to "on"; it is much easier to remember or intuit that "male" corresponds to male. This leads us to an platonic solution where but a number is stored, merely the encoding relating "male" to 1 is also stored.

7.4.half-dozen.ane Dates

Dates are commonly stored as either text, such every bit February 1 2006, or as a number, for example, the number of days since 1970. A number of complications arise due to a variety of factors:

language and cultural: one problem with storing dates every bit text is that the format tin can differ between different countries. For example, the second month of the year is chosen Feb in English language-speaking countries, but something else in other countries. A more subtle and unsafe problem arises when dates are written in formats similar this: 01/03/06. In some countries, that is the showtime of March 2006, just in other countries information technology is the third of Jan 2006.
time zones: Dates (a detail day) are commonly distinguished from datetimes, which specify not only a particular 24-hour interval, but also the hr, second, and even fractions of a 2nd within that twenty-four hour period. Datetimes are more complicated to work with because they depend on location; mid-day on the starting time of March 2006 happens at dissimilar times for different countries (in different fourth dimension zones). Daylight saving just makes things worse.
changing calendars: The current international standard for expressing the date is the Gregorian Calendar. Issues tin arise because events may be recorded using a different calendar (e.m., the Islamic agenda or the Chinese agenda) or events may accept occurred prior to the being of the Gregorian (pre sixteenth century).

The of import betoken is that nosotros need to remember most how nosotros shop dates, how much accuracy we should retain, and we must ensure that we store dates in an unambiguous way (for example, including a time zone or a locale). We will return to this consequence later when nosotros discuss the claim of different standard storage formats.

vii.four.6.2 Money

There are 2 major issues with storing monetary values. The start is that the currency should be recorded; NZ$1.00 is very dissimilar from U.s.a.$1.00. This upshot applies of course to whatsoever value with a unit, such as temperature, weight, distances, etc.

The second issue with storing monetary values is that values need to be recorded exactly. Typically, nosotros want to keep values to exactly 2 decimal places at all times. This is sometimes solved by using fixed-point representations of numbers rather than floating-point; the problems of lack of precision do not disappear, merely they get predictable so that they tin be dealt with in a rational fashion (e.m., rounding schemes).

7.4.seven Binary values

In the standard examples nosotros have seen so far (text and numbers), a single letter or number has been stored in one or more than bytes. These are expert general solutions; for instance, if we want to store a number, but we exercise not know how big or small the number is going to be, then the standard storage methods volition let us to store pretty much whatever number turns upwards. Another mode to put it is that if we use standard storage formats and so we do not have to think too hard.

Information technology is also true that computers are designed and optimised, right downwards to the hardware, for these standard formats, then information technology usually makes sense to stick with the mainstream solution. In other words, if we use standard storage formats then nosotros do non have to work too difficult.

All values stored electronically can be described as binary values considering everything is ultimately stored using 1 or more than $.25; the value can be written as a series of 0'due south and 1's. Withal, nosotros volition distinguish between the very standard storage formats that we take seen and so far, and less common formats which make use of a computer byte in more than unusual ways, or even use only fractional parts of a byte.

An case of a binary format is a common solution that is used for storing colour data. Colours are oftentimes specified as a triplet of red, dark-green, and blueish intensities. For example, the colour (bright) "cherry" is every bit much reddish equally possible, no green, and no bluish. We could represent the corporeality of each colour as a number, say, from 0 to 1, which would mean that a single colour value would require at least 3 words (12 bytes) of storage.

A much more than efficient mode to store a colour value is to use just a single byte for each intensity. This allows 256 (2⁸ ) different levels of red, 256 levels of blue, and 256 levels of greenish, for an overall total of more than than 16 1000000 different colour specifications. Given the limitations on the human being visual system'southward ability to distinguish between colours, this is more than than enough different colours.^7.10Rather than using 3 bytes per colour, often an entire word (4 bytes) is used, with the extra byte available to encode a level of translucency for the color. Then the colour "red" (every bit much cherry-red as possible, no greenish and no blueish) could be represented similar this:

00 ff 00 00

00000000 11111111 00000000 00000000

7.four.8 Retention for processing versus retentivity for storage

Paul Murrell

Creative Commons License
This document is licensed nether a Creative Eatables Attribution-Noncommercial-Share Akin 3.0 License.

shinandento.blogspot.com

Source: https://statmath.wu.ac.at/courses/data-analysis/itdtHTML/node55.html