Jump to content

Question--something I 've always idly wondered: Is a byte ALWAYS 8 bits?


G+_Sean Miller
 Share

Recommended Posts

Question--something I've always idly wondered: Is a byte ALWAYS 8 bits? On a 64-bit operating system, shouldn't a byte be 64 bits?

I have always thought of a byte as a single character. And more bits allows for more characters, like drawing with a 64-crayon box instead of 16-crayons. If it's in terms of the size of your alphabet, UTF-8 is a 32 bit encoding scheme. It takes 32 bits to represent a single character. So if it's based on the size of the alphabet, then a byte would be 32 bits for utf-8 and 7 bits for ASCII.

Is a byte = 8 bits an outdated idea?

Link to comment
Share on other sites

Paul Hutchinson I think you and Michael Hagberg are talking about different things. You seem to be talking about the WORD, DWORD programming definitions, Michael is talking about computer architecture.

 

In computer architecture, a word is the "largest natural size for arithmetic", which is the size of the registers. This is typically the size of the data bus, but doesn't have to be. When we refer to a 64 bit OS, we refer to the size of the data bus. The more data we can transfer at once, the more efficient the data transaction is.

 

On ARM Cortex-M, the word size is 32 bits. On MSP430 it is 16 bits. On PIC/Atmel, it is 8 bits.

 

I'm not 100% sure either way whether WORD, DWORD are actually part of the C standard, as they refer to int, long int, long long, etc. instead. I'll have to look in my copy of the "C Reference Manual" (Harbison and Steele) when I get to work.

 

Speaking of work, I need to stop typing and get out of here! Cheers!

Link to comment
Share on other sites

Paul Hutchinson If you want to be really confused, look back at the history of computing. This stuff evolved organically, and there were no standards. Computers were expensive, so every bit counted.

 

Modern computers use 8 bit bytes. It didn't used to be that way! You could have 4 bit bytes, 6 bit bytes, 7 bit bytes, or whatever else you could dream up.

 

I think the strangest device I've come across (that is still relatively modern) is an 4-bit microcontroller used to control a LCD display for a watch.

Link to comment
Share on other sites

Sean Miller Also, looking back at the comments, I think we all glossed over that bit at the end of your question about byte encoding.

 

UTF-8 is actually not fixed at 32 bits, it is variable length. You are allowed to use 1 - 4 bytes in each character. This is done for backwards compatibility with 7-bit ASCII, and to allow larger character sets, as needed.

 

However, whether the character is always treated as a 4 byte character (even when only 1 is needed to represent it), is of course, application dependent. I wouldn't be surprised if applications just use 4 byte characters for everything for simplicity in implementation.

Link to comment
Share on other sites

#huh

Nibble = half-byte

https://en.m.wikipedia.org/wiki/Nibble

 

In computing, a nibble (occasionally nybble or nyble to match the spelling of byte) is a four-bit aggregation, or half an octet. It is also known as half-byte or tetrade. In a networking or telecommunication context, the nibble is often called a semi-octet, quadbit, or quartet. A nibble has sixteen (2^4) possible values. A nibble can be represented by a single hexadecimal digit and called a hex digit.

Link to comment
Share on other sites

I don't mean to pick nits here, but to be absolutely clear we need to specify whether we are talking about data types from a specific language, or computer architecture.

 

If you are using a windows machine, and a C/C++ compiler, then the size definitions defined below (which define WORD, DWORD, etc) are applicable:

 

docs.microsoft.com - Windows Data Types | Microsoft Docs

 

This would align with the sizes specified by Paul Hutchinson.

 

This differs from the computer architecture term of "word", which, as I explained previously could be anything, but is probably 8 bits, 16 bits, 32 bits, or 64 bits. If you are on a desktop, it is probably 32 or 64. If you are on a microcontroller, it could be any of them.

 

The possible ambiguity of word, dword, etc, is the reason that I don't use these definitions when writing code. I use the standard types instead (stdint.h): uint8_t, uint16_t, etc.

 

It is common for me to write microcontroller code that sends data to a windows app. If I use the standard type definitions in stdint.h, I don't have to worry as much about portability. I can share serial packet processing code, and it will just work.

Link to comment
Share on other sites

 Share

×
×
  • Create New...