Bootstrap

为什么字节是 8 位?不是别的?一些有趣的讨论

注:机翻,未校对。
在这里插入图片描述

WHY IS A BYTE 8 BITS? OR IS IT?

Computer History Vignettes

- By Bob Bemer

I recently received an e-mail from one Zeno Luiz Iensen Nadal, a worker for Siemens in Brazil. He asked “My Algorythms teacher asked me and my colleagues ‘Why a byte has eight bits?’ Is there a technical answer for that?”
最近,我收到了西门子巴西分公司员工泽诺•路易斯•伊藤森•纳达尔(Zeno Luiz Iensen Nadal)的一封电子邮件。他问道:“我的算法老师问我和我的同事‘为什么一个字节有八个比特?’”这有一个技术上的答案吗?”

Of course I could not resist a reply to someone named Zeno, after that teacher of ancient times. Some people copied on the reply thought it a useful document, so (having done the hard work already) I add it to my site as further bite of history.*
当然,我无法抗拒对一个名叫芝诺的人的回复,以古代的那位老师的名字命名。一些在回复上复制的人认为这是一个有用的文档,所以(已经完成了艰苦的工作)我将其添加到我的网站上,作为历史的进一步补充。

I am way behind in my work, but I just cannot resist trying to answer your question on why a “byte” has eight bits.
我的工作进度落后了,但我忍不住要回答你关于为什么一个“字节”有8位的问题。

The answer is that some do, and some don’t. But that takes explaining, as follows:
答案是,有些人有,有些人没有。但这需要解释如下:

If computers worked entirely in binary (and some did a long time ago), and did nothing but calculations with binary numbers, there would be no bytes.
如果计算机完全以二进制方式工作(有些计算机很久以前就这样做了),并且只用二进制数进行计算,那么就不会有字节了。

But to use and manipulate character information we must have encodings for those symbols. And much of this was already known from punch card days.
但是要使用和操作字符信息,我们必须对这些符号进行编码。其中大部分从打孔卡时代就已经知道了。

The punch card of IBM (others existed) had 12 rows and 80 columns. Each column was assigned to a symbol, a term I use here although they have fancier names nowadays because computers have been used in so many new ways.
IBM 的打孔卡(存在其他)有 12 行和 80 列。每一列都被分配到一个符号,我在这里使用这个术语,尽管它们现在有更漂亮的名称,因为计算机已经以许多新的方式使用。

The columns, going down, starting from the top, were 12-11-0-1-2-3-4-5-6-7-8-9. A punch in the 0 to 9 rows signified the digits 0-9. A group of columns could be called a “field”, and a number in such a field could carry a plus sign for the number (an additional punch in top row 12 of the units position of the number), or a minus sign (an additional punch in row 11 just under that).
从上到下,列的顺序是 12-11-0-1-2-3-4-5-6-7-8-9。在 0 到 9 行中的一个打孔表示数字 0-9。一组列可以称为 “字段”,并且这样的字段中的数字可以带有加号(数字的个位数位置的顶部第 12 行中的额外打孔),或减号(刚好位于其下的第 11 行中的额外打孔)。

Then they started to need alphabets. This was accomplished by adding the 12 punch to the digits 1-9 to make letters A through I, the 11 punch to make letters J through R. For S through Z they added the 0 punch to the digits 2 through 9 (the 0-1 combination was skipped – 3x9=27, but the English alphabet has only 26 letters). The 12, 11, and 0 punches were called “zones”, and you’ll notice them today lurking in the high-order 4 bits. Remember that this was much prior to binary representations of those same characters.
接着他们开始需要字母。这是通过将 12 号打孔添加到数字 1-9 中,以生成字母 A 到 I,11 号打孔用于生成字母 J 到 R。对于 S 到 Z,他们添加了 0 号打孔到数字 2 到 9(0-1 组合被跳过 ——3x9=27,但英语字母表只有 26 个字母)。12、11 和 0 号打孔被称为 “区域”,你会发现它们今天仍然潜藏在高位的 4 位中。请记住,这发生在字符的二进制表示之前。

The first bonus was that the 12 and 11 punches without any 0-9 punch gave us the characters + and -. But no other punctuation was represented then, not even a period (dot, full stop) in IBM or telecommunication equipment. One can see this in early telegrams, where one said “I MISS YOU STOP COME HOME STOP”. “STOP” stood for the period the machine did not have.
第一个好处是,只有 12 和 11 号打孔而没有 0-9 号打孔给了我们字符 + 和 -。但当时没有表示其他标点符号,甚至在 IBM 或电信设备中也没有句号(点号,句号)。可以在早期的电报中看到这一点,人们会说 “我想念你 停止 回家 停止”。 “停止” 代表了机器上没有的句号。

Then punctuation and other marks had combinations of punches assigned, but there had to be 3 punches in a column to do this. In most case the third punch was an extra “8”.
然后标点符号和其他符号有了分配的打孔组合,但是必须有三个打孔在一列中才能实现这一点。在大多数情况下,第三个打孔是额外的 “8”。

In this way, with 10 digits, 26 alphabetic, and 11 others, IBM got to 47 characters. UNIVAC, with different punch cards (round holes, not rectangles, and 90 columns, not 80) got to about 54. But most of these were commercial characters. When FORTRAN came along, they needed, for example, a “divide” symbol, and an “=” symbol, and others not in the commercial set. So they had to use an alternate set of rules for scientific and mathematical work. A set of FORTRAN cards would cause havoc in payroll !
这样一来,IBM 通过 10 个数字、26 个字母和 11 个其他字符,可以表示 47 个字符。而 UNIVAC 则采用不同的打孔卡(圆孔而非矩形孔,列数为 90 而非 80),可以表示大约 54 个字符。但其中大多数是商业字符。当 FORTRAN 出现时,他们需要一个 “divide” 符号和一个 “=” 符号,以及其他商业系列中没有的符号。因此,他们不得不使用一套替代的规则来进行科学和数学工作。如果一组 FORTRAN 卡被用在薪资处理中,将会引起混乱!

With many early computers these punch cards were used as input and output, and inasmuch as the total number of characters representable did not exceed 64, why not use just 6 bits each to represent them? The same applied to 6-track punched tape for teletypes.
在许多早期的计算机中,这些穿孔卡被用作输入和输出,既然可表示的字符总数不超过 64 个,为什么不只使用每个 6 位来表示它们呢?这同样适用于电传打字机的 6 轨穿孔胶带。

In this period I came to work for IBM, and saw all the confusion caused by the 64-character limitation. Especially when we started to think about word processing, which would require both upper and lower case. Add 26 lower case letters to 47 existing, and one got 73 – 9 more than 6 bits could represent.
在这段时间里,我来到 IBM 工作,看到了 64 个字符的限制所引起的所有混乱。特别是当我们开始考虑文字处理时,这需要大写和小写。在现有的 47 个字母中加上 26 个小写字母,得到 73 个字母 —— 比 6 位多 9 个。

I even made a proposal (in view of STRETCH, the very first computer I know of with an 8-bit byte) that would extend the number of punch card character codes to 256 [1]. Some folks took it seriously. I thought of it as a spoof.
我甚至提出了一个建议(鉴于 STRETCH,这是我所知道的第一台具有 8 位字节的计算机),将穿孔卡字符代码的数量扩展到 256 个 [1]。有些人认真对待它。我以为这是一种恶搞。

So some folks started thinking about 7-bit characters, but this was ridiculous. With IBM’s STRETCH computer as background, handling 64-character words divisible into groups of 8 (I designed the character set for it, under the guidance of Dr. Werner Buchholz, the man who DID coin the term “byte” for an 8-bit grouping). [2] It seemed reasonable to make a universal 8-bit character set, handling up to 256. In those days my mantra was “powers of 2 are magic”. And so the group I headed developed and justified such a proposal [3].
所以有些人开始考虑 7 位字符,但这太荒谬了。以 IBM 的 STRETCH 计算机为背景,处理可分成 8 组的 64 个字符的单词(我在 Werner Buchholz 博士的指导下为它设计了字符集,他确实为 8 位分组创造了术语 “字节”)。[2] 制作一个通用的 8 位字符集似乎是合理的,最多可以处理 256 个字符。在那些日子里,我的口头禅是 “2 的幂是魔法”。因此,我领导的小组制定并证明了这样的建议 [3]。

That was a little too much progress when presented to the standards group that was to formalize ASCII, so they stopped short for the moment with a 7-bit set, or else an 8-bit set with the upper half left for future work.
当提交给将要正式化 ASCII 的标准小组时,这有点太大了,所以他们暂时停止了 7 位集,或者一个 8 位集,上半部分留给未来的工作。

The IBM 360 used 8-bit characters, although not ASCII directly. Thus Buchholz’s “byte” caught on everywhere. I myself did not like the name for many reasons. The design had 8 bits moving around in parallel. But then came a new IBM part, with 9 bits for self-checking, both inside the CPU and in the tape drives. I exposed this 9-bit byte to the press in 1973. But long before that, when I headed software operations for Cie. Bull in France in 1965-66, I insisted that “byte” be deprecated in favor of “octet”.
IBM 360 使用 8 位字符,但不是直接使用 ASCII。因此,布赫霍尔茨的 “字节” 无处不在。出于多种原因,我自己不喜欢这个名字。该设计有 8 位并行移动。但随后出现了一个新的 IBM 部件,具有 9 位用于自检,无论是在 CPU 内部还是在磁带驱动器中。我在 1973 年向媒体公开了这个 9 位字节。但早在那之前,当我在 1965-66 年领导法国 Cie.Bull 的软件运营时,我坚持要弃用 “字节”,转而使用 “八位字节”。

You can notice that my preference then is now the preferred term. It is justified by new communications methods that can carry 16, 32, 64, and even 128 bits in parallel. But some foolish people now refer to a “16-bit byte” because of this parallel transfer, which is visible in the UNICODE set. I’m not sure, but maybe this should be called a “hextet”.
您可以注意到,我的偏好现在是首选术语。新的通信方法可以并行携带 16、32、64 甚至 128 位。但是一些愚蠢的人现在提到 “16 位字节”,因为这种并行传输在 UNICODE 集中是可见的。我不确定,但也许这应该被称为 “六重奏”。

But you will notice that I am still correct. Powers of 2 are still magic
但你会注意到我仍然是对的。2 的幂仍然是魔法!

REFERENCES 引用

  1. R.W.Bemer, “A proposal for a generalized card code of 256 characters”,
    “256 个字符的通用卡代码提案”
    Commun. ACM 2, No. 9, 19-23, 1959 Sep
    – Computing Reviews 00025
    Early public hint of 8-bit bytes to come.
    8 位字节的早期公开暗示即将到来。

  2. R.W.Bemer, W.Buchholz, “An extended character set standard”,
    “扩展字符集标准”
    IBM Tech. Pub. TR00.18000.705, 1960 Jan, rev. TR00.721, 1960 Jun
    – Computing Reviews 00813

  3. R.W.Bemer, H.J.Smith, Jr., F.A.Williams,
    “Design of an improved transmission/data processing code”,
    “设计改进的传输 / 数据处理代码”
    Commun. ACM 4, No. 5, 212-217, 225, 1961 May
    – Computer Abstracts 61-1920
    ASCII in its original form.
    ASCII 的原始形式。


篇外 一些有趣的讨论

Etymology of “byte” “byte” 的词源

Ask Question Asked 10 years, 11 months ago Modified 5 years, 10 months ago Viewed 9k times 11
I’m interested in the origin of the word byte. Although it is a ubiquitous word in computer science, yet it seems no one can point out its origin. (I’ve been searching the web for a long time, but without coming up with an authoritative answer. Wikipedia says it’s coined from bite, but to avoid mutation to bit, it’s respelled to byte. But why does Weiner Buchholz choose bite, not other words?
我对字节这个词的起源很感兴趣。虽然它在计算机科学中是一个无处不在的词,但似乎没有人能指出它的起源。(我在网上搜索了很长时间,但没有想出一个权威的答案。 维基百科说它是从 bite 创造出来的,但为了避免突变为 bit,它被重新拼写为 byte。 但是,为什么 Weiner Buchholz 选择咬人,而不是其他词呢?

edited Aug 1, 2013 at 13:23 tchrist♦ ** asked Aug 1, 2013 at 13:20

Andrew’s answer below is correct, but it should be mentioned that “bit” is the past-tense of “bite” and also means “a small thing.” “Bit” to mean a Binary Digit was used by Shannon for information theory and was well-established by the time (1962) Bucholz wrote/edited the document which wikipedia footnotes. – horatio Commented Aug 1, 2013 at 14:38
安德鲁在下面的回答是正确的,但应该提到的是,“bit” 是 “bite” 的过去式,也有 “一件小事” 的意思。“位” 表示二进制数字被香农用于信息论,并在 Bucholz 撰写 / 编辑维基百科脚注的文档时(1962 年)已经确立。

Just to clarify how “binary digit” corresponds to “bit”, it is one of those funny kind of acronyms which use interior and terminal as well as initial letters. BInary digiT. – Cyberherbalist Commented Aug 1, 2013 at 15:58
只是为了澄清 “二进制数字” 如何对应于 “位”,它是使用内部和终端以及首字母的有趣首字母缩略词之一。

@horatio Andrew’s answer plus your comment will be perfect. – Robert Fan Commented Aug 1, 2013 at 22:17
@horatio 安德鲁的回答加上您的评论将是完美的。

@Cyberherbalist, is “bit” not a portmanteau rather than an acronym? – Frank H. Commented Aug 1, 2013 at 22:36
@Cyberherbalist,“bit” 不是合成词而不是首字母缩略词吗?

@FrankH. Yes! That’s the word, “portmanteau”! A “funny kind of acronym”. Commented Aug 1, 2013 at 23:30
@FrankH。是的!这就是 “portmanteau” 这个词!一个 “有趣的首字母缩略词”。

The term byte implies a chunk of something — whenever I hear the word, I picture someone taking bite out of a sandwich. That chunk of sandwich is equivalent to the unit of digital information represented by a byte. To extend this metaphor, half a byte is called a nibble or nybble. I would imagine that nibbling a sandwich would result in a smaller amount of food than biting a sandwich.
edited Dec 12, 2013 at 22:02
anotherdave answered Aug 1, 2013 at 13:29 Andrew Ng
“字节” 一词意味着一大块东西 —— 每当我听到这个词时,我都会想象有人从三明治中咬一口。这块三明治相当于一个字节表示的数字信息单位。为了扩展这个比喻,半字节被称为半字节或半字节。我想,啃一个三明治比咬一个三明治会产生更少的食物量。

  • And half a nibble is… a crumb! – Amory Commented Aug 1, 2013 at 13:37
    半口是… 面包屑!

  • And two bytes is a chomp! – Andrew Ng Commented Aug 1, 2013 at 13:41

  • 两个字节简直太棒了!

  • Or should it be crymb and chymp? – TrevorD CommentedAug 1, 2013 at 14:09
    还是应该是 crymb 和 chymp?

  • I’m sorry if I was unclear - I meant that the sandwich is simply a bunch of digital information. A byte of the sandwich is a chunk of that information. A nibble would be a smaller chunk. Commented Aug 1, 2013 at 14:52
    如果我不清楚,我很抱歉 - 我的意思是三明治只是一堆数字信息。三明治的一个字节是该信息的块。一口将是较小的块。

  • @Andrew Ng. I know this is english.stackexchange, but two bytes is a short. – awiebe Commented Sep 3, 2018 at 11:13 12
    @Andrew NG。我知道这是 english.stackexchange ,但两个字节很短。

    We’ll never know unless we hear from the man himself, but the following might be of interest: Origins of the term “BYTE” It was written by Bob Bemer who worked with Werner Buchholz at IBM. I think the explanation is simply that Werner Buchholz came up with bite as a tongue-in-cheek collective noun for a group of bits, then changed the spelling to byte to avoid confusion. edited Sep 3, 2018 at 12:03 awiebe answered Aug 1, 2013 at 15:11
    除非我们听到这个人本人的消息,否则我们永远不会知道,但以下内容可能会引起人们的兴趣: 术语 “BYTE” 的起源 它是由 Bob Bemer 编写的,他在 IBM 与 Werner Buchholz 一起工作。 我认为解释很简单,Werner Buchholz 想出了 bite 作为一组位的诙谐集体名词,然后将拼写更改为 byte 以避免混淆。

字节为什么是 8 位的历史的相关讨论

What is the history of why bytes are eight bits?

Ask Question Asked 12 years, 8 months ago Modified 1 year, 10 months ago Viewed 81k times 102
问题 12 年, 8 个月前 问 修改于 1 year, 10 months ago 浏览了 81k 次 102

What were the historical forces at work, the tradeoffs to make, in deciding to use groups of eight bits as the fundamental unit? There were machines, once upon a time, using other word sizes. But today, for non-eight-bitness, you must look to museum pieces, specialized chips for embedded applications, and DSPs. How did the byte evolve out of the chaos and creativity of the early days of computer design? I can imagine that fewer bits would be ineffective for handling enough data to make computing feasible, while too many would have lead to expensive hardware. Were other influences in play? Why did these forces balance out to eight bits? (BTW, if I could time travel, I’d go back to when the byte was declared to be 8 bits, and convince everyone to make it 12 bits, bribing them with some early 21st century trinkets.)
在决定使用八位组作为基本单位时,是什么历史力量在起作用,需要做出哪些权衡? 曾几何时,有些机器使用其他字大小。但今天,对于非八位,您必须关注博物馆的展品、用于嵌入式应用的专用芯片和 DSP。字节是如何从计算机设计早期的混乱和创造力中演变而来的? 我可以想象,更少的比特对于处理足够的数据以使计算变得可行是无效的,而太多的比特会导致昂贵的硬件。是否有其他影响在起作用?为什么这些力会平衡到八位? (顺便说一句,如果我能穿越时空,我会回到字节被宣布为 8 位的时候,并说服每个人将其设为 12 位,用一些 21 世纪初的小饰品贿赂他们。
edited Sep 7, 2022 at 5:19 asked Nov 16, 2011 at 19:48

So why would you prefer 12 bits to 8? – Frustrated With Forms Designer CommentedNov 16, 2011 at 19:59
那么,为什么您更喜欢 12 位而不是 8 位呢?

Is the last sentence in jest? A 12-bit byte would be inconvenient because it’s not a power of 2. – Rob Commented Nov 16, 2011 at 20:00
最后一句话是在开玩笑吗?12 位字节会很不方便,因为它不是 2 的幂。

Memory and registers weren’t so cheap back then, so 8 bits was a good compromise, compared to 6 or 9 (fractions of a 36-bit word). Also, address calculations are a heck of a lot simpler with powers of 2, and that counts when you’re making logic out of raw transistors in little cans. – Mike Dunlavey Commented Nov 16, 2011 at 20:05
内存和寄存器在当时并不便宜,所以 8 位是一个很好的折衷方案,而 6 位或 9 位(36 位字的一小部分)。此外,地址计算要简单得多,幂为 2,当您用小罐子中的原始晶体管制作逻辑时,这很重要。

Using word sizes that were powers of 2 were not so important in the “early days”. The DEC-10 had a 36 bit word, and the CDC 6000 series had 60 bit words, and index registers with 18 bits. – Jay Elston CommentedNov 17, 2011 at 18:53 在 “早期”,使用 2 的幂字大小并不那么重要。DEC-10 有一个 36 位字,CDC 6000 系列有一个 60 位字,索引寄存器有 18 位。

Answers

82 A lot of really early work was done with 5-bit baudot codes, but those quickly became quite limiting (only 32 possible characters, so basically only upper-case letters, and a few punctuation marks, but not enough “space” for digits).
许多早期的工作都是使用 5 位 baudot 代码完成的,但这些代码很快就变得非常有限(只有 32 个可能的字符,所以基本上只有大写字母和一些标点符号,但没有足够的数字 “空间”)。

From there, quite a few machines went to 6-bit characters. This was still pretty inadequate though – if you wanted upper- and lower-case (English) letters and digits, that left only two more characters for punctuation, so most still had only one case of letters in a character set.
从那时起,相当多的机器转向了 6 位字符。不过,这仍然不够 —— 如果你想要大写和小写(英文)字母和数字,那就只剩下两个字符用于标点符号,所以大多数字符集中仍然只有一个字母大小写。

ASCII defined a 7-bit character set. That was “good enough” for a lot of uses for a long time, and has formed the basis of most newer character sets as well (ISO 646, ISO 8859, Unicode, ISO 10646, etc.)
ASCII 定义了 7 位字符集。这在很长一段时间内对于许多用途来说已经 “足够好” 了,并且也构成了大多数较新的字符集(ISO 646、ISO 8859、Unicode、ISO 10646 等)的基础。

Binary computers motivate designers to making sizes powers of two. Since the “standard” character set required 7 bits anyway, it wasn’t much of a stretch to add one more bit to get a power of 2 (and by then, storage was becoming enough cheaper that “wasting” a bit for most characters was more acceptable as well).
二进制计算机激励设计人员将尺寸设置为 2 的幂。由于 “标准” 字符集无论如何都需要 7 位,因此再添加一位以获得 2 的幂并不是什么难事(到那时,存储变得足够便宜,以至于对于大多数字符来说 “浪费” 一点也更容易接受)。

Since then, character sets have moved to 16 and 32 bits, but most mainstream computers are largely based on the original IBM PC.
从那时起,字符集已经转移到 16 位和 32 位,但大多数主流计算机主要基于原始的 IBM PC。

Then again, enough of the market is sufficiently satisfied with 8-bit characters that even if the PC hadn’t come to its current level of dominance, I’m not sure everybody would do everything with larger characters anyway. I should also add that the market has changed quite a bit. In the current market, the character size is defined less by the hardware than the software.
话又说回来,市场上有足够多的人对 8 位字符感到满意,即使 PC 没有达到目前的主导地位,我不确定每个人都会用更大的字符做任何事情。 我还应该补充一点,市场已经发生了很大变化。在当前市场中,字符大小由硬件定义的比软件定义的要少。

Windows, Java, etc., moved to 16-bit characters long ago. Now, the hindrance in supporting 16- or 32-bit characters is only minimally from the difficulties inherent in 16- or 32-bit characters themselves, and largely from the difficulty of supporting i18n in general. In ASCII (for example) detecting whether a letter is upper or lower case, or converting between the two, is incredibly trivial. In full Unicode/ISO 10646, it’s basically indescribably complex (to the point that the standards don’t even try – they give tables, not descriptions).
Windows、Java 等很久以前就转向了 16 位字符。 现在,支持 16 位或 32 位字符的障碍只是 16 位或 32 位字符本身固有的困难,而主要是由于一般支持 i18n 的困难。例如,在 ASCII 中,检测字母是大写还是小写,或者在两者之间进行转换,都非常简单。在完整的 Unicode/ISO 10646 中,它基本上是难以形容的复杂(以至于标准甚至没有尝试 - 他们给出的是表格,而不是描述)。

Then you add in the fact that for some languages/character sets, even the basic idea of upper/lower case doesn’t apply. Then you add in the fact that even displaying characters in some of those is much more complex still. That’s all sufficiently complex that the vast majority of software doesn’t even try. The situation is slowly improving, but slowly is the operative word.
edited Aug 20, 2019 at 0:22 user53019 answered Nov 16, 2011 at 20:13
然后你补充一个事实,即对于某些语言 / 字符集,甚至大写 / 小写的基本思想也不适用。然后你补充说,即使在其中一些中显示字符也要复杂得多。 这一切都足够复杂,绝大多数软件甚至没有尝试过。情况正在慢慢改善,但缓慢是有效的词。

Jerry Coffin: I thought I read somewehere 8 came from the 7bit ASCII plus a validation bit that was needed because the nearly transmission protocols were not as loss-less as the designers wanted 😃. – Loki Astari CommentedNov 16, 2011 at 22:42
Jerry Coffin: 我记得我在哪里看到过,8 位来源于 7 位 ASCII 加上一个校验位,因为当时的传输协议并不像设计者想要的那样完全无损。

@LokiAstari, Yes, it’s called a parity bit, and can be used for crude forms of error detection or recovery. Wikipedia: Parity bit – user Commented Nov 17, 2011 at 12:53
@LokiAstari,是的,它被称为奇偶校验位,可用于粗略形式的错误检测或恢复。

Not sure what the IBM PC has to do with this. “8 bit per byte” was already standard in the CP/M era (<1980), which started on the 8080 CPU (a predecessor of the 8086/8 of the IBM PC era) – MSalters CommentedNov 17, 2011 at 15:11
不确定 IBM PC 与此有什么关系。“每字节 8 位” 在 CP/M 时代(<1980 年)已经是标准,它始于 8080 CPU(IBM PC 时代 8086/8 的前身)

@MSalters: Primarily that it has (arguably) “stunted” the evolution of hardware. No, 8-bits/byte wasn’t new with the PC, but until then, most architectures were replaced every few years. The PC has largely stopped that, and taken an architecture that wasn’t even particularly progressive when it was new, and preserved it for decades. – Jerry Coffin Commented Nov 17, 2011 at 15:16
@MSalters:主要是它(可以说)“阻碍” 了硬件的发展。不,8 位 / 字节对 PC 来说并不新鲜,但在此之前,大多数架构每隔几年就会更换一次。PC 在很大程度上已经停止了这一点,并采用了一种在新的时候甚至不是特别进步的架构,并保留了几十年。

@- DeadMG : Current character sets aren’t 16 or 32 bits, nor do Java and Windows use such. The current character set is Unicode, which is needs 21 bits to map directly. Current software uses encodings based on 8 (UTF-8), 16 (UTF-16) or 32 (UTF-32) bit code units, combining multiple code units to form a single code point where necessary, but those bits sizes are a consequence of the hardware, not of the character set. – Sebastian Redl Commented Aug 14, 2016 at 22:45
当前的字符集不是 16 位或 32 位,Java 和 Windows 也不使用这样的字符集。当前的字符集是 Unicode,需要 21 位才能直接映射。当前软件使用基于 8 位 (UTF-8)、16 位 (UTF-16) 或 32 位代码单元的编码,在必要时将多个代码单元组合成单个代码点,但这些位大小是硬件的结果,而不是字符集的结果。

Seven bits for ASCII information, and one for error-detecting parity.
edited Nov 17, 2011 at 19:31 Jay Elston answered Nov 16, 2011 at 20:03
7 位用于 ASCII 信息,1 位用于错误检测奇偶校验。

7bits for ASCII and one extra bit that has been used for all sorts of things – Martin Beckett Commented Nov 17, 2011 at 6:58
7 位用于 ASCII 和一个额外的位,用于各种事情

Parity was very important when dealing with early memory. Even after moving to 8 bit data bytes, there were memory chips with 9 bits to allow for parity checking. – Jim C CommentedNov 17, 2011 at 13:13
在处理早期记忆时,奇偶校验非常重要。即使在移动到 8 位数据字节之后,也有 9 位的存储芯片允许奇偶校验。

This is an interesting assertion. Is there any historical data to support the idea? – david CommentedJan 17, 2013 at 8:28
这是一个有趣的断言。是否有任何历史数据支持这个想法?

Take a look at Wikipedia page on 8-bit architecture. Although character sets could have been 5-, 6-, then 7-bit, underlying CPU/memory bus architecture always used powers of 2.
看看关于 8 位架构的维基百科页面。尽管字符集可以是 5 位、6 位和 7 位,但底层 CPU / 内存总线架构始终使用 2 的幂。

Very first Microprocessor (around 1970s) had 4-bit bus, which means one instruction could move 4-bits of data between external memory and the CPU. Then with release of 8080 processor, 8-bit architecture became popular and that’s what gave the beginnings of x86 assembly instruction set which is used even to these days.

最早的微处理器(1970 年代左右)具有 4 位总线,这意味着一条指令可以在外部存储器和 CPU 之间移动 4 位数据。 然后随着 8080 处理器的发布,8 位架构开始流行,这就是 x86 汇编指令集的开端,该指令集甚至一直沿用到现在。

If I had to guess, byte came from these early processors where mainstream public began accepting and playing with PCs and 8-bits was considered the standard size of a single unit of data. Since then bus size has been doubling but it always remained a power of 2 (i.e. 16-, 32- and now 64-bits) Actually, I’m sure the internals of today’s bus are much more complicated than simply 64 parallel wires, but current mainstream CPU architecture is 64-bits.
如果我不得不猜测,字节来自这些早期的处理器,主流公众开始接受和玩 PC,而 8 位被认为是单个数据单元的标准大小。 从那时起,总线大小翻了一番,但它始终保持 2 的幂(即 16 位、32 位和现在的 64 位)实际上,我敢肯定今天总线的内部结构比简单的 64 根并行线要复杂得多,但目前主流的 CPU 架构是 64 位。

I would assume that by always doubling (instead of growing 50%) it was easier to make new hardware that coexists with existing applications and other legacy components. So for example when they went from 8-bits to 16, each instruction could now move 2 bytes instead of 1, so you save yourself one clock cycle but then end result is the same.
我认为,通过始终翻倍(而不是增长 50%),更容易制造与现有应用程序和其他遗留组件共存的新硬件。例如,当它们从 8 位变为 16 位时,每条指令现在可以移动 2 个字节而不是 1 个字节,因此您可以节省一个时钟周期,但最终结果是一样的。

However, if you went from 8 to 12-bit architecture, you’d end breaking up original data into halfs and managing that could become annoying. These are just guesses, I’m not really a hardware expert.
edited Aug 11, 2016 at 21:06 DXM answered Nov 17, 2011 at 2:37

但是,如果您从 8 位架构转向 12 位架构,您最终会将原始数据分成两半,并且管理它可能会变得很烦人。这些只是猜测,我不是真正的硬件专家。

“Very first CPU (around 1970s) …”. You need to do some reading on the history of computing!! The very first CPU for a von Neumann architecture computer was built during World War II … or before (depending on whose version of history you believe.) – Stephen C Commented Nov 17, 2011 at 3:20
“第一个 CPU(大约 1970 年代)…”。你需要做一些关于计算历史的阅读!!冯・诺依曼架构计算机的第一个 CPU 是在二战期间建造的… 或之前(取决于你相信谁的历史版本。

A solution to the “pre-electron” computer is to say modern computer or I suppose the electron computer. Even today you could build a mechanical computer. It wasn’t until we started to use electron fields to our advantage did we build a micro-processor. – Ramhound Commented Nov 17, 2011 at 13:37
“对于‘前电子’计算机的解决方案是说现代计算机或者我想说是电子计算机。即使在今天,你也可以制造一个机械计算机。直到我们开始利用电子场来发展,我们才构建了微处理器。”

The 8-bit byte and 16-bit word size used by the PDP series may have also played a factor in the popularity of 8-bit bytes. Commented Nov 17, 2011 at 18:56
PDP 系列使用的 8 位字节和 16 位字大小也可能是 8 位字节普及的一个因素。

A byte has been variously (at least) 1, 4, 6, 7, 8, 9, 12, 18, 20 and possibly 36 bits, depending on what computer you are looking at. I am taking “byte” here to mean “smallest addressable unit of memory”, rather than using any sort of text-centric interpretation. (For example, the Saturn CPU, a 64-bit CPU used in the popular HP48SX/GX calculator line addresses memory in nibbles – 4-bits.)
字节曾经有多种长度(至少),包括1、4、6、7、8、9、12、18、20,甚至36位,具体取决于所使用的计算机。我这里所说的“字节”是指“内存中最小的可寻址单元”,而不是基于文本的解释。(例如,Saturn CPU,即HP48SX/GX计算器系列中使用的64位CPU,将内存地址为4位的nibble。)

The 20-bit bytes were extremely common in the “IAS machines”, in the 50s. 6, 12, 18 (and maybe 36) were quite popular in a variety of architectures in the 60s, 70s and to some degree 80s. In the end, having a nice correspondence between “powers of 2” and “bits in an addressable unit” seem to have won out.
20位字节在50年代的“IAS机器”中非常常见。6、12、18(可能还有36)位字节在60年代、70年代和80年代的一些架构中非常流行。最终,二进制幂和可寻址单元之间的对应关系似乎占了胜出。
edited Aug 14, 2016 at 3:21 Michaelangel007 answered Sep 23, 2013 at 14:55

And never 10 bits? All I could find with Google is some recent video processors are 10 bits. – wobmene Commented Jan 2, 2014 at 0:07
而且从来没有 10 位?我在谷歌上能找到的只是一些最近的视频处理器是 10 位的。

@khrf It’s possible, I just can’t recall any architecture that had it (I mostly considered general-purpose computers). – Vatine Commented Jan 2, 2014 at 12:08
@khrf 有可能,我只是想不起任何具有它的架构(我主要考虑的是通用计算机)。

Yes, I consider general-purpose computers too. It’s strange because I imagine how nice it would be with 10-bits-byte to know that you can address 1 kilobyte with 1 byte, 1 megabyte with 2 bytes, etc. Of course, it’s just a caprice on comfort :
CommentedJan 6, 2014 at 15:03
是的,我也考虑通用计算机。这很奇怪,因为我想象如果知道您可以用 1 个字节寻址 1 KB、用 1 个字节寻址 1 兆字节和 2 个字节等,那该有多好。当然,这只是对舒适的任性。

First a bit of clarification: Octets (8-bit units) are not really a fundamental unit in modern computer architectures. At least not any more fundamental than other powers of two - 2, 4, 16, 32, 64, 128 etc. Octets were the fundamental unit for 8-bit processors (hence the name!), but modern architectures typically work with larger bit-sets internally. E.g. the x86_64 has 64 bit integer registers and 80 bit floating point registers. RAM is read and written in 64-bit chunks, and the processor just uses a bit of magic to make it look like you can address individual 8-bit bytes.
首先澄清一下:八位字节(8 位单位)并不是现代计算机体系结构中真正的基本单位。至少不比两个的其他幂更基本 ——2、4、16、32、64、128 等。八位字节是 8 位处理器的基本单元(因此得名!),但现代架构通常在内部使用更大的位集。例如,x86_64 具有 64 位整数寄存器和 80 位浮点寄存器。RAM 以 64 位块的形式读取和写入,处理器只是使用一点魔法使其看起来像可以寻址单个 8 位字节。

For older architectures, “byte” indicated the size of the data bus, and as the original question states, a lot of different bus sizes existed (4, 5, 6, 8, 12 etc.). But since 1993 a byte has been defined as 8 bits, in order to have a standardized SI unit for data sizes.
对于较旧的架构,“字节” 表示数据总线的大小,正如最初的问题所述,存在许多不同的总线大小(4、5、6、8、12 等)。但自 1993 年以来,字节被定义为 8 位,以便为数据大小提供标准化的 SI 单位。

Hence the meaning of “byte” has changed from being an architecture-dependent unit to an architecture-independent standardized unit. So these days, bytes are the standard unit for addressing and quantifying data, but not really fundamental otherwise. Octets unit became the de-facto standard for storage primarily due to concerns about storing text. For storing text you ideally want one byte to store one character. Two factors were important:
因此,“字节”的含义已经从架构相关的单位变成了架构无关的标准化单位。如今,字节是寻址和量化数据的标准单位,但在其他方面并不是真正的基本单位。八位组(octet)单元由于存储文本的考虑而成为事实上的标准。理想情况下,存储文本需要一个字节来存储一个字符。两个因素至关重要。有两个重要因素:

Having units which are powers of two (2, 4, 8, 16, 32 etc.) is more convenient when designing digital systems.
在设计数字系统时,拥有两个幂(2、4、8、16、32 等)的单位更方便。

8-bit is enough to store a single character in the ASCII character set (with room to spare for extending the character set to support say Cyrillic).
8 位足以在 ASCII 字符集中存储单个字符(留出空间来扩展字符集以支持西里尔文)。

Of course 8-bits are not enough to support all scripts - something like Japanese requires at least 16 bits (and for what it is worth, Unicode is 21 bits), but at that point in time bits were expensive and most digital text were in the ASCII range anyway.
当然,8 位并不足以支持所有的文字系统 - 像日文这样的语言至少需要 16 位(值得一提的是,Unicode 是 21 位),但在当时,位数昂贵,并且大多数数字文本都在 ASCII 范围内。

These days, text is typically stored in variable-width encodings like UTF-8, and with things like Unicode combining characters, the “one byte equals one character” have long been a thing of the past. Today byte is really just the standard for historical reasons. edited Jul 30, 2018 at 15:24 JacquesB answered Jul 30, 2018 at 15:17
当今,文本通常以像 UTF-8 这样的可变宽度编码存储,并且随着 Unicode 组合字符的出现,“一个字节等于一个字符” 的观念早已成为历史。如今,字节仅仅因为历史原因而成为标准。

“the processor just uses a bit of magic to make it look like you can address individual 8-bit bytes.”
处理器只是使用一点魔法,使其看起来好像你可以寻址单个 8 位字节。

it’s not that simple, you can’t just have the processor turn an 8 bit operation into a 64-bit read-modify-write operation because there might be more than one processor in the system and because memory mapped IO devices may respond differently to different sizes of write.
“事情并不那么简单,处理器不能仅仅将一个 8 位操作转换为 64 位的读 - 修改 - 写操作,因为系统中可能有多个处理器,并且内存映射的 IO 设备对不同大小的写操作可能有不同的响应。”

– Peter Green Commented Sep 8, 2022 at 20:50

;