|
|
|
When I first began to listen to and work with MP3 files, I used some shell scripts and a Linux program, Jason Carter's id3edit, to read and write ID3v1.1 tags. In the mid-2000s, I wrote my id3_util package and my tag311 program, which incorporated the functions of my shell scripts and also ran on Windows.
The small fixed sizes of fields in the ID3v1.1 tags was limiting, especially for my burgeoning collection of Beatles covers. To automatically generate the HTML cover listings, I had to essentially embed a music database in a couple of awk(1) scripts. ID3v2 was the obvious next step. I studied the ID3v2.3 and ID3v2.4 specifications, took notes, and, in anticipation, implemented my utf_util package in 2009 to convert between the UTF-16 and UTF-8 wide-character formats used in ID3v2.3 and ID3v2.4. Almost 15 years later, in 2023, I finally added ID3v2.3/v2.4 support to my software. Yay! To do so, I became intimately familiar with the specifications:
ID3v2.4 Main Structure and Native Frames Specifications
While researching an arcane now-forgotten detail of ID3 tags, I happened upon Jeff Atwood's 2006 Coding Horror blog post, "A Spec-tacular Failure". Back in the day, I had Coding Horror bookmarked and I enjoyed his posts. This post, however, given my now intimate familiarity with the ID3v2.3 and ID3v2.4 specs, struck me as a mean-spirited and ill-informed attack on ID3 tags and the specs' authors. The blog post comments were largely in a similar vein, although there were some pushback from others thankfully.
Keeping in mind that, in 2006, Jeff (Wikipedia) and his commenters were writing much closer to the publication dates of the ID3v2.3 (1999) and ID3v2.4 (2001) specs than I am now, I have to disagree with their criticisms. But first a bit of history ...
Important Note: Although the ID3 standards were hosted on a now-defunct id3.org domain, there is/was no formal "ID3 Organization". The standards were designed and documented by a community of MP3 enthusiasts.
Eric Kemp AKA NamkraD is widely credited with inventing the ID3 tag (now called ID3v1.0) in the mid-1990s. The 128-byte tag contains basic information about a track and is stored at the end of an MP3 file. It is difficult to find any details about Eric Kemp other than that he created ID3v1, but I did find this insightful snippet by Kemp himself on a Japanese website. (I corrected several typos since I wasn't sure if they were in Kemp's original.)
MP3 ID3 Tag Protocol Definition
Note: I do not 'own' nor want to own this tag format, it's something I created for myself that other people attached to. In this regard, this information is for free of use for other playlist makers who want real info with their songs and a standard specification. In other words you do not have to ask me permission to use this information. I believe in freedom of information, in this way it promotes growth of other quality playlist makers and continues to 'up' the ante for what a playlist maker should have with every release from other great playlist makers. (Just remember ID3 was the first! =), even though it did suck!)
MP3 ID3 Tag 1.0 -- 128 bytes
(c) 1996, 1997 NamkraD (Eric Kemp)
Seek end of file, then -128, if first 3 bytes='TAG' then it has info layed out as follows...0..2 == 'TAG' (3 Bytes)
3..32 == Song Name (30 bytes)
33..62 == Artist (30 Bytes)
63..92 == Album Name (30 Bytes)
93..96 == Year (4 Bytes)
97..126 == Comment (30 Bytes)
127 == 1 Byte Song Genre Identifier
(get the listing of song Genre Types to know what byte = what)New Genres will be added when Studio 3 v1.1 is released!
Please send mail, comments and suggestions to NamkraD/Eric Kemp. This site is best viewed using Netscape Navigator 3.x+
—NamkraD/Eric Kemp, "MP3 ID3 Tag Protocol Definition", part of Takeshi Yoneki's "Concerning the confusion in the handling of Japanese characters in iTunes or SoundJAM ID3 tags" (Japanese, English translation).
An earlier version of Kemp's program can be downloaded from the Internet Archive: Studio 3 v1.0, Win 95/NT (c) 1996. (The ZIP file consists of the setup files for installing the application on Windows 95 or Windows NT, which are unfortunately useless on more recent versions of Windows! It is possible to run Windows 95 in a virtual machine; I'll have to try it out.)
ID3v1.0 tags lacked track numbers and Michael Mutschler (who would also play a role in the design of ID3v2) thought of shortening the comment field by 2 characters to make room for a 1-byte (1-254) track number. This version, ID3v1.1, is the most widely used ID3v1 tag.
BirdCage Software proposed and implemented an ID3v1.2 tag that added a 128-byte extension before the 128-byte ID3v1.1 tag. The extension part is almost identical in format with the ID3v1.1 tag and provides additional space for the overflow from truncated fields in the ID3v1.1 tag. For example, if a song title is 45 characters long, you would store the first 30 characters in the ID3v1.1 tag and the remaining 15 characters in the ID3v1.2 extension. This was a good idea as a stopgap measure and it maintained compatibility with ID3v1.1, but the publication date (based on the copyright dates of 2002-2003) postdated even the 2001 date of the ID3v2.4 spec. As a result, ID3v1.2 probably only saw use in BirdCage's own audio software.
The aforementioned Michael Mutschler was also the author of the MP3-Info Windows shell extension, an early program that displayed information about MP3 files and allowed the editing of ID3 tags. Martin Nilsson, an MP3 enthusiast in Sweden and a user of MP3-Info, began communicating with Mutschler via email about ways to improve ID3 tags. The result was ID3v2 tags. Nilsson tells the story:
When I moved to Linköping and connected to the vast Internet in 1997, MP3s was just in the rise. Sure, I've seen (heard) MP3s before, but being connected with 10MBit/s made it possible to actually send and receive them. ... One of the more unusual programs I found was MP3Ext, an MS Windows extension that allowed me to know more about an MP3 file by just right click it. It used some home-made technology called ID3 to store some extra information in the file.
The program, written by a German named Michael Mutschler, was also translated into several other languages, and on his web site Michael asked for contributions. I thought that contributing with a Swedish translation was the least I could do, so I created a new locale file and sent it to him. I don't exactly remember why, but by some reason we begun mailing each other discussing how the ID3 format and its limitations. The mails became longer and I begun collecting our findings into a text file and called it ID3v2, riding on the hype of IPv6 (Where did it go? Both IPv6 and the hype?). After some considerable amount of work, with the help of Johan Sundström and Andreas Sigfridsson, I had made a first draft of the ID3v2 "standard" and published on the net (on this server actually, www.lysator.liu.se/id3v2) and begun getting attention.
... The goal with ID3v2 has always been that it should be a proper RFC, and now it has the quality needed, in my opinion.
—Martin Nilsson, "ID3v2", c. 2000 or 2001?
The "ID3 tag version 2" specification, formatted as an RFC (Request for Comments), was published as an informal standard on March 26, 1998. ID3v2 never did become an actual RFC, although it certainly succeeded beyond Nilsson et al's wildest dreams.
This initial version of ID3v2 later became known as ID3v2.2.
Its file name was id3v2-00.txt
and I have not found why
they didn't simply call it ID3v2.0 or ID3v2.1. I don't know how
popular ID3v2.2 was, but, as of Jeff Atwood's 2006 blog post,
iTunes was still using ID3v2.2. (Writing ID3v2.2 tags;
I see that iTunes could read the later versions of tags, but not
write them.)
Right from the start, ID3v2.2 had everything: variable-length text strings, Unicode support, and images (as well as general binary/other objects). An ID3v2 tag consists of a tag header and one or more frames. Each frame consists of a frame header and one value along with attributes of the value. For example, an attached picture ("PIC") frame has the binary image data (the value) plus the image type (e.g., "PNG" or "JPG") needed to interpret the data. ID3v2.2 frames have 3-character frame IDs like "PIC" and 3-byte frame sizes, the latter limiting frames to 16 MIB in size:
Version | Year | Tag Size | Maximum | Frame IDs | Frame Size | Maximum | Unicode Encodings | Unsynchronization |
---|---|---|---|---|---|---|---|---|
ID3v2.2 | 1998 | 28 bits1 | 256 MiB | 3 chars | 24 bits | 16 MiB | UCS-22 | Tag level |
ID3v2.3 | 1999 | 28 bits1 | 256 MiB | 4 chars | 32 bits | 4 GiB | UCS-2 with BOM | Tag level |
ID3v2.4 | 2001 | 28 bits1 | 256 MiB | 4 chars | 28 bits1 | 256 MiB | UTF-16 with BOM, UTF-16BE, UTF-8 |
Frame level |
1 synchsafe 2 BOM not mentioned |
The ID3v2.3 standard followed less than a year after ID3v2.2, being published on February 3, 1999. The tag header was the same, but with a new version value and two additional flags. The most visible change was that ID3v2.2's 3-character frame IDs (e.g., "TAL") had been expanded to 4 characters (e.g., "TALB"). The frame size field was also expanded from 3 to 4 bytes and 2 bytes of flags were added, thus wiping out any backwards compatibility. I agree with Dale in the Coding Horror comments that these changes warranted a new major version number (ID3v3.0). However, it is just a name and with ID3v2.3 following so closely on the heels of ID3v2.2, Nilsson et al probably didn't foresee ID3v2.2 being adopted and still in use by a major music application for years to come.
Strictly speaking, per their specs, ID3v2.2 and ID3v2.3 do not fully support Unicode. UCS-2 is a single 2-byte (16-bit) encoding of the 64K possible characters in Unicode's Basic Multilingual Plane (BMP), characters in the range U+0000..U+FFFF. (U+xxxx is the common notation for the hexadecimal value of a Unicode character.) The now-obsolete UCS-2 could not encode characters beyond the BMP in the range U+010000..U+10FFFF (the other 16 non-Basic planes).
The ID3v2.4 standard was published two years after ID3v2.3's in 2001. (Main Structure and Native Frames) The most significant differences with ID3v2.3 are that (i) a Text Information frame may have multiple string values separated by the text encoding's NUL character and (ii) the text encoding $01 was changed from UCS-2 to UTF-16 and new text encodings UTF-16BE (big-endian without a BOM) and UTF-8 were added.
UTF-16 is a variable-length 2-byte encoding of all the Unicode characters (U+000000..U+10FFFF). For characters in the BMP, UCS-2 and UTF-16 have identical, single 2-byte codes. 2K characters in the BMP, U+D800..U+DFFF, are reserved for surrogate pairs of 2-byte codes, used by UTF-16 to encode characters beyond the BMP, again U+010000..U+10FFFF. So a UCS-2 encoding, being limited to the BMP, is always 2 bytes, whereas a UTF-16 encoding for any Unicode character may be 2 or 4 bytes. Since both are 16-bit encodings, I don't know if ID3 applications gradually switched to writing UTF-16 strings in ID3v2.2 and ID3v2.3 tags, or not. (Yes, I realize UCS-2 was already obsolete before the ID3v2 specs were written, but, at the time, UCS-2 was entrenched in, for example, operating systems such as Windows and development tools such as Java.)
Less significant changes were also made. Unsynchronization in ID3v2.4 is done at the frame level, not at the tag level; this necessitated making the frame size field a 28-bit synchsafe integer. Unlike earlier ID3v2 tags, ID3v2.4 tags can be stored at both the beginning and/or end of an audio file. A tag at the end of the file must have a footer, a copy of the tag header but with "ID3" replaced by "3DI". (If the tag is stored at both ends of the file, the beginning tag is required to have a SEEK frame, which would seem to limit the audio file size to 4 GiB.)
How did the ID3v2 standards fare in the years following their publication?
Throughout the 2000s, I mostly worked with ID3v1.1 tags, although my tag311 program would report on the presence of ID3v2 tags. Not being an iTunes user, I rarely saw ID3v2.2 tags. ID3v2.3 tags were frequent and ID3v2.4 tags were infrequent. From about 2010 to 2023, prior to adding ID3v2.3/2.4 support to tag311, I continued using tag311 to write the basic track information in ID3v1.1 tags and I used Florian Heidenreich's "universal tag editor", Mp3tag, to convert these tags to ID3v2.3 and to add album covers. Again, almost no ID3v2.2 tags, infrequent ID3v2.4 tags, and lots of ID3v2.3 tags. That's my anecdotal evidence!
A 2012 question, "What ID3 Tag Version does iTunes use?", in the Apple discussions forum had the answer that iTunes 10.6.0.40 (released in 2012) wrote ID3v2.2 tags:
This is the same version of ID3 that iTunes has been using for a long time. It is a little surprising, since most of the industry, such as Amazon MP3, moved to v2.3 years ago. iTunes will of course play them just fine.
iTunes does have the ability to convert ID3 tags to a different version, up to v2.4 ...
—ed2345, March 25, 2012.
This was 13 years after the publication of the ID3v2.3 standard and 6 years after the Coding Horror blog post. A little understandable, but not really excusable. iTunes was based on an MP3 player, SoundJam MP, whose initial release was prior to the publication of the ID3v2.3 standard. Apple acquired SoundJam MP in 2000 and released the rebranded, reengineered iTunes in January 2001, nearly two years after the February 1999 publication of the ID3v2.3 standard.
Notwithstanding Apple's long embrace of ID3v2.2, it is pretty clear that ID3v2.3 is the world's choice for tagging MP3 files. ID3v2.4's additional text encodings and multiple string values in Text Information frames were not compelling enough and ID3v2.4 has never caught on.
Was ID3v2 "A Spec-tacular Failure" as Jeff Atwood said in 2006?
Before digging into the Coding Horror complaints, I think we should first take a look at the state of computing when the ID3v2 standards were developed. This will help us understand why Nilsson et al made the bespoke design choices they did instead of simply using JSON. (That's a joke.) The design probably came together in late 1997 or early 1998, culminating in the publication of the ID3v2.2 standard in March 1998.
In 1996, I was transferred back to our company's headquarters and assigned a brand new PC with Windows 95, an enormous 1-GB disk drive, and a typical amount of RAM (megabytes or tens of megabytes?); work did have high-speed internet. At home I had a PC-XT with MS-DOS, a 10- or 20-MB disk drive, 640K of RAM, and a 2400-baud modem used for calling up computers at my previous job site. A couple of years later, I purchased a used 486 computer running Windows 95 from work, got a higher-speed modem, and signed up for dial-up internet access from a local ISP. In 1998, I discovered and became a fan (and purchaser!) of the Opera 3.21 web browser, famous for its Windows installation program fitting on a 1.44-MB 3.5-inch floppy disk.
My work and home setups were not unusual and having high-speed internet at work was something not available to many consumers. In other words, the ID3v2 designers were working within a constrained computing environment where reducing byte counts was important in both network transmissions and disk storage. Granted, minimizing the size of an insigificant fraction (the ID3 tag) of an audio file is possibly a false economy.
The contemporaneous ID3 documentation at the time of the 2006 blog post can be perused at the archived ID3.org web site.
And now to the Coding Horror blog post! (Bold text and italics in the quotes are such in the original.)
One of the first big warning signs is this list of ID3 "offenders" on the UltraID3Lib site. It reads like a who's who of music applications: iTunes, WinAmp, Windows Media Player. If the applications that ship with the operating system can't get ID3 tags right, clearly something is wrong.
—Jeff Atwood, "A Spec-tacular Failure", August 4, 2006.
Clearly something is wrong, but it's not the ID3 spec. Two of the three ID3-related bugs described on Mitchell S. Honnert's UltraID3Lib web page have to do with Winamp's incorrect implementation of Comment (COMM) frames. All 3 versions of ID3 tags have the same, simple COMM frame structure, the frame header followed by 4 fields:
4.11. Comments
...
There may be more than one comment frame in each tag, but only one with the same language and content descriptor.
<Header for 'Comment', ID: "COMM"> Text encoding $xx Language $xx xx xx Short content descrip. <text string according to encoding> $00 (00) The actual text <full text string according to encoding>—Martin Nilsson, "ID3 Tag Version 2.3.0", February 3, 1999.
The first bug is that Winamp doesn't fill in the language field, leaving it with 3 random binary bytes. The second bug has to do with the fact that ID3v2 allows multiple COMM frames as long as the language/description combinations differ. For example, you can have an English "Biography" comment and a French "Biography" comment, but you can't have two Italian "Biography" comments. Both of these bugs are implementation errors. Yes, perhaps the programmers misunderstood the spec, but I'm not sure how Martin Nilsson could have reworded the spec to make it even more clearer than it already was. (I'm not picking on the Winamp programmers — it happens and I've certainly made similar errors myself in my career.)
The third ID3-related bug described on the UltraID3Lib page is that Apple/iTunes invented a non-standard Compilation (TCMP) frame. They should have used a user-defined Text Information frame (TXXX) or, as Honnert suggested, a Private (PRIV) frame. The ID3 designers kept a list of iTunes custom tags, not just TCMP, and other iTunes issues. I am apparently not alone in thinking that Apple developers seem to go out of their way to do things in non-standard ways.
Jeff twice includes Windows Media Player in the group of major media players that get ID3 tags wrong. However, the only mention of Windows Media Player on the UltraID3Lib page is when Honnert praises it for using the intended PRIV tag for custom data, unlike iTunes. (Several problems with Windows Media Player were listed on the ID3v2 Compliance Issues page.)
Ironically, later in the blog post, Jeff substitutes the two flawed implementations of ID3 for the standards themselves:
Since the ID3 spec is so deficient, I've been using the behavior of popular applications as a de-facto spec. In other words, I test to see how WinAmp behaves when editing ID3 tags ...
I also test to see how iTunes behaves when editing ID3 tags ...
Warts and all, the practical implementations of ID3 tags in popular applications like WinAmp and iTunes trump anything that's written in the formal ID3 spec.
—Jeff Atwood, "A Spec-tacular Failure", August 4, 2006.
That is perhaps an acceptable short-term tactic for working with or around an application bug. You should also report the bug to the application developers, as Honnert did when he submitted two reports for the two bugs he found to the Winamp project. I realize that there might not be an effective communications channel to an application's developers (as was the case with Honnert's unacknowledged bug reports) and, of course, good luck convincing Apple to stop inventing new frame types when existing frame types would be sufficient for its purposes!
Sometimes you must comply with and code to a flawed de facto standard, in which case I recommend coding to the correct standard and providing a configuration option (preferably run-time) to override it with the flawed behavior in the hopeful expectation of that behavior eventually being fixed.
This happened to me at a company I once worked for. A used-car-salesman idiot of an employee convinced management to switch the flagship product's middleware to CORBA. (A possibly defensible choice on grounds other than the costly bill of goods he sold management.) Our company contracted with a third-party company to supply and maintain a copy of the TAO CORBA implementation. Like everyone else, I used TAO for the official software, but I also wrote my own lightweight CORBA implementation. (Suffice to say, I had good reasons for doing so.) I used my implementation to easily write a number of programs that exhaustively tested our CORBA interfaces. I discovered that TAO was not correctly specifying the length of wide-character strings in accordance with the then-current GIOP version, so my programs had a command-line option to match or not match TAO's incorrect implementation. (Our TAO programs happily communicated with each other using the flawed implementation on both sides of their interfaces.)
I found on the TAO forum that this was a known problem that was supposedly fixed in later releases, but, for some reason, the updates we received from the third-party company during the 4-5 years I worked on the project never had that fix. And it wasn't because I coincidentally had an error in my implementation. A Tcl/Tk group in our company needed to talk to our CORBA servers and wanted to use an independent [incr Tcl] implementation of CORBA. Knowing [incr Tcl] and GIOP, I was able to tell them exactly how to break that implementation's wide-character marshaling code in order to talk to our TAO-based servers and the developer who made the change as I watched was able to immediately request/read data from one of our servers. Of course, I recommended that they visibly document this crutch apart from the code to ensure that, in the future, someone would remember to remove it when TAO was finally fixed!
And now let's begin looking at Coding Horror's list of ID3 problems:
The spec shows how but rarely explains why. For example, frame sizes are stored as 4-byte "syncsafe integers" where the 8th bit of every byte is zeroed. Why would you store size in such an annoying, unintuitive format? Who knows; the spec doesn't explain. You just grit your teeth and do it.
—Jeff Atwood, "A Spec-tacular Failure", August 4, 2006.
This is a semi-fair criticism of the ID3 specs, but should specs explain why? The ISO C Standard specifies the language, but the Rationale is a separate document if I remember correctly. Still, the ID3v2 specs are explicitly "Informal standard"s, so explaining the whys and wherefores of the more complex items would not have been out of place.
The ID3v2.2 and ID3v2.3 do begin their "ID3v2 Overview" sections with a discussion of two design goals, the first being to make ID3 tags invisible to MP3 decoders by eliminating false MP3 syncsignals embedded in the tags. Encoding a 28-bit integer in a 4-byte value for this purpose is used for the size field in the tag header in all three versions of ID3v2 tags:
ID3v2.2 spec, "ID3v2 Header":
The reason to use 28 bits (representing up to 256MB) for size description is that we don't want to run out of space here.
(What ?!)ID3v2.3 spec, "ID3v2 Header":
Only 28 bits (representing up to 256MB) are used in the size description to avoid the introduction of 'false syncsignals'.
ID3v2.4 spec, "ID3v2 Header":
The ID3v2 tag size is stored as a 32 bit synchsafe integer (section 6.2), making a total of 28 effective bits (representing up to 256MB).
—Martin Nilsson.
The ID3v2.4 spec dropped the discussion of design goals in "ID3v2 Overview" and introduced the term synchsafe integer for the 28-bit sizes which, because unsynchronization was now at the frame level, now included the tag size and frame size fields. Section 6.2 (referenced above) of the ID3v2.4 spec explains the purpose and format of the synchsafe 32-bit encoding of 28-bit integers. (Also because of frame-level unsynchronization, the CRC-32 value stored in an ID3v2.4 tag's extended header is encoded as a 5-byte, 35-bit synchsafe integer.)
My note: What is a syncsignal? An MP3/MPEG audio stream is a sequence of audio frames, distinct from ID3 tags and frames. At the beginning of each MPEG frame is a 32-bit frame header. The frame header starts with an 11-bit frame sync code of all one bits. That's the 8 bits in the first byte of the header and the 3 most significant bits of the second byte: 11111111 111bbbbb. An MPEG audio decoder scans the stream looking for the 11-bit sync code; when the decoder finds one, it starts decoding the audio frame. Or attempts to if it mistakenly locks onto a false sync code in a non-unsynchronized ID3 tag! ID3 synchsafe integers and unsynchronization work by disrupting (using two different techniques) the 3 most significant bits in the byte following a 0xFF byte.
I found most of the material in the ID3v2 specs fairly self-explanatory. I did research some items to verify or correct my understanding. One esoteric ID3 capability — "grouping of otherwise unrelated frames" — had me asking "Why?" The frame overview sections of the ID3v2.3 and ID3v2.4 standards specify a flag that, if set, indicates a group identification byte ("symbol") is appended to the frame header. To what end?
Later, I came across the separate and associated Group Identification Registration (GRID) frame, which the standards suggest be used for the signing of selected frames within a tag. The GRID frame seems to map a group ID symbol to an owner ID (e.g., a URL) and a buffer of arbitrary private data. ID3v2.4 added an additional, optional Signature (SIGN) frame that is purposely a GRID frame without the owner ID. My inference that the GRID and SIGN frames represent a one-to-one, symbol-to-data mapping doesn't quite square with the spec's confusing-to-me descriptions of when multiple frames are allowed. For example, there may be multiple SIGN frames, but "no two may be identical". That seems to imply that there can be two SIGN frames for the same group ID, with different signatures (the private data). How does an MP3/ID3 application choose between one signature and the other? (A similar situation arises with the optional encryption method byte/symbol appended to a frame header and its associated Encryption Method Registration or ENCR frame.) Okay, I seem to be making Jeff's case for him! 🙂
Giving it some more thought ... I'm not familiar with the mechanics of signing and verifying signatures, but I suppose that a given group ID could have multiple signing "authorities" or forms of verification. If verification with one doesn't work (or the authority is down), an application could try one of the others. And matching with Jeff's experience recounted in the next complaint, I've never seen GRID, SIGN, or ENCR frames in the wild.
The vast majority of the things described in the spec do not appear in any MP3 files that I can find or create. There are 70+ possible frame types, but I've only seen a dozen or so in practice. And what about encryption? Compression? CRC checks? Footers? Extended headers? Never seen 'em. And I probably never will. But I still have to parse through pages and pages of detailed text about these extremely rare features.
—Jeff Atwood, "A Spec-tacular Failure", August 4, 2006.
Of course you've haven't seen encrypted ID3 tags. You're not on the global need-to-know list and we don't send out encrypted secret recordings to just anyone! 😉
This was a fair observation about frame types in 2006 and is still a fair observation nearly 18 years later. I would be hesitant to draw conclusions about the presence or absence of the other features. That depends on the ID3 tag reader or editor you're using. My tag311 program silently handles unsynchronization, CRC checks, and extended headers. If a CRC check fails, tag311 will report that, but otherwise I'm not made aware of the presence of these features in an MP3 track's ID3 tag. My program doesn't handle encryption or compression; I wonder if other ID3 programs do. Footers will only be seen in ID3v2.4 tags written at the end of an MP3 file. That is probably extremely rare and also requires your ID3 application to check for appended ID3v2.4 tags. (tag311 doesn't.)
As to the numerous, apparently unused frame types, it might help to look at the ID3v2 standards as partly "spec-ulative" (!) in nature. The ID3v2 tags were designed by an MP3 enthusiast, Martin Nilsson, with input from other MP3 enthusiasts. As a result, I think they included not only information/frames that would be needed or useful immediately, but also information/frames that might be needed or useful in the future. For example, there are a number of frame types in ID3v2.3 solely intended for or partially useful to commercial distributors of MP3s:
From the list above, I think I've only ever seen UFID and TPUB frames, and those rarely. Perhaps the integration of MP3s into the commercial music industry didn't unfold quite as expected in a time of rapid technological change. I don't advocate removing the "unused" frame types, particularly not based on my purely anecdotal evidence of lack of use.
A real-world example: General Encapsulated Object (GEOB) frame are basically repurposed Attached Picture (APIC) frames, holding arbitrary binary data objects instead of images. I've only encountered GEOB frames once in the wild. An MP3 file downloaded from SoundCloud had a strange (to me!) assortment of ID3 frame types and not just GEOB. Based on the descriptions in the GEOB frames, I figured out these frames are written and used by Serato DJ products and seem to be tables (?) of markers and cue points, plus some others. And my search for "Serato" frames turned up forum questions about importing Serato ID3 tags into other DJ software, so I imagine the other DJ packages might have their own conventions for encoding DJ-specific information in ID3 tags. The lesson: don't delete frame types from the ID3v2 standards just because we casual music fans never see them in our run-of-the-mill music listening!
The spec has ridiculous enumerations. Check out the 147 possible values of the music genre byte. The existing 147 categories seem to be chosen completely at random ... And it isn't just the genre tag; one of the possible picture types for the attached picture tag "APIC" is-- and I swear I'm not making this up-- "A bright coloured fish" ($11). At some point you feel like you're wasting your time by enumerating insanity.
—Jeff Atwood, "A Spec-tacular Failure", August 4, 2006.
The ID3v1 music genres are zero-based, so there were 148 possible values. The original list of genres covered 0 (Blues) through 79 (Hard Rock). Winamp gradually (or in batches) added new genres to the list. At least by June 1998 (based on these MP3 Manager notes), Winamp had added genres 80 (Folk) through 147 (Synthpop). By 2010, according to Wikipedia, the list was extended further with the addition of 148 (Christmas) through 191 (Psybient). The ID3v2.2 and ID3v2.3 specs list the genres through 125 (Dance Hall) in Appendix A. The ID3v2.4 spec dropped back to the original 0..79. Again, those are ID3v1 genres and they are labeled as such in the ID3v2 specs. ID3v2.2 and ID3v2.3 suggest a convention for reusing the ID3v1 genres and all three specs recommend simply making up your own genres.
Assigning a music genre or genres to an audio segment (in general, not just MP3s) is a difficult and subjective task. ID3v1's list of 192 genres pales in comparison to the list of nearly 6,300 genres of finer granularity compiled by Glenn McDonald using statistics culled from Spotify. No one is happy with ID3v1's often wacky and wild genres, but it's the closest thing to a standard available to MP3 taggers. I imagine music and audio scholars have come up with various (countless?) schemes for categorizing audio tracks, but I'm not aware of anything taking the MP3 world by storm.
As to the APIC picture type, "A bright coloured fish", when someone puts in great effort — on their own time and at their own expense — to design and document a new method of tagging MP3 files and decides to inject a little whimsy into said document, I chuckle at the whimsy and say, "Thank you".
No examples are provided. Consider the comment frame. This is a relatively complex frame; it supports multiple languages and different encodings. It also supports multiple comments per frame with descriptive labels for each one. And yet it only merits a paragraph in the frames specification, with no examples of usage whatsoever. Would it kill them to provide a couple examples of how a comment should actually look?
—Jeff Atwood, "A Spec-tacular Failure", August 4, 2006.
As I pointed out earlier, comment (COMM) frames are not complicated. Every ID3 frame type has what I call its core value. A Text Information frame has a text string, a play counter (PCNT) has a binary counter, an attached picture (APIC) has the image data, and so on.
For most frame types, ID3 only allows one instance of the type in a tag. However, certain types can appear multiple times and have additional attributes used to distinguish among the multiple instances in a tag. User-defined Text Information (TXXX) and URL Link (WXXX) frames have an added description attribute. I was going to present the attached picture (APIC) frame as another example since it adds 3 attributes to the image data — MIME type, picture type, and description — and the combination of the 3 could be used as the discriminant. For instance, you might have two front album covers (type $03), one a PNG image and the other a JPEG image. On a Beatles track, you would have 4 artist images ($08): "John", "Paul", "George", and "Ringo". But I just noticed that all three ID3v2 specs use only the description to distinguish between instances. That is odd. (And in checking my own code, written over 6 months ago, I see I mistakenly used the picture type and description, but not the MIME type. I am or was odd!)
A comment frame, at its heart, is basically a Text Information frame with a text string and two added attributes: language and description. From my earlier example, you can have an English "Biography" comment and a French "Biography" comment, but not two Italian "Biography" comments. I suppose examples in the specs might be nice, but I made up the multilingual biography comments off the top of my head based on the specification of COMM frames. A language is a language, a description is a string, and a text string is a text string.
Related items are not together. The comment frame has two lookups in its header: language and text encoding. There is absolutely no reference at all to these lookup tables in the comment frame description. You have to "just know" that the main ID3 spec defines all languages with three character ISO-639-2 language codes, and that there are four possible text encodings from 00 to 03, with different rules for null termination. It'd be awfully difficult to write a comment tag [sic] reader without this information, yet it's nowhere to be found in the description of the comment tag [sic].
—Jeff Atwood, "A Spec-tacular Failure", August 4, 2006.
The language and text encoding fields are covered in the different versions'
"ID3v2 Frame Overview" sections. Language codes are used in the TLAN, USLT,
SYLT, COMM, and USER frames and the text encoding byte is used in a majority
of frame types. IDv2.4 frames have four possible text encodings; ID3v2.2 and
ID3v2.3 support only two encodings. Things like languages and text encodings
should not be detailed in the individual frame specifications —
a document maintenance nightmare — but a parenthetical
"(See Section 3.x, 'ID3v2 frame(s) overview')
" would be helpful.
I fail to see the wisdom in trying to write a comment frame reader based solely on the isolated comment frame specification. If you're a senior developer, you should know better. And if you're a senior developer assigning this task to a junior developer, you should still know better. In both cases, before designing or writing code, I would expect the implementer to familiarize themselves with (i) the ID3v2 spec, especially the tag and frame overviews, and (ii) the software framework (e.g., an ID3 library) of which the comment reader will be a part.
Are ID3v2 tags complicated?
The ID3 spec is doubly frustrating because it makes a simple topic difficult. ID3 tags are just not that complicated. The spec makes me feel like an idiot for not being able to get this stuff right. What's the matter? Can't you read the spec?—Jeff Atwood, "A Spec-tacular Failure", August 4, 2006.
I too was annoyed by the complexity of ID3v2 tags, both in past years and again when preparing to actually implement the tags in 2023.
To make matters worse, I've suffered from severe, treatment-resistant depression for almost 30 years. With depression, the smallest tasks seem overwhelming and it is difficult to make decisions, in this case design decisions. I had to force myself to write the code at various steps and I'm not happy with my implementation. Someone without depression would have had an easier time and would have produced a better, more coherent implementation in much less time.
My main initial gripe was that I wish the ID3 tags had a simpler, more regular structure. Perhaps key-value pairs where values could be textual and binary. It wasn't my original thought because I hadn't looked into it yet, but something like APE, I suppose. Or Vorbis Comments, although user certuna notes on Reddit that "Vorbis Comments have their own problems, mostly that it's a complete free-for-all, nothing is defined."
A secondary gripe was one I have generally with binary file formats intended for multiple computing platforms. There were and are many existing methods of serializing common data types; you don't need to reinvent the wheel. At the time the ID3v2 standards were written, the ones I was most familiar with were Sun's XDR, CORBA's CDR, and ASN.1's BER. (I've only used XDR and CDR.)
What I found as I was implementing ID3v2 tags was that they do have a regular structure and that there are relatively few data types, all with easily encoded representations. For example, the lookup table in my sofware that maps frame types to encoding/decoding functions has 99 entries.
That's more than the spec's 70+ frame types because the table includes "predefined" user-defined Text Information and URL Link frames such as the 14 MusicBrainz-specific user-defined Text Information frames. The lookup table is based on the Tag Field Mappings Table of Florian Heidenreich's "universal tag editor", Mp3tag. Mp3tag is an amazing, mature, and sophisticated, Windows GUI ID3 tag editor. I've used it myself for many years and, like many others in online forums, I highly recommend it. (I'm more of a CLI person than a GUI person and, after adding ID3v2.3/2.4 support to my tag311 program, I've used Mp3tag somewhat less in the past few months.)
Of those 99 entries, 55 are Text Information frames and 19 are user-defined Text Information frames (TXXX), for a total of 74 out of 99. Now the pseudocode for decoding both regular and user-defined Text Information frames is very simple:
Decode text-encoding byte If (user-defined) Decode description string Decode the text value if (ID3v2.4) Decode additional text values if any
To encode these frames, substitute "Encode" for "Decode.
The lookup table has 10 entries for regular and user-defined URL Link frames. These are also easy to decode (or encode), the only difference with Text Information frames being that the URL string is always ISO-8859-1:
If (user-defined) { Decode text-encoding byte Decode description string } Decode the URL
Comment frames are glorified, user-defined Text Information frames with an added language code. Attached Picture (APIC) and General Encapsulated Object (GEOB) frames are basic frames containing binary blobs. My tag311 program simply passes through some of the less common and more complicated frame types as-is; e.g., the MPEG Location Lookup Table (MLLT) frame which holds a variable-length table of deviation pairs and the previously mentioned Commercial frame (COMR) which has various individual text fields and a logo image in the frame.
Lastly, Jeff finishes up his blog post with a rant from Linus Torvalds about the uselessness of specifications and then this final paragraph:
Specs, if they're well-written, can be useful. But they probably won't be. The best functional spec you'll ever have is the behavior of real applications.
—Jeff Atwood, "A Spec-tacular Failure", August 4, 2006.
The Torvalds quote appeared earlier in Jeff's 2005 post, "Dysfunctional Specifications", which links to Joel Sposky's three-part "Painless Functional Specifications". Part 2, "What's a Spec?" defines a functional spec as a user-oriented description of how a product works and a technical spec as a description of the product's internal implementation.
Now I don't feel qualified to knowledgeably contribute to discussions on the types and quality of specifications, but, in the case of ID3 tags, I think Jeff's focus on functional specs is a red herring (hat-tip to Coding Horror commenter RegB for suggesting the connotation of "A bright coloured fish"!). And I would say the same if the post focused on technical specs.
The informal ID3v2 standards document a file/stream format for ID3v2 tags, i.e., a very precise definition of the bit- and byte-layout of a tag. In my experience, this would be called something like an "Interface Control Document" (ICD). Yes, a name is just a name, but "ICD" is not a magnet for adjectives such as "functional" and "technical". A couple of commenters mentioned a desire for a reference implementation. No! Data has been written to and read from files in particular formats since the dawn of the electronic computer age. (Longer than that if you count punch-card tabulators!) This is not rocket science. Certain ancillary information needs to be stored in an audio track; here's how we will lay it out. All the back and forth about specifications seems beside the point when all you need to say is: "The first 3 bytes of a tag are 'ID3'; the next byte is the major version number, the next byte is the minor version number, the next byte is flags, etc."
Are the ID3v2 standards perfect? No. If I could, would I go back in time and change it? I'm not so sure. (Okay, I would maybe make UTF-8 the Unicode encoding from the start and I would eliminate unsynchronization or at least make a definitive choice on its tag- or frame-level application.) The limitations of ID3v1 tags were well known. Martin Nilsson and his small community of MP3 enthusiasts stepped up to the plate (as we say in the U.S.) and designed three standards in quick succession that provided a relatively complete, but still extensible framework for tagging MP3 files with metadata. The middle standard, ID3v2.3, is still the standard 25 years later.
Martin Nilsson had hoped that ID3v2 would become an internet standard and therefore wrote the ID3v2 standards in the RFC format. Ultimately, that did not happen. (I can find no record of it being submitted for consideration; a draft ID3v1 standard by Nilsson was submitted, but it doesn't appear to have progressed any further.) At that point, I think someone should have written a more expansive document such as a reference manual that would have addressed, among other things, some of the questions and concerns that Jeff Atwood raised. I would not have placed that burden on Martin Nilsson as he had already done more than enough and he surely had other personal and professional projects vying for his interest and time. Perhaps one of the companies making hardware- or software-based MP3 players?! (I don't mean Apple. My familiarity with iTunes was limited to manually loading MP3 albums onto a hand-me-down iPod and manually deleting albums from the iPod, but I gather that Apple would like MP3s to quietly disappear.)
'Paul and Ringo violently trying to persuade
John and George to accept their "New Age" theology.
After many attempts; they were finally successful.'
(Click image to enlarge; Source:
Conjecture News)
Note: I'm not a classical music expert, but I found this subject interesting and instructive. I played violin when I was young and I regularly listen to violin concertos, but my musical tastes generally tend towards blues and rock.
When I said that ID3v2 gave us a "relatively complete" framework, I mean that ID3v2 handles maybe 90% of use cases for tagging audio tracks. The other 10% are hard, essentially being the music genre problem writ large. One particularly prominent issue is labeling a track with a multi-level hierarchy of titles. ID3v2 does a good job of supporting the common album/song scheme:
Album/Movie/Show Title (TALB)
⤷ Part of a Set (TPOS) (e.g., CD 1 of a 3-CD set: "1" or "1/3")
⤷ Title/Songname/Content Description (TIT2)
⤷ Subtitle/Description Refinement (TIT3)
Not included above is the odd Content Group Description (TIT1) frame. It is intended for "a larger category of sounds/music" according to the ID3v2 spec, but the purpose of the frame is not made clear. The frame ID, TIT1, and possibly the word "Group" imply a more general title than the song title. And, I guess, less general than the album title? I don't think I've seen TIT1, TIT3, and TPOS frames very often if at all. Since I usually delete the existing tags in MP3s and retag them in line with my own minimalistic scheme, my cursory glances at the old tags likely overlook any instances I do see.
As shown above, ID3v2 does handle the common two-level hierarchy of an album with multiple tracks. It provides partial support for an album broken into multiple discs, the numeric convention for the TPOS frame not fully accounting for named discs; e.g., "CD1 - The Early Years", "CD2 - The Middle Years", etc. More open-ended support for multiple levels in titles could have been achieved if ID3v2 had used TIT1 for the track title and TIT2..TIT9 for successively more general titles. That would have created other problems though and it might require an examination of the distinction between the formal multi-level title of a track and what I call its "physical packaging" multi-level title. For example, Mozart's Violin Concerto No. 5/II. Adagio (second movement) may appear without the first and third movements as track 7 on a mix tape/CD, Music to Relax By.
Although the TIT1 frame is already somewhat subjective in nature, iTunes originally used it in an even more subjective role, grouping, where a group is not necessarily a more general title intended for identification purposes, but is a user-chosen, arbitrary string for the user's grouping purposes. (Note that grouping here has nothing to do with the digital-signature-related group ID in frame headers mentioned earlier.)
In 2016, iTunes enhanced support for tagging classical music by:
Moving grouping information from TIT1 to a new Grouping (GRP1) frame
Using TIT1 for the music work (e.g., "Violin Concerto No. 5 in A")
Adding a new Movement Number/Count (MVIN) frame (e.g., "2/3")
Adding a new Movement Name (MVNM) frame (e.g., "Adagio")
Although I'm not keen on Apple creating new frame types, I must admit that psychologically (though not logically!) a frame type such as MVNM feels more standardized than a TXXX:description convention such as TXXX:"Movement Name".
Discussions of these iTunes changes often reference an additional Show Work & Movement (TXXX:SHOWMOVEMENT) frame whose presence or absence controls whether or not iTunes displays the TIT1 and movement titles/numbers. It's not clear to me if iTunes actually writes these frames or if this is just a flag stored in the user's iTunes database.
The following discussions and document provide some insight into the tagging issues for classical music:
Mp3tag Forum (August 2016): work name, name and movement
MusicBrainz Picard Forum (November 2018): Movements saved in TXXX id3v2.4 frames instead of MVNM ones
Proposal by Sophist-UK (May 2017): Picard Multi-Level Work Tags
A couple of idle unrelated thoughts ...
General Encapsulated Object (GEOB) frames are used to embed arbitrary binary data objects in an ID3 tag. If you take an Attached Picture (APIC) frame, change the 'APIC' ID to 'GEOB', and replace the picture type with a filename, you'll have a GEOB frame. A use for GEOB frames that first sprang to my mind when reading the ID3v2 spec was storing a PDF file of an album's liner notes or booklet in the MP3s. And I've seen PDF music books with embedded MP3 files. So you could potentially have an MP3 file embedded in a PDF file embedded in another MP3 file's ID3 tag — it's ID3 tags all the way down ... until you hit the 256-MiB limit for ID3 tags!
I came across an imaginative presentation of an alternate universe filled with music and ID3v2 tags, one in which the ID3v2 tags have a broader connection to the outside world using capabilities already designed into the tags. Lars Wikman's 2022 blog post, "What ID3v2 Could Have Been", starts with the Play Counter (PCNT) frame and moves onto the Popularimeter (POPM) frame, an expanded play counter with a rating value and an email address. Since there can be multiple POPM frames with different email addresses, an MP3 file passed from user to user around the world can build up a history of who has listened to it. Of course, that is in an alternate universe. In this universe, the pre-spam-onslaught ID3v2 idea of adding people's email addresses to a publicly traded file now seems rather quaint!
Wikman also looks at the Commercial (COMR), Ownership (OWNE), Terms of Use (USER), and lyrics frames. The paragraph below is about the Event Timing Codes (ETCO) frame which holds a sequence of timed events:
... This is about a simpler time, where people saw the wild possibilities of music on computers and when people cared about files, damnit. This is about some of the most interesting and entertaining things I've run across while reading the spec ...
The Event codes frame seems like fun. The event codes can be used for a number of different things, controlling lights, setting of[f] explosives, whatever the player wants to interpret them as. There are some specific ones like start of song, bridge, end of song and a bunch of other musically related ones but you can specify a number of custom ones as well.
—Lars Wikman, "What ID3v2 Could Have Been", Underjord Blog, June 7, 2022.
The ID3v2 specs do actually mention setting off explosives as an application of the ETCO frame! Paul and Linda McCartney's song, "Rock Show" (YouTube), immediately pops into my head!
I accidentally found this 1998 change log for the initial ID3v2 spec (later known as ID3v2.2) in the source code for Michael Mutschler's MP3-Info Windows shell extension. I assume the log was kept by Martin Nilsson and it gives some additional insight into the evolving design of the ID3v2 tags. Also see Nilsson's comments in the annotated ID3v2.2 spec.
Changes
980227 Made a slight change in the newline representation since this is already defined in Unicode. 980222
- Added the event 'unwanted noise' in "ETC".
- Made some changes in the "LLT" frame. More changes are needed here, I know.
980221 Today (night) I've included several ideas mailed in to me as well as a few of my own. As I'm now very, very tired I'll probably have to bug-fix the text tomorrow (day).
- In order to have the 'correct' delays between the songs in a playlist I created the new "TDY" frame as well as made some changes and additions in the "ETC" frame.
- To protect the files from nasty, truncated filenames the "TOF" frame is added.
- The picture type "CD, lable side" is added.
- The "LLT" frame is added for better timing precision of the file.
- I've added on which filetypes ID3v2 applies in the "ID3v2 overview".
- Frames that there might be more than one of in a tag must have different content descriptors.
980218
- Even more small changes (mostly rephrasing).
- Added some suggested picture types.
- Changed the header again because 35-bit size descriptor is not only overkill, it's also not practical to code. The descriptor is now "ID3" instead of "MP3" so it always tells the truth.
980216 Well, I wasn't completely satisfied with my new header, and I was obviously not the only one, so I changed it again. I also made some changes in the neighborhood, so section 2 - 3.2 is now revised. The only really revolutionary change, besides the new header, is that any padding after the frames must have the value $00. 980213 Since it's desired that no synchronization is introduced in the ID3v2 tag header, I've inserted three bytes in the size descriptor. This significantly simplifies ID3v2 identification since there is no need to 'reverse unsychronize' the header. 980215
- Made a lot of small changes that nobody will notice.
- Made some revisions in the 'Reverb' frame.
980213 The 'unsychronization scheme' is altered to also remove 11-bit synchronizations, such as MPEG 2.5. 980211
- Added some information to the 'Encrypted meta frame' and 'Audio encryption' frames to clarify.
- Made some small changes to most frames, but none changes the way the frames are used or implemented.
980209
- Added a table of contents (which resulted in new paragraph numbers).
- Removed 'Security considerations' since there isn't really anything that applies to this.
- Looked over the use of " and ' and changed it according to 'Conventions in this document'.
- Made some corrections in the UMI and IPL frames.
- Changed the RVA frame to avoid distortion.
- Thought a bit about broadcasted streaming MPEG and implemented 'Recommended buffer size' (which I probably will revise).
- Read about Fraunhofers MMP (Multimedia Protection Protocol) and implemented the 'Encrypted meta frame' and 'Audio encryption'.
- Read a little more about MMP and implemented the TPB (publisher) frame.
- Made some addition in 'ID3v2 frames overview' to ease ID3 expansions.
- Added 'Time change' as an event.
980207 Corrected the length descriptor in the "RVA" frame. 980204
- Revised the 'Relative volume adjustment' so it now handles both channels separately, now also serving as balance adjustment.
- Revised and expanded 'References'.
- Added Language descriptor (according to ISO-639-1 and ISO-639-2) to the full-text frames (ULT, SLT & COM).
980130 Made some clarifications and corrections in the sections abstract, 2.1, 2.2 and in the description for the "TDA" and "TIM" frames. 980129
- Made some changes in the way 'Event timing codes' uses padding for long time gaps.
- Modified 'Synced tempo codes' for two byte time description and reserved 0 BPM and 1 BPM for exact timing, making it possible to sync music with different times between each beat. The accuracy is +/- 40ms, at worst.
- Added the 'movement/part name' content descriptor in the 'Synchronized text' frame, making it possible to name the different parts of a continuous piece of music such as classical music and mixes.
- Additions to the 'Content type' frame so that it is possible to make references to the old list of genres.
- The 'TMN' frame is removed and its functions is incorporated in the 'Media type' frame in the same way as numerical references in the 'Content type' frame.
- Made some changes in 'ID3v2 frames overview' (2.2) allowing the frames to be placed in any order within the tag.
- Made several small changes in the 'Attached picture' frame. You'll probably not notice any difference.
- Modified 'Synchronized text' for two-byte time description and made some clarifications.
- All questions and issues '-->' are removed from the document and transferred to the 'Problems to solve' page.
- Changed 'Synchronized text' (SYT) to 'Synchronized lyric/text' (SLT) for consistency.
9801xx A lot. I don't think there's anything that has survived from the initial draft during the process. 980110 The initial draft was written (It might have been a day or two earlier). —Martin Nilsson (?), "Changes", 1998. From
id3v2/changes.html
in the source code for Michael Mutschler's MP3-Info Windows shell extension.