Consider HTML. There's a shameful mix of makeup elements (div, span, i, b, pre, em, strong, p), semantic elements (kbd), user controls (input) and information about how to process actions (form, label). There's no certainity at all on what does it mean to write a tag. In case one would like to go really sematic, it could be done in the following way (keeping as much existing infrastructure as possible). Common web page is just an XML with only the data that are significant for that page. It has xml-stylesheet processing instruction. Linked XSLT contains several (intersecting) levels of mapping of those data into low-level makeup. CSS, ids and classes should never show at any stage of desugaring. The possible levels are (in order of increasing level)
- Give browser instructions to show the page and make UI (tag: span, screen, article, grid, row, cell, padding, spring, ellipsis; attr: link, xlink).
- Tell what kind of information is shown (kbd, paragraph, sentence, term, em, header, list, person).
- Tell what kind of UI is shown (spoiler, splitter, tree, datagrid, editor, toolbar, navbar).
There's just no way to create a Semantic Web until we don't start writing it in a consistent semantic way.
The problem of this approach is that one has to somehow manage dynamic content on page, while exposed DOM is the one of the already transformed XML. There was some movement towards standardisation of DOM access to the source XML, but it haven't got enough support.
In case we would ever be ready to forget all that HTML/CSS/JS stuff as a nightmare, there are even better approaches. We could finally make web based on extensible binary protocols (like protobuf). It would solve all the problems of minification: there wouldn't be a single name of CSS class, space character in HTML or identifier in JS.