HTML / XHTML / WML / XML Validator |
||||||||||||
| de | en | ||||||||||||
|
||||||||||||
Charset encodings in XML and XHTML documents can be detected by means of the
following sources:
| |||||||||||
HTTP-Header: Content-Type: application/xhtml+xml; charset=ISO-8859-1 1: <?xml version="1.0" encoding="ISO-8859-1"?> 2: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 3: "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 4: 5: <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="de" lang="de"> 6: <head> 7: <title>no error</title> 8: </head> 9: <body/> 10: </html> | No Error exists. HTTP-Header charset encoding will be used. | ||||||||||
HTTP-Header: Content-Type: text/xml 1: <!DOCTYPE root [ 2: <!ELEMENT foo (#PCDATA)> 3: <!ELEMENT root (foo)> 4: ]> 5: <root> 6: <foo>foo</foo> 7: </root> | If charset encoding could not be detected, the validator uses a fallback to US-ASCII. | ||||||||||
HTTP-Header: Content-Type: application/xhtml+xml 1: EF BB BF<?xml version="1.0"?>
2: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
3: "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
4:
5: <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="de" lang="de">
6: <head>
7: <title>BOM-Charset</title>
8: </head>
9: <body>äöüÄÖÜß</body>
10: </html>
| The charset encoding statement can only be identified at byte Order Mark (BOM), because no other statements exist. | ||||||||||
HTTP-Header: Content-Type: application/xhtml+xml 1: <?xml version="1.0"?> 2: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 3: "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 4: 5: <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="de" lang="de"> 6: <head> 7: <title>automatic</title> 8: </head> 9: <body>äöüÄÖÜß</body> 10: </html> | This UTF-16 encoded Document contains no charset encoding in the XML-Declaration.
Furthermore no HTTP-Header charset encoding will be submitted. Because charset encoding can be identified by means of the binary pattern of UTF-16, the right charset encoding will be used. | ||||||||||
HTTP-Header: Content-Type: application/xhtml+xml 1: <?xml version="1.0" encoding="UTF-8"?> 2: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 3: "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 4: 5: <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="de" lang="de"> 6: <head> 7: <title>XML-Charset</title> 8: </head> 9: <body>äöüÄÖÜß</body> 10: </html> | Only within the XML-Declaration a charset encoding statement (UTF-8) exists. There is no
charset encoding submitted in HTTP-Header. So XML-Declaration charset encoding (UTF-8) will be used. | ||||||||||
HTTP-Header: Content-Type: application/xhtml+xml 1: <?xml version="1.0"?> 2: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 3: "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 4: 5: <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="de" lang="de"> 6: <head> 7: <title>no XML-Declaration</title> 8: </head> 9: <body>äöüÄÖÜß</body> 10: </html> | If no charset encoding could be found in XHTML-documents, Validome performs a fallback to UTF-8. | ||||||||||
HTTP-Header: Content-Type: application/xhtml+xml 1: <?xml version="1.0" encoding="ISO-8859-1"?> 2: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 3: "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 4: <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="de" lang="de"> 5: <head> 6: <title>XML-Charset</title> 7: <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> 8: </head> 9: <body>äöüÄÖÜß</body> 10: </html> | If there's any difference between XML-Declaration charset encoding and Meta charset encoding, this should be reported. | ||||||||||
From now on, you see some examples with UTF-32 encoded documents. | |||||||||||
HTTP-Header: Content-Type: application/xhtml+xml; charset=UTF-32 1: 00 00 FE FF<?xml version="1.0" encoding="UTF-32"?>
2: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
3: "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
4:
5: <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="de" lang="de">
6: <head>
7: <title>UTF-32 (1234)</title>
8: </head>
9: <body>äöüÄÖÜß</body>
10: </html>
| Validome is also able to handle UTF-32 encoded documents. Every UTF-32 encoded character consists of four Byte and per document four different Byte Orders are possible. The following four documents demonstrate these different Byte Orders. 1. Byte Order 1234 | ||||||||||
HTTP-Header: Content-Type: application/xhtml+xml; charset=UTF-32 1: FF FE 00 00<?xml version="1.0" encoding="UTF-32"?>
2: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
3: "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
4:
5: <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="de" lang="de">
6: <head>
7: <title>UTF-32 (4321)</title>
8: </head>
9: <body>äöüÄÖÜß</body>
10: </html>
| 2. Byte Order 4321 | ||||||||||
HTTP-Header: Content-Type: application/xhtml+xml; charset=UTF-32 1: 00 00 FF FE<?xml version="1.0" encoding="UTF-32"?>
2: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
3: "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
4:
5: <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="de" lang="de">
6: <head>
7: <title>UTF-32 (4321)</title>
8: </head>
9: <body>äöüÄÖÜß</body>
10: </html>
| 3. Byte Order 2143 | ||||||||||
HTTP-Header: Content-Type: application/xhtml+xml; charset=UTF-32 1: FE FF 00 00<?xml version="1.0" encoding="UTF-32"?>
2: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
3: "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
4:
5: <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="de" lang="de">
6: <head>
7: <title>UTF-32 (3412)</title>
8: </head>
9: <body>äöüÄÖÜß</body>
10: </html>
| 4. Byte Order 3412 | ||||||||||
From now on, some example documents with conflicts between possible charset sources were shown. | |||||||||||
HTTP-Header: Content-Type: application/xhtml+xml 1: FF FE<?xml version="1.0" encoding="UTF-8"?> 2: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 3: "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 4: 5: <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="de" lang="de"> 6: <head> 7: <title>BOM != XML</title> 8: </head> 9: <body>äöüÄÖÜß</body> 10: </html> | BOM charset encoding (UTF-16 in this case) is different to XML-Declaration charset encoding (UTF-8). BOM charset encoding has to be used. | ||||||||||
HTTP-Header: Content-Type: application/xhtml+xml 1: EF BB BF<?xml version="1.0" encoding="UTF-16"?>
2: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
3: "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
4:
5: <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="de" lang="de">
6: <head>
7: <title>BOM != XML-AUTO</title>
8: </head>
9: <body />
10: </html>
| This document is UTF-16 encoded, but Byte Order Mark (BOM) specifies UTF-8. Some Error messages should be reported, because the document has to be validated with UTF-8 encoding. | ||||||||||
HTTP-Header: Content-Type: application/xhtml+xml 1: <?xml version="1.0" encoding="UTF-8"?>
2: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
3: "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
4:
5: <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="de" lang="de">
6: <head>
7: <title>XML != XML-Auto</title>
8: </head>
9: <body />
10: </html>
| The charset encoding statement of the XML-Declaration specifies an UTF-8 encoded document. In fact it is UTF-16 encoded and so some Error messages should be reported. | ||||||||||
HTTP-Header: Content-Type: application/xhtml+xml; charset=UTF-8 1: <?xml version="1.0" encoding="ISO-8859-1"?>
2: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
3: "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
4:
5: <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="de" lang="de">
6: <head>
7: <title>HTTP != XML</title>
8: </head>
9: <body>äöüÄÖÜß</body>
10: </html>
| This document is UTF-8 encoded. HTTP-Header charset encoding statement specifies the right encoding. The XML-Declaration charset encoding statement defines ISO-8859-1. HTTP-Header charset encoding UTF-8 has to be used, but a notice according tis conflict has to be reported. | ||||||||||
HTTP-Header: Content-Type: application/xhtml+xml; charset=ISO-8859-11: <?xml version="1.0"?> 2: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 3: "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 4: 5: <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="de" lang="de"> 6: <head> 7: <title>HTTP != AUTO</title> 8: </head> 9: <body>äöüÄÖÜß</body> 10: </html> | HTTP-Header charset encoding (ISO-8859-1) is different to the real encoding (UTF-16). Because ISO-8859-1 (from HTTP-Header) has to be used, some Error messages have to be reported. | ||||||||||
HTTP-Header: Content-Type: application/xhtml+xml; charset=ISO-8859-1 1: EF BB BF<?xml version="1.0"?>
2: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
3: "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
4:
5: <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="de" lang="de">
6: <head>
7: <title>HTTP != BOM</title>
8: </head>
9: <body />
10: </html>
| Byte Order Mark (BOM) defines the UTF-8 charset and the HTTP-Header charset encoding statement defines ISO-8859-1. BOM is right, but HTTP-Header charset has to be used. So some Error messages should be reported. | ||||||||||
| Datenschutzerklärung | |