`

Page-encoding specified in XML prolog (UTF-8) is different from

 
阅读更多

Page-encoding specified in XML prolog (UTF-8) is different from that specified in page directive (utf-8)

 把一個Web應用從Tomcat-5.0.28移植部署到Tomcat-6.0.16後,訪問頁面報錯:

    org.apache.jasper.JasperException:
    /default/header.jsp(1,1) Page-encoding specified in XML prolog (UTF-8) is different from that specified in page directive (utf-8)


    難道Tomcat6的編碼校驗有問題?在Tomcat5下面是沒有問題的。


    打開header.jsp後,發現如下寫法:

    <%@ page language="java"  pageEncoding="utf-8"%>

 

    把小寫的"utf-8"改為大寫的"UFT-8":

    <%@ page language="java"  pageEncoding="UFT-8"%>


    保存後,再次訪問,一切正常了。
拓展:討論Tomcat中JSP中文亂碼問題的原因和解決方法,根據網上的資料總結而成。

JSP 2.0 的 page 指令中有兩個屬性 contentType 和 pageEncoding
SCWCD Exam Study Kit 的敘述:
The contentType attribute specifies the MIME type and character encoding of the
output. The default value of the MIME type is text/html; the default value of the
character encoding is ISO-8859-1. The MIME type and character encoding are
separated by a semicolon, as shown here:

 

<%@ page contentType="text/html;charset=ISO-8859-1"%>

This is equivalent to writing the following line in a servlet:

response.setContentType("text/html;charset=ISO-8859-1");

The pageEncoding attribute specifies the character encoding of the JSP page. The
default value is ISO-8859-1. The following line illustrates the syntax:


<%@ page pageEncoding="ISO-8859-1" %>

下面是JSP 2.0 Spec 中 contentType 和 pageEncoding 的敘述:
contentType
Defines the MIME type and the character encoding for the
response of the JSP page, and is also used in determining the
character encoding of the JSP page.
Values are either of the form “TYPE” or “TYPE;charset=
CHARSET”with an optional white space after the “;”.
“TYPE” is a MIME type, see the IANA registry at
http://www.iana.org/assignments/media-types/index.html
for useful values. “CHARSET”, if present, must be the IANA name for
a character encoding.
The default value for “TYPE” is “text/html” for JSP pages in
standard syntax, or “text/xml” for JSP documents in XML
syntax. If “CHARSET” is not specified, the response
character encoding is determined as described in
Section JSP.4.2, “Response Character Encoding”.
See Chapter JSP.4 for complete details on character
encodings.

pageEncoding

Describes the character encoding for the JSP page. The value
is of the form “CHARSET”, which must be the IANA name
for a character encoding. For JSP pages in standard syntax,
the character encoding for the JSP page is the charset given
by the pageEncoding attriute if it is present, otherwise the
charset given by the contentType attribute if it is present,
otherwise “ISO-8859-1”.
For JSP documents in XML syntax, the character encoding
for the JSP page is determined as described in section 4.3.3
and appendix F.1 of the XML specification. The pageEncoding
attribute is not needed for such documents. It is a
translation-time error if a document names different
encodings in its XML prolog / text declaration and in the
pageEncoding attribute. The corresponding JSP
configuration element is page-encoding (see
Section JSP.3.3.4, “Declaring Page Encodings”).
See Chapter JSP.4 for complete details on character
encodings.
For JSP pages in standard syntax, the page character encoding is determined
from the following sources:

A JSP configuration element page-encoding value whose URL pattern matches
the page.

The pageEncoding attribute of the page directive of the page. It is a translation-
time error to name different encodings in the pageEncoding attribute of
the page directive of a JSP page and in a JSP configuration element whose
URL pattern matches the page.

The charset value of the contentType attribute of the page directive. This is
used to determine the page character encoding if neither a JSP configuration
element page-encoding nor the pageEncoding attribute are provided.

If none of the above is provided, ISO-8859-1 is used as the default character
encoding.
關於 contentType 和 pageEncoding 的差異 和 中文JSP頁的設定技巧:

contentType -- 指定的是JSP頁最終 Browser(客戶端)所見到的網頁內容的編碼.
就是 Mozilla的 Character encoding, 或者是 IE6的 encoding. 例如 JSPtw Forum 用的contentType就是 Big5.

pageEncoding -- 指定JSP編寫時所用的編碼
如果你的是 WIN98, 或 ME 的NOTEPAD記事本編寫JSP, 就一定是常用的是Big5 或 gb2312, 如果是用 WIN2k winXP的
NOTEPAD時, SAVE時就可以選擇不同的編,碼, 包括 ANSI(BIG5/GB2312)或 UTF-8 或 UNIONCODE(估是 UCS 16).

因為 JSP要經過 兩次的"編碼",
第一階段會用 pageEncoding, 第二階段會用 utf-8 至utf-8, 第三階段就是由TOMCAT出來的網頁, 用的是contentType.

階段一是 JSPC的 JSP至JAVA(.java)原碼的"翻譯", 它會跟據 pageEncoding 的設定讀取JSP. 結果是 由指定的
pageEncoding(utf-8,Big5,gb2312)的JSP 翻譯成統一的utf-8 JAVA原碼(.java). 如果pageEncoding設定錯了, 或沒設定
(預設 ISO8859-1), 出來的 在這個階段 就已是中文亂碼.

階段二是由 JAVAC的JAVA原碼至JAVA BYTECODE的編譯. 不論JSP的編寫時是用(utf-8,Big5,gb2312),經過階段一的結果全
都是utf-8的ENCODING的JAVA原碼.
JAVAC用 utf-8的ENCODING讀取AVA原碼, 編譯成字符串是 utf-8 ENCODING的二進制碼(.class). 這是 JAVA VIRTUAL MACNHINE
對常數字符串在 二進制碼(JAVA BYTECODE)內表逹的規範.

階段三是TOMCAT(或其的application container)加載和執行階段二得來的JAVA二進制碼, 輸出的結果( 也就是BROWSER(客戶端))
見到的. 這時一早隱藏在階段一和二的參數contentType, 就發揮了功效. (見 階段一的 ).

response.setContentType("text/html; charset=utf-8");

出來的可以是 utf-8, Big5, gb2312, 看的就是JSP ? contentType的設定.

<%@ page session="false" pageEncoding="big5" contentType="text/html; charset=utf-8" %>

還有, pageEncoding 和contentType的預設都是 ISO8859-1. 而隨便設定了其中一個, 另一個就跟著一樣了(TOMCAT4.1.27是如此).
但這不是絕對, 看的各自JSPC的處理方式. 而pageEncoding不等於contentType, 更有利亞洲區的文字 CJKV系JSP網頁的開發和展示,
(例pageEncoding=Big5 不等於 contentType=utf-8).

一個簡單的解決方法是在包含和被包含文件的開始部分都加上:

<%@ page contentType="text/html;charset=GB2312" language="java" %>

下面是一個示例:main.jsp

<%@ page contentType="text/html;charset=GB2312" language="java" %>
<html>
<head><title>測試頁</title></head>
<body>
<%@ include file="hello.jsp" %>
<b><p align="center"><font color="#ff0000">主頁中的表:</font></p></b>
<br>
<table width="98%" height="20" border="0" cellpadding="0" align="center" bgcolor="#99CCCC" cellspacing="0">
<tr>
<td align="center" valign="middle"><font color="wihte">
转自:http://yk1987.blog.hexun.com.tw/64083290_d.html 

 

分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics