#897 closed defect (wontfix)
two incorrect characters instead of one correct in PDF output
Reported by: | anonymous | Owned by: | Alec Thomas |
---|---|---|---|
Priority: | normal | Component: | PageToPdfPlugin |
Severity: | normal | Keywords: | UTF-8 |
Cc: | Trac Release: | 0.10 |
Description
Hi
I've checked out this plugin form subversion repository and it can't handle utf-8 encoded pages. Generates two characters instead of one correct. I've read previous posts on this topic and saw that it had been fixed, but it does not work for me. Thanks.
Attachments (0)
Change History (12)
comment:1 follow-up: 2 Changed 18 years ago by
comment:2 Changed 18 years ago by
Replying to coderanger:
What is your
default_charset
in trac.ini?
Hi. My trac.ini contains:
[trac] default_charset = UTF-8 [pagetopdf] charset = UTF-8
comment:4 Changed 18 years ago by
Replying to coderanger:
I think that should be
utf-8
(note the lower case).
Unfortunately, it doesn't work with lowercase, either.
Environment:
- CentOS 4.3 linux
- htmldoc 1.8.27
- trac-0.10
- Python 2.3.4
The text is in Hungarian with accented characters. Trac wiki works ok.
comment:6 Changed 18 years ago by
Replying to coderanger:
What encoding are you actually using for the text?
I'm not sure, I understand your question correctly... What do you mean? I use utf-8 default_charset in trac.ini, the default is utf-8 on my linux-box. Wiki pages are utf-8 texts in trac:
[root@dev tmp]# trac-admin /opt/trac/dia wiki export TestPage test [root@dev tmp]# file test test: UTF-8 Unicode text, with CRLF line terminators [root@dev tmp]#
comment:7 follow-up: 8 Changed 18 years ago by
Trac uses Unicode strings internally, but this doesn't mean your browser is actually sending UTF8. Not sure how you check this on a Linux box, though I would hope it takes the system charset.
comment:8 follow-up: 9 Changed 18 years ago by
Replying to coderanger:
Trac uses Unicode strings internally, but this doesn't mean your browser is actually sending UTF8. Not sure how you check this on a Linux box, though I would hope it takes the system charset.
utf-8 is default on linux boxes. htmldoc converts HTML to PDF, trac - I think - creates HTML page from wiki and gives it to htmldoc. The client's charset doesn't affect this process, as far as I know.
pagetopdf.py fragment:
hfile, hfilename = mkstemp('tracpdf') codepage = self.env.config.get('trac', 'default_charset', 0) page = wiki_to_html(source, self.env, req).encode(codepage) page = re.sub('<img src="(?!\w+://)', '<img src="%s://%s:%d' % (req.scheme, req.server_name, req.server_port), page) os.write(hfile, '<html><body>' + page + '</body></html>') os.close(hfile)
Trac logs this:
2006-11-12 16:47:04,174 Trac[pagetopdf] DEBUG: --right 1.5cm --bottom 1.5cm --webpage --top 1.5cm --format pdf14 --size A4 --charset utf-8 --left 1.5cm
comment:9 Changed 18 years ago by
utf-8 is default on linux boxes. htmldoc converts HTML to PDF, trac - I think - creates HTML page from wiki and gives it to htmldoc. The client's charset doesn't affect this process, as far as I know.
I changed the code to test other encoding (ISO-8859-2):
page = wiki_to_html(source, self.env, req).encode('iso-8859-2')
and
htmldoc_args = { 'webpage': None, 'format': 'pdf14', 'left': '1.5cm', 'right': '1.5cm', 'top': '1.5cm', 'bottom': '1.5cm', 'charset': '8859-2'}
I left defaul_charset as utf-8, since I want utf on my wiki. Only PDF generation is based on Latin2 encoding.
This way it works ok for iso-latin-2 accented characters (utf-8 would be better, but it'll do at this moment). HTMLDOC can't handle UTF-8 (but then how is it possible to work somewhere?).
Well, this is a workaround, but not for trac - for HTMLDOC.
comment:10 Changed 18 years ago by
Resolution: | → wontfix |
---|---|
Status: | new → closed |
UTF-8 is not supported by htmldoc. You must use one of the supported encodings.
comment:12 Changed 14 years ago by
Keywords: | UTF-8 added; utf8 removed |
---|
What is your
default_charset
in trac.ini?