As I hinted in an earlier post about my move to WordPress, one of the reasons for that decision was to help me learn PHP. I am well on my way and I have now created one client site with PHP (simple stuff server-side so it was easy) and I am now in the process of converting my business site (webfocusdesign.com) which is slightly more complex but still easy. The primary reason for that move is that I want to convert the blogs there to Wordpress (2 blogs because i need one in French and one in English) and want to move the site to Mediatemple (mt). For now, the process has been… frustrating…
Fighting With Charset and Encoding in PHP
Coming from ColdFusion, I must say that I’m used to having an easier time getting things done and for simple/obvious things to just work out of the box. For a good part of the evening last night and part of this morning, I’ve been fighting to get French accented characters to display correctly in the PHP version of webfocusdesign.com. Before going further, you need to know that the reason I use CF or PHP even on the simplest sites is to use includes for code reuse. I use includes to be able to control all the code in a document’s <head> for example from one file and I set page specific data like titles, meta-description and meta-keywords using variables I set on each page.
Now what does this have to do with accented characters display? Well, I usually place the <meta http-equiv=”Content-Type” content=”text/html; charset=utf-8″ /> meta tag in that head include. Then, in ColdFusion, I add the <cfprocessingdirective pageencoding=”utf-8″> tag as the first element on all pages and it just works. The meta tag doesn’t actually have any effect but if I remove the cfprocessingdirective, the display of accented characters is garbled. I put it back it’s OK. Direct correlation, just works as expected.
So, in PHP, I tried the same strategy, I used the same include file (including the Content-Type meta tag), changed the few CF bits of code to their PHP equivalents and I placed the PHP header(‘Content-Type: text/html; charset=utf-8’); command as the first line of code on my page (after the opening <?php of course). It didn’t work… or I should say, it half worked. All the straight HTML text that came from other includes (footer, side column repeating elements, etc) displayed OK. But the actual HTML text on the page (which didn’t come from an include, was not output by a PHP command and didn’t come from a database) displayed garbled accented characters. I was stumped (and still am).
Google was absolutely no help when I searched for PHP charset and encoding issues as pretty much all the info I found on the matter dealt with database output or PHP string manipulation commands and getting those to display as UTF-8. But it was my pure HTML text that displayed wrong. I was getting REALLY frustrated.
Then, I thought of comparing the code to my just published client site (100% French with no display issues) and realized that, for reasons I can’t remember as I coded the basic templates months ago, I’d placed the <meta http-equiv=”Content-Type” content=”text/html; charset=utf-8″ /> meta tag directly on the page (and not in an include) right after the opening <head> tag which is the SAME EXACT SPOT it ends up at anyway as it was the first line of code in my include. The interpreted/processed code you see in the browser in a view source looks exactly the same. I then had a lightbulb moment and looked for encoding issues related to Dreamweaver since it’s my primary coding environment. I found out the following. If it can help someone else my frustration won’t have been in vain…
The Solution?
On the files that work correctly (Content-Type meta tag directly in file and not in my head include), if I go to Modify>Page Properties in Dreamweaver and look at the Title/Encoding section, the encoding reads/is set as Unicode (UTF-8). For pages with the meta tag in the include, Dreamweaver doesn’t see it so it defaults to “Western-European” or something other than UTF-8 anyway.
So my guess is that (thinking as I’m writing), Dreamweaver probably saves the file differently and Apache or PHP probably serve it differently and no matter what the meta tag or headers say, accented characters displayed wrong. But what still has me perplexed is that, if I do the same test on the ColdFusion version of the index file in the site, Dreamweaver sees it as UTF-8 even if the meta tag is “hidden” in an include. My guess here is that, I probably had saved that file a few times before moving the code to an include as I was first developing that site. I usually build the first “template” of a new site without includes but, for the PHP version, it was just a matter of converting to PHP so most of the work was already done.
If anyone can shed some more light on this, please feel free!
Good guess!
Dw, indeed, saves a file that does *not* have an encoding specified in it (using a META tag), with a default encoding, set up in Edit > Preferences > New Document > Default Encoding.
Also, good to know that Dw will open an existing file according to this same setting (*Use when opening existing files that don’t specify an encoding*), if the file has no META tag for encoding in it. It will then save the file according to this setting!
And also, if the file has a “hidden” encoding specified (for example, a META tag inside an HTML comment), it will open and/or save the file according to the “hidden” meta tag. This can help you (or not), when working with files that cannot have a visible META tag for encoding in them, and yet, you would not like to open them as UTF-8…
Hope this helps a bit! 🙂