linux - Python 2.7 unicode confusion again -


i've read this:

setting correct encoding when piping stdout in python

and i'm trying stick rule of thumb: "always use unicode internally. decode receive, , encode send."

so here's main file:

# coding: utf-8  import os import sys  myplugin import myplugin if __name__ == '__main__':     c = myplugin()     = unicode(open('myfile.txt').read().decode('utf8'))     print(c.generate(a).encode('utf8')) 

what getting on nerves that:

  • i read in utf8 file decode it.
  • then force convert unicode gives unicode(open('myfile.txt').read().decode('utf8'))
  • then try output terminal
  • on linux shell need re-encode utf8, , i guess normal because i'm working time on unicode string, output it, have re-encode in utf8 (correct me if i'm wrong here)
  • when run pycharm under windows, it's twice utf8 encoded, gives me things agréable, déjÃ. if remove encode('utf8') (which changes last line print(c.generate(a)) works pycharm, doesn't work anymore linux, get: 'ascii' codec can't encode character u'\xe9' in position blabla know problem.

if try in command line:

  • linux/shell ssh: import sys sys.stdout.encoding 'utf-8'
  • linux/shell in code: import sys sys.stdout.encoding none wtf??
  • windows/pycharm: import sys sys.stdout.encoding 'windows-1252'

what best way code works on both environments?

you're philosophy correct you're on complicating things , making code brittle.

open files in text mode automatically convert unicode you. print without encoding - print supposed work out correct encoding.

if linux environment isn't set correctly, set pythonioencoding=utf-8 in linux environment vars (export pythonioencoding=utf-8) fix issues during print. should consider setting locale utf-8 variation such en_gb.utf-8 avoid having define pythonioencoding.

pycharm should work without modification.

your code should like:

import os import sys import io  myplugin import myplugin  if __name__ == '__main__':     c = myplugin()     # t default     io.open('myfile.txt', 'rt', encoding='utf-8') myfile:         # unicode string         = myfile.read()      result = c.generate(a)     print result 

if you're using python 3.x, drop import io , io. io.open().


Comments

Popular posts from this blog

html - Outlook 2010 Anchor (url/address/link) -

javascript - Why does running this loop 9 times take 100x longer than running it 8 times? -

Getting gateway time-out Rails app with Nginx + Puma running on Digital Ocean -