View | Details | Raw Unified | Return to bug 46493
Collapse All | Expand All

(-)unicode.orig/./debian/control (-1 / +2 lines)
Lines 2-13 Source: unicode Link Here
2
Section: utils
2
Section: utils
3
Priority: optional
3
Priority: optional
4
Maintainer: Radovan Garabík <garabik@kassiopeia.juls.savba.sk>
4
Maintainer: Radovan Garabík <garabik@kassiopeia.juls.savba.sk>
5
Build-Depends: debhelper (>= 4), dh-python
5
Build-Depends: debhelper (>= 4), dh-python, python3
6
Standards-Version: 4.3.0
6
Standards-Version: 4.3.0
7
7
8
Package: unicode
8
Package: unicode
9
Architecture: all
9
Architecture: all
10
Depends: ${misc:Depends}, ${python3:Depends}
10
Depends: ${misc:Depends}, ${python3:Depends}
11
Suggests: bzip2
11
Recommends: unicode-data
12
Recommends: unicode-data
12
Description: display unicode character properties
13
Description: display unicode character properties
13
 unicode is a simple command line utility that displays
14
 unicode is a simple command line utility that displays
(-)unicode.orig/./debian/copyright (-1 / +1 lines)
Lines 7-11 The sources and package can be downloade Link Here
7
http://kassiopeia.juls.savba.sk/~garabik/software/unicode/
7
http://kassiopeia.juls.savba.sk/~garabik/software/unicode/
8
8
9
9
10
Copyright: © 2003-2016 Radovan Garabík <garabik @ kassiopeia.juls.savba.sk>
10
Copyright: © 2003-2022 Radovan Garabík <garabik @ kassiopeia.juls.savba.sk>
11
released under GPL v3, see /usr/share/common-licenses/GPL
11
released under GPL v3, see /usr/share/common-licenses/GPL
(-)unicode.orig/./debian/changelog (+15 lines)
Lines 1-3 Link Here
1
unicode (2.9-1) unstable; urgency=low
2
3
  * better protection against changed/corrpupted data files (closes: #932846)
4
5
 -- Radovan Garabík <garabik@kassiopeia.juls.savba.sk  Fri, 03 Jun 2022 16:09:26 +0200
6
7
unicode (2.8-1) unstable; urgency=low
8
9
  * display ASCII table (either traditional or the EU–UK Trade and Cooperation
10
    Agreement version)
11
  * tidy up manpage (closes: #972047) (closes:#972063)
12
  * fix decoding paracode arguments (closes: #939196)
13
14
 -- Radovan Garabík <garabik@kassiopeia.juls.savba.sk>  Wed, 30 Dec 2020 17:13:32 +0100
15
1
unicode (2.7-1) unstable; urgency=low
16
unicode (2.7-1) unstable; urgency=low
2
17
3
  * add East Asian width
18
  * add East Asian width
(-)unicode.orig/./paracode.1 (-19 / +20 lines)
Lines 4-49 Link Here
4
paracode \- command line Unicode conversion tool
4
paracode \- command line Unicode conversion tool
5
.SH SYNOPSIS
5
.SH SYNOPSIS
6
.B paracode
6
.B paracode
7
.RI [ -t tables ] 
7
.RB [ \-t
8
.IR tables ]
8
string
9
string
9
.SH DESCRIPTION
10
.SH DESCRIPTION
10
This manual page documents the
11
This manual page documents the
11
.B paracode
12
.B paracode
12
command.
13
command.
13
.PP
14
.PP
14
\fBparacode\fP exploits the full power of the Unicode standard to convert the text
15
\fBparacode\fP exploits the full power of the Unicode standard to convert
15
into visually similar stream of glyphs, while using completely different codepoints.
16
the text into visually similar stream of glyphs, while using completely
16
It is an excellent didactic tool demonstrating the principles and advanced use of
17
different codepoints.
17
the Unicode standard.
18
It is an excellent didactic tool demonstrating the principles and advanced
19
use of the Unicode standard.
18
.PP
20
.PP
19
\fBparacode\fP is a command line tool working as
21
\fBparacode\fP is a command line tool working as
20
a filter, reading standard input in UTF-8 encoding and writing to
22
a filter, reading standard input in UTF-8 encoding and writing to
21
standard output.
23
standard output.
22
24
.
23
.SH OPTIONS
25
.SH OPTIONS
24
.TP
26
.TP
25
.BI \-t tables
27
.BI \-t tables
26
.BI \-\-tables
28
.BI \-\-tables tables
27
29
28
Use given list of conversion tables, separated by a plus sign.
30
Use given list of conversion tables, separated by a plus sign.
29
31
30
Special name 'all' selects all the tables.
32
Special name 'all' selects all the tables.
31
33
32
Note that selecting 'other', 'cyrillic_plus' and 'cherokee' tables (and 'all') 
34
Note that selecting 'other', 'cyrillic_plus' and 'cherokee' tables (and 'all')
33
makes use of rather esoteric characters, and not all fonts contain them.
35
makes use of rather esoteric characters, and not all fonts contain them.
34
36
35
36
Special table 'mirror' uses quite different character substitution,
37
Special table 'mirror' uses quite different character substitution,
37
is not selected automatically with 'all' and does not work well
38
is not selected automatically with 'all' and does not work well
38
with anything except plain ascii alphabetical characters.
39
with anything except plain ascii alphabetical characters.
39
40
40
Example:
41
Example:
41
42
42
paracode -t cyrillic+greek+cherokee
43
paracode \-t cyrillic+greek+cherokee
43
44
44
paracode -t cherokee  <input >output
45
paracode \-t cherokee  <input >output
45
46
46
paracode -r -t mirror  <input >output
47
paracode \-r \-t mirror  <input >output
47
48
48
49
49
50
Lines 60-75 other Link Here
60
cherokee
61
cherokee
61
62
62
all
63
all
63
64
.
64
.TP
65
.TP
65
.BI \-r
66
.B \-r
66
67
Display text in reverse order after conversion, best used together with -t mirror.
68
67
68
Display text in reverse order after conversion,
69
best used together with \-t mirror.
70
.
69
.SH SEE ALSO
71
.SH SEE ALSO
70
iconv(1)
72
.BR iconv (1)
71
73
.
72
73
.SH AUTHOR
74
.SH AUTHOR
74
Radovan Garab\('ik <garabik @ kassiopeia.juls.savba.sk>
75
Radovan Garab\('ik <garabik @ kassiopeia.juls.savba.sk>
75
76
(-)unicode.orig/./unicode.1 (-61 / +77 lines)
Lines 4-10 Link Here
4
unicode \- command line unicode database query tool
4
unicode \- command line unicode database query tool
5
.SH SYNOPSIS
5
.SH SYNOPSIS
6
.B unicode
6
.B unicode
7
.RI [ options ] 
7
.RI [ options ]
8
string
8
string
9
.SH DESCRIPTION
9
.SH DESCRIPTION
10
This manual page documents the
10
This manual page documents the
Lines 15-90 command. Link Here
15
15
16
.SH OPTIONS
16
.SH OPTIONS
17
.TP
17
.TP
18
.BI \-h 
18
.B \-h
19
.BI \-\-help 
19
.B \-\-help
20
20
21
Show help and exit.
21
Show help and exit.
22
22
23
.TP
23
.TP
24
.BI \-x
24
.B \-x
25
.BI \-\-hexadecimal
25
.B \-\-hexadecimal
26
26
27
Assume 
27
Assume
28
.I string
28
.I string
29
to be a hexadecimal number 
29
to be a hexadecimal number
30
30
31
.TP
31
.TP
32
.BI \-d
32
.B \-d
33
.BI \-\-decimal
33
.B \-\-decimal
34
34
35
Assume 
35
Assume
36
.I string
36
.I string
37
to be a decimal number 
37
to be a decimal number
38
38
39
.TP
39
.TP
40
.BI \-o
40
.B \-o
41
.BI \-\-octal
41
.B \-\-octal
42
42
43
Assume 
43
Assume
44
.I string
44
.I string
45
to be an octal number 
45
to be an octal number
46
46
47
.TP
47
.TP
48
.BI \-b
48
.B \-b
49
.BI \-\-binary
49
.B \-\-binary
50
50
51
Assume 
51
Assume
52
.I string
52
.I string
53
to be a binary number 
53
to be a binary number
54
54
55
.TP
55
.TP
56
.BI \-r
56
.B \-r
57
.BI \-\-regexp
57
.B \-\-regexp
58
58
59
Assume 
59
Assume
60
.I string
60
.I string
61
to be a regular expression
61
to be a regular expression
62
62
63
.TP
63
.TP
64
.BI \-s
64
.B \-s
65
.BI \-\-string
65
.B \-\-string
66
66
67
Assume 
67
Assume
68
.I string
68
.I string
69
to be a sequence of characters
69
to be a sequence of characters
70
70
71
.TP
71
.TP
72
.BI \-a
72
.B \-a
73
.BI \-\-auto
73
.B \-\-auto
74
74
75
Try to guess type of
75
Try to guess type of
76
.I string
76
.I string
77
from one of the above (default)
77
from one of the above (default)
78
78
79
.TP
79
.TP
80
.BI \-mMAXCOUNT
80
.BI \-m MAXCOUNT
81
.BI \-\-max=MAXCOUNT
81
.BI \-\-max= MAXCOUNT
82
82
83
Maximal number of codepoints to display, default: 20; use 0 for unlimited
83
Maximal number of codepoints to display, default: 20; use 0 for unlimited
84
84
85
.TP
85
.TP
86
.BI \-iCHARSET
86
.BI \-i CHARSET
87
.BI \-\-io=IOCHARSET
87
.BI \-\-io= IOCHARSET
88
88
89
I/O character set. For maximal pleasure, run \fBunicode\fP on UTF-8
89
I/O character set. For maximal pleasure, run \fBunicode\fP on UTF-8
90
capable terminal and specify IOCHARSET to be UTF-8. \fBunicode\fP
90
capable terminal and specify IOCHARSET to be UTF-8. \fBunicode\fP
Lines 92-99 tries to guess this value from your loca Link Here
92
locale, you should not need to specify it.
92
locale, you should not need to specify it.
93
93
94
.TP
94
.TP
95
.BI \-\-fcp=CHARSET
95
.BI \-\-fcp= CHARSET
96
.BI \-\-fromcp=CHARSET
96
.BI \-\-fromcp= CHARSET
97
97
98
Convert numerical arguments from this encoding, default: no conversion.
98
Convert numerical arguments from this encoding, default: no conversion.
99
Multibyte encodings are supported. This is ignored for non-numerical
99
Multibyte encodings are supported. This is ignored for non-numerical
Lines 101-119 arguments. Link Here
101
101
102
102
103
.TP
103
.TP
104
.BI \-cADDCHARSET
104
.BI \-c ADDCHARSET
105
.BI \-\-charset\-add=ADDCHARSET
105
.BI \-\-charset\-add= ADDCHARSET
106
106
107
Show hexadecimal reprezentation of displayed characters in this additional charset.
107
Show hexadecimal reprezentation of displayed characters in this additional charset.
108
108
109
.TP
109
.TP
110
.BI \-CUSE_COLOUR
110
.BI \-C USE_COLOUR
111
.BI \-\-colour=USE_COLOUR
111
.BI \-\-colour= USE_COLOUR
112
112
113
USE_COLOUR is one of
113
USE_COLOUR is one of
114
.I on
114
.B on
115
.I off
115
.B off
116
.I auto
116
.B auto
117
117
118
.B \-\-colour=on
118
.B \-\-colour=on
119
will use ANSI colour codes to colourise the output
119
will use ANSI colour codes to colourise the output
Lines 121-170 will use ANSI colour codes to colourise Link Here
121
.B \-\-colour=off
121
.B \-\-colour=off
122
won't use colours.
122
won't use colours.
123
123
124
.B \-\-colour=auto 
124
.B \-\-colour=auto
125
will test if standard output is a tty, and use colours only when it is.
125
will test if standard output is a tty, and use colours only when it is.
126
126
127
.BI \-\-color
127
.B \-\-color
128
is a synonym of
128
is a synonym of
129
.BI \-\-colour
129
.B \-\-colour
130
130
131
.TP
131
.TP
132
.BI \-v
132
.B \-v
133
.BI \-\-verbose
133
.B \-\-verbose
134
134
135
Be more verbose about displayed characters, e.g. display Unihan information, if available.
135
Be more verbose about displayed characters, e.g. display Unihan information, if available.
136
136
137
.TP
137
.TP
138
.BI \-w
138
.B \-w
139
.BI \-\-wikipedia
139
.B \-\-wikipedia
140
140
141
Spawn browser pointing to English Wikipedia entry about the character.
141
Spawn browser pointing to English Wikipedia entry about the character.
142
142
143
.TP
143
.TP
144
.BI \-\-wt
144
.B \-\-wt
145
.BI \-\-wiktionary
145
.B \-\-wiktionary
146
146
147
Spawn browser pointing to English Wiktionary entry about the character.
147
Spawn browser pointing to English Wiktionary entry about the character.
148
148
149
.TP
149
.TP
150
.BI \-\-brief
150
.B \-\-brief
151
151
152
Display character information in brief format
152
Display character information in brief format
153
153
154
.TP
154
.TP
155
.BI \-\-format=fmt
155
.BI \-\-format= fmt
156
156
157
Use your own format for character information display. See the README for details.
157
Use your own format for character information display. See the README for details.
158
158
159
160
.TP
159
.TP
161
.BI \-\-list
160
.B \-\-list
162
161
163
List (approximately) all known encodings.
162
List (approximately) all known encodings.
164
163
164
.TP
165
.B \-\-download
166
167
Try to download UnicodeData.txt into ~/.unicode/
168
169
.TP
170
.B \-\-ascii
171
172
Display ASCII table
173
174
.TP
175
.B \-\-brexit\-ascii
176
.B \-\-brexit
177
178
Display ASCII table (EU–UK Trade and Cooperation Agreement 2020 version)
179
180
165
.SH USAGE
181
.SH USAGE
166
182
167
\fBunicode\fP tries to guess the type of an argument. In particular, 
183
\fBunicode\fP tries to guess the type of an argument. In particular,
168
if the arguments looks like a valid hexadecimal representation of a
184
if the arguments looks like a valid hexadecimal representation of a
169
Unicode codepoint, it will be considered to be such. Using
185
Unicode codepoint, it will be considered to be such. Using
170
186
Lines 174-180 will display information about U+FACE CJ Link Here
174
and it will not search for 'face' in character descriptions \- for the latter,
190
and it will not search for 'face' in character descriptions \- for the latter,
175
use:
191
use:
176
192
177
\fBunicode\fP -r face
193
\fBunicode\fP \-r face
178
194
179
195
180
For example, you can use any of the following to display information
196
For example, you can use any of the following to display information
Lines 191-216 about U+00E1 LATIN SMALL LETTER A WITH Link Here
191
207
192
You can specify a range of characters as argumets, \fBunicode\fP will
208
You can specify a range of characters as argumets, \fBunicode\fP will
193
show these characters in nice tabular format, aligned to 256-byte boundaries.
209
show these characters in nice tabular format, aligned to 256-byte boundaries.
194
Use two dots ".." to indicate the range, e.g. 
210
Use two dots ".." to indicate the range, e.g.
195
211
196
\fBunicode\fP 0450..0520
212
\fBunicode\fP 0450..0520
197
213
198
will display the whole cyrillic and hebrew blocks (characters from U+0400 to U+05FF)
214
will display the whole cyrillic and hebrew blocks (characters from U+0400 to U+05FF)
199
215
200
\fBunicode\fP 0400.. 
216
\fBunicode\fP 0400..
201
217
202
will display just characters from U+0400 up to U+04FF
218
will display just characters from U+0400 up to U+04FF
203
219
204
Use --fromcp to query codepoints from other encodings:
220
Use \-\-fromcp to query codepoints from other encodings:
205
221
206
\fBunicode\fP --fromcp cp1250 -d 200
222
\fBunicode\fP \-\-fromcp cp1250 \-d 200
207
223
208
Multibyte encodings are supported:
224
Multibyte encodings are supported:
209
\fBunicode\fP --fromcp big5 -x aff3
225
\fBunicode\fP \-\-fromcp big5 \-x aff3
210
226
211
and multi-char strings are supported, too:
227
and multi-char strings are supported, too:
212
228
213
\fBunicode\fP --fromcp utf-8 -x c599c3adc5a5
229
\fBunicode\fP \-\-fromcp utf-8 \-x c599c3adc5a5
214
230
215
.SH BUGS
231
.SH BUGS
216
Tabular format does not deal well with full-width, combining, control
232
Tabular format does not deal well with full-width, combining, control
(-)unicode.orig/./unicode (-12 / +125 lines)
Lines 1-9 Link Here
1
#!/usr/bin/python3
1
#!/usr/bin/python3
2
2
3
from __future__ import unicode_literals
3
from __future__ import unicode_literals, print_function
4
4
5
import os, glob, sys, unicodedata, locale, gzip, re, traceback, encodings, io, codecs
5
import os, glob, sys, unicodedata, locale, gzip, re, traceback, encodings, io, codecs, shutil
6
import webbrowser, textwrap, struct
6
import webbrowser, textwrap, struct
7
7
#from pprint import pprint
8
#from pprint import pprint
8
9
9
# bz2 was introduced in 2.3, but we want this to work even if for some
10
# bz2 was introduced in 2.3, but we want this to work even if for some
Lines 31-36 if PY3: Link Here
31
    import subprocess as cmd
32
    import subprocess as cmd
32
    from urllib.parse import quote as urlquote
33
    from urllib.parse import quote as urlquote
33
    import io
34
    import io
35
    from urllib.request import urlopen
34
36
35
    def out(*args):
37
    def out(*args):
36
        "pring args, converting them to output charset"
38
        "pring args, converting them to output charset"
Lines 50-55 else: # python2 Link Here
50
    import commands as cmd
52
    import commands as cmd
51
53
52
    from urllib import quote as urlquote
54
    from urllib import quote as urlquote
55
    from urllib import urlopen
53
56
54
    def out(*args):
57
    def out(*args):
55
        "pring args, converting them to output charset"
58
        "pring args, converting them to output charset"
Lines 66-72 else: # python2 Link Here
66
69
67
from optparse import OptionParser
70
from optparse import OptionParser
68
71
69
VERSION='2.7'
72
VERSION='2.9'
70
73
71
74
72
# list of terminals that support bidi
75
# list of terminals that support bidi
Lines 230-238 def get_unicode_blocks_descriptions(): Link Here
230
    for line in f:
233
    for line in f:
231
        if line.startswith('#') or ';' not in line or '..' not in line:
234
        if line.startswith('#') or ';' not in line or '..' not in line:
232
            continue
235
            continue
233
        ran, desc = line.split(';')
236
        spl = line.split(';', 1)
237
        ran, desc = spl
234
        desc = desc.strip()
238
        desc = desc.strip()
235
        low, high = ran.split('..')
239
        low, high = ran.split('..', 1)
236
        low = int(low, 16)
240
        low = int(low, 16)
237
        high = int(high, 16)
241
        high = int(high, 16)
238
        unicodeblocks[ (low,high) ] = desc
242
        unicodeblocks[ (low,high) ] = desc
Lines 256-262 def get_unicode_properties(ch): Link Here
256
        proplist = ['codepoint', 'name', 'category', 'combining', 'bidi', 'decomposition', 'dummy', 'digit_value', 'numeric_value', 'mirrored', 'unicode1name', 'iso_comment', 'uppercase', 'lowercase', 'titlecase']
260
        proplist = ['codepoint', 'name', 'category', 'combining', 'bidi', 'decomposition', 'dummy', 'digit_value', 'numeric_value', 'mirrored', 'unicode1name', 'iso_comment', 'uppercase', 'lowercase', 'titlecase']
257
        for i, prop in enumerate(proplist):
261
        for i, prop in enumerate(proplist):
258
            if prop!='dummy':
262
            if prop!='dummy':
259
                properties[prop] = fields[i]
263
                if i<len(fields):
264
                    properties[prop] = fields[i]
260
        if properties['lowercase']:
265
        if properties['lowercase']:
261
            properties['lowercase'] = chr(int(properties['lowercase'], 16))
266
            properties['lowercase'] = chr(int(properties['lowercase'], 16))
262
        if properties['uppercase']:
267
        if properties['uppercase']:
Lines 330-338 def get_unihan_properties_internal(ch): Link Here
330
            line = l.strip()
335
            line = l.strip()
331
            if not line:
336
            if not line:
332
                continue
337
                continue
333
            char, key, value = line.strip().split('\t')
338
            spl = line.strip().split('\t')
339
            if len(spl) != 3:
340
                continue
341
            char, key, value = spl
334
            if int(char[2:], 16) == ch:
342
            if int(char[2:], 16) == ch:
335
                properties[key] = value.decode('utf-8')
343
                properties[key] = value
336
            elif int(char[2:], 16)>ch:
344
            elif int(char[2:], 16)>ch:
337
                break
345
                break
338
    return properties
346
    return properties
Lines 412-417 def OpenGzip(fname): Link Here
412
        fo = codecs.getreader('utf-8')(fo)
420
        fo = codecs.getreader('utf-8')(fo)
413
        return fo
421
        return fo
414
422
423
def get_unicode_cur_version():
424
    # return current version of the Unicode standard, hardwired for now
425
    return '14.0.0'
426
427
def get_unicodedata_url():
428
    unicode_version = get_unicode_cur_version()
429
    url = 'http://www.unicode.org/Public/{}/ucd/UnicodeData.txt'.format(unicode_version)
430
    return url
431
432
def download_unicodedata():
433
    url = get_unicodedata_url()
434
    out('Downloading UnicodeData.txt from ', url, '\n')
435
    HomeDir = os.path.expanduser('~/.unicode')
436
    HomeUnicodeData = os.path.join(HomeDir, "UnicodeData.txt.gz")
437
438
    # we want to minimize the chance of leaving a corrupted file around
439
    tmp_file = HomeUnicodeData+'.tmp'
440
    try:
441
        if not os.path.exists(HomeDir):
442
            os.makedirs(HomeDir)
443
        response = urlopen(url)
444
        r = response.getcode()
445
        if r != 200:
446
            # this is handled automatically in python3, the exception will be raised by urlopen
447
            raise IOError('HTTP response code '+str(r))
448
        if os.path.exists(HomeUnicodeData):
449
            out(HomeUnicodeData, ' already exists, but downloading as requested\n')
450
        out('downloading...')
451
        shutil.copyfileobj(response, gzip.open(tmp_file, 'wb'))
452
        shutil.move(tmp_file, HomeUnicodeData)
453
        out(HomeUnicodeData, ' downloaded\n')
454
    finally:
455
        if os.path.exists(tmp_file):
456
            os.remove(tmp_file)
457
415
def GrepInNames(pattern, prefill_cache=False):
458
def GrepInNames(pattern, prefill_cache=False):
416
    f = None
459
    f = None
417
    for name in UnicodeDataFileNames:
460
    for name in UnicodeDataFileNames:
Lines 428-437 def GrepInNames(pattern, prefill_cache=F Link Here
428
Cannot find UnicodeData.txt, please place it into
471
Cannot find UnicodeData.txt, please place it into
429
/usr/share/unidata/UnicodeData.txt,
472
/usr/share/unidata/UnicodeData.txt,
430
/usr/share/unicode/UnicodeData.txt, ~/.unicode/ or current
473
/usr/share/unicode/UnicodeData.txt, ~/.unicode/ or current
431
working directory (optionally you can gzip it).
474
working directory (optionally you can gzip, bzip2 or xz it).
432
Without the file, searching will be much slower.
475
Without the file, searching will be much slower.
433
476
434
""" )
477
You can donwload the file from {} (or replace {} with current Unicode version); or run {} --download
478
479
""".format(get_unicodedata_url(), get_unicode_cur_version(), sys.argv[0]))
435
480
436
    if prefill_cache:
481
    if prefill_cache:
437
        if f:
482
        if f:
Lines 635-641 def print_characters(clist, maxcount, fo Link Here
635
        if maxcount:
680
        if maxcount:
636
            counter += 1
681
            counter += 1
637
        if counter > options.maxcount:
682
        if counter > options.maxcount:
638
            out("\nToo many characters to display, more than %s, use --max 0 (or other value) option to change it\n" % options.maxcount)
683
            sys.stdout.flush()
684
            sys.stderr.write("\nToo many characters to display, more than %s, use --max 0 (or other value) option to change it\n" % options.maxcount)
639
            return
685
            return
640
        properties = get_unicode_properties(c)
686
        properties = get_unicode_properties(c)
641
        ordc = ord(c)
687
        ordc = ord(c)
Lines 809-814 def is_range(s, typ): Link Here
809
def unescape(s):
855
def unescape(s):
810
    return s.replace(r'\n', '\n')
856
    return s.replace(r'\n', '\n')
811
857
858
ascii_cc_names = ('NUL', 'SOH', 'STX', 'ETX', 'EOT', 'ENQ', 'ACK', 'BEL', 'BS', 'HT', 'LF', 'VT', 'FF', 'CR', 'SO', 'SI', 'DLE', 'DC1', 'DC2', 'DC3', 'DC4', 'NAK', 'SYN', 'ETB', 'CAN', 'EM', 'SUB', 'ESC', 'FS', 'GS', 'RS', 'US')
859
860
def display_ascii_table():
861
    print('Dec Hex    Dec Hex    Dec Hex  Dec Hex  Dec Hex  Dec Hex   Dec Hex   Dec Hex')
862
    for row in range(0, 16):
863
        for col in range(0, 8):
864
            cp = 16*col+row
865
            ch = chr(cp) if 32<=cp else ascii_cc_names[cp]
866
            ch = 'DEL' if cp==127 else ch
867
            frm = '{:3d} {:02X} {:2s}'
868
            if cp < 32:
869
                frm = '{:3d} {:02X} {:4s}'
870
            elif cp >= 96:
871
                frm = '{:4d} {:02X} {:2s}'
872
            cell = frm.format(cp, cp, ch)
873
            print(cell, end='')
874
        print()
875
876
brexit_ascii_diffs = {
877
 30: ' ',
878
 31: ' ',
879
 34: "'",
880
123: '{}{',
881
125: '}}',
882
127: ' ',
883
128: ' ',
884
129: ' ',
885
        }
886
887
def display_brexit_ascii_table():
888
    print(' + | 0    1    2    3    4    5    6    7    8    9')
889
    print('---+-----------------------------------------------')
890
    for row in range(30, 130, 10):
891
        print('{:3d}'.format(row), end='|')
892
        for col in range(0, 10):
893
            cp = col+row
894
            ch = brexit_ascii_diffs.get(cp, chr(cp))
895
            cell = ' {:3s} '.format(ch)
896
            print(cell, end='')
897
        print()
898
899
900
812
format_string_default = '''{yellow}{bold}U+{ordc:04X} {name}{default}
901
format_string_default = '''{yellow}{bold}U+{ordc:04X} {name}{default}
813
{green}UTF-8:{default} {utf8} {green}UTF-16BE:{default} {utf16be} {green}Decimal:{default} {decimal} {green}Octal:{default} {octal}{opt_additional}
902
{green}UTF-8:{default} {utf8} {green}UTF-16BE:{default} {utf16be} {green}Decimal:{default} {decimal} {green}Octal:{default} {octal}{opt_additional}
814
{pchar}{opt_flipcase}{opt_uppercase}{opt_lowercase}
903
{pchar}{opt_flipcase}{opt_uppercase}{opt_lowercase}
Lines 880-889 def main(): Link Here
880
          action="store", dest="format_string", type="string",
969
          action="store", dest="format_string", type="string",
881
          default=format_string_default,
970
          default=format_string_default,
882
          help="formatting string")
971
          help="formatting string")
883
    parser.add_option("--brief", "--terse",
972
    parser.add_option("--brief", "--terse", "--br",
884
          action="store_const", dest="format_string",
973
          action="store_const", dest="format_string",
885
          const='{pchar} U+{ordc:04X} {name}\n',
974
          const='{pchar} U+{ordc:04X} {name}\n',
886
          help="Brief format")
975
          help="Brief format")
976
    parser.add_option("--download",
977
          action="store_const", dest="download_unicodedata",
978
          const=True,
979
          help="Try to dowload UnicodeData.txt")
980
    parser.add_option("--ascii",
981
          action="store_const", dest="ascii_table",
982
          const=True,
983
          help="Display ASCII table")
984
    parser.add_option("--brexit-ascii", "--brexit",
985
          action="store_const", dest="brexit_ascii_table",
986
          const=True,
987
          help="Display ASCII table (EU-UK Trade and Cooperation Agreement version)")
887
988
888
    global options
989
    global options
889
    (options, arguments) = parser.parse_args()
990
    (options, arguments) = parser.parse_args()
Lines 899-904 def main(): Link Here
899
        print (textwrap.fill(' '.join(all_encodings)))
1000
        print (textwrap.fill(' '.join(all_encodings)))
900
        sys.exit()
1001
        sys.exit()
901
1002
1003
    if options.ascii_table:
1004
        display_ascii_table()
1005
        sys.exit()
1006
1007
    if options.brexit_ascii_table:
1008
        display_brexit_ascii_table()
1009
        sys.exit()
1010
1011
    if options.download_unicodedata:
1012
        download_unicodedata()
1013
        sys.exit()
1014
902
    if len(arguments)==0:
1015
    if len(arguments)==0:
903
        parser.print_help()
1016
        parser.print_help()
904
        sys.exit()
1017
        sys.exit()
(-)unicode.orig/./setup.py (-1 / +1 lines)
Lines 8-14 os.chdir(os.path.abspath(os.path.dirname Link Here
8
8
9
9
10
setup(name='unicode',
10
setup(name='unicode',
11
      version='2.7',
11
      version='2.8',
12
      scripts=['unicode', 'paracode'],
12
      scripts=['unicode', 'paracode'],
13
#      entry_points={'console_scripts': [
13
#      entry_points={'console_scripts': [
14
#          'unicode = unicode:main',
14
#          'unicode = unicode:main',
(-)unicode.orig/./paracode (-1 / +1 lines)
Lines 201-207 def main(): Link Here
201
    (options, args) = parser.parse_args()
201
    (options, args) = parser.parse_args()
202
202
203
    if args:
203
    if args:
204
        to_convert = ' '.join(args).decode('utf-8')
204
        to_convert = decode(' '.join(args), 'utf-8')
205
    else:
205
    else:
206
        to_convert = None
206
        to_convert = None
207
207
(-)unicode.orig/./README (-1 / +1 lines)
Lines 4-10 To use unicode utility, you need: Link Here
4
 - python >=2.6 (str format() method is needed), preferrably wide
4
 - python >=2.6 (str format() method is needed), preferrably wide
5
   unicode build, however, python3 is recommended
5
   unicode build, however, python3 is recommended
6
 - python optparse library (part of since python2.3)
6
 - python optparse library (part of since python2.3)
7
 - UnicodeData.txt file (http://www.unicode.org/Public) which
7
 - UnicodeData.txt file (http://www.unicode.org/Public/13.0.0/ucd/UnicodeData.txt; or replace 13.0.0 with current Unicode version) which
8
   you should put into /usr/share/unicode/, ~/.unicode/ or current
8
   you should put into /usr/share/unicode/, ~/.unicode/ or current
9
   working directory.
9
   working directory.
10
    - apt-get install unicode-data  # Debian
10
    - apt-get install unicode-data  # Debian

Return to bug 46493