Sybase NNTP forums - End Of Life (EOL)

The NNTP forums from Sybase - forums.sybase.com - are now closed.

All new questions should be directed to the appropriate forum at the SAP Community Network (SCN).

Individual products have links to the respective forums on SCN, or you can go to SCN and search for your product in the search box (upper right corner) to find your specific developer center.

non utf8 characters

5 posts in General Discussion Last posting was on 2013-01-15 13:50:56.0Z
djmodica Posted on 2012-12-27 21:02:27.0Z
Sender: 396b.50dcb61c.1804289383@sybase.com
From: djmodica
Newsgroups: sybase.public.ase.general
Subject: non utf8 characters
X-Mailer: WebNews to Mail Gateway v1.1t
Message-ID: <50dcb763.39be.1681692777@sybase.com>
NNTP-Posting-Host: 172.20.134.41
X-Original-NNTP-Posting-Host: 172.20.134.41
Date: 27 Dec 2012 13:02:27 -0800
X-Trace: forums-1-dub 1356642147 172.20.134.41 (27 Dec 2012 13:02:27 -0800)
X-Original-Trace: 27 Dec 2012 13:02:27 -0800, 172.20.134.41
Lines: 6
Path: forums-1-dub!not-for-mail
Xref: forums-1-dub sybase.public.ase.general:31619
Article PK: 1158641

hi folks. ASE 15.0.3 esd 1. i have a table with some
varchar columns. some of these columns have non-utf8
characters in them, some printable some not, that i need to
get rid of. how could i do this ?

thanks


Bret Halford Posted on 2012-12-27 22:41:12.0Z
From: Bret Halford <bret.halford@sap.com>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:16.0) Gecko/20121026 Thunderbird/16.0.2
MIME-Version: 1.0
Newsgroups: sybase.public.ase.general
Subject: Re: non utf8 characters
References: <50dcb763.39be.1681692777@sybase.com>
In-Reply-To: <50dcb763.39be.1681692777@sybase.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
NNTP-Posting-Host: vip152.sybase.com
X-Original-NNTP-Posting-Host: vip152.sybase.com
Message-ID: <50dcce88$1@forums-1-dub>
Date: 27 Dec 2012 14:41:12 -0800
X-Trace: forums-1-dub 1356648072 172.20.134.152 (27 Dec 2012 14:41:12 -0800)
X-Original-Trace: 27 Dec 2012 14:41:12 -0800, vip152.sybase.com
Lines: 24
Path: forums-1-dub!not-for-mail
Xref: forums-1-dub sybase.public.ase.general:31620
Article PK: 1158642


On 12/27/2012 2:02 PM, djmodica wrote:
> hi folks. ASE 15.0.3 esd 1. i have a table with some
> varchar columns. some of these columns have non-utf8
> characters in them, some printable some not, that i need to
> get rid of. how could i do this ?
>
> thanks
>

So your server's default character set is utf8?

Could you give a few examples of the text and the raw binary
data behind it? i.e.
select <column>, convert(binary(<n>, <column>) from <table> where...

How are you currently identifying the invalid characters -
is some error being raised by some operations, or just
by looking at the data?

--
Bret Halford
Support Architect, ASE Tactical Support Team, AGS Primary Support
Sybase, Inc., an SAP Company
385 Interlocken Crescent, Suite 300, Broomfield, Colorado, 80021


djmodica Posted on 2013-01-04 15:03:24.0Z
Sender: 2a52.50e6dc2f.1804289383@sybase.com
From: djmodica
Newsgroups: sybase.public.ase.general
Subject: Re: non utf8 characters
X-Mailer: WebNews to Mail Gateway v1.1t
Message-ID: <50e6ef3c.32aa.1681692777@sybase.com>
References: <50dcce88$1@forums-1-dub>
MIME-Version: 1.0
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: quoted-printable
NNTP-Posting-Host: 172.20.134.41
X-Original-NNTP-Posting-Host: 172.20.134.41
Date: 4 Jan 2013 07:03:24 -0800
X-Trace: forums-1-dub 1357311804 172.20.134.41 (4 Jan 2013 07:03:24 -0800)
X-Original-Trace: 4 Jan 2013 07:03:24 -0800, 172.20.134.41
Lines: 46
Path: forums-1-dub!not-for-mail
Xref: forums-1-dub sybase.public.ase.general:31626
Article PK: 1158740


> On 12/27/2012 2:02 PM, djmodica wrote:
> > hi folks. ASE 15.0.3 esd 1. i have a table with some
> > varchar columns. some of these columns have non-utf8
> > characters in them, some printable some not, that i need
> > to get rid of. how could i do this ?
> >
> > thanks
> >
>
> So your server's default character set is utf8?
>
> Could you give a few examples of the text and the raw
> binary data behind it? i.e.
> select <column>, convert(binary(<n>, <column>) from
> <table> where...
>
> How are you currently identifying the invalid characters -
> is some error being raised by some operations, or just
> by looking at the data?
>
> --
> Bret Halford
> Support Architect, ASE Tactical Support Team, AGS Primary
> Support Sybase, Inc., an SAP Company
> 385 Interlocken Crescent, Suite 300, Broomfield, Colorado,
> 80021

my server's default character set is the default out of the
box.
when i query the table using DBArtisan on windows XP the
column value is:

Drägerwerk AG & Co. 68.63 warrants April 2015

when i use isql on solaris the value is:

Dr\344gerwerk AG & Co. 68.63 warrants April 2015

the way i found this was i tried to import the data into
MongoDB

from ASE and i got an error "non-UTF8 data".

is this enough info ? thanks

david


Bret Halford Posted on 2013-01-04 17:11:33.0Z
From: Bret Halford <bret.halford@sap.com>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/17.0 Thunderbird/17.0
MIME-Version: 1.0
Newsgroups: sybase.public.ase.general
Subject: Re: non utf8 characters
References: <50dcce88$1@forums-1-dub> <50e6ef3c.32aa.1681692777@sybase.com>
In-Reply-To: <50e6ef3c.32aa.1681692777@sybase.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit
NNTP-Posting-Host: vip152.sybase.com
X-Original-NNTP-Posting-Host: vip152.sybase.com
Message-ID: <50e70d45$1@forums-1-dub>
Date: 4 Jan 2013 09:11:33 -0800
X-Trace: forums-1-dub 1357319493 172.20.134.152 (4 Jan 2013 09:11:33 -0800)
X-Original-Trace: 4 Jan 2013 09:11:33 -0800, vip152.sybase.com
Lines: 104
Path: forums-1-dub!not-for-mail
Xref: forums-1-dub sybase.public.ase.general:31627
Article PK: 1158749


On 1/4/2013 8:03 AM, djmodica wrote:
>> On 12/27/2012 2:02 PM, djmodica wrote:
>>> hi folks. ASE 15.0.3 esd 1. i have a table with some
>>> varchar columns. some of these columns have non-utf8
>>> characters in them, some printable some not, that i need
>>> to get rid of. how could i do this ?
>>>
>>> thanks
>>>
>>
>> So your server's default character set is utf8?
>>
>> Could you give a few examples of the text and the raw
>> binary data behind it? i.e.
>> select <column>, convert(binary(<n>, <column>) from
>> <table> where...
>>
>> How are you currently identifying the invalid characters -
>> is some error being raised by some operations, or just
>> by looking at the data?
>>
>> --
>> Bret Halford
>> Support Architect, ASE Tactical Support Team, AGS Primary
>> Support Sybase, Inc., an SAP Company
>> 385 Interlocken Crescent, Suite 300, Broomfield, Colorado,
>> 80021
>
> my server's default character set is the default out of the
> box.
> when i query the table using DBArtisan on windows XP the
> column value is:
>
> Dr?gerwerk AG & Co. 68.63 warrants April 2015
>
> when i use isql on solaris the value is:
>
> Dr\344gerwerk AG & Co. 68.63 warrants April 2015
>
> the way i found this was i tried to import the data into
> MongoDB
>
> from ASE and i got an error "non-UTF8 data".
>
> is this enough info ? thanks
>
> david
>

Hi David,

Maybe enough, maybe not.

Exactly what command / parameters did you use to extract the data?

One possibility is that your server is configured for a non-utf8
character set and all you have to do is extract it in a way that
converts the values to utf8. One way to do that is with the bcp
utility specifying a "-Jutf8" parameter. So you might see if
mongoDB likes the output file created like this:

bcp <database>..<table> out <file> -U<login> -P<password> -c -Jutf8

If that doesn't do it, we will need some more information.

The ASE's default character set out of the box depends on
the platform (utf8 for Linux, iso_i for Solaris,
cp850 for Windows, roman8 for HPUX) Best thing to do is get
the actual current configuration value from
sp_configure "default character set id"

Another way to determine it is to look in the ASE errorlog,
the character set and sort order information is printed
during each boot sequence. It looks like this (from my Solaris
ASE configured for iso_1 character set):

00:0000:00000:00001:2013/01/01 13:25:04.29 server ASE's default unicode
sort order is 'binary'.
00:0000:00000:00001:2013/01/01 13:25:04.29 server ASE's default sort
order is:
00:0000:00000:00001:2013/01/01 13:25:04.29 server 'bin_iso_1' (ID
= 50)
00:0000:00000:00001:2013/01/01 13:25:04.29 server on top of default
character set:
00:0000:00000:00001:2013/01/01 13:25:04.29 server 'iso_1' (ID = 1).



I'd still also like to see the binary representation of that "Dr?gerwerk
AG & Co." value on the server, so please get

select <column>, convert(binary(40), <column>)
from <tablename>
where <filter to get this "Dr?gerwerk" row>


-bret


--
Bret Halford
Support Architect, ASE Tactical Support Team, AGS Primary Support
Sybase, Inc., an SAP Company
385 Interlocken Crescent, Suite 300, Broomfield, Colorado, 80021


djmodica Posted on 2013-01-15 13:50:56.0Z
Sender: 3d6b.50f5529e.1804289383@sybase.com
From: djmodica
Newsgroups: sybase.public.ase.general
Subject: Re: non utf8 characters
X-Mailer: WebNews to Mail Gateway v1.1t
Message-ID: <50f55ec0.3f0f.1681692777@sybase.com>
References: <50e70d45$1@forums-1-dub>
MIME-Version: 1.0
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: quoted-printable
NNTP-Posting-Host: 172.20.134.41
X-Original-NNTP-Posting-Host: 172.20.134.41
Date: 15 Jan 2013 05:50:56 -0800
X-Trace: forums-1-dub 1358257856 172.20.134.41 (15 Jan 2013 05:50:56 -0800)
X-Original-Trace: 15 Jan 2013 05:50:56 -0800, 172.20.134.41
Lines: 144
Path: forums-1-dub!not-for-mail
Xref: forums-1-dub sybase.public.ase.general:31650
Article PK: 1159017


> On 1/4/2013 8:03 AM, djmodica wrote:
> >> On 12/27/2012 2:02 PM, djmodica wrote:
> >>> hi folks. ASE 15.0.3 esd 1. i have a table with some
> >>> varchar columns. some of these columns have non-utf8
> >>> characters in them, some printable some not, that i
> need >>> to get rid of. how could i do this ?
> >>>
> >>> thanks
> >>>
> >>
> >> So your server's default character set is utf8?
> >>
> >> Could you give a few examples of the text and the raw
> >> binary data behind it? i.e.
> >> select <column>, convert(binary(<n>, <column>) from
> >> <table> where...
> >>
> >> How are you currently identifying the invalid
> characters - >> is some error being raised by some
> operations, or just >> by looking at the data?
> >>
> >> --
> >> Bret Halford
> >> Support Architect, ASE Tactical Support Team, AGS
> Primary >> Support Sybase, Inc., an SAP Company
> >> 385 Interlocken Crescent, Suite 300, Broomfield,
> Colorado, >> 80021
> >
> > my server's default character set is the default out of
> > the box.
> > when i query the table using DBArtisan on windows XP the
> > column value is:
> >
> > Drägerwerk AG & Co. 68.63 warrants April 2015
> >
> > when i use isql on solaris the value is:
> >
> > Dr\344gerwerk AG & Co. 68.63 warrants April 2015
> >
> > the way i found this was i tried to import the data into
> > MongoDB
> >
> > from ASE and i got an error "non-UTF8 data".
> >
> > is this enough info ? thanks
> >
> > david
> >
>
> Hi David,
>
> Maybe enough, maybe not.
>
> Exactly what command / parameters did you use to extract
> the data?
>
> One possibility is that your server is configured for a
> non-utf8 character set and all you have to do is extract
> it in a way that converts the values to utf8. One way to
> do that is with the bcp utility specifying a "-Jutf8"
> parameter. So you might see if mongoDB likes the output
> file created like this:
>
> bcp <database>..<table> out <file> -U<login> -P<password>
> -c -Jutf8
>
> If that doesn't do it, we will need some more information.
>
> The ASE's default character set out of the box depends on
> the platform (utf8 for Linux, iso_i for Solaris,
> cp850 for Windows, roman8 for HPUX) Best thing to do is
> get the actual current configuration value from
> sp_configure "default character set id"
>
> Another way to determine it is to look in the ASE errorlog
> , the character set and sort order information is printed
> during each boot sequence. It looks like this (from my
> Solaris ASE configured for iso_1 character set):
>
> 00:0000:00000:00001:2013/01/01 13:25:04.29 server ASE's
> default unicode sort order is 'binary'.
> 00:0000:00000:00001:2013/01/01 13:25:04.29 server ASE's
> default sort order is:
> 00:0000:00000:00001:2013/01/01 13:25:04.29 server
> 'bin_iso_1' (ID = 50)
> 00:0000:00000:00001:2013/01/01 13:25:04.29 server on top
> of default character set:
> 00:0000:00000:00001:2013/01/01 13:25:04.29 server
> 'iso_1' (ID = 1).
>
>
>
> I'd still also like to see the binary representation of
> that "Drägerwerk AG & Co." value on the server, so
please
> get
>
> select <column>, convert(binary(40), <column>)
> from <tablename>
> where <filter to get this "Drägerwerk" row>
>
>
> -bret
>
>
> --
> Bret Halford
> Support Architect, ASE Tactical Support Team, AGS Primary
> Support Sybase, Inc., an SAP Company
> 385 Interlocken Crescent, Suite 300, Broomfield, Colorado,
> 80021

bret,

thank you for your response !
the boot sequence is:

00:00000:00001:2012/08/26 09:32:04.74 server ASE's default
unicode sort order is 'binary'.
00:00000:00001:2012/08/26 09:32:04.74 server ASE's default
sort order is:
00:00000:00001:2012/08/26 09:32:04.74 server 'bin_iso_1'
(ID = 50)
00:00000:00001:2012/08/26 09:32:04.74 server on top of
default character set:
00:00000:00001:2012/08/26 09:32:04.74 server 'iso_1' (ID
= 1).

when i do a select in LINUX i get this:

Drägerwerk AG & Co. 68.63 warrants April 2015

when i use the bcp you suggested i get this:

Drägerwerk AG & Co. 68.63 warrants April 2015

the output from the query is as follows:

Drägerwerk AG & Co. 68.63 warrants April
2015 4472E46765727765726B204147202620436F2E2036382E36332077617272616E747320417072696C

thanks

david