[Discuss] The dangers of ASCII mode

Do not post support requests, bug reports or feature requests. Discuss phpBB here. Non-phpBB related discussion goes in General Discussion!
Ideas Centre
User avatar
Noxwizard
Support Team Leader
Support Team Leader
Posts: 10338
Joined: Mon Jun 27, 2005 8:41 pm
Location: Texas, USA
Name: Patrick Webster
Contact:

[Discuss] The dangers of ASCII mode

Post by Noxwizard » Tue Aug 23, 2011 1:17 am

This topic is for discussion of the blog post, The dangers of ASCII mode

Does your FTP client do this by default too? Have you been burned by this "feature" before? Share your comments. Please note that this discussion is not about why you should or should not use FTP or why client X is better than client Y.
[Support Template] - [Read Before Posting] - [phpBB Knowledge Base]
Do not contact me for private support, please share the question in our forums.

User avatar
callumacrae
Former Team Member
Posts: 2662
Joined: Tue Feb 12, 2008 12:28 pm
Location: London, UK
Name: Callum Macrae
Contact:

Re: [Discuss] The dangers of ASCII mode

Post by callumacrae » Tue Aug 23, 2011 12:32 pm

Nice post. I wasn't aware of this and haven't been affected by it (I only transfer from OS X to Linux and back, and try to avoid FTP anyway), but will watch out in the future :-)
macr.ae = my website. you probably won't like it.
Proud user ofProud user of

User avatar
eyoungren
Registered User
Posts: 29
Joined: Sun Feb 13, 2011 9:05 am

Re: [Discuss] The dangers of ASCII mode

Post by eyoungren » Tue Aug 23, 2011 2:22 pm

For OSX, the default for Fetch and Cyberduck is Automatic. Don't know what it is for Transmit. Cyberduck seems to handle automatic much better than Fetch, but I've switched it to Binary anyway.

BTW, I do most of my upload/download via SFTP. Cyberduck does not seem to have any switches other than the type of security transfer.

Does this apply to SFTP?

User avatar
callumacrae
Former Team Member
Posts: 2662
Joined: Tue Feb 12, 2008 12:28 pm
Location: London, UK
Name: Callum Macrae
Contact:

Re: [Discuss] The dangers of ASCII mode

Post by callumacrae » Tue Aug 23, 2011 2:37 pm

eyoungren wrote:For OSX, the default for Fetch and Cyberduck is Automatic. Don't know what it is for Transmit. Cyberduck seems to handle automatic much better than Fetch, but I've switched it to Binary anyway.
OS X and Linux both use the same line endings, so I don't think you will have a problem unless you're transferring to a Windoze server.
eyoungren wrote:Does this apply to SFTP?
Yes.
macr.ae = my website. you probably won't like it.
Proud user ofProud user of

User avatar
Dog Cow
Registered User
Posts: 2494
Joined: Fri Jan 28, 2005 12:14 am
Contact:

Re: [Discuss] The dangers of ASCII mode

Post by Dog Cow » Tue Aug 23, 2011 3:30 pm

Noxwizard wrote: Does your FTP client do this by default too?
No, I use the ftp command in Mac OS X, and it defaults to binary. In fact, when I know that I'm just uploading ascii files, I type ascii to switch to that mode.

Code: Select all

230 OK. Current restricted directory is /
Remote system type is UNIX.
Using binary mode to transfer files.
ftp> ascii
200 TYPE is now ASCII
ftp>
Have you been burned by this "feature" before?
No.

eyoungren wrote:Cyberduck does not seem to have any switches other than the type of security transfer.

Does this apply to SFTP?
SFTP operates over an SSH connection. It makes no discrimination between ascii or binary modes.
Last edited by Dog Cow on Tue Aug 23, 2011 3:35 pm, edited 1 time in total.
Moof!
Mac GUI Vault: Retro Apple II & Macintosh computing archive.
Inside Allerton bookMac GUIMac 512K Blog

updown
Registered User
Posts: 542
Joined: Sat Jan 05, 2008 6:53 am

Re: [Discuss] The dangers of ASCII mode

Post by updown » Tue Aug 23, 2011 5:54 pm

By the way, this basic ASCII-problem is not only with FileZilla. The "Super Flexible File Synchronizer" for example also uses the ASCII-transfer for files without endings, if not altered first in the settings.

This seems to be some kind of "religious" problem, depending on what each developer believes in. There is no absolute right or wrong, and I doubt that there will be a unique standard, as long as the CRLF-LF-differences themselves exist due to the different systems.

User avatar
callumacrae
Former Team Member
Posts: 2662
Joined: Tue Feb 12, 2008 12:28 pm
Location: London, UK
Name: Callum Macrae
Contact:

Re: [Discuss] The dangers of ASCII mode

Post by callumacrae » Tue Aug 23, 2011 6:43 pm

updown wrote:This seems to be some kind of "religious" problem, depending on what each developer believes in. There is no absolute right or wrong, and I doubt that there will be a unique standard, as long as the CRLF-LF-differences themselves exist due to the different systems.
There is absolutely a wrong method. One method destroys stuff, the other method doesn't.
macr.ae = my website. you probably won't like it.
Proud user ofProud user of

Oleg
Former Team Member
Posts: 1221
Joined: Sat Jan 30, 2010 4:42 pm
Location: NYC
Contact:

Re: [Discuss] The dangers of ASCII mode

Post by Oleg » Wed Aug 24, 2011 12:35 am

How an FTP client can assume that a completely unknown file is going to be pure ASCII is beyond me.
This is actually easy to answer by reading RFC 959:
3.1.1.1. ASCII TYPE

This is the default type and must be accepted by all FTP
implementations.
Curiously enough,
3.1.1.3. IMAGE TYPE

Image type is intended for the efficient storage and
retrieval of files and for the transfer of binary data. It
is recommended that this type be accepted by all FTP
implementations.
So, using binary mode by default will render an implementation non-compliant with the specification, and possibly non-interoperable with other implementations.
Participate in phpBB development: Get involved | Issue tracker | Report a bug | Development board | [url=irc://chat.freenode.net/phpbb-dev]Development IRC chat[/url]
My stuff: mindlinkgame.com

Pony99CA
Registered User
Posts: 4783
Joined: Thu Sep 30, 2004 3:13 pm
Location: Hollister, CA
Name: Steve
Contact:

Re: [Discuss] The dangers of ASCII mode

Post by Pony99CA » Wed Aug 24, 2011 1:28 am

NoxWizard wrote:Recovering the files is not possible at this point. The only option is to change the transfer mode and download them again if you still have access to the originals. “But can’t I just re-upload the files and let the line endings get converted back?” Unfortunately, no. Existing Line Feeds would have had a Carriage Return added in front of them, and existing CRLFs would have been left alone. Uploading it would convert both the new and the previous CRLFs to LF. The file is lost.
I wanted to explore that a bit. I was under the impression that going from Unix to Windows would convert every LF (including those in CRLF pairs already) to CRLF. If that were the case, reuploading as ASCII would be OK (subject to the following) as any existing CRLF pairs that got translated to CRCRLF sequences as part of the DOS conversion would just get converted back to CRLF.

I thought the real problem would be the DOS End-of-File character. If ASCII mode recognizes those, a binary file that happened to contain '1A'X would get truncated when converting to ASCII. That would be truly unrecoverable.

Steve
Silicon Valley Pocket PC (http://www.svpocketpc.com)
Creator of manage_bots and spoof_user (ask me)
Need hosting for a small forum with full cPanel & MySQL access? Contact me or PM me.

Pony99CA
Registered User
Posts: 4783
Joined: Thu Sep 30, 2004 3:13 pm
Location: Hollister, CA
Name: Steve
Contact:

Re: [Discuss] The dangers of ASCII mode

Post by Pony99CA » Wed Aug 24, 2011 1:41 am

Oleg wrote:
How an FTP client can assume that a completely unknown file is going to be pure ASCII is beyond me.
This is actually easy to answer by reading RFC 959[...]
Interestingly, that RFC defined end-of-line as:
End-of-Line

The end-of-line sequence defines the separation of printing
lines. The sequence is Carriage Return, followed by Line Feed.
So it looks like UNIX got it wrong. :D Interpreting the character names (Carriage Return and Line Feed) literally, a carriage return without a line feed should allow overtyping (on a printing terminal; it's more difficult with a monitor :D) and a line feed without a carriage return should start printing on the next line in the next horizontal character position (ragged text).

Steve
Silicon Valley Pocket PC (http://www.svpocketpc.com)
Creator of manage_bots and spoof_user (ask me)
Need hosting for a small forum with full cPanel & MySQL access? Contact me or PM me.

User avatar
Noxwizard
Support Team Leader
Support Team Leader
Posts: 10338
Joined: Mon Jun 27, 2005 8:41 pm
Location: Texas, USA
Name: Patrick Webster
Contact:

Re: [Discuss] The dangers of ASCII mode

Post by Noxwizard » Wed Aug 24, 2011 3:38 am

Oleg wrote:
How an FTP client can assume that a completely unknown file is going to be pure ASCII is beyond me.
This is actually easy to answer by reading RFC 959:
3.1.1.1. ASCII TYPE

This is the default type and must be accepted by all FTP
implementations.
Yes, I've read it, but that's really only a valid assumption if the client has decided to implement that single data type. If you're offering both and a mode called "Automatic" that is supposed to do things magically without breaking files, it should try to be intelligent and not cause data corruption. When in doubt, no transformations should be applied. If the server or client only offer one of the modes, then there is no choice but to treat it as ASCII.

Pony99CA wrote:I wanted to explore that a bit. I was under the impression that going from Unix to Windows would convert every LF (including those in CRLF pairs already) to CRLF. If that were the case, reuploading as ASCII would be OK (subject to the following) as any existing CRLF pairs that got translated to CRCRLF sequences as part of the DOS conversion would just get converted back to CRLF.

I thought the real problem would be the DOS End-of-File character. If ASCII mode recognizes those, a binary file that happened to contain '1A'X would get truncated when converting to ASCII. That would be truly unrecoverable.

Steve
You have tried to make that recommendation before and were told the same thing. Try it yourself, it only takes a minute. Additionally, there is no EOF character embedded in files in Windows. By your own article, it is a concept and not an actual character. Anyway, the transformations only apply to line endings:
3.4. TRANSMISSION MODES wrote:For the purpose of standardized transfer, the sending host will
translate its internal end of line or end of record denotation
into the representation prescribed by the transfer mode and file
structure, and the receiving host will perform the inverse
translation to its internal denotation.
Pony99CA wrote:So it looks like UNIX got it wrong. :D
This is how the FTP RFC defines it for its own use, not how everyone else should use it.
[Support Template] - [Read Before Posting] - [phpBB Knowledge Base]
Do not contact me for private support, please share the question in our forums.

Oleg
Former Team Member
Posts: 1221
Joined: Sat Jan 30, 2010 4:42 pm
Location: NYC
Contact:

Re: [Discuss] The dangers of ASCII mode

Post by Oleg » Wed Aug 24, 2011 3:40 am

Pony99CA wrote: So it looks like UNIX got it wrong.
A number of (popular) protocols (including HTTP and SMTP) define "end of line" as CR+LF. I don't know why this is.

Keep in mind that unix predates such protocols (unix began around 1970).
Participate in phpBB development: Get involved | Issue tracker | Report a bug | Development board | [url=irc://chat.freenode.net/phpbb-dev]Development IRC chat[/url]
My stuff: mindlinkgame.com

Oleg
Former Team Member
Posts: 1221
Joined: Sat Jan 30, 2010 4:42 pm
Location: NYC
Contact:

Re: [Discuss] The dangers of ASCII mode

Post by Oleg » Wed Aug 24, 2011 3:50 am

Noxwizard wrote: Yes, I've read it, but that's really only a valid assumption if the client has decided to implement that single data type. If you're offering both and a mode called "Automatic" that is supposed to do things magically without breaking files, it should try to be intelligent and not cause data corruption. When in doubt, no transformations should be applied. If the server or client only offer one of the modes, then there is no choice but to treat it as ASCII.
I agree that a client is free to use any mode it is able to negotiate with a server.

My previous post went a little too far into protocol details. The first half of it was an explanation for why ascii would be the default. The second part was only relevant to client-server communication, not to what mode client chooses among the available ones.

With respect to what the default should be, maybe the developers don't want to break things for people who are happily transferring extensionless files in ascii mode. If so feedback from users that binary is the more sensible default can help change developers' minds.
Participate in phpBB development: Get involved | Issue tracker | Report a bug | Development board | [url=irc://chat.freenode.net/phpbb-dev]Development IRC chat[/url]
My stuff: mindlinkgame.com

Pony99CA
Registered User
Posts: 4783
Joined: Thu Sep 30, 2004 3:13 pm
Location: Hollister, CA
Name: Steve
Contact:

Re: [Discuss] The dangers of ASCII mode

Post by Pony99CA » Wed Aug 24, 2011 8:59 am

Noxwizard wrote:
Pony99CA wrote:I wanted to explore that a bit. I was under the impression that going from Unix to Windows would convert every LF (including those in CRLF pairs already) to CRLF. If that were the case, reuploading as ASCII would be OK (subject to the following) as any existing CRLF pairs that got translated to CRCRLF sequences as part of the DOS conversion would just get converted back to CRLF.

I thought the real problem would be the DOS End-of-File character. If ASCII mode recognizes those, a binary file that happened to contain '1A'X would get truncated when converting to ASCII. That would be truly unrecoverable.
You have tried to make that recommendation before and were told the same thing. Try it yourself, it only takes a minute.
First, my post here was not a "recommendation"; I was seeking clarification that CRLF in a UNIX file did not get translated to CRCRLF during an ASCII transfer to Windows.

Second, the post that you cited did not say "the same thing" at all. He merely stated that it didn't work without giving any real details about exactly what failed.

Anyway, I took your advice and tried it. I created the following file in a hex editor and saved it as ftp_test.txt:

Code: Select all

01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0D 0A
01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0D 0A
1A 1A 1A 46 54 50
I uploaded in binary and downloaded in binary as a control and verified that the contents were identical after the download. I then downloaded again as ASCII and got the following:

Code: Select all

01 02 03 04 05 06 07 08 09 0d 0a 0b 0c 0d 0e 0d
0d 0a 01 02 03 04 05 06 07 08 09 0d 0a 0b 0c 0d
0e 0d 0d 0a 1a 1a 1a 46 54 50
So, as I expected, all instances of LF (0a) were converted to CRLF (0d0a) as shown by the 0d0d0a groups. If that's the correct behavior, the file could possibly be recovered by converting all instances of CRLF to just LF.

I then uploaded in ASCII and downloaded in binary and got the following:

Code: Select all

01 02 03 04 05 06 07 08 09 0a 0b 0c 0e 0a 01 02
03 04 05 06 07 08 09 0a 0b 0c 0e 0a 1a 1a 1a 46
54 50
That's a bit more interesting. It appears that not only did CRLF pairs get converted to LF, but solo CR characters were deleted completely! (Notice there are no 0d characters in the file any longer.)

That was using the FTP Genius program, so I don't know if removing single CR instances is a bug or part of the standard. If that's standard, then uploading again in ASCII and downloading in binary won't fix the ASCII download.

Of course, uploading in ASCII to a Unix system and downloading back to Windows as ASCII will (almost) always fail. Any CRLF character pairs in the file will get converted to LF on the upload, but all LF characters (not just those in CRLF pairs, apparently) will be converted to CRLF pairs on the download.
Noxwizard wrote:Additionally, there is no EOF character embedded in files in Windows. By your own article, it is a concept and not an actual character.
Did we read the same article? The one that I linked to clearly stated that DOS used Ctrl+Z as an End-of-File character (for compatibility with CP/M and to enable file input and terminal input to be handled identically). Maybe Windows no longer uses Ctrl+Z, but that's a different discussion.

If FTP doesn't worry about Ctrl+Z (FTP Genius didn't seem to), that's great.
Noxwizard wrote:
Pony99CA wrote:So it looks like UNIX got it wrong. :D
This is how the FTP RFC defines it for its own use, not how everyone else should use it.
Remind me next time to put more smilies in there to indicate when something is intended as a joke. :D :D :D :D

Steve
Silicon Valley Pocket PC (http://www.svpocketpc.com)
Creator of manage_bots and spoof_user (ask me)
Need hosting for a small forum with full cPanel & MySQL access? Contact me or PM me.

User avatar
Lumpy Burgertushie
Registered User
Posts: 66324
Joined: Mon May 02, 2005 3:11 am
Contact:

Re: [Discuss] The dangers of ASCII mode

Post by Lumpy Burgertushie » Wed Aug 24, 2011 10:20 pm

to me the arguement about which is right or wrong is moot.

the point is that phpbb developers decided to change things so that attachments do not have a file extension.

most ftp programs apparently are set to default files with no extension to ascii and therefore the files get corrupted.

something was said earlier about letting developers of ftp software know about this,
well, I could say the same about the phpbb developers.

why create a problem that can only be solved if all the different ftp programs get changed when we all know that is not going to happen.

why not simply change phpbb so that the problem is not there?


roberrt
I'm baaaaaccckkkk. still doing work on donation basis. PM your needs.

Premium phpBB 3.2 Styles by PlanetStyles.net

If a tree falls in the forest and nobody is there, does it make a sound?

Post Reply

Return to “phpBB Discussion”