A simple ->utf-8 converter

This is an archive of the phpBB 2.0.x convertors forum. Support for phpBB2 has now ended.
Forum rules
Following phpBB2's EoL, this forum is now archived for reference purposes only.
Please see the following announcement for more information: viewtopic.php?f=14&t=1385785
Locked
User avatar
GanQuan
Registered User
Posts: 88
Joined: Tue Jul 20, 2004 9:19 am
Contact:

A simple ->utf-8 converter

Post by GanQuan »

Since been tired of looking for a gbk->utf-8 converter, i wrote a simple one in java myself, it convert a text file(or a batch of text files) encoded in your default charset to utf-8 encoded, it's a little rough, but fortunately, solved my problem..
It works only when the input file(s) are encoded in your default charset, otherwise, the output will be unreadable, and you might get some runtime exceptions. It's possible to convert from whatever charset to another if i know how to detect the original charset from a file, plz do tell me if you know how to do that. Besides, there is a way to do the same thing but seems not so clever. You can simply add a new argument and let the user specify the source charset, and use user-specified charset in the code:

Code: Select all

InputStreamReader isr = new InputStreamReader(fis, user_specified_charset);
Other than utf-8, you can specify another java-supported charset as the target charset.

I'm a total newbie in java, and, it's written in such a rush, so, some of the codes might be not so efficient, if you think it's stupid to post such a simple program here, please just ignore me :)

Here is the usage (assume you already know how to compile a java source file):

Code: Select all

java UTF8Conv path [charset]
path must be a valid path to a file or a directory, the conversion will be made on the file or all files in the directory, however, it won't overwrite the original file(s), just create new one(s) with similar name(s).
charset is the desired charset, default is utf-8


here is the source, have fun with it :D :

Code: Select all

/*
 * Created on 2004-8-13
 * UTF8Conv.java:
 * 
 * @author GanQuan
 * @email [email protected]
 *
 * Simply convert file(s) charset
 *
 * Usage:
 * java UTF8Conv file_name/or/dir [postfix] [charset]
 *     file_name/or/dir: name of the file or dir of the files which you want to convert.
 *     [postfix]:        the postfix will be append to the original file name. ".new" is the default.
 *     [charset]:        the charset you want to convert to. "utf-8" is the default.
 */

import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.OutputStreamWriter;
import java.io.Reader;
import java.io.Writer;
import java.nio.charset.UnsupportedCharsetException;

public final class UTF8Conv {
    
    private static String fileName = new String();
    private static String postfix = ".new";   // any thing you like
    private static String charset = "UTF8";   // charset of the target file
    
    private static void writeOutput() {
        
        File path = new File(fileName);
        
        if (!path.isFile() && !path.isDirectory()) {
            System.out.println("Invalid file name or path!");
        } else if (path.isFile()) {
            try {
                FileOutputStream fos = new FileOutputStream(path.getAbsolutePath() + postfix);
                Writer out = new OutputStreamWriter(fos, charset);
                out.write(readInput(path));
                out.close();
            } catch (UnsupportedCharsetException uce) {
                System.out.println("Unsupported charset: " + charset);
            } catch (IOException e) {
                e.printStackTrace();
            }
            System.out.println("Done");
        } else if (path.isDirectory()) {
            File[] dir = path.listFiles();
            for (int i = 0; i < dir.length; i++) {
                try {
                    FileOutputStream fos = new FileOutputStream(dir[i].getAbsolutePath() + postfix);
                    Writer out = new OutputStreamWriter(fos, charset);
                    out.write(readInput(dir[i]));
                    out.close();
                } catch (UnsupportedCharsetException uce) {
                    System.out.println("Unsupported charset: " + charset);
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
            System.out.println("Done");
        }
    }
    
    private static String readInput(File file) {

        StringBuffer buff = new StringBuffer();
        try {
            FileInputStream fis = new FileInputStream(file);
            InputStreamReader isr = new InputStreamReader(fis);
            Reader in = new BufferedReader(isr);
            int ch;
            while ((ch = in.read()) > -1) {
                buff.append((char)ch);
            }
            in.close();
        } catch (FileNotFoundException fe) {
            System.out.println("File Not Found!");
        } catch (IOException e) {
            e.printStackTrace();
        }
        return buff.toString();
    }
    
    public static void main(String[] args) {
        
        if (args.length < 1 || args.length > 2) {
            printHelp();
        } else {
            fileName = args[0];
            charset = args.length > 1 ? args[1] : charset;
            writeOutput();
        }
    }
    
    private static void printHelp() {
        System.out.println("Usage:");
        System.out.println("java UTF8Conv file_name/or/dir [charset]");
        System.out.println("file_name/or/dir: \n\tname of the file or dir of the files which you want to convert.");
        System.out.println("[charset]: \n\tthe charset you want to convert to. \"utf-8\" is the default.");
    }
}
wow, olympic 2004 is about to open :D
User avatar
GanQuan
Registered User
Posts: 88
Joined: Tue Jul 20, 2004 9:19 am
Contact:

Post by GanQuan »

seems nobody interested :cry:
pichirichi
Registered User
Posts: 83
Joined: Wed Jun 02, 2004 5:34 am
Contact:

Post by pichirichi »

GanQuan wrote: seems nobody interested :cry:


I was looking for something like that. to bad I found it after I've converted the files with an editor :-(
Locked

Return to “[2.0.x] Convertors”