When working with CSV files, character encoding is a topic you cannot ignore. By understanding the differences between UTF-8 and Shift_JIS, you can prevent most garbled-text issues before they happen. This guide covers the fundamentals of character encoding and practical advice for choosing the right one for your CSVs.
What Is Character Encoding?
How Computers Handle Text
Computers process all data internally as numbers (sequences of 0s and 1s). Character encoding is the ruleset that maps specific numbers to specific characters.
For example, the Japanese character "あ" is represented as the 3-byte sequence "E3 81 82" in UTF-8, but as the 2-byte sequence "82 A0" in Shift_JIS. Because the same character maps to different numbers in different encodings, reading a file with the wrong encoding produces garbled text.
Major Character Encodings
| Encoding | Full Name | Year | Key Feature |
|---|---|---|---|
| UTF-8 | Unicode Transformation Format - 8bit | 1993 | International standard covering all world scripts |
| Shift_JIS | Shift Japanese Industrial Standards | 1982 | Encoding designed specifically for Japanese |
| EUC-JP | Extended Unix Code for Japanese | 1985 | Japanese encoding for Unix/Linux systems |
| ISO-2022-JP | — | 1993 | Japanese encoding for email |
In today's web and business environments, UTF-8 and Shift_JIS are the two encodings you will encounter most often.
UTF-8 vs. Shift_JIS: Key Differences
Comparison Table
| Attribute | UTF-8 | Shift_JIS |
|---|---|---|
| Character coverage | ~140,000 (all world scripts) | ~7,000 (primarily Japanese) |
| Bytes per Japanese character | 3 bytes | 2 bytes |
| Bytes per alphanumeric character | 1 byte | 1 byte |
| Web adoption rate | Over 98% | Minimal |
| Excel on Windows | Supported with BOM | Supported by default |
| Mac / Linux | Default | Requires conversion |
| Programming languages | Standard | Legacy support |
UTF-8 Characteristics
UTF-8 is a Unicode encoding that can represent every character in every writing system worldwide. Over 98% of websites use UTF-8, making it the de facto international standard.
Alphanumeric characters use 1 byte, while Japanese characters use 3 bytes — so Japanese-heavy files are slightly larger than their Shift_JIS counterparts. However, UTF-8 handles multilingual data (Chinese, Korean, emoji, etc.) without issue.
Shift_JIS Characteristics
Shift_JIS was developed in 1982 by Microsoft and ASCII Corporation specifically for Japanese text processing. It remains widely used in Excel on Windows and legacy business systems in Japan.
Japanese characters use only 2 bytes, making Japanese-heavy files more compact than UTF-8. However, Shift_JIS cannot handle certain characters such as "①", "髙", and "﨑" (platform-dependent characters), and it has no support for non-Japanese scripts.
What Is a BOM (Byte Order Mark)?
A BOM is a few-byte identifier at the beginning of a file that tells software which character encoding is used. The UTF-8 BOM consists of 3 bytes: "EF BB BF".
| Type | Leading Bytes | Behavior in Excel |
|---|---|---|
| UTF-8 (no BOM) | None | Read as Shift_JIS → garbled text |
| UTF-8 (with BOM) | EF BB BF | Correctly recognized as UTF-8 |
| Shift_JIS | None | Read correctly by default |
For CSVs intended for Excel, choose "UTF-8 with BOM" or "Shift_JIS" for safe results.
Free Tool
CSV Encoding Converter
Fix CSV character encoding issues. Convert between Shift_JIS and UTF-8 to resolve garbled text in Excel.
Try it now →How to Choose the Right Encoding
When to Use Shift_JIS
- Opening CSVs directly in Excel on Windows
- Interfacing with a business system (payroll, accounting software, etc.) that requires Shift_JIS
- A trading partner has specified Shift_JIS as the delivery format
When to Use UTF-8
- Uploading to web services (most accept UTF-8)
- Processing data with Python, JavaScript, or other modern languages
- Working with multilingual data (Chinese, Korean, etc.)
- Using Google Sheets or macOS
When in Doubt, Default to UTF-8
If you are creating a new CSV from scratch, UTF-8 is the recommended default. It covers the widest range of characters and has universal support. When you need Shift_JIS for a specific purpose, you can always convert with a tool.
Free Tool
CSV Encoding Converter
Fix CSV character encoding issues. Convert between Shift_JIS and UTF-8 to resolve garbled text in Excel.
Try it now →Frequently Asked Questions (FAQ)
Should I use UTF-8 or Shift_JIS?
UTF-8 is generally recommended. However, if you need to open the CSV directly in Excel on Windows, Shift_JIS is more reliable. Use sakutto's CSV Encoding Converter to switch between UTF-8 and Shift_JIS whenever needed.
How can I check a CSV file's encoding?
On Windows, open the file in Notepad — the encoding ("UTF-8" or "ANSI" for Shift_JIS) is shown in the bottom-right status bar. On Mac, run file -i filename.csv in the terminal. sakutto's CSV Viewer also auto-detects encoding.
What is the difference between UTF-8 with BOM and without BOM?
A BOM (Byte Order Mark) is a 3-byte marker (EF BB BF) at the start of the file. Excel uses this to identify UTF-8 encoding. With a BOM, Excel displays the file correctly. Without a BOM, Excel may misread it as Shift_JIS and display garbled text. For programmatic use, BOM-less UTF-8 is generally preferred.
Are Shift_JIS and CP932 (Windows-31J) the same?
Strictly speaking, no. CP932 is Microsoft's extension of Shift_JIS with additional symbols like "①②③" and "Ⅰ Ⅱ Ⅲ". What Windows calls "Shift_JIS" is almost always CP932 in practice.
Can encoding conversion corrupt my data?
Converting from Shift_JIS to UTF-8 preserves all data perfectly. However, converting from UTF-8 to Shift_JIS may replace characters that Shift_JIS cannot represent (emoji, certain kanji) with "?". sakutto's CSV Encoding Converter runs entirely in your browser, so your files are never sent to any server.
Summary
UTF-8 is the international standard with broad character coverage, while Shift_JIS offers strong compatibility with Excel on Windows for Japanese-language data. Garbled text in CSV files is almost always caused by a mismatch between these two encodings. By choosing the right encoding for your use case — and converting when necessary — you can avoid encoding headaches entirely.
Free Tool
CSV Encoding Converter
Fix CSV character encoding issues. Convert between Shift_JIS and UTF-8 to resolve garbled text in Excel.
Try it now →