sakutto
Knowledge

UTF-8 vs. Shift_JIS: Understanding CSV Character Encoding

UTF-8Shift_JIScharacter encodingCSVencoding conversion

When working with CSV files, character encoding is a topic you cannot ignore. By understanding the differences between UTF-8 and Shift_JIS, you can prevent most garbled-text issues before they happen. This guide covers the fundamentals of character encoding and practical advice for choosing the right one for your CSVs.

What Is Character Encoding?

How Computers Handle Text

Computers process all data internally as numbers (sequences of 0s and 1s). Character encoding is the ruleset that maps specific numbers to specific characters.

For example, the Japanese character "あ" is represented as the 3-byte sequence "E3 81 82" in UTF-8, but as the 2-byte sequence "82 A0" in Shift_JIS. Because the same character maps to different numbers in different encodings, reading a file with the wrong encoding produces garbled text.

Major Character Encodings

EncodingFull NameYearKey Feature
UTF-8Unicode Transformation Format - 8bit1993International standard covering all world scripts
Shift_JISShift Japanese Industrial Standards1982Encoding designed specifically for Japanese
EUC-JPExtended Unix Code for Japanese1985Japanese encoding for Unix/Linux systems
ISO-2022-JP1993Japanese encoding for email

In today's web and business environments, UTF-8 and Shift_JIS are the two encodings you will encounter most often.

UTF-8 vs. Shift_JIS: Key Differences

Comparison Table

AttributeUTF-8Shift_JIS
Character coverage~140,000 (all world scripts)~7,000 (primarily Japanese)
Bytes per Japanese character3 bytes2 bytes
Bytes per alphanumeric character1 byte1 byte
Web adoption rateOver 98%Minimal
Excel on WindowsSupported with BOMSupported by default
Mac / LinuxDefaultRequires conversion
Programming languagesStandardLegacy support

UTF-8 Characteristics

UTF-8 is a Unicode encoding that can represent every character in every writing system worldwide. Over 98% of websites use UTF-8, making it the de facto international standard.

Alphanumeric characters use 1 byte, while Japanese characters use 3 bytes — so Japanese-heavy files are slightly larger than their Shift_JIS counterparts. However, UTF-8 handles multilingual data (Chinese, Korean, emoji, etc.) without issue.

Shift_JIS Characteristics

Shift_JIS was developed in 1982 by Microsoft and ASCII Corporation specifically for Japanese text processing. It remains widely used in Excel on Windows and legacy business systems in Japan.

Japanese characters use only 2 bytes, making Japanese-heavy files more compact than UTF-8. However, Shift_JIS cannot handle certain characters such as "①", "髙", and "﨑" (platform-dependent characters), and it has no support for non-Japanese scripts.

What Is a BOM (Byte Order Mark)?

A BOM is a few-byte identifier at the beginning of a file that tells software which character encoding is used. The UTF-8 BOM consists of 3 bytes: "EF BB BF".

TypeLeading BytesBehavior in Excel
UTF-8 (no BOM)NoneRead as Shift_JIS → garbled text
UTF-8 (with BOM)EF BB BFCorrectly recognized as UTF-8
Shift_JISNoneRead correctly by default

For CSVs intended for Excel, choose "UTF-8 with BOM" or "Shift_JIS" for safe results.

Free Tool

CSV Encoding Converter

Fix CSV character encoding issues. Convert between Shift_JIS and UTF-8 to resolve garbled text in Excel.

Try it now →

How to Choose the Right Encoding

When to Use Shift_JIS

  • Opening CSVs directly in Excel on Windows
  • Interfacing with a business system (payroll, accounting software, etc.) that requires Shift_JIS
  • A trading partner has specified Shift_JIS as the delivery format

When to Use UTF-8

  • Uploading to web services (most accept UTF-8)
  • Processing data with Python, JavaScript, or other modern languages
  • Working with multilingual data (Chinese, Korean, etc.)
  • Using Google Sheets or macOS

When in Doubt, Default to UTF-8

If you are creating a new CSV from scratch, UTF-8 is the recommended default. It covers the widest range of characters and has universal support. When you need Shift_JIS for a specific purpose, you can always convert with a tool.

Free Tool

CSV Encoding Converter

Fix CSV character encoding issues. Convert between Shift_JIS and UTF-8 to resolve garbled text in Excel.

Try it now →

Frequently Asked Questions (FAQ)

Should I use UTF-8 or Shift_JIS?

UTF-8 is generally recommended. However, if you need to open the CSV directly in Excel on Windows, Shift_JIS is more reliable. Use sakutto's CSV Encoding Converter to switch between UTF-8 and Shift_JIS whenever needed.

How can I check a CSV file's encoding?

On Windows, open the file in Notepad — the encoding ("UTF-8" or "ANSI" for Shift_JIS) is shown in the bottom-right status bar. On Mac, run file -i filename.csv in the terminal. sakutto's CSV Viewer also auto-detects encoding.

What is the difference between UTF-8 with BOM and without BOM?

A BOM (Byte Order Mark) is a 3-byte marker (EF BB BF) at the start of the file. Excel uses this to identify UTF-8 encoding. With a BOM, Excel displays the file correctly. Without a BOM, Excel may misread it as Shift_JIS and display garbled text. For programmatic use, BOM-less UTF-8 is generally preferred.

Are Shift_JIS and CP932 (Windows-31J) the same?

Strictly speaking, no. CP932 is Microsoft's extension of Shift_JIS with additional symbols like "①②③" and "Ⅰ Ⅱ Ⅲ". What Windows calls "Shift_JIS" is almost always CP932 in practice.

Can encoding conversion corrupt my data?

Converting from Shift_JIS to UTF-8 preserves all data perfectly. However, converting from UTF-8 to Shift_JIS may replace characters that Shift_JIS cannot represent (emoji, certain kanji) with "?". sakutto's CSV Encoding Converter runs entirely in your browser, so your files are never sent to any server.

Summary

UTF-8 is the international standard with broad character coverage, while Shift_JIS offers strong compatibility with Excel on Windows for Japanese-language data. Garbled text in CSV files is almost always caused by a mismatch between these two encodings. By choosing the right encoding for your use case — and converting when necessary — you can avoid encoding headaches entirely.

Free Tool

CSV Encoding Converter

Fix CSV character encoding issues. Convert between Shift_JIS and UTF-8 to resolve garbled text in Excel.

Try it now →

Related Tools

Free Tool

CSV Encoding Converter

Fix CSV character encoding issues. Convert between Shift_JIS and UTF-8 to resolve garbled text in Excel.

Try it now

Free Tool

CSV to Excel Converter

Convert CSV files to Excel (.xlsx) format. No character encoding issues, with auto column width.

Try it now

Free Tool

CSV Viewer

View CSV files as readable tables in your browser. Auto-detects encoding with sort and filter support.

Try it now

Related Tool Categories

Articles