Database Administration
mysql character-set aurora
Updated Wed, 07 Sep 2022 23:06:20 GMT

Question marks instead Hebrew characters after mysql dump and import


I tried to migrate my AWS RDS instance from Aurora to MySQL. I created a dump file from the Aurora and imported it into the MySQL instance.

Both instances have the same characters set.

mysql> show variables like '%char%';
+--------------------------+-------------------------------------------+
| Variable_name            | Value                                     |
+--------------------------+-------------------------------------------+
| character_set_client     | utf8mb4                                   |
| character_set_connection | utf8mb4                                   |
| character_set_database   | latin1                                    |
| character_set_filesystem | binary                                    |
| character_set_results    | utf8mb4                                   |
| character_set_server     | latin1                                    |
| character_set_system     | utf8                                      |
| character_sets_dir       | /rdsdbbin/mysql-5.7.26.R1/share/charsets/ |
+--------------------------+-------------------------------------------+

In the original DB the Hebrew letters appears as black diamonds (Converted to Hebrew in app side). After the migration to the new MySQL instance, instead of black diamonds i see questions marks.

What could be the issue?




Solution

I think your problem is that you have exported your dump file with ANSI character set, to take a dump file using UTF8, issue your mysqldump command with option --default-character-set=utf8 and check the results. (AFAIK the same options for mysqldump work for Aurora as well)

PS : After your comments, I dug the problem a bit more. The problem comes from different configurations for character-set-system (which is utf8), character_set_server (which is latin1) with character-set-client (which is utf8mb4). Based on this link to documentation, this difference might cause issues in data input and may result in bad format in output. You can check the validity of this root cause by changing your character sets for different components.





Comments (5)

  • +0 – Thanks. I've done that procedure (dump and import) so many times before and never had this proble. What could be the difference now? Is this related to AWS? — Jan 26, 2020 at 10:32  
  • +0 – It should be, check with AWS to get their input, I've been working with UTF8 throughout my entire career in database and your issue is simply a character set convert that happens in the background also, check some data inside your dump file, see if it contains hebrew or question marks — Jan 26, 2020 at 10:38  
  • +0 – Hi. I've just tried to dump the data with the flag --default-character-set=utf8 and got the same result with the question marks. Any other idea? — Jan 26, 2020 at 17:15  
  • +0 – Change all of the character sets within your database to utf8, also, check that the server you are taking dump from is supporting utf8. Did you check the data within the dump file itself? — Jan 27, 2020 at 05:03  
  • +0 – I've changed all the the character sets to ut8 - Didn't help. What do you mean by " check that the server you are taking dump from is supporting utf8"? I'm taking the dump in Linux server which is obviously supporting ut8. The data in the dump file is with question marks. — Jan 28, 2020 at 17:09  


External Links

External links referenced by this document: