fix: character issues with umlauts #471

Skyliife · 2025-04-18T12:22:27Z

link to issue: #470

Wrap incoming HTML in charset.NewReader before goquery parsing
Ensures ISO‑8859‑1 (and other legacy) input is normalized to UTF‑8
Prevents “mojibake” (e.g. “Ã¤” instead of “ä”)
Updated TestWorldAntica to simulate Latin‑1 input and verify correct Umlaut decoding
Added Antica.html for parsing character Näurin

Closes #470

link to issue: TibiaData#470 - Wrap incoming HTML in charset.NewReader before goquery parsing - Ensures ISO‑8859‑1 (and other legacy) input is normalized to UTF‑8 - Prevents “mojibake” (e.g. “Ã¤” instead of “ä”) - Updated TestWorldAntica to simulate Latin‑1 input and verify correct Umlaut decoding - Added Antica.html for parsing character Näurin Closes TibiaData#470

- fix for character endpoint - Replace custom TibiaDataConvertEncodingtoUTF8 with golang.org/x/net/html/charset.NewReader - Use the actual Content‑Type header from Tibia.com to normalize response bytes into UTF‑8 - Remove resIo/resIo2 steps and feed the UTF‑8 reader directly into goquery

- cleanup

sonarqubecloud · 2025-04-18T14:51:07Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

Skyliife · 2025-04-18T14:54:01Z

@tobiasehlert I’ve updated the HTML collector to use charset.NewReader with the real Content-Type header instead of our custom converter, so incoming pages should now be normalized to proper UTF‑8 and preserve Umlauts (e.g. “Näurin”). I’m not super familiar with all the Go idioms here, so I’d really appreciate if someone could double check my changes.

Skyliife marked this pull request as ready for review April 18, 2025 12:23

Skyliife added 2 commits April 18, 2025 16:47

fix: proper decode legacy encodings in HTML collector for Umlauts

b82bd71

- cleanup

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: character issues with umlauts #471

fix: character issues with umlauts #471

Skyliife commented Apr 18, 2025

sonarqubecloud bot commented Apr 18, 2025

Skyliife commented Apr 18, 2025

fix: character issues with umlauts #471

Are you sure you want to change the base?

fix: character issues with umlauts #471

Conversation

Skyliife commented Apr 18, 2025

sonarqubecloud bot commented Apr 18, 2025

Quality Gate passed

Skyliife commented Apr 18, 2025