Skip to content

fix: character issues with umlauts #471

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Skyliife
Copy link

link to issue: #470

  • Wrap incoming HTML in charset.NewReader before goquery parsing
  • Ensures ISO‑8859‑1 (and other legacy) input is normalized to UTF‑8
  • Prevents “mojibake” (e.g. “ä” instead of “ä”)
  • Updated TestWorldAntica to simulate Latin‑1 input and verify correct Umlaut decoding
  • Added Antica.html for parsing character Näurin

Closes #470

link to issue: TibiaData#470

- Wrap incoming HTML in charset.NewReader before goquery parsing
- Ensures ISO‑8859‑1 (and other legacy) input is normalized to UTF‑8
- Prevents “mojibake” (e.g. “ä” instead of “ä”)
- Updated TestWorldAntica to simulate Latin‑1 input and verify correct Umlaut decoding
- Added Antica.html for parsing character Näurin

Closes TibiaData#470
@Skyliife Skyliife marked this pull request as ready for review April 18, 2025 12:23
- fix for character endpoint
- Replace custom TibiaDataConvertEncodingtoUTF8 with golang.org/x/net/html/charset.NewReader
- Use the actual Content‑Type header from Tibia.com to normalize response bytes into UTF‑8
- Remove resIo/resIo2 steps and feed the UTF‑8 reader directly into goquery
@Skyliife
Copy link
Author

@tobiasehlert I’ve updated the HTML collector to use charset.NewReader with the real Content-Type header instead of our custom converter, so incoming pages should now be normalized to proper UTF‑8 and preserve Umlauts (e.g. “Näurin”). I’m not super familiar with all the Go idioms here, so I’d really appreciate if someone could double check my changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

bug: character issues with umlauts
1 participant