German Extension
Tue, 25 Feb 2025 00:00:00 GMT
Lately I've been working on a German popup dictionary extension. That is, a German version of the Zhongwen dictionary.
I've learned a lot. Before this, I have never worked with browser extensions before. I deep dived into the source code of Zhongwen, and learned that with extensions, you have "front" and "back" scripts. AKA content.js and background.js. As the names imply, backgrounds scripts do not have access to the DOM, and they live separately from the content script which loads directly on whatever page the user is opening. Thus, they have to communicate with each other via messages. Flashbacks to my days working with web workers.
Besides learning how to work with web extensions for the first time, what's been challenging is the dictionary data itself.
You see, Zhongwen uses an CC-CEDICT, which is a complete dataset for Chinese-English dictionary.It's there and readily available. But for German, I've yet to know of similar projects.
However, for all those time I've spent learning German, I've been relying on two sources for lookups. First, Cambridge dictionary for a quick search and a polished UI. Second, Wiktionary for a more comprehensive read.
Therefore, I look for a dataset from Wiktionary. Wiktionary does offer data dumps, but they are too raw and I know parsing them and extracting just what I need for this project would be a huge pain in the ass. Luckily, there's Kaikki, which extracts and cleans data from those dumps and makes it readily available for lazy folks like me to download. Thank you Kaikki.
The data still needs some modification to work with what I have in mind though. Plus, the German-English dictionary data alone is more than 900MB! Who would even wants to download almost 1GB of data just to look up some German words? So there I go, pruning the data down to what is actually needed. Granted, it is still 85MB by the time I'm writing this, but I believe it's already a huge improvement over 900MB. (pls let me have a moment of glory, i will come back to minimize it even further)
And there's the index, a fascinating thing. See, Zhongwen splits their dictionary data into the actual data, and an index file. The index stores the word, and the offset into the data file where you can actually find the description of the word. In this way, a query will only needs to go through the much shorter index file, and then afterwards jump to the actual data file to get the description. Smart.
So I've gone and done something similar. But because my dictionary data is in the form of JSONL (each line is a separate JSON object), my index file points to the line offset instead of character offset.
Anyway, it serves a similar purpose, and in the end I've got a working dictionary extension. I even got a logo and everything. Problem? Chrome Web Store does not accept extensions with manifest version 2 anymore. And upgrading to version 3 means a whole host of new changes that I would need more time to conquer. So for now, the extension is only available inside my own computer while I'm reading up about extension service worker and chrome.storage API. It's a whole new world.
Just had a small teeny silly dispute over a shower thought today. I told my friend that wouldn't it be nice to host a huge image, say 5GB, onto a website and trick a friend to open it. Wouldn't their entire data plan be used up? They said it is not possible. Anyway, the answer seems to be that it is technically possible, with some huge limitations:
-
The image would put a huge strain on the server, and the server would have to be able to handle the load.
-
You can pass that responsibility to, say, Amazon S3. But then you would have to pay for the data transfer. And it would be a lot.
-
The client would have to be able to download the entire 5GB image. This means that the client would have to have a download limit of at least 5GB.
-
Chances are the image would be sent in chunks, so the client may decide to stop the download at any time. This means that the client will not use up their entire data plan immediately.
-
A huge image is sus. Even if the client is naive, the browser still have checks in place to prevent such things from happening.
So all in all, not a worthwhile prank. Maybe we should just stick with hosting a slightly larger image to crash the browser of those who use old phones.