Latest attack on PyPI users shows crooks are only getting better

NOT THE PYPI PACKAGE YOU’RE LOOKING FOR — Latest attack on PyPI users shows crooks are only getting better The code found in the malicious packages closely resembled legit offerings.

Dan Goodin – Feb 14, 2023 11:37 pm UTC EnlargeGetty Images reader comments 35 with Share this story Share on Facebook Share on Twitter Share on Reddit

More than 400 malicious packages were recently uploaded to PyPI (Python Package Index), the official code repository for the Python programming language, in the latest indication that the targeting of software developers using this form of attack isnt a passing fad.

All 451 packages found recently by security firm Phylum contained almost identical malicious payloads and were uploaded in bursts that came in quick succession. Once installed, the packages create a malicious JavaScript extension that loads each time a browser is opened on the infected device, a trick that gives the malware persistence over reboots.

The JavaScript monitors the infected developers clipboard for any cryptocurrency addresses that may be copied to it. When an address is found, the malware replaces it with an address belonging to the attacker. The objective: intercept payments the developer intended to make to a different party.

In November, Phylum identified dozens of packages, downloaded hundreds of times, that used highly encoded JavaScript to surreptitiously do the same thing. Specifically, it: Created a textarea on the page Pasted any clipboard contents to it Used a series of regular expressions to search for common cryptocurrency address formats Replaced any identified addresses with the attacker-controlled addresses in the previously created textarea Copied the textarea to the clipboard

If at any point a compromised developer copies a wallet address, the malicious package will replace the address with an attacker-controlled address, Phylum Chief Technical Officer Louis Lang wrote in the November post. This surreptitious find/replace will cause the end user to inadvertently send their funds to the attacker. New obfuscation method

Besides vastly increasing the number of malicious packages uploaded, the latest campaign also uses a significantly different way to cover its tracks. Whereas the packages disclosed in November used encoding to conceal the behavior of the JavaScript, the new packages write function and variable identifiers in what appear to be random 16-bit combinations of Chinese language ideographs found in the following table: Advertisement Unicode code point Ideograph Definition 0x4eba ? man; people; mankind; someone else 0x5200 ? knife; old coin; measure 0x53e3 ? mouth; open end; entrance, gate 0x5973 ? woman, girl; feminine 0x5b50 ? child; fruit, seed of 0x5c71 ? mountain, hill, peak 0x65e5 ? sun; day; daytime 0x6708 ? moon; month 0x6728 ? tree; wood, lumber; wooden 0x6c34 ? water, liquid, lotion, juice 0x76ee ? eye; look, see; division, topic 0x99ac ? horse; surname 0x9a6c ? horse; surname 0x9ce5 ? bird 0x9e1f ? bird

Using this table, the line of code ”.join(map(getattr(__builtins__, oct.__str__()[-3 << 0] + hex.__str__()[-1 << 2] + copyright.__str__()[4 << 0]), [(((1 << 4) – 1) << 3) – 1, ((((3 << 2) + 1)) << 3) + 1, (7 << 4) – (1 << 1), ((((3 << 2) + 1)) << 2) – 1, (((3 << 3) + 1) << 1)]))

creates the built-in function chr and maps the function to the list of integers [119, 105, 110, 51, 50]. Then the line combines it into a string that ultimately creates ‘win32’.

Phylum researchers explained: We can see a series of these kinds of calls oct.__str__()[-3 << 0]. The [-3 << 0] evaluates to [-3] and oct.__str__() evaluates to the string ‘<built-in function oct>’. Using Pythons index operator [] on a string with a -3 will grab the 3rd character from the end of the string, in this case ‘<built-in function oct>'[-3] will evaluate to ‘c’. Continuing with this on the other 2 here gives us ‘c’ + ‘h’ + ‘r’ and simply evaluating the complex bitwise arithmetic tacked on to the end leaves us with: ”.join(map(getattr(__builtins__, ‘c’ + ‘h’ + ‘r’), [119, 105, 110, 51, 50]))

The getattr(__builtins__, ‘c’ + ‘h’ + ‘r’) just gives us the built-in function chr and then it maps chr to the list of ints [119, 105, 110, 51, 50] and then joins it all together into a string ultimately giving us ‘win32’. This technique is continued throughout the entirety of the code.

While giving the appearance of highly obfuscated code, the technique is ultimately easy to defeat, the researchers said, simply by observing what the code does when it runs.

The latest batch of malicious packages attempts to capitalize on typos developers make when downloading one of these legitimate packages: bitcoinlib ccxt cryptocompare cryptofeed freqtrade selenium solana vyper websockets yfinance pandas matplotlib aiohttp beautifulsoup tensorflow selenium scrapy colorama scikit-learn pytorch pygame pyinstaller

Packages that target the legitimate vyper package, for instance, used 13 file names that omitted or duplicated a single character or transposed two characters of the correct name: yper vper vyer vype vvyper vyyper vypper vypeer vyperr yvper vpyer vyepr vypre

This technique is trivially easy to automate with a script (we leave this as an exercise for the reader), and as the length of the name of the legitimate package increases, so do the possible typosquats, the researchers wrote. For example, our system detected 38 typosquats of the cryptocompare package published nearly simultaneously by the user named pinigin.9494.

Further ReadingHow a college student tricked 17k coders into running his sketchy scriptThe availability of malicious packages in legitimate code repositories that closely resemble the names of legitimate packages dates back to at least 2016 when a college student uploaded 214 booby-trapped packages to the PyPI, RubyGems, and NPM repositories that contained slightly modified names of legitimate packages. The result: The imposter code was executed more than 45,000 times on more than 17,000 separate domains, and more than half were given all-powerful administrative rights. So-called typosquatting attacks have flourished ever since.

The names of all 451 malicious packages the Phylum researchers found are included in the blog post. Its not a bad idea for anyone who intended to download one of the legitimate packages targeted to double-check that they didnt inadvertently obtain a malicious doppelganger. reader comments 35 with Share this story Share on Facebook Share on Twitter Share on Reddit Dan Goodin Dan is the Security Editor at Ars Technica, which he joined in 2012 after working for The Register, the Associated Press, Bloomberg News, and other publications. Find him on Mastodon at: https://infosec.exchange/@dangoodin Email dan.goodin@arstechnica.com Advertisement Channel Ars Technica ← Previous story Next story → Related Stories Today on Ars