Want to join in? Respond to our weekly writing prompts, open to everyone.
Want to join in? Respond to our weekly writing prompts, open to everyone.
from
đ
Fabled Entry
An episode of the palace Where the sky took up rain For Cooks and a dream The cougar that waits To cross by the stream Was an early river Taking chance
We in Ontario Take timing breaks to exist And enjoy open play For Hammond At the cenotaph Laurie at the gate Solemn news, Was war And I printed last Summer For a wear of resistance Typing rain And hearing doughnuts The simplest mood But afraid of existence- For its afterwards Laying on a table Being fed As time goes up
And so by dawn I work carefully But to know an amend I like peace And am on the phone
đ±
from
đ
Xylophone
David sat here By the rooftop, looking South A prayer for the first responders They were German and feeling well Six pairs of lungs today A solemn bit of Earth being turned A thousand trillion Euros for keep Kids on notice- There was a war and an accident Three years for better days A stink for redemption But the peers in line- Weâre not our best We invest in freedom And finding our renew The Earthâs project And just at last An attempted standing Will see the coup And bear on our Sun In perfect hiding For his law- The one of the land And only day In his life To recover- Unarmed And likely injured For poetic frost
from
jolek78's blog
3:00 AM. Another one of those nights where my brain decided sleep was overrated. After my usual nocturnal walk through the streets of a remote Scottish townâwhere even a fox observed me with that âhumans are weirdâ lookâI sat back down at my server. Just a quick scan of my RSS feeds, I told myself, then I can start work. When...
We backed up Spotify (metadata and music files). It's distributed in bulk torrents (~300TB), grouped by popularity. This release includes the largest publicly available music metadata database with 256 million tracks and 186 million unique ISRCs. It's the world's first âpreservation archiveâ for music which is fully open (meaning it can easily be mirrored by anyone with enough disk space), with 86 million music files, representing around 99.6% of listens.
The news came from Anna's Archiveâthe world's largest pirate libraryâwhich had just scraped Spotify's entire catalog. Not just metadata, but also the audio files. 86 million tracks, 300 terabytes. I stopped to reread those numbers, then thought: holy shit, how big is this thing?
And so, while the rest of the world slept, I started digging. This is one of those stories that needs to be toldâa story weaving together hacker idealism, technology, billions of dollars in AI training data, and an ethical paradox few want to truly confront.
November 3, 2022. The FBI seized Z-Library's domains, one of the world's largest pirate libraries. Two alleged operators were arrested in Argentina. The community panickedâZ-Library served millions of students, researchers, and readers. And suddenly, everything vanished.
But someone was prepared. A group called PiLiMi (Pirate Library Mirror) had created complete backups of all shadow libraries for years. LibGen, Z-Library, Sci-Hub. Everything. When Z-Library fell, these backups were ready. But there was a problem: petabytes of unusable data with no way to search them.
Enter Anna Archivistâa pseudonym, probably a collectiveâwho understood something fundamental: preserving data is useless if it's not accessible. Days after Z-Library's seizure, Anna's Archive was online with a meta-search engine aggregating all shadow library catalogs, making them searchable andâcruciallyâvirtually impossible to censor.
December 2025:
To put this in perspective: the sum of all academic knowledge produced by humanity, plus a gigantic slice of world literary production, plus now music. All indexed, searchable, downloadable. Free. And virtually impossible to shut down.
Remember Napster? Centralized servers, one lawsuit, shut down in a day. BitTorrent learned from thatâdecentralized everything. But Anna's Archive goes further, combining layers of resilience that make it practically immortal:
Distributed Frontend: Multiple domain mirrors (.li, .se, .org, .gs), Tor hidden service, Progressive Web App that works offline. Block one, others continue.
Distributed Database: Elasticsearch + PostgreSQL + public API. Anyone can download the entire database and host their own instance. No central server to attack.
Distributed Files: This is the genius part. Anna's Archive hosts almost nothing directly. Instead:
Result: user downloads via normal HTTP, but content comes from a decentralized network. Can't shut down IPFS. Can't stop BitTorrent. Can block gateways, but hundreds exist and anyone can create new ones.
OpSec: Domains registered via privacy-focused Icelandic registrar, bulletproof hosting in non-cooperative jurisdictions, Bitcoin payments, PGP-encrypted communications, zero personal information.
The only way to stop Anna's Archive would be to shut down the internet. Or convince every single seeder to stop. Good luck.
And here's where it gets disturbing.
February 2025. Documents from Kadrey v. Meta are unsealedâa class action by authors against Meta for using their pirated books to train Llama AI models. Internal emails reveal a shocking timeline:
October 2022 â Melanie Kambadur, Senior Research Manager:
I don't think we should use pirated material. I really need to draw a line there.
Eleonora Presani, Meta employee:
Using pirated material should be beyond our ethical threshold. SciHub, ResearchGate, LibGen are basically like PirateBay... they're distributing content that is protected by copyright and they're infringing it.
January 2023 â Meeting with Mark Zuckerberg present:
[Zuckerberg] wants to move this stuff forward, and we need to find a way to unblock all this.
April 2023 â Nikolay Bashlykov, Meta engineer:
Using Meta IP addresses to load through torrents pirate content... torrenting from a corporate laptop doesn't feel right.
2023-2024: The Operation
Meta downloaded:
Method: BitTorrent client on separate infrastructure, VPN to obscure origin, active seeding to other peers. Result: 197,000 copyrighted books integrated into Llama training data.
Judge Vince Chhabria (Northern District California) applied the four-factor fair use test. The decision is legally fascinating and ethically disturbing.
Factor 1 â Transformative Use: Meta wins decisively. The judge ruled AI training is âspectacularly transformativeââfundamentally different from human reading. The purpose isn't to express the content but to learn statistical relationships between words.
Factor 2 â Nature of Work: Neutral. Creative fiction gets more copyright protection than factual works, but this didn't tip the scales either way.
Factor 3 â Amount Used: Meta wins. Even though they used entire books, the judge found this necessary for training. You can't cherry-pick sentences and expect an AI to learn language patterns.
Factor 4 â Market Effect: This is where the judge's discomfort shows through:
Generative AI has the potential to flood the market with endless amounts of images, songs, articles, books... So by training generative AI models with copyrighted works, companies are creating something that often will dramatically undermine the market for those works, and thus dramatically undermine the incentive for human beings to create things the old-fashioned way.
He sees the problem clearly. AI trained on copyrighted works will compete with and potentially destroy the market for those very works. But the plaintiffs couldn't prove specific economic harm with hard data.
The final ruling: âGiven the state of the record, the Court has no choice but to grant summary judgment.â Meta wins on these specific facts. But the judge adds a critical caveat: âIn most cases, training LLMs on copyrighted works without permission is likely infringing and not fair use.â
Meta didn't win because what they did was legitimate. They won because the authors' lawyers didn't build a strong enough evidentiary case. It's a technical legal victory that sidesteps the ethical question entirely.
The precedent this sets is chilling: AI companies can pirate with relative impunity if they have good lawyers and plaintiffs can't prove specific damages.
Scenario A (legal):
Scenario B (what they did):
Meta's savings: $45-95 million
And now every AI company knows: download from Anna's Archive, risk a lawsuit with weak evidence, save tens of millions.
Anna's Archive also revealed they provide âSFTP bulk access to approximately 30 companiesââprimarily Chinese LLM startups and data brokersâwho contribute money or data. DeepSeek publicly admitted using Anna's Archive data for training. No consequences in Chinese jurisdiction.
There's a ghost here. His name is Aaron Swartz, and his story illuminates everything wrong with how we treat information access.
2011: Aaron, 24, brilliant programmer, Reddit co-founder, and information freedom activist, connected to MIT's network and downloaded 4.8 million academic papers from JSTOR. His intent was to make publicly-funded research freely available. He wasn't enriching himself. He was acting on principle.
The response was swift and brutal. Federal prosecutors threw the book at him: 13 felony charges, maximum penalty of 50 years in prison and $1 million in fines. For downloading academic papers. The prosecution was led by U.S. Attorney Carmen Ortiz, who called it âstealing is stealing, whether you use a computer command or a crowbar.â
The pressure was immense. Aaron faced financial ruin, decades in prison, complete destruction of his life. In January 2013, at age 26, he hanged himself. His family and partner blamed the aggressive prosecution. The internet mourned a brilliant mind and passionate advocate crushed by prosecutorial overreach.
Now consider the parallel:
Aaron Swartz: 4.8 million papers â federal persecution, suicide at 26
Meta: 162 TB (~162 million papers) â wins in court, saves $95 million
Aaron was an individual acting on idealistic principles about information freedom. Meta is a trillion-dollar corporation acting on profit motives. Aaron faced the full weight of federal prosecution. Meta faced a civil lawsuit they successfully defended with their massive legal team.
The system punishes idealism and rewards profit. The disparity isn't just unjustâit reveals something fundamental about who gets to break rules and who doesn't.
Anna's Archive claims to fight publishing monopolies and inequality in access to knowledge. But the reality:
Who benefits most?
Resources needed to benefit:
Only big tech can afford this. The result:
But what about students in the Global South?
This is where the story gets complicated, because the benefits are real and they matter immensely.
Consider a medical student in India. Her family earns about $400/month. A single medical textbook costs $300-500. She needs fifteen of them. The math is impossible. Her options: don't graduate, or Anna's Archive. She chose the latter and completed her degree. She's now a practicing physician.
Or take a PhD researcher in South Africa studying climate change impacts. The critical papers for his dissertation are behind Elsevier's paywall at $35 each. He needs twenty papers minimumâ$700 his university can't afford. Without Sci-Hub (accessible through Anna's Archive), his dissertation would have been impossible. He completed it, published findings that inform local climate policy.
An art history teacher in Argentina wanted to enrich her curriculum with Renaissance art analysis. The books she needed weren't available in local libraries. Importing them? Prohibitive between shipping costs and customs. Anna's Archive gave her access to rare texts that transformed her teaching.
The data backs this up: literature review times for researchers in developing countries reduced 60-80%. Citation patterns show researchers in Nigeria, Bangladesh, Ecuador now cite contemporary research at parity with Harvard and Oxford. Publications from developing countries have increased. Methodological quality has improved. International collaborations have expanded.
This matters. This changes lives. This is not hypothetical.
The problem is: both things are simultaneously true.
But Meta downloaded more data in one week than all Indian students download in a year. How do we square that?
To understand why Anna's Archive exists and why it's grown so explosively, you need to understand how fundamentally broken academic publishing has become.
Here's the perverse cycle:
Today, over 70% of academic papers sit behind paywalls. Access costs $35-50 per paper for individuals, or $10,000-100,000+ per year for institutional subscriptions. Universities in developing countries simply cannot afford these subscriptions. Neither can most universities in developed countriesâHarvard famously called journal subscription costs âfiscally unsustainableâ in 2012.
The system extracts free labor from researchers, locks up publicly-funded research behind paywalls, charges exorbitant fees to access it, and funnels enormous profits to publishers who add relatively little value. Academic institutions create the knowledge, do the quality control, and then pay again to access their own work.
Sci-Hub and Anna's Archive didn't emerge from nowhere. They're responses to a genuinely broken system. The question is whether they're the right responseâand who ultimately benefits most from that response.
Anna's Archive can't discriminate because:
IPFS and BitTorrent are magnificent tools for resisting censorship. But resistance to censorship also means resistance to ethical control. You can't have one without the other.
The system is structurally designed to be unkillable. Which also means it's structurally designed to serve whoever has the resources to benefit most.
December 2025: Anna's Archive announced they'd scraped Spotify. The same preservation narrative, the same pattern. 256 million tracks, 86 million audio files, 300TB available to anyone with the infrastructure to use it.
âThis Spotify scrape is our humble attempt to start such a 'preservation archive' for music,â they wrote. The justification mirrors the books argument: Spotify loses licenses, music disappears; platform risk if Spotify fails; regional blocks prevent access; long tail poorly preserved.
All true. But who downloads 300TB of music? Not the kid in Malawi who just wants to listen to his favorite artist. ByteDance, training the next AI music generator. Startups building Spotify competitors. The same companies with compute budgets in the tens of millions.
Anna's Archive is pivoting from text to multimedia, and each escalation follows a predictable pattern:
With each escalation:
And the international precedent is already being set. Japan's AI Minister (January 2025) stated explicitly: âAI companies in Japan can use whatever they want for AI training... whether it is content obtained from illegal sites or otherwise.â
The message from governments: pirate freely if it serves AI supremacy. We're in a race to the bottom where copyright becomes meaningless for AI training, and the companies with the most resources benefit most.
I started from that sleepless night, 256 million songs in an RSS feed, and ended up here with more questions than answers.
Anna's Archive is a technological marvelâIPFS, BitTorrent, distributed databases creating something genuinely uncensorable. It's also a lifeline for millions of students and researchers locked out of knowledge by an exploitative publishing system. And simultaneously, it's the largest intellectual property expropriation operation in history, saving corporations hundreds of millions while creators receive nothing.
All of these things are true at once. This isn't a simple story with heroes and villains.
The academic publishing system is genuinely broken. Researchers create knowledge for free, review it for free, then their institutions must pay exorbitant fees to access it while publishers extract 35-40% profit margins. This system deserves to be disrupted.
But Anna's Archive isn't disrupting it equitably. The architecture that makes it uncensorable also makes it impossible to distinguish between a student in Lagos accessing a textbook and Meta downloading 162TB for AI training. You can't have selective resistance to censorshipâit's all or nothing.
Aaron Swartz died fighting for information freedom with idealistic principles. Meta achieves the same result with corporate profit motives and walks away victorious. The system rewards power and punishes principle.
Can this be fixed? Copyright reform moves at the speed of politicsâyears, decades. Compulsory licensing for AI training? Just beginning to be discussed. Open access mandates? Facing massive publisher resistance. Meanwhile, Anna's Archive operates at the speed of software, and data flows freely to those with $100M compute clusters.
The question isn't whether Anna's Archive will be stoppedâit won't be, that's the point of the architecture. The question is what world we're building where the same technology that liberates a medical student in India also bankrolls Meta's AI ambitions, and we can't separate one from the other.
I don't have answers. I have a functioning IPFS node, a Tor relay, and the uncomfortable knowledge that every byte I help distribute might be saving a researcher's career or training someone's proprietary AI model. Probably both.
Free for everyone. The problem is that âeveryoneâ has very different resources to benefit from that freedom.
Now, if you'll excuse me, I'm going to check how much bandwidth my nodes are using. And reflect on whether participation is complicity or resistance. Maybe it's both. Maybe that's the point.
#AnnaArchive #AI #Copyright #AaronSwartz #Meta #AcademicPublishing #IPFS #InformationFreedom
from Unvarnished diary of a lill Japanese mouse
JOURNAL
29 décembre 2025
Mamie et papi sont partis se coucher, nous on a l'auberge pour nous toutes seules. On s'est installĂ©es autour du foyer, on a allumĂ©s trois bouts de bois et une bougie, et on se fait chauffer du sakĂ© tranquillement. Quelle fĂȘte ! On est au temps des shogun soudain â sauf l'Ă©cran du cellphone, je vais l'Ă©teindre, ça va pas du tout dans le dĂ©cor. On est heureuses on aimerait que ça dure comme ça mille annĂ©es, on sait bien que câest fugace alors on en profite on se baigne dedans.
from
Olhar Convexo
AVISO: Este texto contém material que pode ser inadequado para alguns leitores.
Quem nĂłs serĂamos sem nossos prazeres?
Bom, vejamos os 4 prazeres bĂĄsicos que todo ser humano possui. Todo ser humano possui desejo de transar; beber; comer e dormir. Afinal sĂŁo os desejos que a vida necessita. âReprodução, sede, fome e sono.â
Neste texto, a questĂŁo a ser abordada Ă© outra. Ă o hiperestimulo que acaba causando problemas numa parte especĂfica do cĂ©rebro chamada cĂłrtex prĂ©-frontal. Essa parte Ă© responsĂĄvel pelo domĂnio da atenção.
Como qualquer droga, o vĂcio, ou melhor, a dependĂȘncia, pode virar uma doença, a depender do nĂvel em que esteja.
NĂłs ânos permitimosâ criar uma doença: o vĂcio do uso de celular, chamado de nomofobia.
Como qualquer dependĂȘncia, a nomofobia estĂĄ sendo tratada pela medicina como uma doença â e Ă© o que deve acontecer de fato.
(Nota: Quando aplicamos a lei que proĂbe o uso de celulares nas escolas, na minha visĂŁo, irĂamos ter adolescentes nomofĂłbicos em todos os cantos. E foi o que de fato aconteceu.)
Por que trago esse assunto, no meio dos 4 desejos mĂnimos humanos?
Porque é o mais pronunciado na nossa sociedade na época de hoje. E derivado dele, nasce o imediatismo. Nasce também a inquietação e o TDAH (Transtorno de Déficit de Atenção e Hiperatividade).
HĂĄ uma parcela da geração revolucionĂĄria (que hoje estĂĄ na meia idade) que acredita que estamos âfazendo diagnĂłsticos em excessoâ; especialmente de TDAH, mas tambĂ©m de TEA (Transtorno do Espectro Autista).
A geração revolucionåria não vivenciou o que é vivenciado hoje pela geração mais afetada, a geração.com.
A geração.com vivenciou picos de dependĂȘncia por excesso de uso de celular; vivenciou picos de uso de tecnologia em geral, vivenciou e grande parte e, ainda vivencia o hiperestĂmulo que os vĂdeos curtos, os reels (Instagram) e o TikTok fornecem.
O ato de passar para cima para ver um vĂdeo de gatinho seguido do outro, Ă© uma doença! Especialmente porque nĂŁo sĂŁo dois ou trĂȘs vĂdeos, sĂŁo 400 seguidos que o jovem nĂŁo se dĂĄ conta que o algoritmo jĂĄ o fez levar a assistir 398 vĂdeos a mais do que era o desejo dele.
Uma novidade: hoje existem novelas â repito â NOVELAS â no formato reels.
Essas novelas sĂŁo projetadas para criar mais imediatismo e mais dependĂȘncia.
Elas NĂO possuem intervalo entre as falas - atĂ© porquĂȘ a geração.com nĂŁo aguentaria aguardar e passaria o vĂdeo.
Essas novelas sĂŁo projetadas para o vĂcio.
(NĂŁo que as novelas comuns nĂŁo sejam).
Mas o potencial de adicção é extremo.
A saĂșde da geração.com Ă© algo que se vĂȘ como delicada, mas essa geração tem seus prĂłprios problemas que foram projetados para afetĂĄ-la.
HĂĄ um questionamento na sociedade cientĂfica de fato de que possamos estar fazendo muitos diagnĂłsticos sem os devidos critĂ©rios, mas ao mesmo tempo mais pessoas estĂŁo obtendo acesso Ă mĂ©dicos especialistas, e Ă informação, que se tornou essencial para questionar os âhiperdiagnĂłsticosâ. A conclusĂŁo? Mais pessoas estĂŁo expostas a problemas causados pelo celular, e uma gigantesca quantidade de pessoas obteve acesso a cuidados mĂ©dicos, fazendo o nĂșmero de diagnĂłsticos crescer exponencialmente. Mas de fato, estamos mais doentes do que em qualquer outra Ă©poca.
Hoje em dia, nĂŁo Ă© mais vencer a dependĂȘncia a alguma droga que Ă© o ĂĄpice.
O ĂĄpice Ă© vencer a dependĂȘncia do uso do celular.
Rio de Janeiro,
29 de Dezembro de 2025.
FONTE: https://pubmed.ncbi.nlm.nih.gov/35253285/
from An Open Letter
I started a new workout routine, no longer doing my own but using a PPLUL from the app I use. And holy shit, that leg day beat the fuck out of me. I feel good again. I think I miss that intensity and level of pain, and overcoming that helps so much. Wanting to quit and cut it out but not really helps me a lot.
from
Justina Revolution
I am quite fast on my feet and I am very, very dextrous in my foot placement. My sparring partners have always marveled at how easily I can traverse distances and remain just out of reach of their strikes.
I credit this to my Fut Gar training. I practice four short stepping drills that enable me to absolutely focus on dropping my weight and delivering power from every possible stance and position.
These Butterfly Drills contain the true essence of Southern Shaolin and have enhanced my fighting capabilities by forcing me to endlessly drill real positions over and over until finally I cannot get them wrong.
It improved my kickboxing and grappling abilities by enabling me to be stable even in the most awkward positions.
from
The EuropeâChina Monitor

Because internships involve work and work-related activities, they are treated as employment under Chinese immigration law. According to official Chinese government sources, lawful work in China requires a Z visa, a Foreignerâs Work Permit, and a work-type residence permit. This would automatically exclude those on the following visas from legally undertaking an internship in China:
The F visa is a non-commercial visa for foreigners entering China for exchanges, visits, or study tours. As it does not include work authorisation or permit income-generating activities, it cannot be used for employment or internships, which by nature involve work and remuneration. According to the Beijing Authorities, a person in possession of an F visa cannot obtain an Employment Permit (Electronic Social Security Card) or Work-Type Residence which are mandatory for work in China.
The China Business Visa (M visa) is issued to foreigners for commercial and trade activities, such as visiting clients, attending trade fairs, and meeting business partners â not for employment. In addition, on most China business (M) visas, holders can stay in China for a limited period (often 30â120 days per visit), and if longer continuous time in the country is needed, travelers may need to exit and re-enter or apply for an extension. The M Visa cannot be directly converted to a residence permit in China.
The China L Visa (Tourist Visa) is for foreigners visiting China for sightseeing, tourism, or visiting friends/relatives, allowing for short stays (often 30-90 days).
The X1 visa permits study exceeding 180 days, while the X2 visa is limited to study periods of less than 180 days. In some regions of China, students may apply for permission to engage in part-time work; however, this involves additional administrative requirements.
Such permission can only be applied for after arrival in China and requires formal approval and official documentation from the university or college, the employer, immigration and, in some cases, the municipal authorities. Another potential issue is that not all educational institutions or employers are authorised or willing to support applications for part-time work or internships.
The Z visa is widely regarded as the gold standard for internships and work-related activities in China. The key benefits of holding a Z visa for an internship include the following:

The China International Leadership Programme offers applicants Z visa sponsorship, a work-type residence permit, and an Employment Permit (Electronic Social Security Card), allowing participants to legally work and receive remuneration in China.
In addition, the programmeâs HSK-aligned Mandarin language lessons and immersion are delivered as work-related activities, supporting participants in carrying out the teaching and internship components more effectively and enabling clear, professional communication within a Chinese working environment.
https://youtu.be/3NL2hWs6XT0?si=WRNlK14cwlmeaoqN
© 2025 Europe China Monitor News Team
from
wystswolf

The consequences of a touched eyeball are that you can run, but you cannot hide.
For Jehovah will show mercy to Jacob, and he will again choose Israel. He will settle them in their land, and the foreign residents will join them and attach themselves to the house of Jacob.
And peoples will take them and bring them to their own place, and the house of Israel will possess them as male and female servants in Jehovahâs land; and they will be the captors of those who held them captive, and they will have in subjection those who were forcing them to work.
In the day when Jehovah gives you rest from your pain and from your turmoil and from the hard slavery imposed on you, you will recite this proverb against the king of Babylon:
How the one forcing others to work has met his end! How the oppression has ended!
Jehovah has broken the rod of the wicked, the staff of the rulers, the one furiously striking peoples with unceasing blows, the one angrily subduing nations with relentless persecution.
The whole earth now rests, free of disturbance. People cry out for joy.
Even the juniper trees rejoice over you, along with the cedars of Lebanon. They say, âEver since you have fallen, no woodcutter comes up against us.â
Even the Grave underneath is stirred up to meet you when you come. Because of you, it awakens those powerless in death, all the oppressive leaders of the earth. It makes all the kings of the nations rise from their thrones.
All of them speak up and say to you: âHave you also become weak like us? Have you become like us?
Down to the Grave your pride has been brought, the sound of your stringed instruments. Maggots are spread beneath you as a bed, and worms are your covering.â
How you have fallen from heaven, O shining one, son of the dawn! How you have been cut down to the earth, you who vanquished nations!
You said in your heart, âI will ascend to the heavens. Above the stars of God I will lift up my throne, and I will sit down on the mountain of meeting, in the remotest parts of the north. I will go up above the tops of the clouds; I will make myself resemble the Most High.â
Instead, you will be brought down to the Grave, to the remotest parts of the pit.
Those seeing you will stare at you; they will closely examine you, saying: âIs this the man who was shaking the earth, who made kingdoms tremble, who made the inhabited earth like the wilderness and overthrew its cities, who refused to let his prisoners go home?â
All other kings of the nations, yes, all of them, lie down in glory, each one in his own tomb.
But you are discarded without a grave, like a detested sprout, clothed with the slain who were stabbed with the sword, who go down to the stones of a pit, like a carcass trampled underfoot.
You will not join them in a grave, for you destroyed your own land, you killed your own people. The offspring of evildoers will never again be named.
Prepare a slaughtering block for his sons because of the guilt of their forefathers, so that they will not rise up and take over the earth and fill the land with their cities.
I will rise up against them. And I will wipe out from Babylon name and remnant and descendants and posterity.
And I will make her a possession of porcupines and a region of marshes, and I will sweep her with the broom of annihilation.
Jehovah of armies has sworn: âJust as I have intended, so it will occur, and just as I have decided, that is what will come true.
I will crush the Assyrian in my land, and I will trample him on my mountains. His yoke will be removed from them, and his load will be removed from their shoulder.â
This is what has been decided against all the earth, and this is the hand that is stretched out against all the nations.
For Jehovah of armies has decided, and who can thwart it? His hand is stretched out, and who can turn it back?
In the year that King Ahaz died, this pronouncement was made:
Do not rejoice, Philistia, any of you, just because the staff of the one striking you has been broken. For from the root of the serpent will come a poisonous snake, and its offspring will be a flying fiery snake.
While the firstborn of the lowly feed and the poor lie down in security, I will put your root to death with famine, and what is left of you will be killed.
Wail, O gate! Cry out, O city! All of you will lose heart, O Philistia! For a smoke is coming from the north, and there are no stragglers in his ranks.
How should they answer the messengers of the nation? That Jehovah has laid the foundation of Zion, and that the lowly ones of his people will take refuge in her.
Because it has been devastated in a night, Ar of Moab has been silenced. Because it has been devastated in a night, Kir of Moab has been silenced.
He has gone up to the House and to Dibon, to the high places to weep. Moab wails over Nebo and over Medeba. Every head is shaved bald, every beard is clipped.
In its streets they have put on sackcloth. On their roofs and in their public squares they all wail; they go down weeping.
Heshbon and Elealeh cry out; their voice is heard as far as Jahaz. That is why the armed men of Moab keep shouting. He is trembling.
My heart cries out over Moab. Its fugitives have fled as far as Zoar and Eglath-shelishiyah. On the ascent of Luhith they weep as they go up; on the way to Horonaim they cry out over the catastrophe.
For the waters of Nimrim are desolate; the green grass has dried up, the grass is gone and nothing green is left.
That is why they are carrying away what is left of their stores and their riches; they are crossing the valley of poplars.
For the outcry echoes throughout the territory of Moab. The wailing reaches to Eglaiim; the wailing reaches to Beer-elim.
For the waters of Dimon are full of blood, and I have more in store for Dimon: a lion for those of Moab who escape and for those remaining in the land.
Send a ram to the ruler of the land, from Sela through the wilderness to the mountain of the daughter of Zion.
Like a bird chased away from its nest, so the daughters of Moab will be at the fords of Arnon.
Offer counsel, carry out the decision. Make your shadow at high noon like the night. Conceal the dispersed and do not betray those fleeing.
May my dispersed ones reside in you, O Moab. Become a place of concealment to them because of the destroyer. The oppressor will reach his end, the destruction will come to an end, and those trampling others down will perish from the earth.
Then a throne will be firmly established in loyal love. The one who sits on it in the tent of David will be faithful; he will judge fairly and will swiftly execute righteousness.
We have heard about the pride of Moabâhe is very proudâ his haughtiness and his pride and his fury; but his empty talk will come to nothing.
So Moab will wail for Moab; they will all wail. Those who are stricken will moan for the raisin cakes of Kir-hareseth.
For the terraces of Heshbon have withered, the vine of Sibmah. The rulers of the nations have trampled its bright-red branches; they had reached as far as Jazer; they had extended into the wilderness. Its shoots had spread out and gone as far as the sea.
That is why I will weep over the vine of Sibmah as I weep for Jazer. With my tears I will drench you, O Heshbon and Elealeh, because the shouting over your summer fruit and your harvest has ended.
Rejoicing and joyfulness have been taken away from the orchard, and there are no songs of joy or shouting in the vineyards. The treader no longer treads out wine in the presses, for I have caused the shouting to cease.
That is why deep within me I am boisterous over Moab, like the strumming of a harp, and my innermost being over Kir-hareseth.
Even when Moab wears himself out on the high place and goes to pray in his sanctuary, he will accomplish nothing.
This is the word that Jehovah previously spoke concerning Moab.
And now Jehovah says: âWithin three years, like the years of a hired worker, the glory of Moab will be disgraced with much tumult of every sort, and those who remain will be very few and insignificant.â
from
Justina Revolution
I did my 5 phase routine with Loosening, Cosmos Palm, Silk Reeling, and Swimming Dragon Baguazhang. This was so good as the sun rose behind me. I am increasing my power, my flexibility, my meditative abilities, and my body, mind, and spirit senses.
Weaving energy around my body, spreading my awareness from horizon to horizon. Generating stillness in both limited and unlimited forms. This is glorious. I am generating a world of benefits and my evolution, the activation of my DNA upgrades all beings in the multiverse.
There is no separation. Itâs all one thing. I did the Monroe guided portal meditation last night. I know this energy of the portal. It is Akasha and I am joined with all beings in that beautiful pregnant void.
The Void is not emptiness or annihilation. It is the pregnant field from whence all things arise and to which all things return. This is my reality. As solid and true as my fist. Nothing is ever gone. Nothing is ever lost. There is no past and no future because there is no time. There is no loss because there is no space. Nothing can come to you or leave you. It is all here right now in this very moment.
from digital ash
I guess the lazy dog jumped over the quick brown fox for onceâŠ
from Stefan Angrick
This post is part of a four-part summary of Google's Machine Learning Crash Course. For context, check out this post. This fourth module covers critical considerations when building and deploying ML models in the real world, including productionisation best practices, automation, and responsible engineering.
The model is only a small part of real-world production ML systems. It often represents only 5% or less of the total codebase in the system.

Source: Production ML systems | Machine Learning | Google for Developers
Machine learning models can be trained statically (once) or dynamically (continuously).
| Static training (offline training) | Dynamic training (online training) | |
|---|---|---|
| Advantages | Simpler. You only need to develop and test the model once. | More adaptable. Keeps up with changes in data patterns, providing more accurate predictions. |
| Disadvantages | Sometimes stale. Can become outdated if data patterns change, requiring data monitoring. | More work. You must build, test, and release a new product continuously. |
Choosing between static and dynamic training depends on the specific dataset and how frequently it changes.
Monitoring input data is essential for both static and dynamic training to ensure reliable predictions.
Source: Production ML systems: Static versus dynamic training | Machine Learning | Google for Developers
Inference involves using a trained model to make predictions on unlabelled examples, and it can be done as follows:
Static inference (offline inference, batch inference) generates predictions in advance and caches them, which suits scenarios where prediction speed is critical.
Dynamic inference (online inference, real-time inference) generates predictions on demand, offering flexibility for diverse inputs.
| Static inference (offline inference, batch inference) | Dynamic inference (online inference, real-time inference) | |
|---|---|---|
| Advantages | No need to worry about cost of inference; allows post-verification of predictions before pushing | Can infer a prediction on any new item as it comes in |
| Disadvantages | Limited ability to handle uncommon inputs | Compute-intensive and latency-sensitive; monitoring needs are intensive |
Choosing between static and dynamic inference depends on factors such as model complexity, desired prediction speed, and the nature of the input data.
Static inference is advantageous when cost and prediction verification are prioritised, while dynamic inference excels in handling diverse, real-time predictions.
Source: Production ML systems: Static versus dynamic inference | Machine Learning | Google for Developers
Feature engineering can be performed before or during model training, each with its own advantages and disadvantages.
Source: Production ML systems: When to transform data? | Machine Learning | Google for Developers
Deploying a machine learning model involves validating data, features, model versions, serving infrastructure, and pipeline integration.
Reproducible model training involves deterministic seeding, fixed initialisation order, averaging multiple runs, and using version control.
Integration tests ensure that different components of the ML pipeline work together seamlessly and should run continuously and for new model or software versions.
Before serving a new model, validate its quality by checking for sudden and gradual degradations against previous versions and fixed thresholds.
Ensure model-infrastructure compatibility by staging the model in a sandboxed server environment to avoid dependency conflicts.
Source: Production ML systems: Deployment testing | Machine Learning | Google for Developers
ML pipeline monitoring involves validating data (using data schemas) and features (using unit tests), tracking real-world metrics, and addressing potential biases in data slices.
Monitoring training-serving skew, label leakage, model age, and numerical stability is crucial for maintaining pipeline health and model performance.
Live model quality testing uses methods such as human labelling and statistical analysis to ensure ongoing model effectiveness in real-world scenarios.
Implementing proper randomisation through deterministic data generation enables reproducible experiments and consistent analysis.
Maintaining invariant hashing ensures that data splits remain consistent across experiments, contributing to reliable analysis and model evaluation.
Source: Production ML systems: Monitoring pipelines | Machine Learning | Google for Developers
Continuously monitor models in production to evaluate feature importance and potentially remove unnecessary features, ensuring prediction quality and resource efficiency.
Data reliability is crucial. Consider data source stability, potential changes in upstream data processes, and the creation of local data copies to control versioning and mitigate risks.
Be aware of feedback loops, where a model's predictions influence future input data, potentially leading to unexpected behaviour or biased outcomes, especially in interconnected systems.
Source: Production ML systems: Questions to ask | Machine Learning | Google for Developers
AutoML automates tasks in the machine learning workflow, such as data engineering (feature selection and engineering), training (algorithm selection and hyperparameter tuning), and analysis, making model building faster and easier.

While manual training involves writing code and iteratively adjusting it, AutoML reduces repetitive work and the need for specialised skills.
Source: Automated Machine Learning (AutoML) | Google for Developers
Benefits:
Limitations:
Large amounts of data are generally required for AutoML, although specialised systems using transfer learning (taking a model trained on one task and adapting its learned representations to a different but related task) can reduce this requirement.
AutoML suits teams with limited ML experience or those seeking productivity gains without customisation needs. Custom (manual) training suits cases where model quality and customisation matter most.
Source: AutoML: Benefits and limitations | Machine Learning | Google for Developers
AutoML tools fall into two categories:
The AutoML workflow follows steps similar to traditional machine learning, including problem definition, data gathering, preparation, model development, evaluation, and potential retraining.
Data preparation is crucial for AutoML and involves labelling, cleaning and formatting data, and applying feature transformations.
No-code AutoML tools guide users through model development with steps such as data import, analysis, refinement, and configuration of run parameters before starting the automated training process.
Source: AutoML: Getting started | Machine Learning | Google for Developers
Before putting a model into production, it is critical to audit training data and evaluate predictions for bias.
Source: Fairness | Machine Learning | Google for Developers
Machine learning models can be susceptible to bias due to human involvement in data selection and curation.
Understanding common human biases is crucial for mitigating their impact on model predictions.
Types of bias include reporting bias, historical bias, automation bias, selection bias, coverage bias, non-response bias, sampling bias, group attribution bias (in-group bias and out-group homogeneity bias), implicit bias, confirmation bias, and experimenter's bias, among others.
Source: Fairness: Types of bias | Machine Learning | Google for Developers
Missing or unexpected feature values in a dataset can indicate potential sources of bias.
Data skew, where certain groups are under- or over-represented, can introduce bias and should be addressed.
Evaluating model performance by subgroup ensures fairness and equal performance across different characteristics.
Source: Fairness: Identifying bias | Machine Learning | Google for Developers
Machine learning engineers use two primary strategies to mitigate bias in models:
Augmenting training data involves collecting additional data to address missing, incorrect, or skewed data, but it can be infeasible due to data availability or resource constraints.
Adjusting the model's loss function involves using fairness-aware optimisation functions rather than the common default log loss.
The TensorFlow Model Remediation Library provides optimisation functions designed to penalise errors in a fairness-aware manner:
Source: Fairness: Mitigating bias | Machine Learning | Google for Developers
Aggregate model performance metrics such as precision, recall, and accuracy can hide biases against minority groups.
Fairness in model evaluation involves ensuring equitable outcomes across different demographic groups.
Fairness metrics can help assess model predictions for bias.
Candidate pool of 100 students: 80 students belong to the majority group (blue), and 20 students belong to the minority group (orange):

Source: Fairness: Evaluating for bias | Machine Learning | Google for Developers
Demographic parity aims to ensure equal acceptance rates for majority and minority groups, regardless of individual qualifications.
Both the majority (blue) and minority (orange) groups have an acceptance rate of 20%:

While demographic parity promotes equal representation, it can overlook differences in individual qualifications within each group, potentially leading to unfair outcomes.
Qualified students in both groups are shaded in green, and qualified students who were rejected are marked with an X:

Majority acceptance rate = Qualified majority accepted / Qualified majority = 16/35 = 46% Minority acceptance rate = Qualified minority accepted / Qualified minority = 4/15 = 27%
When the distribution of a preferred label (âqualifiedâ) differs substantially between groups, demographic parity may not be the most appropriate fairness metric.
There may be additional benefits/drawbacks of demographic parity not discussed here that are also worth considering.
Source: Fairness: Demographic parity | Machine Learning | Google for Developers
Equality of opportunity focuses on ensuring that qualified individuals have an equal chance of acceptance, regardless of demographic group.
Qualified students in both groups are shaded in green:

Majority acceptance rate = Qualified majority accepted / Qualified majority = 14/35 = 40% Minority acceptance rate = Qualified minority accepted / Qualified minority = 6/15 = 40%
Equality of opportunity has limitations, including reliance on a clearly defined preferred label and challenges in settings that lack demographic data.
It is possible for a model to satisfy both demographic parity and equality of opportunity under specific conditions where positive prediction rates and true positive rates align across groups.
Source: Fairness: Equality of opportunity | Machine Learning | Google for Developers
Counterfactual fairness evaluates fairness by comparing predictions for similar individuals who differ only in a sensitive attribute such as demographic group.
This metric is particularly useful when datasets lack complete demographic information for most examples but contain it for a subset.
Candidate pool, with demographic group membership unknown for most candidates (icons shaded in grey):

Counterfactual fairness may not capture broader systemic biases across subgroups. Other fairness metrics, such as demographic parity and equality of opportunity, provide a more holistic view but may require complete demographic data.
Summary
Selecting the appropriate fairness metric depends on the specific application and desired outcome, with no single ârightâ metric universally applicable.
For example, if the goal is to achieve equal representation, demographic parity may be the optimal metric. If the goal is to achieve equal opportunity, equality of opportunity may be the best metric.
Some definitions of fairness are mutually incompatible.
Source: Fairness: Counterfactual fairness | Machine Learning | Google for Developers
from Stefan Angrick
This post is part of a four-part summary of Google's Machine Learning Crash Course. For context, check out this post. This third module covers advanced ML model architectures.
Neural networks are a model architecture designed to automatically identify non-linear patterns in data, eliminating the need for manual feature cross experimentation.
Source: Neural networks | Machine Learning | Google for Developers
In neural network terminology, additional layers between the input layer and the output layer are called hidden layers, and the nodes in these layers are called neurons.

Source: Neural networks: Nodes and hidden layers | Machine Learning | Google for Developers
Each neuron in a neural network performs the following two-step action:
Common activation functions include sigmoid, tanh, and ReLU.
The sigmoid function maps input x to an output value between 0 and 1:
$$
F(x) = \frac{1}{1 + e^{-x}}
$$

The tanh function (short for âhyperbolic tangentâ) maps input x to an output value between -1 and 1:
$$
F(x) = \tanh{(x)}
$$

The rectified linear unit activation function (or ReLU, for short) applies a simple rule:
ReLU often outperforms sigmoid and tanh because it reduces vanishing gradient issues and requires less computation.

A neural network consists of:
Source: Neural networks: Activation functions | Machine Learning | Google for Developers
Backpropagation is the primary training algorithm for neural networks. It calculates how much each weight and bias in the network contributed to the overall prediction error by applying the chain rule of calculus. It works backwards from the output layer to tell the gradient descent algorithm which equations to adjust to reduce loss.
In practice, this involves a forward pass, where the network makes a prediction and the loss function measures the error, followed by a backward pass that propagates that error back through the layers to compute gradients for each parameter.
Best practices for neural network training:
Source: Neural Networks: Training using backpropagation | Machine Learning | Google for Developers
Multi-class classification models predict from multiple possibilities (binary classification models predict just two).
Multi-class classification can be achieved through two main approaches:
One-vs.-all uses multiple binary classifiers, one for each possible outcome, to determine the probability of each class independently.

This approach is fairly reasonable when the total number of classes is small.
We can create a more efficient one-vs.-all model with a deep neural network in which each output node represents a different class.

Note that the probabilities do not sum to 1. With a one-vs.-all approach, the probability of each binary set of outcomes is determined independently of all the other sets (the sigmoid function is applied to each output node independently).
One-vs.-one (softmax) predicts probabilities of each class relative to all other classes, ensuring all probabilities sum to 1 using the softmax function in the output layer. It assigns decimal probabilities to each class such that all probabilities add up to 1.0. This additional constraint helps training converge more quickly.
Note that the softmax layer must have the same number of nodes as the output layer.

The softmax formula extends logistic regression to multiple classes: $$ p(y = j|\textbf{x}) = \frac{e^{(\textbf{w}_j^{T}\textbf{x} + b_j)}}{\sum_{k\in K} e^{(\textbf{w}_k^{T}\textbf{x} + b_k)}} $$
Full softmax is fairly cheap when the number of classes is small but can become computationally expensive with many classes.
Candidate sampling offers an alternative for increased efficiency. It computes probabilities for all positive labels but only a random sample of negative labels. For example, if we are interested in determining whether an input image is a beagle or a bloodhound, we do not have to provide probabilities for every non-dog example.
One label versus many labels
Softmax assumes that each example is a member of exactly one class. Some examples, however, can simultaneously be a member of multiple classes. For multi-label problems, use multiple independent logistic regressions instead.
Example: To classify dog breeds from images, including mixed-breed dogs, use one-vs.-all, since it predicts each breed independently and can assign high probabilities to multiple breeds, unlike softmax, which forces probabilities to sum to 1.
Source: Neural networks: Multi-class classification | Machine Learning | Google for Developers
Embeddings are lower-dimensional representations of sparse data that address problems associated with one-hot encodings.
A one-hot encoded feature âmealâ of 5,000 popular meal items:

This representation of data has several problems:
Embeddings, lower-dimensional representations of sparse data, address these issues.
Source: Embeddings | Machine Learning | Google for Developers
Embeddings are low-dimensional representations of high-dimensional data, often used to capture semantic relationships between items.
Embeddings place similar items closer together in the embedding space, allowing for efficient machine learning on large datasets.
Example of a 1D embedding of a sparse feature vector representing meal items:

2D embedding:

3D embedding:

Distances in the embedding space represent relative similarity between items.
Real-world embeddings can encode complex relationships, such as those between countries and their capitals, allowing models to detect patterns.
In practice, embedding spaces have many more than three dimensions, although far fewer than the original data, and the meaning of individual dimensions is often unclear.
Embeddings usually are task-specific, but one task with broad applicability is predicting the context of a word.
Static embeddings like word2vec represent all meanings of a word with a single point, which can be a limitation in some cases. When each word or data point has a single embedding vector, this is called a static embedding.
word2vec can refer both to an algorithm for obtaining static word embeddings and to a set of word vectors that were pre-trained with that algorithm.
Source: Embeddings: Embedding space and static embeddings | Machine Learning | Google for Developers
Embeddings can be created using dimensionality reduction techniques such as PCA or by training them as part of a neural network.
Training an embedding within a neural network allows customisation for specific tasks, where the embedding layer learns optimal weights to represent data in a lower-dimensional space, but it may take longer than training the embedding separately.
In general, you can create a hidden layer of size d in your neural network that is designated as the embedding layer, where d represents both the number of nodes in the hidden layer and the number of dimensions in the embedding space.

Word embeddings, such as word2vec, leverage the distributional hypothesis to map semantically similar words to geometrically close vectors. However, such static word embeddings have limitations because they assign a single representation per word.
Contextual embeddings offer multiple representations based on context. For example, âorangeâ would have a different embedding for every unique sentence containing the word in the dataset (as it could be used as a colour or a fruit).
Contextual embeddings encode positional information, while static embeddings do not. Because contextual embeddings incorporate positional information, one token can have multiple contextual embedding vectors. Static embeddings allow only a single representation of each token.
Methods for creating contextual embeddings include ELMo, BERT, and transformer models with a self-attention layer.
Source: Embeddings: Obtaining embeddings | Machine Learning | Google for Developers
A language model estimates the probability of a token or sequence of tokens given surrounding text, enabling tasks such as text generation, translation, and summarisation.
Tokens, the atomic units of language modelling, represent words, subwords, or characters and are crucial for understanding and processing language.
Example: âunwatchedâ would be split into three tokens: un (the prefix), watch (the root), ed (the suffix).
N-grams are ordered sequences of words used to build language models, where N is the number of words in the sequence.
Short N-grams capture too little information, while very long N-grams fail to generalise due to insufficient repeated examples in training data (sparsity issues).
Recurrent neural networks improve on N-grams by processing sequences token by token and learning which past information to retain or discard, allowing them to model longer dependencies across sentences and gain more context.
Model performance depends on training data size and diversity.
While recurrent neural networks improve context understanding compared to N-grams, they have limitations, paving the way for the emergence of large language models that evaluate the whole context simultaneously.
Source: Large language models | Machine Learning | Google for Developers
Large language models (LLMs) predict sequences of tokens and outperform previous models because they use far more parameters and exploit much wider context.
Transformers form the dominant architecture for LLMs and typically combine an encoder that converts input text into an intermediate representation with a decoder that generates output text, for example translating between languages.

Partial transformers
Encoder-only models focus on representation learning and embeddings (which may serve as input for another system), while decoder-only models specialise in generating long sequences such as dialogue or text continuations.
Self-attention allows the model to weigh the importance of different words in relation to each other, enhancing context understanding.
Example: âThe animal didn't cross the street because it was too tired.â
The self-attention mechanism determines the relevance of each nearby word to the pronoun âitâ. The bluer the line, the more important that word is to the pronoun it. As shown, âanimalâ is more important than âstreetâ to the pronoun âitâ.

Multi-head multi-layer self-attention
Each self-attention layer contains multiple self-attention heads. The output of a layer is a mathematical operation (such as a weighted average or dot product) of the outputs of the different heads.
A complete transformer model stacks multiple self-attention layers. The output from one layer becomes the input for the next, allowing the model to build increasingly complex representations, from basic syntax to more nuanced concepts.
Self-attention is an O(N^2 * S * D) problem.
LLMs are trained using masked predictions on massive datasets, enabling them to learn patterns and generate text based on probabilities. You probably will never train an LLM from scratch.
Instruction tuning can improve an LLM's ability to follow instructions.
Why transformers are so large
This course generally recommends building models with a smaller number of parameters, but research shows that transformers with more parameters consistently achieve better performance.
Text generation
LLMs generate text by repeatedly predicting the most probable next token, effectively acting as highly powerful autocomplete systems. You can think of a user's question to an LLM as the âgivenâ sentence followed by a masked response.
Benefits and problems
While LLMs offer benefits such as clear text generation, they also present challenges.
Source: LLMs: What's a large language model? | Machine Learning | Google for Developers
General-purpose LLMs, also known as foundation LLMs, base LLMs, or pre-trained LLMs, are pre-trained on vast amounts of text, enabling them to understand language structure and generate creative content, but they act as platforms rather than complete solutions for tasks such as classification or regression.
Fine-tuning updates the parameters of a model to improve its performance on a specialised task, improving prediction quality.
Distillation aims to reduce model size, typically at the cost of some prediction quality.
Prompt engineering allows users to customise an LLM's output by providing examples or instructions within the prompt, leveraging the model's existing pattern-recognition abilities without changing its parameters.
One-shot, few-shot, and zero-shot prompting differ by how many examples the prompt provides, with more examples usually improving reliability by giving clearer context.
Prompt engineering does not alter the model's parameters. Prompts leverage the pattern-recognition abilities of the existing LLM.
Offline inference pre-computes and caches LLM predictions for tasks where real-time response is not critical, saving resources and enabling the use of larger models.
Responsible use of LLMs requires awareness that models inherit biases from their training and distillation data.
Source: LLMs: Fine-tuning, distillation, and prompt engineering | Machine Learning | Google for Developers
from Stefan Angrick
This post is part of a four-part summary of Google's Machine Learning Crash Course. For context, check out this post. This second module covers fundamental techniques and best practices for working with machine learning data.
Numerical data: Integers or floating-point values that behave like numbers. They are additive, countable, ordered, and so on. Examples include temperature, weight, or the number of deer wintering in a nature preserve.
Source: Working with numerical data | Machine Learning | Google for Developers
A machine learning model ingests data through floating-point arrays called feature vectors, which are derived from dataset features. Feature vectors often utilise processed or transformed values instead of raw dataset values to enhance model learning.
Example of a feature vector: [0.13, 0.47]
Feature engineering is the process of converting raw data into suitable representations for the model. Common techniques are:
Non-numerical data like strings must be converted into numerical values for use in feature vectors.
Before creating feature vectors, it is crucial to analyse numerical data to detect anomalies and patterns in the data, which aids in identifying potential issues early in the data analysis process.
Outliers, values significantly distant from others, should be identified and handled appropriately.
A dataset probably contains outliers when:
Source: Numerical data: First steps | Machine Learning | Google for Developers
Data normalization is crucial for enhancing machine learning model performance by scaling features to a similar range. It is also recommended to normalise a single numeric feature that covers a wide range (for example, city population).
Normalisation has the following benefits:
| Normalization technique | Formula | When to use |
|---|---|---|
| Linear scaling | $$x'=\frac{x-x_\text{min}}{x_\text{max}-x_\text{min}}$$ | When the feature is mostly uniformly distributed across range; flat-shaped |
| Z-score scaling | $$x' = (x-\mu)/\sigma$$ | When the feature is normally distributed (peak close to mean); bell-shaped |
| Log scaling | $$x'=ln(x)$$ | When the feature distribution is heavy skewed on at least either side of tail; heavy Tail-shaped |
| Clipping | If x > max, set $$x'=max$$ If x < min, set $$x' = min$$ | When the feature contains extreme outliers |
Source: Numerical data: Normalization | Machine Learning | Google for Developers
Binning (bucketing) is a feature engineering technique used to group numerical data into categories (bins). In many cases, this turns numerical data into categorical data.
For example, if a feature X has values ranging from 15 to 425, we can apply binning to represent X as a feature vector divided into specific intervals:
| Bin number | Range | Feature vector |
|---|---|---|
| 1 | 15-34 | [1.0, 0.0, 0.0, 0.0, 0.0] |
| 2 | 35-117 | [0.0, 1.0, 0.0, 0.0, 0.0] |
| 3 | 118-279 | [0.0, 0.0, 1.0, 0.0, 0.0] |
| 4 | 280-392 | [0.0, 0.0, 0.0, 1.0, 0.0] |
| 5 | 393-425 | [0.0, 0.0, 0.0, 0.0, 1.0] |
Even though X is a single column in the dataset, binning causes a model to treat X as five separate features. Therefore, the model learns separate weights for each bin.
Binning offers an alternative to scaling or clipping and is particularly useful for handling outliers and improving model performance on non-linear data.
When to use: Binning works well when features exhibit a âclumpyâ distribution, that is, the overall linear relationship between the feature and label is weak or nonexistent, or when feature values are clustered.
Example: Number of shoppers versus temperature. By binning them, the model learns separate weights for each bin.

While creating multiple bins is possible, it is generally recommended to avoid an excessive number, as this can lead to insufficient training examples per bin and increased feature dimensionality.
Quantile bucketing is a specific binning technique that ensures each bin contains a roughly equal number of examples, which can be particularly useful for datasets with skewed distributions.

Source: Numerical data: Binning | Machine Learning | Google for Developers
| Problem category | Example |
|---|---|
| Omitted values | A census taker fails to record a resident's age |
| Duplicate examples | A server uploads the same logs twice |
| Out-of-range feature values | A human accidentally types an extra digit |
| Bad labels | A human evaluator mislabels a picture of an oak tree as a maple |
You can use programs or scripts to identify and handle data issues such as omitted values, duplicates, and out-of-range feature values by removing or correcting them.
Source: Numerical data: Scrubbing | Machine Learning | Google for Developers
Source: Numerical data: Qualities of good numerical features | Machine Learning | Google for Developers
Synthetic features, such as polynomial transforms, enable linear models to represent non-linear relationships by introducing new features based on existing ones.
By incorporating synthetic features, linear regression models can effectively separate data points that are not linearly separable, using curves instead of straight lines. For example, we can separate two classes with y = x^2.

Feature crosses, a related concept for categorical data, synthesise new features by combining existing features, further enhancing model flexibility.
Source: Numerical data: Polynomial transforms | Machine Learning | Google for Developers
Categorical data has a specific set of possible values. Examples include species of animals, names of streets, whether or not an email is spam, and binned numbers.
Categorical data can include numbers that behave like categories. An example is postal codes.
Encoding means converting categorical or other data to numerical vectors that a model can train on.
Preprocessing includes converting non-numerical data, such as strings, to floating-point values.
Source: Working with categorical data | Machine Learning | Google for Developers
Machine learning models require numerical input; therefore, categorical data such as strings must be converted to numerical representations.
The term dimension is a synonym for the number of elements in a feature vector. Some categorical features are low dimensional. For example:
| Feature name | # of categories | Sample categories |
|---|---|---|
| snowed_today | 2 | True, False |
| skill_level | 3 | Beginner, Practitioner, Expert |
| season | 4 | Winter, Spring, Summer, Autumn |
| dayofweek | 7 | Monday, Tuesday, Wednesday |
| planet | 8 | Mercury, Venus, Earth |
| car_colour | 8 | Red, Orange, Blue, Yellow |
When a categorical feature has a low number of possible categories, you can encode it as a vocabulary. This treats each category as a separate feature, allowing the model to learn distinct weights for each during training.
One-hot encoding transforms categorical values into numerical vectors (arrays) of N elements, where N is the number of categories. Exactly one of the elements in a one-hot vector has the value 1.0; all the remaining elements have the value 0.0.
| Feature | Red | Orange | Blue | Yellow | Green | Black | Purple | Brown |
|---|---|---|---|---|---|---|---|---|
| âRedâ | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| âOrangeâ | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| âBlueâ | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| âYellowâ | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
| âGreenâ | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| âBlackâ | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
| âPurpleâ | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| âBrownâ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
It is the one-hot vector, not the string or the index number, that gets passed to the feature vector. The model learns a separate weight for each element of the feature vector.
The end-to-end process to map categories to feature vectors:

In a true one-hot encoding, only one element has the value 1.0. In a variant known as multi-hot encoding, multiple values can be 1.0.
A feature whose values are predominantly zero (or empty) is termed a sparse feature.
Sparse representation efficiently stores one-hot encoded data by only recording the position of the '1' value to reduce memory usage.
Notice that the sparse representation consumes far less memory. Importantly, the model must train on the one-hot vector, not the sparse representation.
The sparse representation of a multi-hot encoding stores the positions of all the non-zero elements. For example, the sparse representation of a car that is both âBlueâ and âBlackâ is 2, 5.
Categorical features can have outliers. If âcar_colourâ includes rare values such as âMauveâ or âAvocadoâ, you can group them into one out-of-vocabulary (OOV) category. All rare colours go into this single bucket, and the model learns one weight for it.
For high-dimensional categorical features with many categories, one-hot encoding might be inefficient, and embeddings or hashing (also called the hashing trick) are recommended.
Source: Categorical data: Vocabulary and one-hot encoding | Machine Learning | Google for Developers
Categorical data quality hinges on how categories are defined and labelled, impacting data reliability.
Human-labelled data, known as âgold labelsâ, is generally preferred for training due to its higher quality, but it is essential to check for human errors and biases.
Machine-labelled data, or âsilver labelsâ, can introduce biases or inaccuracies, necessitating careful quality checks and awareness of potential common-sense violations.
High dimensionality in categorical data increases training complexity and costs, leading to techniques such as embeddings for dimensionality reduction.
Source: Categorical data: Common issues | Machine Learning | Google for Developers
Feature crosses are created by combining two or more categorical or bucketed features to capture interactions and non-linearities within a dataset.
For example, consider a leaf dataset with the categorical features:
The feature cross, or Cartesian product, of these two features would be:
{Smooth_Opposite, Smooth_Alternate, Toothed_Opposite, Toothed_Alternate, Lobed_Opposite, Lobed_Alternate}
For example, if a leaf has a lobed edge and an alternate arrangement, the feature-cross vector will have a value of 1 for âLobed_Alternateâ, and a value of 0 for all other terms:
{0, 0, 0, 0, 0, 1}
This dataset could be used to classify leaves by tree species, since these characteristics do not vary within a species.
Feature crosses are somewhat analogous to polynomial transforms.
Feature crosses can be particularly effective when guided by domain expertise. It is often possible, though computationally expensive, to use neural networks to automatically find and apply useful feature combinations during training.
Overuse of feature crosses with sparse features should be avoided, as it can lead to excessive sparsity in the resulting feature set. For example, if feature A is a 100-element sparse feature and feature B is a 200-element sparse feature, a feature cross of A and B yields a 20,000-element sparse feature.
Source: Categorical data: Feature crosses | Machine Learning | Google for Developers
Source: Datasets, generalization, and overfitting | Machine Learning | Google for Developers
A machine learning model's performance is heavily reliant on the quality and quantity of the dataset it is trained on, with larger, high-quality datasets generally leading to better results.
Datasets can contain various data types, including numerical, categorical, text, multimedia, and embedding vectors, each requiring specific handling for optimal model training.
The following are common causes of unreliable data in datasets:
Maintaining data quality involves addressing issues such as label errors, noisy features, and proper filtering to ensure the reliability of the dataset for accurate predictions.
Incomplete examples with missing feature values should be handled by either deletion or imputation to avoid negatively impacting model training.
When imputing missing values, use reliable methods such as mean/median imputation and consider adding an indicator column to signal imputed values to the model. For example, alongside temperature include âtemperature_is_imputedâ. This lets the model learn to trust real observations more than imputed ones.
Source: Datasets: Data characteristics | Machine Learning | Google for Developers
Direct labels are generally preferred but often unavailable.
Use a proxy label when no direct label exists or when the direct concept resists easy numeric representation. Carefully evaluate proxy labels to ensure they are a suitable approximation.
Human-generated labels, while offering flexibility and nuanced understanding, can be expensive to produce and prone to errors, requiring careful quality control.
Models can train on a mix of automated and human-generated labels, but an extra set of human labels often adds complexity without sufficient benefit.
Source: Datasets: Labels | Machine Learning | Google for Developers
Imbalanced datasets occur when one label (majority class) is significantly more frequent than another (minority class), potentially hindering model training on the minority class.
Note: Accuracy is usually a poor metric for assessing a model trained on a class-imbalanced dataset.
A highly imbalanced floral dataset containing far more sunflowers (200) than roses (2):

During training, a model should learn two things:
Standard training conflates these two goals. In contrast, a two-step technique of downsampling and upweighting the majority class separates these two goals, enabling the model to achieve both.
Step 1: Downsample the majority class by training on only a small fraction of majority class examples, which makes an imbalanced dataset more balanced during training and increases the chance that each batch contains enough minority examples.
For example, with a class-imbalanced dataset consisting of 99% majority class and 1% minority class examples, we could downsample the majority class by a factor of 25 to create a more balanced training set (80% majority class and 20% minority class).
Downsampling the majority class by a factor of 25:

Step 2: Upweight the downsampled majority class by the same factor used for downsampling, so each majority class error counts proportionally more during training. This corrects the artificial class distribution and bias introduced by downsampling, because the training data no longer reflects real-world frequencies.
Continuing the example from above, we must upweight the majority class by a factor of 25. That is, when the model mistakenly predicts the majority class, treat the loss as if it were 25 errors (multiply the regular loss by 25).
Upweighting the majority class by a factor of 25:

Experiment with different downsampling and upweighting factors just as you would experiment with other hyperparameters.
Benefits of this technique include a better model (the resultant model knows what each class looks like and how common each class is) and faster convergence.
Source: Datasets: Class-imbalanced datasets | Machine Learning | Google for Developers
Machine learning models should be tested against unseen data.
It is recommended to split the dataset into three subsets: training, validation, and test sets.

The validation set is used for initial testing during training (to determine hyperparameter tweaks, add, remove, or transform features, and so on), and the test set is used for final evaluation.

The validation and test sets can âwear outâ with repeated use. For this reason, it is a good idea to collect more data to ârefreshâ the test and validation sets.
A good test set is:
In theory, the validation set and test set should contain the same number of examples, or nearly so.
Source: Datasets: Dividing the original dataset | Machine Learning | Google for Developers
Machine learning models require all data, including features such as street names, to be transformed into numerical (floating-point) representations for training.
Normalisation improves model training by converting existing floating-point features to a constrained range.
When dealing with large datasets, select a subset of examples for training. When possible, select the subset that is most relevant to your model's predictions. Safeguard privacy by omitting examples containing personally identifiable information.
Source: Datasets: Transforming data | Machine Learning | Google for Developers
Generalisation refers to a model's ability to perform well on new, unseen data.
Source: Generalization | Machine Learning | Google for Developers
Overfitting means creating a model that matches the training set so closely that the model fails to make correct predictions on new data.
Generalization is the opposite of overfitting. That is, a model that generalises well makes good predictions on new data.
An overfit model is analogous to an invention that performs well in the lab but is worthless in the real world. An underfit model is like a product that does not even do well in the lab.
Overfitting can be detected by observing diverging loss curves for training and validation sets on a generalization curve (a graph that shows two or more loss curves). A generalization curve for a well-fit model shows two loss curves that have similar shapes.
Common causes of overfitting include:
Dataset conditions for good generalization include:
Source: Overfitting | Machine Learning | Google for Developers
Simpler models often generalise better to new data than complex models, even if they perform slightly worse on training data.
Occam's Razor favours simpler explanations and models.
Model training should minimise both loss and complexity for optimal performance on new data. $$ \text{minimise}(\text{loss + complexity}) $$
Unfortunately, loss and complexity are typically inversely related. As complexity increases, loss decreases. As complexity decreases, loss increases.
Regularisation techniques help prevent overfitting by penalising model complexity during training.
Source: Overfitting: Model complexity | Machine Learning | Google for Developers
L2 regularisation is a popular regularisation metric to reduce model complexity and prevent overfitting. It uses the following formula: $$ L_2 \text{ regularisation} = w^2_1 + w^2_2 + \ldots + w^2_n $$
It penalises especially large weights.
L2 regularisation encourages weights towards 0, but never pushes them all the way to zero.
A regularisation rate (lambda) controls the strength of regularisation. $$ \text{minimise}(\text{loss} + \lambda \text{ complexity}) $$
Tuning is required to find the ideal regularisation rate.
Early stopping is an alternative regularisation method that involves ending training before the model fully converges to prevent overfitting. It usually increases training loss but decreases test loss. It is a quick but rarely optimal form of regularisation.
Learning rate and regularisation rate tend to pull weights in opposite directions. A high learning rate often pulls weights away from zero, while a high regularisation rate pulls weights towards zero. The goal is to find the equilibrium.
Source: Overfitting: L2 regularization | Machine Learning | Google for Developers
An ideal loss curve looks like this:

To improve an oscillating loss curve:

Possible reasons for a loss curve with a sharp jump include:

Test loss diverges from training loss when:

The loss curve gets stuck when:

Source: Overfitting: Interpreting loss curves | Machine Learning | Google for Developers
from Stefan Angrick
This post is part of a four-part summary of Google's Machine Learning Crash Course. For context, check out this post. This first module covers the fundamentals of building regression and classification models.
The linear regression model uses an equation $$ y' = b + w_1x_1 + w_2x_2 + \ldots $$ to represent the relationship between features and the label.
y and features x are given. b and w are calculated from training by minimizing the difference between predicted and actual values.
Source: Linear regression | Machine Learning | Google for Developers
Loss is a numerical value indicating the difference between a model's predictions and the actual values.
The goal of model training is to minimize loss, bringing it as close to zero as possible.
| Loss type | Definition | Equation |
|---|---|---|
| L1 loss | The sum of the absolute values of the difference between the predicted values and the actual values. | $$\sum |\text{actual value}-\text{predicted value}|$$ |
| Mean absolute error (MAE) | The average of L1 losses across a set of N examples. | $$\frac{1}{N}\sum |\text{actual value}-\text{predicted value}|$$ |
| L2 loss | The sum of the squared difference between the predicted values and the actual values. | $$\sum (\text{actual value}-\text{predicted value})^2$$ |
| Mean squared error (MSE) | The average of L2 losses across a set of N examples. | $$\frac{1}{N}\sum (\text{actual value}-\text{predicted value})^2$$ |
The most common methods for calculating loss are Mean Absolute Error (MAE) and Mean Squared Error (MSE), which differ in their sensitivity to outliers.
A model trained with MSE moves the model closer to the outliers but further away from most of the other data points.

A model trained with MAE is farther from the outliers but closer to most of the other data points.

Source: Linear regression: Loss | Machine Learning | Google for Developers
Gradient descent is an iterative optimisation algorithm used to find the best weights and bias for a linear regression model by minimising the loss function.
A model is considered to have converged when further iterations do not significantly reduce the loss, indicating it has found the weights and bias that produce the lowest possible loss.
Loss curves visually represent the model's progress during training, showing how the loss decreases over iterations and helping to identify convergence.
Linear models have convex loss functions, ensuring that gradient descent will always find the global minimum, resulting in the best possible model for the given data.
Source: Linear regression: Gradient descent | Google for Developers
Hyperparameters, such as learning rate, batch size, and epochs, are external configurations that influence the training process of a machine learning model.
The learning rate determines the step size during gradient descent, impacting the speed and stability of convergence.
Batch size dictates the number of training examples processed before updating model parameters, influencing training speed and noise.
Model trained with SGD:

Model trained with mini-batch SGD:

Epochs represent the number of times the entire training dataset is used during training, affecting model performance and training time.
Source: Linear regression: Hyperparameters | Machine Learning | Google for Developers
Logistic regression is a model used to predict the probability of an outcome, unlike linear regression which predicts continuous numerical values.
Logistic regression models output probabilities, which can be used directly or converted to binary categories.
Source: Logistic Regression | Machine Learning | Google for Developers
A logistic regression model uses a linear equation and the sigmoid function to calculate the probability of an event.
The sigmoid function ensures the output of logistic regression is always between 0 and 1, representing a probability.
$$
f(x) = \frac{1}{1 + e^{-x}}
$$

Linear component of a logistic regression model: $$ z = b + w_1 x_1 + w_2 x_2 + \ldots + w_N x_N $$ To obtain the logistic regression prediction, the z value is then passed to the sigmoid function, yielding a value (a probability) between 0 and 1: $$ y' = \frac{1}{1+e^{-z}} $$
z is referred to as the log-odds because if you solve the sigmoid function for z you get: $$ z = \log(\frac{y}{1-y}) $$ This is the log of the ratio of the probabilities of the two possible outcomes: y and 1 â y.
When the linear equation becomes input to the sigmoid function, it bends the straight line into an s-shape.

Logistic regression models are trained similarly to linear regression models but use Log Loss instead of squared loss and require regularisation.
Log Loss is used in logistic regression because the rate of change isn't constant, requiring varying precision levels unlike squared loss used in linear regression.
The Log Loss equation returns the logarithm of the magnitude of the change, rather than just the distance from data to prediction. Log Loss is calculated as follows: $$ \text{Log Loss} = \sum_{(x,y)\in D} -y\log(y') â (1 â y)\log(1 â y') $$
Regularisation, such as L2 regularisation or early stopping, is crucial in logistic regression to prevent overfitting (due to the model's asymptotic nature) and improve generalisation.
Source: Logistic regression: Loss and regularization | Machine Learning | Google for Developers
Logistic regression models can be converted into binary classification models for predicting categories instead of probabilities.
Source: Classification | Machine Learning | Google for Developers
To convert the raw output from a logistic regression model into binary classification (positive and negative class), you need a classification threshold.
Confusion matrix
| Actual positive | Actual negative | |
|---|---|---|
| Predicted positive | True positive (TP) | False positive (FP) |
| Predicted negative | False negative (FN) | True negative (TN) |
Total of each row = all predicted positives (TP + FP) and all predicted negatives (FN + TN) Total of each column = all real positives (TP + FN) and all real negatives (FP + TN)
When we increase the classification threshold, both TP and FP decrease, and both TN and FN increase.
Source: Thresholds and the confusion matrix | Machine Learning | Google for Developers
Accuracy, Recall, Precision, and related metrics are all calculated at a single classification threshold value.
Accuracy is the proportion of all classifications that were correct. $$ \text{Accuracy} = \frac{\text{correct classifications}}{\text{total classifications}} = \frac{TP+TN}{TP+TN+FP+FN} $$
Recall, or true positive rate, is the proportion of all actual positives that were classified correctly as positives. Also known as probability of detection. $$ \text{Recall (or TPR)} = \frac{\text{correctly classified actual positives}}{\text{all actual positives}} = \frac{TP}{TP+FN} $$
False positive rate is the proportion of all actual negatives that were classified incorrectly as positives. Also known as probability of a false alarm. $$ \text{FPR} = \frac{\text{incorrectly classified actual negatives}}{\text{all actual negatives}}=\frac{FP}{FP+TN} $$
Precision is the proportion of all the model's positive classifications that are actually positive. $$ \text{Precision} = \frac{\text{correctly classified actual positives}}{\text{everything classified as positive}}=\frac{TP}{TP+FP} $$
Precision and Recall often show an inverse relationship.
F1 score is the harmonic mean of Precision and Recall. $$ \text{F1} = 2 * \frac{\text{precision} * \text{recall}}{\text{precision} + \text{recall}} = \frac{2TP}{2TP + FP + FN} $$
ROC and AUC evaluate a model's quality across all possible thresholds.
ROC curve, or receiver operating characteristic curves, plot the true positive rate (TPR) against the false positive rate (FPR) at different thresholds. A perfect model would pass through (0,1), while a random guesser forms a diagonal line from (0,0) to (1,1).
AUC, or area under the curve, represents the probability that the model will rank a randomly chosen positive example higher than a negative example. A perfect model has AUC = 1.0, while a random model has AUC = 0.5.
ROC and AUC of a hypothetical perfect model (AUC = 1.0) and for completely random guesses (AUC = 0.5):


ROC and AUC are effective when class distributions are balanced. For imbalanced data, precision-recall curves (PRCs) can be more informative.

A higher AUC generally indicates a better-performing model.
ROC and AUC of two hypothetical models; the first curve (AUC = 0.65) represents the better of the two models:

Threshold choice depends on the cost of false positives versus false negatives. The most relevant thresholds are those closest to (0,1) on the ROC curve. For costly false positives, a conservative threshold (like A in the chart below) is better. For costly false negatives, a more sensitive threshold (like C) is preferable. If costs are roughly equivalent, a threshold in the middle (like B) may be best.

Source: Classification: ROC and AUC | Machine Learning | Google for Developers
Prediction bias measures the difference between the average of a model's predictions and the average of the true labels in the data. For example, if 5% of emails in the dataset are spam, a model without prediction bias should also predict about 5% as spam. A large mismatch between these averages indicates potential problems.
Prediction bias can be caused by:
Source: Classification: Prediction bias | Machine Learning | Google for Developers
Multi-class classification extends binary classification to cases with more than two classes.
If each example belongs to only one class, the problem can be broken down into a series of binary classifications. For instance, with three classes (A, B, C), you could first separate C from A+B, then distinguish A from B within the A+B group.
Source: Classification: Multi-class classification | Machine Learning | Google for Developers
from
Bloc de notas
no sé si te acuerdas o vas tan råpido que a estas alturas te da igual lo que pasó / lo que se fue se fue y entonces sà que aprendiste algo de mà / un poco a vivir