AI and Code Quality: Faster, more churn, more duplication, less refactoring

Plus UAVs and the Future of War

Feb 17, 2025

Heya futurists!

I think I'm doing a better job of picking up the tempo and getting these messages out faster. The pace of developments, particularly in AI, continues to be fast. Let's jump in.

AI

1. New research on AI and code quality. The people who did the study last year showing AI increases "churn" did a follow-up study this year.

If you were thinking, maybe AI models getting smarter and smarter means that AI-generated code is getting better and needs to be fixed less, well...

In this study, the researchers analyzed 211 million lines of code from more than 100 open-source repositories on GitHub, and code "churn", which is defined as changing a line of code within 2 weeks of adding it, increased from 3.31% in 2021 (the year before ChatGPT came out) to 4.50% in 2022, to 5.67% in 2023, to 6.87% in 2024.

If a line of code that is "churned" is "churned" again, it is not counted twice -- the "churn" number counts only changes -- only the first change -- to a line of code within 2 weeks of it being added.

The theory is that the higher the "churn" rate, the more likely the code added was bad to start with. The churn rate appears to be going upward as more and more developers use AI to generate more and more code, instead of going downward as AI models get better, as you may have thought (or hoped).

In addition to the churn rate, this study measured lines added, lines deleted, lines updated, lines moved, lines "copy/pasted", and lines "find/replaced". I put "copy/pasted" in quotes because what is likely happening is the AI system is generating the same (or very similar) lines of code over and over. The researchers developed a system to detect the same exact sequence of 5 or more lines of code appearing over and over. "Find/replaced" is defined as "a pattern of code change where the same string is removed from 3+ locations and substituted with consistent replacement content."

Lines added went from 40.86% (of changed lines, not of total lines in a codebase) in 2021 to 49.32% in 2024. Deleted went from 21.17% to 23.00%. Updated went from 5.57% to 6.33%. Moved went from 15.88% to 3.10%. Copy/pasted went from 10.64% to 13.90%. Find/replaced went from 4.11% to 4.35%.

Most striking in all this is that "moved" went down so much, while "added" increased quite a bit and "copy/pasted" went up, too.

What this means is that developers are doing dramatically less restructuring of their codebases, which in the industry is called "refactoring". Developers are doing more and more code adding, and less and less refactoring.

The researchers discuss "the scourge of duplicated code blocks."

"'So what?' an executive may wonder. So long as the code works, does it really matter that it might be repeated? Both first principles and prior research suggests that it does matter."

"From first principles, it is evident that repeated blocks impose the burden of deciding: should a cloned block change be propagated to its matching siblings? When a conscientious developer deems it prudent to propagate their change, their scope of concern is multiplied by the number of duplicated blocks."

"Instead of focusing on the specific Jira they had been assigned, the developer must now tack-on the effort of understanding every system in the repo that duplicates the code block they are changing."

By "specific Jira", I think they mean specific issue in Jira, a bug tracking, issue tracking, and general agile project management software product from Atlassian. It's probably the one the researchers themselves use, which is why they got into the habit of saying "specific Jira" while forgetting when writing this paper that there are lots of bug tracking systems and not everyone uses Jira (I don't). Let's continue...

"Likewise for every developer who reviews the pull request that changes code from disparate domains. They, too, must acquaint themselves with every context where a duplicated block was changed, to render opinion on whether the change in context seems prone to defect."

"Next, each changed code domain must be tested. Picking the method to test these 'prerequisite sub-tasks' saps the mental stamina of the developer. And because the impromptu chore of 'updating duplicate blocks' is rarely-if-ever budgeted into the team's original Story Point estimate, duplication is a significant obstacle to keeping on schedule."

"Story Point" is more Jira-specific terminology. The idea is you estimate the time it takes to implement a change (new feature or bug fix) to software in "story point" units instead of calendar time, because if the developer has to do tasks unrelated to the software change (emails, meetings, etc), then the calendar time won't be meaningful. For non-Jira people, the point being made here is that a developer is budgeted a certain amount of time to implement a change, and the unexpected discovery that the code they are changing is duplicated all over the place causes the time budget to likewise get unexpectedly exceeded.

"As a developer falls further behind their target completion time, their morale is prone to drop further. From the developer's perspective, this task that had seemed so simple, has now ballooned into systems where they might have little-or-no-familiarity. All this burden, without any observable benefit to teammates or executives."

"'Exploring the Impact of Code Clones on Deep Learning Software' is a 2023 paper by Ran Mo, Yao Zhang et. al that seeks to quantify the impact of code clones in software that utilizes deep learning. With regard to the prevalence of duplicated code, the researchers find that '[Deep Learning projects exhibit] about 16.3% of code fragments encounter clones, which is almost twice as large as traditional projects.' By analyzing 3,113 pairs of co-changed code lines, the researchers observe that 57.1% of all co-changed clones are involved in bugs."

"The Citations offer a multitude of research that has shown clones in traditional software systems prone to cause bugs."

So, the bottom line is: AI enables developers to produce new code substantially faster at the cost of long-term maintainability. Code is added about ~20% faster, but "churn" is ~50% higher, code duplication is ~30% higher, and refactorization is ~80% lower.

Bill Harding, the lead researcher, is interviewed in the video. You have to fork over your email to his company (GitClear) to get the actual paper. (Which I did. I have a copy of it, that's where I got the numbers above.)

"This was the first year that we switched over from having more moved and refactored code to more copy pasted code. 2024 was I guess a dubious milestone in that regard, and that seems very consequential for teams that are going to have a long-lived repo."

https://www.gitclear.com/ai_assistant_code_quality_2025_research

2. Click-through rates on Google go way down when AI Overviews are shown -- but go up when a particular brand is mentioned in the AI Overview and that brand has a link (paid or not). Not only that, but paid click-through rates are declining over time with or without AI Overviews. But the links that aren't paid -- the "organic" links -- haven't seen a decline and may actually increase when then is no AI Overview.

https://searchengineland.com/google-organic-paid-ctr-down-451619

3. "DeepClaude: Harness the power of DeepSeek R1's reasoning and Claude's creativity and code generation capabilities with a unified API and chat interface."

Weird. How does one combine two LLMs like this?

"Why R1 + Claude?"

"DeepSeek R1's CoT trace demonstrates deep reasoning to the point of an LLM experiencing 'metacognition' - correcting itself, thinking about edge cases, and performing quasi Monte Carlo Tree Search in natural language."

"However, R1 lacks in code generation, creativity, and conversational skills. Claude 3.5 Sonnet excels in these areas, making it the perfect complement. DeepClaude combines both models to provide: R1's exceptional reasoning and problem-solving capabilities, Claude's superior code generation and creativity, fast streaming responses in a single API call, and complete control with your own API keys."

https://deepclaude.com/

4. "Agent Mode" for GitHub Copilot.

The video demos a project that has a website with a page listing races, and we ask GitHub Copilot to add the ability to search for a race by name. (For those of you who would rather read than watch a video, see below.) To do this, it has to change services code, server-side code, UI code, and test code. "Agent Mode" reasons and iterates the reasoning process to perform the task. Once the code is updated it prompts the user to re-run the tests, which fail because they don't include the new functionality, which Copilot detects and makes the changes to the tests. The process is repeated until all tests pass.

Copilot is subsequently assed to associate races with fundraisers. This time Copilot is given a prompt file (in markdown format) that tells Copilot everything it is supposed to do.

For those of you who would rather read than watch a video, go to:

https://github.blog/news-insights/product-news/github-copilot-the-agent-awakens/

5. AlphaGeometry 2 with AlphaProof from DeepMind claims to have "solved four out of six problems from this year's International Mathematical Olympiad (IMO), achieving the same level as a silver medalist in the competition for the first time."

https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/

6. "Ex-Google, Apple engineers launch unconditionally open source Oumi AI platform that could help to build the next DeepSeek."

"What neither DeepSeek nor Llama enables, however, is full unconditional access to all the model code, including weights as well as training data. Without all that information, developers can still work with the open model but they don't have all the necessary tools and insights to understand how it really works and more importantly how to build an entirely new model. That's a challenge that a new startup led by former Google and Apple AI veterans aims to solve."

"Launching today, Oumi is backed by an alliance of 13 leading research universities including Princeton, Stanford, MIT, UC Berkeley, University of Oxford, University of Cambridge, University of Waterloo and Carnegie Mellon. Oumi's founders raised $10 million, a modest seed round they say meets their needs. While major players like OpenAI contemplate $500 billion investments in massive data centers through projects like Stargate, Oumi is taking a radically different approach. The platform provides researchers and developers with a complete toolkit for building, evaluating and deploying foundation models."

The $10 million makes me wonder if this has a chance of working. But, let's continue. (Lots of quotes follow.)

"Oumi is a fully open-source platform that streamlines the entire lifecycle of foundation models -- from data preparation and training to evaluation and deployment. Whether you're developing on a laptop, launching large scale experiments on a cluster, or deploying models in production, Oumi provides the tools and workflows you need."

"With Oumi, you can: Train and fine-tune models from 10M to 405B parameters using state-of-the-art techniques (SFT, LoRA, QLoRA, DPO, and more), work with both text and multimodal models (Llama, DeepSeek, Qwen, Phi, and others), synthesize and curate training data with LLM judges, deploy models efficiently with popular inference engines (vLLM, SGLang), evaluate models comprehensively across standard benchmarks, run anywhere - from laptops to clusters to clouds (AWS, Azure, GCP, Lambda, and more), and integrate with both open models and commercial APIs (OpenAI, Anthropic, Vertex AI, Together, Parasail, ...).

"All with one consistent API, production-grade reliability, and all the flexibility you need for research."

"Here are some of the key features that make Oumi stand out:"

"Zero Boilerplate: Get started in minutes with ready-to-use recipes for popular models and workflows. No need to write training loops or data pipelines,"

"Enterprise-Grade: Built and validated by teams training models at scale,"

"Research Ready: Perfect for ML research with easily reproducible experiments, and flexible interfaces for customizing each component,"

"Broad Model Support: Works with most popular model architectures - from tiny models to the largest ones, text-only to multimodal,"

"State-Of-The-Art (SOTA) Performance: Native support for distributed training techniques (FSDP, DDP) and optimized inference engines (vLLM, SGLang),"

"Community First: 100% open source with an active community. No vendor lock-in, no strings attached."

https://venturebeat.com/ai/ex-google-apple-engineers-launch-unconditionally-open-source-oumi-ai-platform-that-could-help-to-build-the-next-deepseek/

Start here:

https://oumi.ai/docs/en/latest/index.html

7. "Google's updated, public AI ethics policy removes its promise that it won't use the technology to pursue applications for weapons and surveillance."

https://www.cnn.com/2025/02/04/business/google-ai-weapons-surveillance/index.html

Google's responsible AI policy is here:

https://ai.google/responsibility/principles/

8. YCGPT: Your Y Combinator Advisor. For those of you who are starting startups and thinking of applying to YC.

https://www.buildthatidea.com/ycgpt

9. LLPlayer is a media player for language learning, with AI-generated subtitles (powered by OpenAI Whisper), dual subtitles (two languages simultaneously), realtime-OCR and translation (powered by Google Translate and DeepL), and word lookup (click on any word in a subtitle).

I'll try this when I get some time. If you beat me to it, let me know how it goes.

https://llplayer.com/

LLPlayer is unrelated to LL Cool J (as far as I know).

10. Find stuff in aerial images.

GeoDeep is "a fast, easy to use, lightweight Python library for AI object detection and semantic segmentation in geospatial rasters (GeoTIFFs), with pre-built models included."

See the example images of where it puts bounding boxes on cars, swimming pools, and tennis courts, and draws an outline that follows roads. It has a "detect" system that outputs class labels with confidence scores (class labels like "small vehicle" or "tennis court", confidence scores like 0.838), and a "segment" system that gives you outlines of roads and buildings.

https://github.com/uav4geo/GeoDeep

Cybersecurity

11. Physical unclonable function (PUF) technology. This article is from last November but I only just saw it today, and I'm sharing it anyway because I only just now discovered this technology is a thing that exists.

In cryptography, you can do something called a "challenge-response" where you generate some random input, and the device combines it with a secret key using some algorithm to generate some output, which you can check to see if it's correct and the device is authentic. This relies on both the challenger and the hardware device having the shared secret, but not any attackers.

If an attacker should get their hands on the physical device, though, they can copy it. If the key is stored in ROM (read-only memory), it can simply be copied out of the memory by the attacker and then they can make unlimited copies of the device by assembling the same components and making ROM with the same key.

What physical unclonable function (PUF) technology does is render this literally impossible. PUFs exploit random physical factors introduced during semiconductor manufacturing that are unpredictable and uncontrollable. As such, it is impossible to manufacture a copy, even by an attacker who has access to the same semiconductor manufacturing equipment as the original manufacturer.

"Due to deep submicron manufacturing process variations, every transistor in an IC has slightly different physical properties. These variations lead to small but measurable differences in electronic properties, such as transistor threshold voltages and gain factor. Since these process variations are not fully controllable during manufacturing, these physical device properties cannot be copied or cloned."

"By utilizing these inherent variations, PUFs are very valuable for use as a unique identifier for any given IC. They do this through circuitry within the IC that converts the tiny variations into a digital pattern of 0s and 1s, which is unique for that specific chip and is repeatable over time. This pattern is a 'silicon fingerprint,' comparable to its human biometric counterpart."

The article is about a particular type of PUF called an SRAM PUF. Synopsys, a company that makes software used for designing chips, has a circuit design that you can add to the circuit design for a SRAM (combined before manufacturing at the "intellectual property" -- IP -- stage) to get it to generate a "root key" when the device is first started up.

https://www.eetimes.eu/synopsys-pufs-create-cryptographic-keys-that-never-get-stored/

Demographics

12. There's a city in China you've never heard of that has 36 million people -- the largest in the world. It's not Shanghai or Beijing, it's Chongqing... or so this YouTuber (Drew Binsky) says.

Well, citypopulation.de says Chongqing is number 43, right after... eh, do I really have to list out 42 cities? It would be much simpler if it was just a handful. Oh, well, let's do this. Chongqing is at number 43, at 10.9 million, right after: Changsha (11.0 million), Hyderabad (11.4 million), Tianjin (11.5 million), Paris (11.5 million), Lima (11.8 million), Wuhan (12.2 million), Rio de Janeiro (12.5 million), Chennai (12.6 million), Xi'an (12.8 million), Ho Chi Minh City (Saigon) (13.9 million), Hangzhou (13.9 million), Bengaluru (Bangalore) (14.2 million), Lahore (14.5 million), Johannesburg (14.6 million), Xiamen (14.9 million), London (14.9 million), Kinshasa (15.6 million), Istanbul (15.9 million), Tehran (16.5 million), Buenos Aires (16.7 million), Los Angeles (17.2 million), Chengdu (17.3 million), Osaka (17.7 million), Kolkata (Calcutta) (17.7 million), Moskva (Moscow) (19.1 million), Lagos (20.7 million), Karachi (20.9 million), Krung Thep (Bangkok) (21.2 million), Beijing (21.2 million), New York (22.0 million), São Paulo 22.1 million, Dhaka (22.5 million), Al-Qahirah (Cairo) (22.5 million), Seoul (25.1 million), Ciudad de México (Mexico City) (25.1 million), Mumbai (27.1 million), Manila (27.2 million), Jakarta (29.2 million), Delhi (34.6 million), Shanghai (40.8 million), Tokyo (41.0 million), and (drum roll please...) Guangzhou (70.1 million).

If you've never heard of Guangzhou, you probably have heard of Shenzhen. Well, what happened is a bunch of cities in the Northern Pearl River Delta in China grew to the point where they basically all merged together into a single metropolitan area. These include Shenzhen, Dongguan, Foshan, Huizhou, Jiangmen, Zhongshan, and, yes, Guangzhou, which, for some reason, became the name for the whole agglomeration, rather than Shenzhen.

It seems to me like the reason he thinks Chongqing has more people than Guangzhou is because Chongqing is *denser*. The subjective feeling of "population" probably correlates mostly with population density. According to the numbers, Los Angeles has more people, with 17.2 million people, but I've been to Los Angeles (it's the largest city I've been to), and LA is the definition of "suburban sprawl". It's a place where you can drive for 3.5 hours in one direction and see nothing but houses and shopping malls. I've never been to New York, the largest city in the US, or Chicago, the 3rd largest, but it doesn't seem like from videos that they go "3D" and stack people on top of each other to the degree Chongqing does.

They say "5D" but I don't know where they're getting 4th and 5th dimensions -- it looks 3D to me.

I found a list of cities by population density, but, weirdly, when I look at a bunch of those top places on Google Street View (Port-au-Prince, Giza, Manila, Mandaluyong, Malé, Dhaka, Bnei Brak, Kolkata, Kathmandu), the look less crowded than Chongqing in the video. Well, I suppose there's always the possibility the statistics on citypopulation.de are wrong? And maybe Chongqing really is the world's biggest city, like the YouTuber insists? How are all these people counted, anyway?

Citypopulation.de list of urban agglomerations:

https://citypopulation.de/en/world/agglomerations/

List of cities by population density. It goes by municipal boundaries, not the total agglomeration, like the numbers above.

https://en.wikipedia.org/wiki/List_of_cities_proper_by_population_density

While we're on the subject of China, VPNs in China don't allow you to get access to Gmail or Google Maps or any other Google services or most "Western" services, according to Elina Bakunova aka "Eli From Russia", unlike Russia which blocks lots of US-based internet services but doesn't bother to stop people from getting out using VPNs. Apparently China has cracked down hard on the VPNs so it is very hard in China to get a VPN that will give you access to the outside.

Also, she says Gmail is illegal in China. She was able to get access to information in her Gmail by contacting friends in Russia through VKontakte, a Russian social network like Facebook that is apparently not blocked in either Russia or China (and is also not blocked here -- so you can communicate with people in Russia and China with it from here -- if you don't mind your communications being monitored by the Russian government).

Aeronautics

13. Boom Supersonic made a plane called Overture that is supersonic but doesn't make a sonic boom. Obviously they have a sense of humor calling the company that makes a supersonic plane with no sonic boom "Boom Supersonic".

"In Mach cutoff, the sonic boom refracts in the atmosphere and never reaches the ground. Exact speeds and altitude vary based on atmospheric conditions."

"Overture's advanced autopilot will continuously optimize speed for Boomless Cruise based on real-time atmospheric conditions. Boomless Cruise is possible at speeds up to Mach 1.3, with typical speed between Mach 1.1 and 1.2."

"Boomless Cruise leverages well-known Mach cutoff physics, where a sonic boom refracts upward due to temperature and wind gradients affecting the local speed of sound. This is similar to how light bends when passing through a glass of water. By flying at a sufficiently high altitude at an appropriate speed for current atmospheric conditions, Overture ensures that its sonic boom never reaches the ground."

"Unlike 'low boom' designs, Boomless Cruise does not require extensive aircraft shaping. Instead, it relies on the ability to break the sound barrier at a high enough altitude where the boom refracts harmlessly away from the ground."

"On Overture, Boomless Cruise is specifically enabled by the Symphony engines. These engines feature enhanced transonic performance compared to commercially derived engines, allowing Overture to efficiently transition to supersonic speeds at altitudes above 30,000 feet."

"Additionally, Boomless Cruise is enabled by an advanced autopilot which uses current weather conditions and software algorithms to automatically select the optimal speed for Boomless Cruise."

"Boomless and low boom are distinct concepts. In Boomless Cruise, Overture's sonic boom does not reach the ground. However, at its full Mach 1.7 speed over water, Overture will generate a sonic boom."

"Overture is designed to be most efficient at high subsonic speeds of Mach 0.94 and supersonic speed of Mach 1.7. As aerodynamic drag increases closer to speed of sound, there is a modest increase in fuel burn at low supersonic speeds. Even with this drag penalty, Overture still has plenty of range to fly the longest transcontinental routes, such as Vancouver-Miami."

https://boomsupersonic.com/

Latest updates on X:

https://x.com/bscholl/status/1888939430833975765

Absurdity

14. A young "technologist" known online as "Big Balls" who works for Elon Musk's "so-called" Department of Government Efficiency (DOGE) briefly worked at Path Network, a network monitoring firm known for hiring reformed criminal hackers (including Eric Taylor, also known as "Cosmo the God," allegedly a member of the hacker group UGNazis, and Matthew Flannery, an Australian convicted hacker allegedly a member of the hacker group LulzSec), allegedly solicited a cyberattack-for-hire service -- or not, maybe that was someone else using his Telegram handle -- founded a company called Tesla.Sexy LLC, which offers a service called Helfie, which is an AI bot for Discord servers targeting the Russian market, and another called faster.pw, that is currently inactive but an archived version shows the service provided "multiple encrypted cross-border networks" in Chinese, and worked at Elon Musk's Neuralink brain implant startup before joining DOGE, enabling him to circumvent the requirement for a security clearance -- at the age of 19. What movie am I watching?

https://www.wired.com/story/edward-coristine-tesla-sexy-path-networks-doge/

Continuing the story, Brain Krebs of Krebs On Security says:

"Here's the real story behind why Edward Coristine only worked at Path for a few months. He was fired after the founder of Path, Marshal Webb, accused him of making it known that one of Path's employees was Curtis Gervais, a serial swatter from Canada who was convicted of perpetrating dozens of swattings and bomb threats -- including at least two attempts on our home in 2014. [BTW the aforementioned Eric Taylor was convicted of a separate (successful) swatting against our home in 2013."

(Open bracket not closed in original.)

"In the screenshot here, we can see Webb replying to a message from Gervais stating that 'Edward has been terminated for leaking internal information to the competitors.'"

https://infosec.exchange/@briankrebs/113957683483583881

Fun Stuff

15. YouDJ. I guess this has been around for a while but I never saw it until today. This is pretty awesome. Pretty impressive for all in a browser. Anyone can be a DJ, just go through the tutorial boxes.

https://youdj.online/

Get the Unitree G1 robot to dance to your tunes.

UAVs & the Future of War

16. Drones are responsible for 90% of battlefield fatalities, according to this Ukraine war veteran. That's not a calculated statistic, that's just what he estimates from his battlefield experience. So if you've heard that artillery is king, that may have been true of wars in the past, and maybe the Ukraine war initially looked like it was becoming "trench warfare" similar to previous wars, but no, artillery is not king, it's drones. The Ukraine war is a drone war.

The drones can kill people directly or they can call in artillery. You always have to think about concealment directly above you, not just along what you think is line of sight to the enemy, because you can be spotted by a surveillance drone.

The drones have infrared cameras, which make it extremely hard to sneak around. If one member of your team turns on a flashlight at night, you're all likely to get killed. If one member of your team pulls out a phone with face ID turned on at night, you're all likely to get killed. To infrared cameras, these things light you up like a Christmas tree.

Share Wayne's Future

Wayne's Future