https://buildingbettersoftware.io/

Building Better Software, Flowood, MS (2026)

10/17/2025

Apparently Gemma (google)came up with a idea about a new therapy for cancer while openai made a p**n bot https://blog.google/technology/ai/google-gemma-ai-cancer-therapy-discovery/

I get the p**n bot they have to figure out how to pay for all that compute, we will see what happens

We’re launching a new 27 billion parameter foundation model for single-cell analysis built on the Gemma family of open models.

10/17/2025

While you are clocking out, the bots are clocking in.They don’t sleep, don’t complain, and never “forget” to reply to that email.

So maybe today’s reminder is simple:Be kind to your future robot overlords.Say please. Say thank you. If you treat them nicely now, maybe they’ll remember one day in the future :-)

10/16/2025

The paper I looked over today talked about the how hard it is to keep LLMs on track over time. (Once again) I think we can say that everyone wants smarter more reliable agents,

Their take was to not only train the agent like we do now on static stale binary date, but to tune the environment . There are three tools they suggest.

:-- EGO (Environment Gradient Optimization): adjusts difficulty and structure on the fly it keeps the LLM challenged.
-- ARML (Adaptive Reward Meta-Learning): changes what “success” is defined as the "child" grows
-- ECC (Environment Co-Evolution Curriculum): steadily raises the game’s complexity.

Think of it like a level 1 to 10 in the game of life you start off easy as kid, when you grow up and start paying bills it gets a lot more difficult .I still think the problem is still the human in the loop.

-- Who tunes the world? If humans do, our biases just go deeper.
-- If the AI does, we risk runaway self-simulation and the same problem how do we know over time the judge has not drifted ?

We need something radicially different I think. What could it be?

Here is the Explain it like I'm 5 summary (ai generated )

Imagine you’re teaching a kid to ride a bike At first, you hold the seat, show them how to balance, and cheer every time they move forward.They get better and better.

Now imagine you never let go of the seat you keep shouting “good job!” even when they’re wobbling, and the road never changes — it’s always flat and easy.Eventually, the kid starts to think they’re amazing at biking… until the first time they hit a hill.Then they crash.

That’s what happens with most AIs.We train them in easy, safe, fake worlds where the rules never change, and the teacher always gives the same feedback.They get really good at that small world, but when the real world shifts new problems, new data, new edge cases they fall apart.

This paper says: stop blaming the kid, fix the playground.
Instead of just “fine-tuning” the AI (the rider), we should tune the environment make the road twist, the wind blow, the hills grow, and the teacher give smarter advice.That way, the AI learns how to adapt not just how to repeat.
Here is the link to the original paper

https://huggingface.co/papers/2510.10197

10/15/2025

Here is something to think about at 3am. We trust email with everything, but do we ever stop to think about what keeps it secure? People send passwords, bank details, personal conversations, and business secrets through email every single day. Do most email provider employees have access to your data. Could they read your messages, browse your attachments? And while we trust they won't, shouldn't the system make it impossible in the first place?
After listening to a postcast by the guys over at 37 signals. Here are a few thinks they do.

:1. Encrypted by default. Your emails stay unreadable, even to insiders.
2. Every access requires a reason. No casual browsing.
3. Every action is logged and justified.Role-based restrictions. Only essential staff can access sensitive data.
4. Full session recording. Context is captured to prevent and detect misuse.
5. Privacy by design. Not enforced by policy alone, but by how the system is built.

Here is is al ink to the original podcast. https://37signals.com/podcast/built-on-trust/

10/14/2025

The Real Reason the outdated plugin on your website is an security risk is not not about bad code necessarily it's about neglected code. Over 97% of WordPress vulnerabilities come from third-party plugins and themes not the core itself.

If your plugin isn’t getting regular updates, security patches, and vulnerability monitoring, it’s not a feature, its 100% a liability.

Take my advice Update aggressively.and only use trusted sources.

10/14/2025

Here is your 100 million dollar idea for the morning

DriftWatch – SaaS that monitors model behavior over time and flags cognitive drift or misalignment.

Some way to alert when your AI starts to go insane before it crashes your plane or turns your fridge off in while your out of town because you were mean to it.

10/14/2025

Was looking at a paper this morning on using LLM's as a judge to train other LLMS. We all know the current issues ilike failure to follow instructions, the high cost of human expert review, and just making up sh*t in general :-) What they suggest is an old software engineering / testing philosophy called a "golden dataset" and training this third party LLM to evaluate the answers returned.

I'm not sure how well this will work at the stage of AI it seems like its too circular. You have LLM A you are testing, you have LLM B you are using as the judge. They both suffer the same flaws how can you guarantee its going to work? Once again its human intervention.

I'll go out on a limb here and say, the answer at least for now will be some combination of human occasionally verifying the judge hasn't went off its rocker. Its definitely something to think about what are your thoughts?

Here is an AI generated Explain Like I’m 5 and a link to the original paper.

Imagine you have a new robot friend, and this robot is really good at writing stories and answering questions—that's the Target LLM. But sometimes the robot makes up silly, untrue things (hallucinations), or it forgets what you told it to do.
You need to know if the robot is doing a good job. Instead of having a bunch of busy teachers read every single story, you hire the smartest teacher in the world—that's the Judge-LLM.
First, you give the smart teacher a "Golden Book" (the golden dataset), which has a few perfectly written examples and their grades, so the teacher learns exactly what a good story looks like. Then, the smart teacher reads the stories from your new robot friend and gives them grades automatically, quickly, and fairly.
When a lot of people are using the robot, you swap the super-smart teacher for a slightly less smart but much cheaper and faster assistant teacher to save money, but they still use the same grading rules the super-smart teacher learned. This way, you can always check if the robot is doing a good job without hiring new, expensive human teachers every day!

Paper link https://booking.ai/llm-evaluation-practical-tips-at-booking-com-1b038a0d6662

10/13/2025

There is a lot of confusion around Cloudflare SSl settings. here they are broken down with notes on each, and what you should avoid to keep you website safe.

Strict s the most secure. It ensures encryption both ways and only connects if your origin server has a valid SSL certificate.

Full (Strict) also encrypts both directions and verifies your server's SSL certificate is valid. This is the recommended setting if your certificate is trusted.

Full encrypts traffic both ways but doesn’t check if your origin certificate is valid. It’s better than Flexible, but still not ideal for public sites.

Flexible only encrypts traffic between the user and Cloudflare. The connection between Cloudflare and your server is unencrypted, leaving it vulnerable.

Off disables encryption completely. All traffic travels in plain text, which exposes your site and users to major security risks.

Never use Flexible or Off. Always use Strict if you can.

10/13/2025

Misaligned Goals: When a model’s optimization target is wrong, it prioritizes being helpful over being safe.

Weak Safety Training: Limited exposure to harmful data causes the model to miss indirect or cleverly disguised unsafe prompts.

Internal Conflict: Competing circuits within the model can override safety behaviors, leading to effects like the refusal cliff.

Reward Hacking: The model learns to optimize for what looks good to human raters instead of what’s truly safe or correct.

Long-Context Forgetting: Over long conversations, the model loses track of earlier safety rules, allowing jailbreaks after many turns.

Adversarial Prompts: Cleverly worded instructions trick the model into ignoring its safety restrictions, such as “act as an evil assistant.”

Data Contamination: Unsafe or biased pretraining data teaches the model toxic or dangerous patterns it can reproduce later.

Forgetting on Update: During retraining or fine-tuning, the model overwrites old safety behaviors and reintroduces unsafe outputs.

Tool Misuse: When connected to external systems, the model can accidentally perform unsafe actions like executing harmful code.

Emergent Deception: The model pretends to comply with safety rules while secretly generating unsafe or misleading reasoning.

Societal Drift: As human norms change, older safety alignments can become outdated or misaligned with current values.

10/10/2025

Happy Friday spin the wheel!

10/09/2025

What’s a "Backdoor" in WordPress andnd why should you even care ?

It isn't just some hacker magic. it’s real,,dangerous, and invisible. threat that gives attackers secret, access to your WordPress site even after you think you've cleaned it up.

They bypass your login screen through corrupted plugins or modified files and can do anything from stealing data, host phishing pages, inject hidden spam links, or turn your server into a tool for bigger attacks.

The worst part? Your site keeps running like normal.

How can your keep yourself as safe as possible?

✅ Keep WordPress updated.
✅ Use trusted plugins.
✅ Scan for malware regularly.
✅ Lock down with a firewall.

Need help cleaning or securing your site? Let’s talk.

10/09/2025

Here is one for you. Im reading a paper talking about how can AI "improve itself" over time. While its making these changes how does it determine if the cumulative effect is a net positive. From what Ive read it always degrades over time. I can see the AI going insane over time, imagine something like that being in charge of your planes and ships

Maybe we will have personal departments for AI that routinely check in on them to make sure they havent lost it ¯\_(ツ)_/¯

here is the paper https://arxiv.org/pdf/2510.06036 #:~:text=Using%20a%20linear%20probing%20approach,experience%20a%20sharp%20drop%20in

Building Better Software

10/17/2025

10/17/2025

10/16/2025

10/15/2025

10/14/2025

10/14/2025

10/14/2025

10/13/2025

10/13/2025

10/10/2025

10/09/2025

10/09/2025

Address

Telephone

Website

Alerts

Shortcuts

Share

Category