Tecnologia do Blogger.
RSS

Anthropic sues Pentagon ⚖️, Siri delays Apple products 🖥️, Claude Code Review 👨‍💻

Anthropic has filed two lawsuits against the Department of War to change the Pentagon's decision to label it a 'supply chain risk' ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌  ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 

TLDR

Together With WorkOS

TLDR 2026-03-10

How to Test AI Agents That Never Produce the Same Output Twice (Sponsor)

Same input. Same prompt. Different output. That's the reality of testing AI agents that write code, and most teams are shipping without solving it.

Nick Nisi from WorkOS tackled this by building eval systems for two AI tools: npx workos, a CLI agent that installs AuthKit into your project, and WorkOS's agent skills that power LLM responses about SSO, directory sync, and RBAC.

The post covers how to test against real project structures, score output that's different every time, and catch when your agent makes up methods that don't exist.

Learn more about evals →
📱

Big Tech & Startups

Anthropic Sues Pentagon Over 'Supply Chain Risk' Label (4 minute read)

Anthropic has filed two lawsuits against the Department of War to change the Pentagon's decision to label it a 'supply chain risk'. One of the lawsuits is in the US District court in the Northern District of California, and the other is in the US Court of Appeals for the District of Columbia Circuit. The supply chain risk designation is typically applied to firms that are deemed a major national security risk and has never been used on an American company. Anthropic has offered to continue negotiating with the Pentagon and also offered to help move the Pentagon off its technology and onto another AI system. OpenAI and xAI have signed agreements to provide technology on the Department of War's classified systems in recent weeks.
Apple Postpones Smart Home Display Launch as It Waits for New AI and Siri (4 minute read)

Apple's smart home display has been delayed until later this year. The project was first scheduled to launch in the spring of 2025, but it was postponed to let the company finish work on a new Siri digital assistant. It was then scheduled to be released this month, but the new Siri is still not yet ready. The display is designed to be a central AI hub for the home that can display personalized data, such as calendar appointments, reminders, notes, and more.
🚀

Science & Futuristic Technology

China leads the humanoid robot race — but the US still has a shot (6 minute read)

China's humanoid robots now dominate the market with over 90% of global sales and thousands of units shipped last year. Tesla's Optimus robots still won't be ready for launch until at least next year. While Chinese vendors are more advanced when it comes to production scales, US companies are very strong at the technical side of things, especially in the hardware and software departments. By the time humanoid robot startups build up their production bases, they will be ready for large-scale deployment.
How to Design Antibodies (29 minute read)

BoltzGen is the leading open-source approach for computational antibody design. It uses the permissive MIT license, so it can be used commercially by anyone. This guide walks through the full process of designing an antibody from home using BoltzGen. The process involves choosing a target, preparing a target structure, running a design campaign, filtering candidates, and experimentally validating the results.
💻

Programming, Design & Data Science

✂️ Cut QA Cycles From Hours to Minutes With Automated Testing (Sponsor)

If slow QA cycles are holding your team back from releasing faster, try QA Wolf.

Their fully managed, AI-native service delivers 80% automated E2E test coverage in weeks and helps teams ship 5× faster by cutting QA cycles from hours to minutes.

⭐ Rated 4.8/5 on G2.

Schedule a demo to learn more →

Code Review, a new feature for Claude Code (1 minute read)

Code Review is a new feature for Claude Code that dispatches a team of agents on every PR to catch bugs. Built for depth, not speed, the system is now in research preview for Team and Enterprise. Anthropic runs Code Review on nearly every PR. The tool doesn't approve PRs, but it closes the gap so reviewers can cover what's shipping. Reviews are billed on token usage, and admins have several ways to control spend and usage.
Perhaps not Boring Technology after all (2 minute read)

The recurring concern, that large language models will push technology choices towards the tools best represented in their training data, making it harder for new tools to break through the noise, doesn't really hold up anymore. New models have large enough context lengths that they can consume a lot of documentation before they start working on a problem. Most agents work just fine in existing codebases that use libraries or tools too private or new to feature in the training data. Developers are still free to choose whatever tools they want to use and are not restricted to using the ones LLMs are most familiar with.
🎁

Miscellaneous

After falling far behind the rest of industry, Blue Origin creates new stock option plan (8 minute read)

When Jeff Bezos launched Blue Origin, he knew that the company would not meet investors' expectations for return on investment over a typical investing horizon. Decades later, the company is still not operationally profitable, though recently, it has made impressive strides and seen financial returns from the sale of engines and commercial launches. To continue its growth and attract top talent, Blue Origin will begin granting stock options to employees this spring. The new program is structured to provide opportunities for liquidity events that will enable employees to convert vested stock options into realized value. More details about the program will be released during a company-wide meeting on April 17.
Amazon tells FCC to bin SpaceX's million-satellite datacenter dream (2 minute read)

Amazon has criticized SpaceX's application for permission to launch a fleet of orbital datacenter satellites as incomplete, speculative, and unrealistic. It wants regulators to reject the application, which it says is a speculative placeholder rather than a complete application under the Commission's rules. Amazon also raised concerns about satellite interference and environmental objections. Analysts say that SpaceX's plan of putting datacenters in space is 'peak insanity' as running spaceborne facilities would be uneconomical and could never satisfy terrestrial demand for compute power.

Quick Links

Former Meta AI Chief's Start-Up Is Valued at $3.5 Billion (3 minute read)

Yann LeCun's Advanced Machine Intelligence Labs is only a month old and employs just 12 people.
Ghostty 1.3.0 (30 minute read)

Ghostty 1.3.0 is a significant release that includes hundreds of improvements, bug fixes, and performance optimizations across all platforms.
Bluesky CEO Jay Graber steps down (2 minute read)

Graber, who will be replaced by Toni Schneider as interim CEO, will transition to a new role as chief innovation officer.
Video Conferencing with Postgres (7 minute read)

SpacetimeDB recently open-sourced a way for people to make video calls over a database.
10x is the new floor (3 minute read)

AI amplifies people with agency and curiosity.
The human.json Protocol (12 minute read)

human.json is a lightweight protocol for humans to assert authorship of their site content and vouch for the humanity of others.

Love TLDR? Tell your friends and get rewards!

Share your referral link below with friends to get free TLDR swag!
Track your referrals here.

Want to advertise in TLDR? 📰

If your company is interested in reaching an audience of tech executives, decision-makers and engineers, you may want to advertise with us.

Want to work at TLDR? 💼

Apply here, create your own role or send a friend's resume to jobs@tldr.tech and get $1k if we hire them! TLDR is one of Inc.'s Best Bootstrapped businesses of 2025.

If you have any comments or feedback, just respond to this email!

Thanks for reading,
Dan Ni & Stephen Flanders


Manage your subscriptions to our other newsletters on tech, startups, and programming. Or if TLDR isn't for you, please unsubscribe.

  • Digg
  • Del.icio.us
  • StumbleUpon
  • Reddit
  • RSS

0 comentários:

Postar um comentário