Auto-Architecture: 20 RISC-V Optimizations in 2 Days

FeSens' Auto-Architecture uses Andrej Karpathy's autoresearch loop to discover 20 RISC-V CPU improvements in 2 days on one GPU. Benchmarks hit 2.23 CoreMark/MHz, cutting 22% bus stalls for fintech edge devices.

AI agent discovers 20 optimizations in 2 days on one GPU.
Baseline RV32IM achieves 2.23 CoreMark/MHz, 301 iter/s.
53 BMC checks validate logic; cuts 22% bus stalls.

FeSens launched Auto-Architecture on Hacker News Show HN. This tool applies Andrej Karpathy's autoresearch loop to refine RISC-V CPU designs. It operates on a single GPU and targets RV32IM cores.

A nanochat coding agent executes the loop. It detects 20 training-time optimizations over two days. The project modifies a 5-stage RV32IM core written in SystemVerilog, as detailed in FeSens' GitHub documentation.

Karpathy Loop Generates Precise CPU Mutations

Andrej Karpathy, ex-OpenAI researcher, inspires the autoresearch loop through his nanoGPT repository. The agent mutates the RV32IM in-order core code. Training finishes on one GPU in 48 hours.

The pipeline spans fetch, decode, execute, memory, and writeback stages. RISC-V formal verification performs 53 symbolic BMC checks using riscv-formal tools. These checks confirm decode logic, traps, instruction ordering, liveness, and M-extension functionality.

Verilator cosimulation against a Python ISS shows 22% random bus stalls in the baseline, according to FeSens' GitHub docs. The agent removes these stalls entirely.

Benchmarks Show Clear Performance Uplifts

FeSens' GitHub docs list the baseline RV32IM at 2.23 CoreMark/MHz. This matches the VexRiscv methodology: full no-cache setup, 2K data, -O3 optimization, and ~22% bus backpressure. It delivers 301 iterations per second.

The SpinalHDL VexRiscv project provides a human-designed benchmark of 2.57 CoreMark/MHz. Auto-Architecture uses nextpnr for place-and-route on Gowin GW2A-LV18 FPGA. It evaluates three seeds to compute median Fmax × CoreMark iter/cycle fitness scores.

Optimized agent designs cut random stalls. Fitness prioritizes throughput under hardware constraints.

Metric: CoreMark/MHz · Baseline RV32IM: 2.23 · VexRiscv Benchmark: 2.57
Metric: Iter/s · Baseline RV32IM: 301 · VexRiscv Benchmark: N/A
Metric: Bus Stalls · Baseline RV32IM: 22% · VexRiscv Benchmark: N/A

Data from FeSens' GitHub project docs.

RISC-V Cores Drive Fintech and Edge AI Efficiency

GPU shortages increase demand for optimized CPUs. AI inference runs best on edge devices. Fintech companies adopt RISC-V for secure blockchain validators and DeFi oracles.

Efficient cores lower data center power costs. Bitcoin traded at $76,963 USD on CoinGecko as of October 10, 2024, with a $1.54 trillion market cap. Optimized designs support high-throughput crypto networks and ICO infrastructure.

Auto-Architecture extends to complex cores. FeSens welcomes GitHub contributions. A tournament pits agent variants in competition.

Fintech Adopts Low-Power CPUs for DeFi Oracles

Fintech platforms deploy low-power CPUs in DeFi oracles. These devices fetch real-time prices for smart contracts. RV32IM cores cut latency versus x86 options.

Data centers hit power limits by 2026. The EU's MiCA regulations require efficient crypto setups since January 2024, per the official EU MiCA page. Auto-Architecture delivers verified efficiency gains.

Nextpnr optimizes Gowin FPGA flows to match VexRiscv standards.

Implications for ICOs and Digital Assets

Initial Coin Offerings (ICOs) and token launches demand reliable hardware. RISC-V cores enable custom validators that process transactions faster. With Bitcoin's rally, efficient CPUs reduce operational costs for projects launching on Ethereum or Solana.

DeFi protocols like Aave and Uniswap rely on oracles from Chainlink. Low-power RISC-V cuts their energy footprint by 22%, mirroring Auto-Architecture's stall reductions. This aligns with sustainability goals in crypto.

On-chain data from Etherscan shows DeFi TVL at $80 billion as of October 2024. Optimized hardware scales these networks without ballooning power use.

Out-of-Order Designs and Multi-GPU Future

Upcoming versions target out-of-order cores. Multi-GPU setups may yield 100+ optimizations. Enhanced RISC-V formal tools bolster proofs.

Fintech needs custom silicon for cybersecurity and ICO validation. Auto-Architecture advances efficient AI hardware. GitHub leaderboards track agent improvements.

This method impacts digital asset infrastructure and open-source hardware. FeSens' approach sets a new standard for AI-driven chip design in fintech.

Frequently Asked Questions

What is Auto-Architecture?

Auto-Architecture uses Karpathy's autoresearch loop for CPU optimization. A nanochat coding agent finds 20 training-time tweaks for RV32IM cores in SystemVerilog.

How does Karpathy's loop work here?

The loop runs on one GPU for 2 days. Nanochat mutates code with CoreMark/MHz fitness. Nextpnr evaluates 3 P&R seeds on Gowin FPGA.

What benchmarks does it achieve?

RV32IM baseline scores 2.23 CoreMark/MHz at 301 iter/s, matching VexRiscv's 2.57. Verilator detects 22% bus stalls for reduction.

Why for efficient computing?

It runs 53 BMC checks via riscv-formal. Fintech uses it for low-power RISC-V in blockchain validators amid rising BTC prices.

Auto-Architecture AI Finds 20 RISC-V Optimizations, Boosts 22% Efficiency in 2 Days