Two RAG wins in one year: LiveRAG and MMU-RAG

I still find this a bit surreal to write: our team had two big RAG competition results in one year.

First, we won the SIGIR 2025 LiveRAG Challenge. Later, at NeurIPS 2025 MMU-RAG, our system won the Best Dynamic Evaluation award in the Open Source category for the text-to-text track, as described in our MMU-RAG system paper.

As someone who spends a lot of time thinking about retrieval, evidence, and how LLM systems actually behave in realistic settings, this meant a lot to me. These competitions are not toy demos. They are messy, time-constrained, high-pressure evaluations where a system has to actually work.

The two results

1) SIGIR 2025 LiveRAG Challenge

On the official LiveRAG challenge page, the winners are listed as:

First place: RMIT-ADMS — Kun Ran, Shuoqi Sun, Khoi Nguyen Dinh Anh, Damiano Spina, Oleg Zendel

The same page says that on the live challenge day, teams had to answer a stream of unseen questions under a two-hour time limit, and 25 teams returned valid answers.

Like the ADM+S announcement said, the competition drew 70 teams from 27 countries, and during the live event teams had to answer 500 never-before-seen questions using the same AI model and dataset.

That combination is exactly why this win feels special to me. It was not just about looking good on an offline benchmark. It was about building a system that could hold up under pressure.

2) NeurIPS 2025 MMU-RAG

After LiveRAG, we kept going.

Our later system, R2RAG (Routing-to-RAG), is described in our paper RMIT-ADM+S at the MMU-RAG NeurIPS 2025 Competition. In that paper, we describe it as an award-winning system for the Text-to-Text track, and note that it won the Best Dynamic Evaluation award in the Open Source category.

Like the ADM+S announcement on MMU-RAG said, this was the team’s second big RAG competition result that year. It also mentioned something I think is very telling: 81 teams registered, but only 8 submitted a fully working system.

That statistic says a lot. In competitions like this, ideas matter, but engineering discipline matters just as much.

What we built

One thing I really like about this story is that the second system clearly grew out of the first.

LiveRAG: G-RAG / GRAG

In our LiveRAG paper, we described Generation-Retrieval-Augmented Generation (GRAG).

The core idea was simple, but effective:

  • generate a hypothetical answer first,
  • use that generated answer alongside the original question during retrieval,
  • then apply LLM-based pointwise re-ranking before final answer generation.

As Damiano’s publication page put it, the submitted system achieved the highest Borda score, based on aggregated manual evaluation of Coverage, Relatedness, and Quality, and ranked first in the SIGIR 2025 LiveRAG Challenge.

MMU-RAG: R2RAG

In MMU-RAG, we pushed the ideas further with Routing-to-RAG (R2RAG).

In the MMU-RAG paper, we describe R2RAG as a research-focused RAG architecture composed of lightweight components that dynamically adapt retrieval strategy based on:

  • inferred query complexity, and
  • evidence sufficiency.

What I especially like about this design is that it is not just about throwing a larger model at the problem. The paper explicitly says that the system uses smaller LLMs and can operate on a single consumer-grade GPU, while still supporting complex research tasks.

That matters to me. Efficient systems are not just cheaper. They are usually easier to reason about, easier to iterate on, and easier to reproduce.

Why this matters to me

A lot of RAG discussion online gets trapped in shallow comparisons: which model, which embedding, which vector database, which benchmark number.

But these competition experiences kept bringing me back to the same things:

  • retrieval quality matters;
  • evidence use matters;
  • evaluation design matters;
  • latency and robustness matter;
  • and careful system design can beat brute force.

I am especially proud that these systems came out of a research environment where we care about evidence-backed answers, reproducibility, and understanding why something works.

Huge thanks

These results were absolutely team efforts.

For LiveRAG, the official challenge page lists the winning team as:

  • Kun Ran
  • Shuoqi Sun
  • Khoi Nguyen Dinh Anh
  • Damiano Spina
  • Oleg Zendel

For MMU-RAG, the paper lists the authors as:

  • Kun Ran
  • Marwah Alaofi
  • Danula Hettiachchi
  • Chenglong Ma
  • Khoi Nguyen Dinh Anh
  • Khoi Vo Nguyen
  • Sachin Pathiyan Cherumanal
  • Lida Rashidi
  • Falk Scholer
  • Damiano Spina
  • Shuoqi Sun
  • Oleg Zendel

I feel very lucky to have worked with such a strong team.

What’s next

Winning is fun, but to me the more interesting part is what comes after.

The real goal is not to collect trophies. It is to build RAG systems that are genuinely useful for research and real-world information seeking:

  • better at deciding when to retrieve,
  • better at deciding what evidence is enough,
  • better at being faithful to sources,
  • and better at operating under realistic constraints.

That is the direction I want to keep pushing.

References

Apple Shortcuts to boost your reading performance

Background

Lately, I have been delving into papers from a new field, and I often find the sentences challenging to comprehend. Instead of searching for clarification online, which disrupts my flow and hinders my ability to follow the author’s thought process, I have been trying alternative strategies.

Proposed Shortcuts

During the process, I developed several shortcuts utilizing LLMs, These methods assisted me in explaining concepts, answering my questions, and proof-reading my writings.

⚠️ Is my api key safe? Read the README of html-gpt: https://github.com/rankun203/html-gpt.

You will need to obtain an OpenAI API Key to use these shortcuts. If not, you can change the shortcut config to point to another free model.

Example usage

Example usage of shortcuts

Original Groq Shortcuts

Change logs

  • 09/04/2024: updated models to gpt-4-turbo-preview
  • 06/04/2024: added more sophisticated GPT-4 for tasks, streaming output
  • 18/03/2024: added logging, storing all requests and responses, system prompts, into an HTML file, stored in iCloud Shortcuts folder
  • 17/03/2024: initial version with Groq

Privacy Policy for Readnow Safari Extension

Effective date: June 14, 2023

Kun Ran (“we,” “our,” or “us”) respects your privacy and is committed to protecting it through this Privacy Policy. This Privacy Policy governs your access to and use of the Readnow Safari Extension, including any content, functionality, and services offered on or through it (the “Extension”).

Please read this policy carefully to understand our policies and practices regarding your information and how we will treat it. By using or accessing the Extension, you agree to this Privacy Policy.

Read More

Give everything you have and watch the ripples

Give everything you have and watch the ripples.

居然被热醒了… 有一些时间没回来成都了,刚刚 Jimmy 给我发了他公司 “new space” 的 3D 模型跟场景渲染图,我太喜欢了,明天去看看施工进展。现在他公司做的整体装修布置我觉得已经很🉑了(好吧我是眼馋他满桌子的无人机 跟 3D 打印的机械臂,哪个科技发烧友不眼馋 😂),但我知道他想要更多,更好,”more”, more! 多么富有人类情感的一个词 😝

他之前描述他以前在美国的公司设计思路的时候,我就觉得他很注重细节和整体用户体验,谈项目 as an experience 也可以是一个设计点,要知道那个时候即便是在美国他也是科技公司里比较早做这样的考虑的。”because we wanna give our clients a better experience when they come to us”, “they ended up working at our office more than their own, because it just, feels good”(原话已经忘的差不多了,大概就是这样)为什么人家能拿到你的项目啊?因为你从踏进人家公司的那一刻就处于人家的体验之中,如果没把你感动说明他活儿不好 😂😂

我最近在 adidas 办公了一段时间,也体验到了类似于他当时说的那种环境,大的夸张的 4K 屏幕放在会议室里,各种处于同一 eco-system 中的设备相互连接提供最佳的现场与远程参会体验。不过相比于不差钱的 adidas,你在金钱有限的时候能把活儿做的多好这才是更有趣的地方。在有限的资源与无限的欲望之间,「设计」发光发热 🤹

无疑他是一个优秀的商人,他的时间与精力大都浇灌在了他做的事情上面。想起 19 年我们刚合作那会儿他刚完成一个前后协同开发的架构设计跟基础开发,然后除了我们项目,他跟他老婆还开始了现在这个公司,这种 200% 做事的激情是打动我的地方。想一想身边还有许多这样的朋友,Haishan, Kaleo , Sol, Trista 他们,有的打磨自己的手艺,有的浇灌自己的生活。感动,实在是感动 🥳🥳 如开头那句最近看到的话我很喜欢,我感觉有这样的朋友我自己都觉得很有力量。

我只列举了几个最近玩的比较多,技术/艺术相关的朋友,其他朋友各有所长 😜 要是有机会真希望给他们每人都发一条这样的朋友圈 😂😂

SSH to overseas server too slow?

UPDATE (2018-05-03):

I ended up using a BandWagon VPS as high volume traffic proxy with an AEAD encryption method and another faster one as a backup instead of those ssh forwarding stuff, it was unstable and quickly fallen only one month later and never be fast again.


The original post:

Use China mainland cloud server as a hop.

1
ssh -v -N -L 2222:remote.oversea:2222  cloud.mainland

Then:

1
ssh -p 2222 root@localhost

Or even mount as a local folder:

1
sshfs -p 2222 root@localhost:/var/www/html ~/docker/nhweb -oauto_cache,reconnect,defer_permissions,noappledouble,negative_vncache,volname=nhweb

A few sweet ssh features to improve your development experience.

Expose local port to internet with a delegate:

1
2
3
4
5
6
7
# Remote server
sudo vi /etc/ssh/sshd_config
# Append below content:
# GatewayPorts yes

# Local
ssh -v -N -R *:9090:localhost:8080 cloud.mainland

Then heading to http://cloud.mainland:9090 for the results.

Ref:

Enable Google Drive file sync, on a Mac, in China

UPDATE (2018-05-03):

Another option is to use Surge for Mac, it just works, fantastically well!


Use Polipo to convert a socks connection into a http proxy.

1
brew install polipo

Once you had polipo installed, config it to work properly with a polipo.proxy:

1
2
3
4
5
6
7
8
socksParentProxy = "localhost:8089"
socksProxyType = socks5

proxyAddress = "::0" # both IPv4 and IPv6
# allowedClients = 127.0.0.1, 192.168.1.1/255

pmmFirstSize = 16384
pmmSize = 8192

And start polipo.

1
polipo -c ./polipo.config

Then config Web Proxy (HTTP) & Secure Web Proxy (HTTPS) to 127.0.0.1:8123 (Settings -> Network -> Advanced -> Proxies -> Web Proxy (HTTP) -> OK -> Apply).

And you will get your Google Drive sync each changes on the fly.

For convenient, use our bash tool :)

Bash, OS X: pSet - a CLI, help you manage your OSX network settings.

Browse the Internet from China

Updates:

  • 2016-05-27 14:25:20
    • Docker enthusiast? (7MB/s)

Let’s talk about network. Chinese version.

1
ssh -v -N -C -D 8089 -o ServerAliveInterval=60 -o ServerAliveCountMax=2048 rankun.org

One step further.

Before long, your will see a lot of error messages like this:

1
2
3
4
5
6
debug1: Connection to port 8089 forwarding to socks port 0 requested.
debug1: channel 24: new [dynamic-tcpip]
debug1: channel 24: free: dynamic-tcpip, nchannels 35
debug1: Connection to port 8089 forwarding to socks port 0 requested.
debug1: channel 24: new [dynamic-tcpip]
debug1: channel 24: free: dynamic-tcpip, nchannels 35

Which means your just lost the connection to remote server, but you can use autossh to monitor and restart it.

1
autossh -M 2000 -v -N -C -D 8089 -o ServerAliveInterval=60 -o ServerAliveCountMax=2048 rkus.rankun.org

3-tier forwarding

Idea: shadowsocks + port forwarding (ssh tunnel)

1
2
3
4
5
6
7
8
9
10
11
# Start a ssserver in a server outside of China (here: listen on oversea:993)
ssserver -c /path/to/config.json

# Setup a Chinese Cloud server, connect to that ssserver (rkus.json pointing to oversea:993 and listen on cloud:993)
sslocal -c rkus.json

# Finally, connect to cloud server at local (socks on local:8089, local forwarding to cloud:993)
ssh -v -C -N -L 8089:localhost:993 cloud

# If you want the local ssh port forwarding to be auto restart, try autossh
autossh -M 2000 -v -C -N -L 8089:localhost:993 -o ServerAliveInterval=60 -o ServerAliveCountMax=2048 sax.mindfine.com

The result?

Might surprise you :-)

OK, attached the video ;)

Step closer to development:

1
2
3
4
5
6
7
8
9
10
~/docker/nhweb on  master ⌚ 1:08:21
$ git push -uf origin master
Counting objects: 5993, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (5698/5698), done.
Writing objects: 100% (5993/5993), 29.00 MiB | 5.17 MiB/s, done.
Total 5993 (delta 627), reused 1422 (delta 120)
To https://youdar@bitbucket.org/youdarnet/nanhai-wp.git
+ 1b34252...dc448a1 master -> master (forced update)
Branch master set up to track remote branch master from origin.

Push code to Bitbucket (that one, normally 10-20KB/s…) can be up to 6MB/s.

Test environment: 四川省 长城宽带…

Notes:

  • You will need 2 servers or at least ¥50 in your pocket to rent one.
  • Some Internet are bad enough, in such condition, you will need to use a AUTOSSH_POLL env to force autossh check the connection health more frequent. Let’s do a checking every 5 seconds!
    1
    export AUTOSSH_POLL=5 &&  autossh -M 2000 -v -g -C -N -L 8089:localhost:993 -o ServerAliveInterval=60 -o ServerAliveCountMax=2048 hz.youdar.net

Thank you for reading,
Regards,
Youdar

Ultimate docker on OSX collection - mapping docker files to local

Yes, you’ve heard, and have looked around here, here, or maybe here, it’s different to Linux with a simple volume mapped to your local machine. In OSX world, Docker is working in a VirtualBox…

So the file permission got be a problem, you can’t simply change its group or permission, neither. let’s fix it.

The idea is like this:

  • Everything in Docker operate as normal.
  • Create a new docker container that serves a ssh endpoint.
  • Connect to that ssh point, and get access to the file system inside our docker.
  • Get deeper, map the files of docker volume to local file system.

Let’s get started.

Run your containers as normal.

Go to your docker-compose directory, edit the file docker-compose.yml.

1
2
3
4
5
6
7
8
9
sshd:
image: 'krlmlr/debian-ssh'
ports:
- '2222:22'
environment:
- SSH_KEY=ssh-rsa AAAAxxxx_ssh_rsa_pub_content rankun203@gmail.com
volumes_from:
- data
working_dir: /var/www/html

Run it

1
docker-compose up sshd

Mount the docker file system to your local file system:

1
sshfs -p 2222 root@youdar.dev:/var/www/html ~/docker/nhweb -oauto_cache,reconnect,defer_permissions,noappledouble,negative_vncache,volname=nhweb

Notes:

  • volname=nhweb is the directory name of the mount point.
  • ~/docker/nhweb is the mount point.
  • These options are selected for more friendly and less errors.

Now your local folder ~/docker/nhweb are just the same as inside the Docker, but with the right access rights.

Notes:

  • All the file created by you will on the name of root user inside the docker.
  • If you want to use another limited user with sudo permission, use docker.

References: