nareshkarthigeyan
Transformer Chess
Setting up a Huggingface account for the model (which I should have sooner)
Jun 21, 2026
In the last part, I bragged about how baking my M3 MacBook Air's integrated fanless chips for 6 hours straight gave my transformer engine enough grid-coordinate awareness to draw against a lobotomized Stockfish.... But I didn't tell you about the absolute disaster that happened right before it.
I wanted to compare the current model to the old model of phase 1 trained on ~200 games vs the ones I had now: trained on over 1k games? But I couldn't. Because the file didn't exist.
In my infinite wisdom of hacking together training loops, I had completely forgotten to wire up an automated, robust checkpoint-saving system that backed up to a reliable cloud ledger!!! The toy model's weights were just gone.... vaporized into volatile smoke the second the terminal session ended. Because I saved each iteration to the SAME .pt file, it just overwrote my checkpoint.pt file.
I realized: I need to start saving more, saving often, and saving somewhere that isn't just a fragile local directory.
The 100MB Wall in GitHub
My first instinct was obvious: "I'll just push everything to GitHub." I'm a developer, I know Git, it's comfortable. I'll just write my weights out to a local file called checkpoint.pt, commit it, and ship it.
Easy, right? NAHHHHH.
GitHub immediately slapped me with a catastrophic file size error. Standard Git is built to track elegant lines of text, not a massive 482 megabyte binary slab of PyTorch neural layer weights. Even trying to force it via standard Git Large File Storage (LFS) command line hooks felt like dragging a boulder through molasses—network drops, local indexing hangs, and cryptic terminal rejections.
Worse, I wanted to open-source my code for recruiters and friends to look at. Nobody browsing a clean code repo wants their git clone command to stall for twenty minutes while downloading half a gigabyte of compiled weight vectors. I needed a setup where GitHub could hold my clean, lightweight source code, while a dedicated ML platform managed my heavy asset versioning. That’s when I finally decided to settle on Hugging Face Hub.
Hugging Face: The Blueprint
Hugging Face is basically GitHub, but engineered exclusively for machine learning models. Every repository is an under-the-hood Git LFS container optimized to handle massive weight matrices without breaking a sweat. Much of all opensource models are some or the other way avaiable on HF. So I wondered why not mine, too?
The dream architectural strategy looked like this:
┌───────────────────┐
│ Local Workstation│
└─────────┬─────────┘
│
┌────────────────────────┴────────────────────────┐
▼ ▼
git push github main git push hf main
┌───────────────────────┐ ┌───────────────────────┐
│ GitHub │ │ Hugging Face Hub │
│ (Code Only repo) │ │ (Code + Weights File) │
└───────────────────────┘ └───────────────────────┘
To make this dual-repository workflow seamless without accidentally blowing up my GitHub limits, I had to configure my files to treat the remotes completely differently:
1. The Stealth Mode .gitignore
First, I updated my local .gitignore file to explicitly block standard Git from tracking any heavy model signatures or media playbacks:
# Ignore model weights for standard Git/GitHub pushes
*.pt
*.pth
*.safetensors
*.bin
checkpoint/
outputs/
# Ignore the 135MB of match recording GIFs and MP4s (that I use in these blogs) killing my bandwidth:
*.gif
*.mp4
2. Linking the Split Remotes
Next, I added both destinations into my terminal environment as unique tracking remotes:
git remote add github [https://github.com/nareshkarthigeyan/transformer-chess.git](https://github.com/nareshkarthigeyan/transformer-chess.git)
git remote add hf [https://huggingface.co/nareshkarthigeyan/intuition1](https://huggingface.co/nareshkarthigeyan/intuition1)
Reconciling Divergent Histories
When I tried to push my initial architecture to Hugging Face via the command line (git push -u hf main), the ecosystem blew up in my face with a terrifying error:
! [rejected] main -> main (fetch first)
error: failed to push some refs to '[https://huggingface.co/nareshkarthigeyan/intuition1](https://huggingface.co/nareshkarthigeyan/intuition1)'
Bruh. Because I had created the repository on the Hugging Face web interface first, the platform had automatically initialized its own base README.md and systemic .gitattributes layout. My local machine didn't know these files existed, and Git refused to push because it feared I would overwrite the cloud data.
To reconcile the divergent branches and force them to play nice together, I had to fetch the remote history, allow unrelated timelines to merge, and execute a forced override:
# Pull the remote configurations down and force a historical merge
git pull hf main --allow-unrelated-histories
# Push the synchronized local history up to the hub
git push -u hf main --force
main -> main (forced update). Success! The code files were live on the cloud UI. But when I checked the file list... the weights still weren't there.
The Phantom File Phenomenon
Because checkpoint.pt was grayed out in my VS Code sidebar (due to being gitignored), standard git push completely skipped it. I tried running a generic repository sync script I wrote, and it printed a beautiful success log, but Hugging Face kept stubbornly reporting Files changed (0).
The script was just pushing empty folder metadata. The 393MB file was sitting locally on my MacBook Air, totally ghosted by the automation loop.
To bypass standard Git's configuration limits completely, I abandoned the terminal wrappers and wrote an isolated single-file stream using the official huggingface_hub Python SDK. The API cuts right through the .gitignore boundary to stream files directly over a secure HTTPS upload block.
Here is the (upload_to_hf.py):
import os
from huggingface_hub import HfApi
def push_weights_directly():
api = HfApi()
filename = "checkpoint.pt"
repo_id = "nareshkarthigeyan/intuition1"
# Verify the local file is actually there and has data
if not os.path.exists(filename):
print(f"Error: Can't find '{filename}' locally!")
print("Available files:", os.listdir("."))
return
size_mb = os.path.getsize(filename) / (1024 * 1024)
print(f"Verified local file: {filename} ({size_mb:.2f} MB)")
print(f"Initializing direct API stream to Hugging Face Hub...")
# Force upload via direct SDK request
url = api.upload_file(
path_or_fileobj=filename,
path_in_repo=filename,
repo_id=repo_id,
commit_message="Forcing programmatic model weight checkpoint upload via SDK API"
)
print(f"Upload absolute! View it live here: {url}")
if __name__ == "__main__":
push_weights_directly()
I ran it: python upload_to_hf.py. The terminal paused, the upload pipeline saturated my network bandwidth, and boom - a 393.12 MB verified asset was officially planted on the Hugging Face servers.
What This Means For The Future (Cloud Automation!)
Now that the data sync architecture is completely locked down, I’ve broken out of local compute limitations. I no longer need to worry about losing my progress or melting my laptop's aluminum frame.
The pipeline is perfectly set up for true Cloud Training Automation. By hooking this Hugging Face asset hub up to a serverless compute backend like Modal Labs or remote GPU containers, I can spin up an enterprise-grade Nvidia L4 or A10G GPU in the cloud on-demand (But will I tho??? Hopefully free tiers of GPUs are enough)
The future workflow is:
- Fire up a remote cloud script that pulls my code structure.
- Train the chess transformer over millions of massive public datasets instead of my tiny 500k toy sample.
- Call the
api.upload_file()block inside the remote script to push the shiny newcheckpoint.ptstraight back to Hugging Face automatically.
No more lost data, no more fanless thermal throttling. We are officially ready for the big leagues.
Follow the repository adjustments live on GitHub and check out the active model assets directly on Hugging Face!