Something happened to me a couple of weeks ago that gave me pause. I found a bright, green leaf in the middle of my driveway. This might not sound like a notable event, but it's the end of winter, and all the trees around me are barren, awaiting spring to create their buds. Green leaf in winterTo me, this looks like a full, summertime leaf, maybe from a maple tree. It really disoriented me, I had to stop what I was doing and ponder what was going on. Is reality breaking? Is the simulation throwing exceptions? Is the increase in compute usage in our world (due to increases in usage of AI and crypto) causing our host reality to lag out and add sprites in the wrong places? These are the thoughts that flooded my brain.
I've been thinking about that leaf for a while. I finally realized I live in the 21st century and can identity the leaf with my phone, and maybe that would offer a clue to my mystery. And yeah, it's Common Ivy, an evergreen plant. Mystery solved.
While this was a silly, and to me, quite funny little freak out, it has me wondering what things other people have stumbled across in the world that seemed so completely out of place, that it could only be by some non-natural means. I'm sure this happens to every one at some point in their lives.
I just learned about a new open-source tool from Microsoft called MarkItDown.
MarkItDown is a lightweight Python utility for converting various files to Markdown for use with LLMs and related text analysis pipelines.
This seems similar to pandoc, but instead of any being able to take any formatted document type and convert it to any other type, it only outputs to markdown. It can be used as a standalone CLI tool or as a python library.
I'm particularly interested in converting HTML to markdown, so that I can take public documentation online and convert it into a markdown file, which can be more effectively consumed by LLMs. I was playing around with this idea last week during a hackathon, where I wanted to take the query language specification for WIQL that is online and turn it into a compact prompt, so the LLM can more reliably create WIQL queries for me.
To get the HTML for the web page, I use Simon Willison's tool shot-scraper to dump the HTML of the webpage, then pipe it into markitdown
shot-scraper html https://learn.microsoft.com/en-us/azure/devops/boards/queries/wiql-syntax | markitdown > wiql.md
This produces a file called wiql.md (link to gist with unmodified output). It's certainly not perfect, the first 300 lines (out of around 1000), are not related to the documentation, and is just extra HTML that isn't needed. This could probably be mitigated by passing an element selector to shot-scraper, so it doesn't dump the unrelated HTML of the page. But it's not hard to delete those lines manually, and then the final result is pretty good. It looks fairly similar to the original web page.
edit: Here is the one-liner to only dump the relevant part of the page.. You have to wrap the output of shot-scraper in a <html> so markitdown can infer the input type.
echo "<html>$(shot-scraper html https://learn.microsoft.com/en-us/azure/devops/boards/queries/wiql-syntax -s .content)</html>" | markitdown -o wiql.md
Side by side comparisonMarkItDown also supports plugins, so you can extend it to support other file formats. I've only played around with this a little bit, but I think it will be handy to have a quick and easy way to convert more documents to markdown. I'm particularly interested in the pdf and docx input types as well.
A friend sent me this paper regarding a security vulnerability in Chinas Great Firewall. The hack is quite interesting and worth reading about, but this quote is what stuck with me
Wallbleed exemplifies that the harm censorship middleboxes impose on Internet users is even beyond their obvious infringement of freedom of expression. When implemented poorly, it also imposes severe privacy and confidentiality risks to Internet users.
Spent my free time adding image support to the blog today. I'm using Azure Storage accounts, as it's what I know, and I did not feel like diving into S3 buckets right now, although I probably should at some point.
Once again, .NET Aspire proved its usefulness. I kicked off the changes by adding the Azure Storage hosting and client integrations via Aspire, then added a Minimal API POST endpoint so I had a place to upload the images to. GitHub Copilot is very useful in these kinds of tasks. I can write a comment block describing what I want the endpoint to do, all the edge cases I think it should handle, then hit tab and get 90% of what I wanted. I go through and tweak the rest, then I'm testing almost immediately.
The last piece of the puzzle was hooking up my new endpoint to the Trix editor. Fortunately, the Trix dev team provided a nice Javascript file to show how to hook up the event listeners and use an XHR object to post the image back to the server. Considering my lack of experience with Javascript, I was ecstatic to get this working in only a few iterations.
I edited my last post to add a picture from my bike ride, so I know it's working, but I better add one to this post too :)
I broke my wrist back in January when I fell skiing down a run in the early season. Snow conditions were pretty bad and the moguls on the run had really built up. The resort really hadn't been able to groom the runs yet. I did the dumbest move possible and on my first run of the day decided to go down a black diamond. Classic move on my part.
I did a turn to go around a mogul and didn't realize how big a drop off there was. I lost balance and fell down it onto hard packed snow. I broke the fall with my left hand and immediately knew something was wrong, but did not consider that I broke anything... I got back up and continued to ski down. I did fall one other time but it was a typical lose-your-balance type fall. The only issue was getting back up, when I realized I couldn't put any weight onto my left hand. That's when I knew I was cooked. I managed to ski the rest of the way down to the bottom and found the nearest ski patrol. They put my hand in a splint and that's when I saw the massive swelling. I knew my day, if not season, was over.
The last two months have been tough, I had been really looking forward to skiing and getting better this season. I also haven't been able to bike much. But I'm now close to fully recovered, the bone is healed and I got the all-clear to start biking again. Today was my first day back on the bike in two months, my biggest gap in between bike rides in years. What a day it was, a beautiful "fake spring" kind of day.
Here's to looking forward to many more rides soon.
I'm very excited about this release of .NET Aspire. It's an absolute game changer for me, specifically in doing dev work on this blog.
This release has a bunch of great stuff, but I'm actually only interested in the bug fix to this issue. I really love the `dotnet watch run` command, which will perform a hot reload of your application when you make changes to the source files. Before this bug fix, sockets would not be freed up in a timely manner if the process ends, meaning that on reload, it would try to rebind to the existing ports and fail. This meant my workflow was 1. make a change 2. kill the running process 3. wait 10-15 seconds 4. run dotnet run again. Sometimes, I would have to rinse and repeat 3 and 4, if I hadn't waited long enough.
Now, I run `dotnet watch run`, make changes to my code (.cs, .css, and .razor files), and all I have to do to see the new changes is refresh my browser page. It's like magic. My dev loop has gone from ~30 seconds to see the changes reflected in the browser to near-instantly (the time it takes me to press F5 in Safari).
I've been very happy with .NET Aspire so far. But because I deploy to Heroku and not Azure, I can't take advantage of any of the deployment features. A side project I've been thinking about is to build a Heroku deployer based on a .NET aspire deployment manifest.
I guess Amazon is removing the ability to download your ebooks, so my wife asked me to help her download all hers. She has 1,600(ish) books in her library, so manually going through and clicking download on them was an obvious no-go.
We found a neat script (via Reddit) to do the clicking and downloading automatically and have been running it for the last 2 hours. I'm seeing some javascript errors being thrown in the console, but it seems to be working.
I was hoping that this tool would just use my existing Claude plan, but no, of course you actually pay for the tokens it uses. I'm sure this was a very conscious decision, as this tool uses A LOT of tokens right now. I mean, it's breathtaking. The first thing I did was load it up on my link blog codebase, and ran the /init command to generate a readme file for the codebase. I immediately ran the /cost command to see how much that operation costed. Thirty cents. That may not sound like much, but for as small as my codebase is, I was expecting that to only be a few cents. I then gave it a very specific task to add validation to my admin post authoring form. I gave it a fair bit of instruction, as the docs recommends treating the tool like a software engineer that you would delegate a task to. So I gave it hints as to how to find validation rules and all that. I then sent it off. It ran for something like 2 minutes making the change. It prompted me for permission to perform tool actions (e.g. run some bash commands, run build, etc). After a total of 10 minutes of use, I was up to $1.50 in spend, the code did not build, and I realized that the tool call to build the code was broken. Edit: It turns out powershell is not officially supported yet. You must use bash or zsh to launch claude.
I'm still excited about this tool and will keep playing around with it. I'll probably have to reload my anthropic wallet with more credits soon as it is expensive, but so far it seems like a really cool concept, and I hope they keep improving it and driving down the cost.
“Vendoring” software is a technique where you copy the source of another project directly into your own project.
I love this idea, so many times you need to debug into your dependencies to figure out an issue or why things aren't working as you expected.
I mainly develop in C# and .NET and am fortunate to work at Microsoft, so I've done this many times in the past where I grab the source for a dependency, copy it into my code, update the references, and start debugging. This is always a temporary step. I don't actually check in the dependent code. I don't do it that often anymore, as you can enable debug options to disable "just my code" and if your dependencies publish symbols (most do in my experience), you can just F11 and step into library code from Nuget packages.
In the future, I may take up the vendoring approach for web frontends, where licenses allow. If nothing else, it gives me peace of mind that the dependency won't just disappear from the CDN.
You can now find me on Bluesky with an updated handle. My interest in social media has been pretty low in the last few months, but I do find it fun linking myself all over the internet.