We recently sat down with a local Canadian blogger who runs 'Thoughts From My Life" for an interview about Artaygo, our history and process, and the future of AI Art.
Below is a copy of the original article ...
I recently spoke with the owners of Artaygo. The company offers one of a kind, canvas art pieces that are generated using artificial intelligence. It’s a very unique product offering for customers who would like original pieces at more affordable prices. There are multiple themes available and it continues to grow in selection.
The following is a question and answer style interview about the company and how the artwork process works.
Q: Why the name “Artaygo”? Where did it come from?
The choice of the name Artaygo is a mix of branding potential, ease of pronunciation, availability, and linkage to the content. Artaygo is primarily a play on words combining “Art” and “AI” in a way that is easy to pronounce. We started with a brainstorming session writing down a few dozen words that might be relevant – words like AI, art, artificial, machine, gallery, generative and so on, and from there played around with variations of them.
There are quite a few sites that help with generating business names which were helpful at getting ideas for prefixes and suffixes. But at the end of the day it came down to spending a weekend with Excel open, and generating all kinds of prefixed and suffixed versions and picking favorites.
Q: What is your interest in art historically?
Although our personal background is pretty focused on finance & capital markets, we’ve done a lot of work in the past in photoshop, graphic design work and just experimenting with other visual arts packages, including 3DSMax, Maya and Blender.
As a homeowner it also became more topical to think about home decorating, framed canvas and other wall art. We have a mix of art types at home, and I’ve often wondered if there was an interesting middle ground between basic printed wall art and more expensive original oil paintings.
Q: When did you get the idea to create this site and what were your motivations?
Our introduction to AI / Deep Learning / Machine learning was in late 2017 and we became very familiar with a lot of the different algorithms that existed back then. Initially it was more out of general curiosity as a bit of a sci-fi nerd. The general concept of AI seemed so far-out versus anything we had seen previously. Just the idea that you can create a generalized architecture in which a computer can ‘think’ for itself and solve a variety of problems without much human input is completely amazing. And having watched this space now for several years, it’s incredible to see the innovation that’s been happening – much of it just trial and error because there isn’t a lot of academic theory behind it yet.
In terms of motivations, we really like art, graphic design, and AI personally, plus with a business background, what better way to put that together, but by putting computers to work at creating art? I also think it’s a great service that benefits people, because while everyone aspires to have their own original hand-painted artwork, realistically you’re not going to furnish your first home with multiples pieces of art at $2,000-$10,000+ each. So for folks getting into their first home – there wasn’t really a way to decorate a house with something completely original, one-of-a-kind, with high quality materials and at a reasonable price. So it’s great to be a part of offering a solution to that problem.
Q: I would love some details on how this works if it doesn’t give away any secrets. What is the algorithm behind the art generation?
The core principal is something called a generative adversarial network (GAN). So you actually create two neural networks which fight against each other. Initially, the generator network really just makes colorful noise – just random guesses for every pixel in an image, while the discriminator network looks at the fake noisy pictures, and looks at real pictures, and assigns a probability of the images being real or fake. Initially, neither the generator nor the discriminator has any idea what it’s doing and is pretty inaccurate. Then the results are scored, and the generator is rewarded for fooling the discriminator, and the discriminator is rewarded for correctly determining if an image is real or fake. So this process then repeats and the act of rewarding the networks causes them to improve over time.
Q: If it is being trained, how does it find art to train itself with?
We need to provide the GAN with copies of what it’s trying to emulate – and the broader the subject matter we provide, the broader range of outputs we produce. So if you want the GAN to produce portraits of people – it won’t learn to do that if you provide it pictures of landscapes. On the other hand, if you provide it pictures of all art ever produced, the model will suffer from numerous challenges including taking a very long time to train, having a tendency to ‘cheat’ by just drawing one type of art successfully and ignoring other types, among other issues. So we have to act as a teacher who guides the student to learn a reasonably curated set of examples.
However once you have a trained artist, you can ‘transfer’ its learnings and begin learning a different subject matter more quickly. So if you had taught an AI artist landscape painting, you can take a copy of that artist and tell it to train on portraits, and it would adapt to be a portrait painter much more quickly than if it started from scratch.
Q: What computer hardware is being utilized to create the artwork? Is it local or cloud hosted?
All machine learning these days is being done on graphics cards, primarily made by NVIDIA, because they’re really effective at the type of math required for training neural nets. Your typical Intel or AMD CPU is extremely fast at doing single tasks (or 8-16 tasks if we’re talking typical CPUs these days with multiple cores). Compare that to graphics cards which are designed exclusively to do a smaller amount of math across millions of pixels in unison, and repeated at least 60 times per second. That graphic card structure is much more aligned to how neural networks behave, with millions of connections that need to be updated simultaneously during the training process. So we’ve been fortunate enough to get a very good NVIDIA card and run a lot of work locally, but it’s also required a lot of effort in managing memory and using a few hacks to live within memory and time constraints.
We have been thinking about migrating to Amazon AWS for a couple reasons. First, running it locally means it fully occupies local machine – and especially as we approach the summer – it starts to heat up a room pretty fast! Also by going on AWS it would let us get quicker turnaround time on experimentation and simply not need to think as much about the trade-offs of complexity and training speed as much.
Q: How long does it take to train and how many pieces of art will it train with? This probably translates into an average time per piece of training art?
Training and art generation are similar to what you might expect from a human artist, just accelerated. You might think about a human artist training for years to become a master, at which point each work of art is quite good, and then takes an order of magnitude less time to paint each new masterpiece than the cumulative training time up to that point. Similarly AI artists have a large up-front training time that can take days or weeks, at which point the artist is quite good and can produce new art relatively quickly. In terms of training materials, you can theoretically achieve ‘interesting’ results when you train on a very small dataset of dozens of images – sometimes an output with unusual visual artifacts or ‘stuttering’ can make for an appealing effect. But generally you want at least a dataset in the ‘low thousands’ of images and at the extreme upper end, maybe 50 to 100 thousand images. But that upper limit keeps coming down as new techniques and architectures develop.
One of the recent discoveries by researchers is that if you use aggressive augmentation of the images, it works almost as good as having extra original images. So for example you can flip an image horizontally, zoom the image, slightly rotate or tilt the image, or adjust its color saturation slightly – and doing these small adjustments greatly improves the ability of GANs to produce a variety of output without needing an abundance of input materials to learn from.
Q: What were some of the biggest hurdles in developing it?
There are tons of hurdles that we didn’t expect! Whether coding, getting good data, seeing unexpected results, and even getting online with print suppliers. One issue that’s a bit more tech-oriented is simply the trade-off between resolution and compute time. So running a basic GAN at 256×256 pixels for 12 hours might produce great results. But to generate that at 512×512 – twice the width and height actually takes 4x longer to train – 48 hours. Then going to higher resolutions takes even more time – so to get the quality of that original 256×256 image at 2048×2048 would take 768 hours (about 32 days) – or so you would think.
You can imagine the frustration when you think everything is going to work and 8 days into training you check the results and every image looks identical – a phenomenon called GAN collapse. That happens when the networks get ‘stuck’ and only generate a single type of content. Working around these kinds of issues where you spend a lot of time waiting before knowing if you’ll have success or not – that’s probably the hardest hurdle.
Q: Do you have any noteworthy improvements you want to make to the algorithm?
There isn’t anything architecture wise that is mission critical, so any improvements are very ‘on the margins’ and a bit academic. But SOTA (State of the Art) is always changing, so we could completely overhaul the architecture if there was a compelling enough reason. There are many public GAN codebases which occasionally implement new techniques, so we kind of watch to see what is new & interesting there to see if we can implement similar concepts. Most GANs today rely on an image recognition architecture called convolution, which allows the AI to detect primitive shapes and stack them into more complicated features. So what gets us the most excited is stepping outside of convolution and some alternate architectures called Transformers (also known as Attention) and Diffusion Models, but they aren’t yet surpassing convolutional approaches just yet. So the old adage holds true, if it ain’t broke, don’t fix it!
Q: How long did it take you to develop the app and output your first “usable” piece of artwork?
It took a few months to get things up and running where content looked decent. So much code and research papers are open sourced, so you can get basic items going quickly, but implementing some of the newer features like style mapping and augmentation takes more work. But the very first piece of content ended up not looking as great as we hoped when printed to canvas, so we went back to the drawing board and found solutions to get final output that looked good. I still have a copy of that early framed canvas hanging in my office as a reminder of where things started.
Q: Of the artwork generated, does a human look through for ones that look interesting? What percentage are rejected? What are the reasons for rejection? General appearance does not look good, doesn’t fit the theme, too similar to other art pieces, etc?
Yes, we look through and are rejecting anything that looks subjectively ‘bad’. It’s maybe 10% or so that get rejected. So for example in the training data you might have a black and white image, which we thought was useful for capturing the foliage of a different tree style. But some small percentage of output images may appear partially black & white, and just don’t align to the collection very well. Sometimes the unusual artifacts that get generated are actually really cool (see a blog post about the hidden secrets of AI Art here) but sometimes they’re more obviously ‘mistakes’. The longer we train the models, the lower the rejection rate. But it’s sometimes easier and more enjoyable to just take a look at the artwork.
Q: Does a person create the titles for each art piece?
It’s a mix – initially we created most of the titles by human means, and pulled in the expert assistance of my daughter! In one of our recent collections, called the Alleys of Old Europe, we made a simple GAN to generate names of European cities and used that to give every image a name belonging to a totally fictional city. So they have names like Midleshannon, Prejek, Afragliano for example, which to the best of our knowledge are not actual places, but sound like they’d fit right into someplace in England, Croatia and Italy. Certainly as we move forward, we plan to implement other systems that name the works as well.
Q: Could a user ever “influence” the algorithm to generate more custom art?
In theory yes, and there are a couple of ways to achieve that. One approach we’ve seen in other GAN repositories is to label the training data, so while the GAN is trained on a large body of work, you have more control in terms of generating content which is true to the label. So you could train it on a collection of cat, dog and horse images, but then only ask to have horses generated. Another approach is using a technique called neural style transfer, where you can transfer the visual style of one image onto another – so if you can project the style of Van Gogh’s starry night onto a picture of your backyard. There are a few other techniques that can be adapted from of other GANs as well.
Q: You say it is one-of-a-kind, how do you guarantee that? The art is taken down and not available for purchase once someone does buy it?
That’s right – although the buyer has full control over what size they want the art produced at, once a single print is purchased, it is no longer available for sale again at any size. As for guaranteeing uniqueness – Depending on the model we’re using, the initial ‘seed’ of an image is based on a 256 to 1,024 digit random code, so the odds of seeing an identical work are astronomically small (i.e. you might say there are 10^256 to 10^1024 possible inputs compared to 10^80 atoms in the universe). Furthermore, as we train and update the models over time, the same random digit sequence won’t produce quite the same output either, even if you re-used the same seed.
Q: Does the purchaser also get the digital file?
At this moment, no, but we might consider expanding that offering. There is also a lot of excitement about NFTs – non-fungible-tokens, where buyers receive a digital copy of the art along with blockchain certified proof of ownership. We’re not sure if our target market tilts that direction – physical ownership has its perks! But it’s certainly something we’d be open to explore.
Q: Do you retain the original file as well in case a reprint is needed?
We don’t. We do retain the files during the refund period in the event a customer doesn’t want to keep their piece, and also to ensure that if the art is damaged during delivery we can offer a replacement. However after the return window closes, we delete the high-res files and just keep the low resolution versions for marketing purposes, and to have a visual record of what’s been sold. While it might be nice to have a ‘backup’ with us, we felt that customers having confidence in the uniqueness of their product was preferred.
Q: What other themes can we expect upcoming?
Within the broader impressionism school of art, we’re looking at still life, portraits, and potentially doing more narrow themes – impressionist “fields” versus impressionist “mountains” or “cityscapes”. We also have quite a few ideas in photorealism, but those require a bit more field work to gather original private content to implement. I think there is a lot of potential in combining styles and content that have never co-existed – just off the top of my head, maybe doing a series of sportscars with Japanese sumi-e style brush strokes.
Q: Any plans on other product offerings? Whether it is different sizings or other mediums.
In terms of sizing we can technically offer any size – so while we have a pretty good range on the website, in theory a customer can reach out and have something different produced. Moving to different aspect ratios is something that’s in the works as well. There are some upper limits on size with current technology – so if you’re looking for 300 DPI images at 36 inches x 36 inches you’re getting images at 10,800 pixels, and even small increases start to exponentially increase computing requirements. We’ve also considered alternate finishing options, like acrylics, but for now we’re keeping the offerings relatively simple.