Ideation:
In lieu of thesis deadlines and the open ended nature of the final project I just did my best to have fun with this project and incorporate what made sense and just work organically. While there are many fascinating facets to generative art with AI truly there just wasn't enough time this semester to study all of it and get into the nitty gritty. For me, what made the most sense was to integrate KNN and machine learning to do gesture tracking and for the creation of HCI via kinetic movement. It's not super novel or the most flashy but it makes the most sense with the type of work that I do/am comfortable with.
I knew that I wanted to incorporate pixel manipulation and audio but did not want to generatively make audio or music, I'm just not familiar enough with audio to get a worthwhile end product, instead I could do pixel manipulation via / audio reactivity. This isn't a new concept for me, I had done a similar project in one of our earlier classes but this would be the first time I incorporated gestural interactions and live video feed textures.
Tech Stack:
- Three.js
- ML5
- KNN
- Web Audio API
- WebGL
Process:
The very first task for me was to find a way to integrate live webcam as a video texture, Three.js has a built in video texture material which at first I thought would be useful. After playing around with it for a while I found it to be extremely limiting as it really is made for flat planes and projections, as far as I'm aware there aren't ways to actually access the pixel data live / make changes. After scrapping that idea I explored other ways of using webcam as a texture and found some helpful examples on the web where the videostream is passed in as a texture to shaders.
After I got the video texture to work I began adding some basic noise distortions based off of Ashima Art's algorithms just to get a rough pixel manipulation effect going. I thought I would have time to re-visit this and play with shaders more but ended up not getting around to it. I quickly found that RGB and noise distortions to a cube don't quite look as good as they have very few vertices to move around. Giving up the idea of clean representation view of the webcam image I opted for a sphere / icosahedron which yielded far more vertices to mold and shape.
The next step in this project would be to add audio, I had the option of using tone.js but I'm not super familiar with it and ended up cannibalizing some of my own code from my past audio visualizer and integrated it in the new object oriented model that I was working with. I wasn't interested in creating a full application, more so a funky weird experience so I didn't bother with drag drop features or options to switch songs, if you don't like the song sucks for you.
From there I used the built in three.js audio analyser to get frequency and a custom algorithm that I had written last year to get average frequency / amplitude and passed the data into my shaders every frame during the render loop. Realizing I had yet to add any machine learning into my project I scanned my old projects where I used KNN and handtrack.js. I was interested in trying to use teachable machines and image-classification so I trained a basic model on teachable machines and uploaded to the cloud and tried to import it into my project. I got some errors about layer density and after some time on stack over flow I realized I was out of my depth here, I didn't understand the core tensorflow.js enough to really understand what was going wrong so I ended up ditching it for KNN. I was a little scared at first because I had my entire project set up as a single object and I was getting worried about length/spaghetti logic that would occur.
For the sake of not destroying my project I went with an intentionally poor but hacky solution, I would include the KNN scripts outside my object as normal js functions. Instead of trying to interface with KNN and machine learning directly with the three.js sketch I would write to DOM and retrieve information from the DOM. Very hacky I know. I essentially wrote to innerHTML from the KNN and then in my renderloop looked for updates in the innerHTML and parsed the values as floats to pass to my shaders.


The rest was just tweaking with different visuals and parameters until I got something remotely interesting to see / experience, when you hold your hands up in the sketch visuals should change!
UPDATES UPDATES UPDATES:
JUST KIDDING I DECIDED LAST MINUTE TO FILTER THE AUDIO AND WARP IT BASED OFF LABEL TO GIVE MORE PIZZAZ I CHANGED THINGS BASICALLY CONDITIONALLY WARPING THE AUDIO WITH BIQUAD FILTER AND ADDING DISTORTION TO THE OTHER SPHERE DEPENDING ON LABEL :)
Live Site:
References:
https://sbcode.net/threejs/webcam/