Importing Products from an API into Umbraco
# help-with-umbraco
d
Hey Umbraco Community, I am currently working on importing products into Umbraco. Essentialy, I need to make an API call (which returns JSON data) to fetch some base information for products (such as SKU, Name, Price). Once I've obtained the data, I want to automatically generate Product nodes under my general container node for Products, and populate some of their properties with the data that I retrieve from the API. Additionally, I also want to update Products at scheduled intervals (for example, weekly). What I really wanted to ask you is what the best approach for this would be? What types of classes should I create to streamline this workflow? I found this article which seems to address my requirements, however, given that the article is 6 years old, I wanted to check with you in the community if there is a more modern way of doing this. Thanks in advance! https://skrift.io/issues/importing-external-data-as-content-in-umbraco/
p
Hey @dalle3430 It's great that you're considering importing programmatically, it's a really valuable technique if you have loads of items, especially if they exist in another system already or in JSON. The blog is great, and I'm pretty certain it's all still valid. The main bit you've need is the
ContentService
for creating the nodes and populating the data. https://docs.umbraco.com/umbraco-cms/reference/management/services/contentservice/create-content-programmatically How much effort you put into 'streamlining the workflow' will really depend on whether this a 'one off' initial import, of if you plan to use this as an import feature going forward. If it's one hit, then focus on just getting the job done with a disposable API endpoint - ideally in a form where you can easily test. If this this is an ongoing feature, then maybe consider creating a 'Content App' or similar that can allow the import to be managed via the backoffice. https://docs.umbraco.com/umbraco-cms/extending/content-apps
Actually, I can see you've said you want to "update Products at scheduled intervals". It maybe worth thinking whether you want that to be a Scheduled Task (or Azure Function... or some other web based trigger) that hits an API at a given time (as opposed to a Content App) Either way, if it's not a one off, make sure you make this more rock solid than a 'one off' import. I'd recommend abstracting the imported data file/http request at the earliest point, then getting it into some sort of intermediary
ProductImportService
that works with the Umbraco
IContentService
. Then try and get some unit tests around the mapping so you can test the given scenarios and possible fail points.
r
Everything Adam(@User) said is spot on.. getting and "cleaning data" and ensuring it right is going to be your key issue.. also I would put a tonne of exception reporting so you can see where failures and therefore issues with the data are . it will save you time in the long run as there will inevitable be issues where a number is in a text field etc
d
Hi @Prenders Thanks for the quick and informative reply! To clarify some details, I want to import the products to get the creation of products inside of Umbraco going. Then I will have an admin-only section/dashboard within the backoffice. This section will include a setting for the frequency of updating products, presented as a dropdown with options such as daily, weekly, and monthly. If the option of updating frequency has been set to weekly, I want to automatically trigger the function of updating the products once a week. However, I also want the flexibility to manually update/sync the products by clicking a button in that same dashboard/section.
r
a small db table which allows you to store the script and some time variables that some other mechanism can read to fire of your api sounds like the way the go.. whether its a hang-fire job, a scheduled task or azure function depends on your set up and so forth https://docs.hangfire.io/en/latest/background-methods/calling-methods-in-background.html https://learn.microsoft.com/en-us/answers/questions/695987/how-to-call-a-rest-api-from-an-azure-function cc // @Prenders
p
Yes! Was just going to suggest Hangfire. The approach will most likely need a combination of API and Content App (or custom section for the manual importing). If you have it so both use a common
ProductImportService
then you can share the core logic across both. The arguably trickier bit is having your scheduled import time variable. With my Product Owner hat on, I'd be asking if that is essential given that they will have the option for manual importing. If it is essential, then messaging & background service tools like Hangfire as @Ravi suggested could be useful. Also one called Wolverine. https://www.hangfire.io/ https://blog.jetbrains.com/dotnet/2023/05/09/dotnet-background-services/ https://wolverine.netlify.app/
Just be aware... adding in such tools does make this whole thing exponentially more complex.
d
Alright, this has definitely cleared some things up but also raised new questions regarding how complex we want to make this solution. Thank you both for your quick and informative replies! 🙂
p
No probs... good luck with it @dalle3430 . I'm always in favour with trying to simplify the approach where ever possible before committing to code. Let me know if you hit any stumbling blocks. Always happy to help 🙂
s
I'd check how often these products change. If you're planning to store very volatile data then it might be best to have a custom table and custom route hijhacking controller that serves them - rather than trying to store these as Umbraco nodes. If you need to enhance these products with Umbraco content of course - then you will need to create nodes. https://docs.umbraco.com/umbraco-cms/reference/routing/custom-controllers
p
I was thinking about this while on an evening run - as one does 🤓 Another potential suggestion to help with the variable import schedule could be to have a cron job for each (daily, weekly, monthly) which all run regardless and the first thing each does is check if it should/can import. That would avoid having to implement something more complex with Hangfire/Wolverine... as much as it would be ace to play around with that tech, your team might not like the estimate 😅 @SiempreSteve 's suggestion is also much better if you don't need to adorn them with more bespoke Umbraco properties.
d
Well @Prenders, running does clear the mind! Unfortunately, I do need to adorn them with more Umbraco properties considering that the API only provides a few properties that serves as base information. So it seems I will need to create nodes... I'll try to see what I can do with recurring background jobs using IRecurringBackgroundJobs, what's your opinion on them regarding this matter? Otherwise I'll have a look at cron jobs and/or Hangfire/Wolverine.
d
Hi @dalle3430 Just my two cents on this one. I've written a lot of importers in combination with Umbraco. 1. I usually create a new class library for all importing logic so i can easily create unit/integration tests (more on this later) 2. About making http-requests, make sure you use the HttpClient Factory (https://learn.microsoft.com/en-us/dotnet/architecture/microservices/implement-resilient-applications/use-httpclientfactory-to-implement-resilient-http-requests) this also has options for resilience 3. We usually use hangfire for scheduling (do not forget to use the Hangfire Console logger for friendly logging): https://github.com/pieceofsummer/Hangfire.Console 4. If you do not have any experience with hangfire you also might want to try Coravel (https://github.com/jamesmh/coravel) 5.The most annoying issue i always run into is the testing part. As far as i know Umbraco has no easy option for integration testing against a existing database. I recently opened a issue for this: https://github.com/umbraco/UmbracoDocs/issues/5834#issuecomment-1931943039 It might be there is a way but i never got it to work with an existing database and the documentation is not really clear on that subject (in my opinion). Happy to hear if someone got this working!? 6. I agree with @Prenders that for the mapping part you should create unit tests. I always like to use automapper for this. 7. Definitely use the IContentService for updating the content. Dont forget to check if a content item is already published and act on that. I once made the error to just always save the content which resulted in hundreds of history items (as my scheduled task ran serveral times per day) haha! Which eventually resulted in a very slow umbraco backoffice. 8. Keep in mind that the ContentService does not use caching (as the umbraco helper does), so play smart with that!
d
Hey again, just a quick question. If I access a node in the following way:
Copy code
var container = _publishedContentQuery.Content(containerNodeId);
Is it then possible to access unpublished nodes in the following way?
Copy code
var existingProduct = container.Children.FirstOrDefault(x => x.Value<string>("productSKU") == externalProduct.Sku);
What I'm essentially doing is iterating over each externalProduct in externalProducts (which is the result of the products I fetch from my API) and checking if there is a child node of my Products container node that has the same SKU as the externalProduct. In that case, the product already exists. However, existingProduct is always null. I have a mild suspicion that it might be because the Product nodes are not published, but merely saved. Is there anyway to perform such a query with unpublished nodes? And if that's not the problem, what do you think it might be?
a
As described in my article, it may be better to use the content service instead. Then you also already have the
IContent
in case you need to update or delete any products.
I just skimmed through the article, and it's looks close to the approach I'm still using. Although I'm usually using Umbraco's hosted services instead of Hangfire now
d
Thanks for the quick reply @Anders Bjerner ! In that case, I’ll get the Products container by using ContentService.GetById(containerNodeId), how would I then query its children in a similar way that I did above? I played around with GetPagedChildren() but didn’t manage to get anything out of it, maybe I misused the function.
a
IIRC the first page for
GetPagedChildren
is 0, not 1 (or it's the other way around)
I have something like this for a job importer:
Copy code
csharp
Dictionary<int, IContent> existing = new();

foreach (IContent content in _contentService.GetPagedChildren(parent.Id, 0, int.MaxValue, out long _)) {

    if (content.ContentType.Alias != feed.ContentTypeAlias) continue;

    int jobId = content.GetValue<int>(settings.IdProperty.Alias);
    if (jobId == 0) continue;

    existing[jobId] = content;

}
The dictionary allows fast lookups, so it's great for mapping IDs/SKUs to
IContent
instances. Here I've used
existing[jobId] = content;
, so if conflicts are handled somewhat at random. Ideally there shouldn't be jobs/products/whatever with the same ID/SKU, but by experience, I can tell that this does happen. Mostly because someone messed up the source feed 😁 I'm sometimes using
existing.Add(jobId, content);
instead, which will trigger an exception if two content items have the same ID. Then it might be easier to see when something is wrong.
d
Alright, thanks a lot. I’ll try this out when I get back to work tomorrow and I’ll let you know how it goes 🙂
d
@dalle3430 I do not have discord nitro to i have to chop my code example, but this is basically the way i do it:
Copy code
csharp
public void UpdateProductGroup(IPublishedContent root, ExternalProductGroup externalProductGroup, List<ExternalProduct> externalProducts, List<Language> umbracoLanguages)
{
    const string docTypeProduct = "Product";
    const string docTypeProductGroup = "productGroup";

    var productGroup = _contentService.
                GetPagedDescendants(root.Id, 0, int.MaxValue, out long totalrecordsRoot)
                .FirstOrDefault(s => s.GetValue<string>("productGroupSKU") == externalProductGroup.Code 
                                && s.ContentType.Alias == docTypeProductGroup);

List<string> languages = umbracoLanguages?.Select(s => s.IsoCode)?.ToList();

 if (productGroup != null)
    {
        foreach (var item in externalProducts)
        {
            bool shouldUpdate = false;
            var productPage = _contentService.
                GetPagedChildren(productGroup.Id, 0, int.MaxValue, out long totalrecords)
                .FirstOrDefault(s => s.GetValue<string>("productSKU") == item.Code);

            IContent productContentPage = null;

            if (productPage == null)
            {
                productContentPage = _contentService.Create(item.Name, productGroup.Id, docTypeProduct);

                productContentPage.CreateDate = DateTime.Now;
            }
            else
            {
                productContentPage = _contentService.GetById(productPage.Id);
            }
Copy code
csharp
foreach (var lang in languages)
            {
                if (productContentPage.GetCultureName(lang) != item.Name)
                {
                    productContentPage.SetCultureName(item.Name, lang);
                    shouldUpdate = true;
                }

                if (String.IsNullOrEmpty(productContentPage.GetValue<string>("productDescription", culture: lang)) && 
                    !String.IsNullOrEmpty(item.Type_NL) && (lang.ToLower() == "nl-nl" || lang.ToLower() == "nl-be"))
                {
                    productContentPage.SetValue("productDescription", item.Type_NL, lang);
                    shouldUpdate = true;
                }
            }

            if (productContentPage.GetValue<string>("productSKU") != item.Code)
            {
                productContentPage.SetValue("productSKU", item.Code);
                shouldUpdate = true;
            }
Copy code
csharp
try
            {
                // Always save and publish newly pages in all cultures
                if (productContentPage.Id == 0)
                {
                    _contentService.SaveAndPublish(productContentPage, raiseEvents: false);
                }
                else
                {
                    if (shouldUpdate)
                    {
                        _contentService.Save(productContentPage, raiseEvents: false);

                        // For existing ones check if the need to be published
                        foreach (var lang in languages)
                        {
                            if (productContentPage.IsCulturePublished(lang))
                            {
                                _contentService.SaveAndPublish(productContentPage, culture: lang, raiseEvents: false);
                            }
                        }
                    }
                }
            }
            catch (Exception ex)
            {
                _logger?.Error<AddonImporter>(ex, $"An error occured while updating a product-group");
            }
        }
    }
}
Note that i also have to create the content in multiple languages so this make the code slightly more difficult. And you might want to restructure the code for better testability and readability
Also note that this is using the Umbraco 8 api's
a
@Domitnator I'd imagine doing this for each product in the external source might cause the import to take longer:
Copy code
csharp
var productPage = _contentService.
                GetPagedChildren(productGroup.Id, 0, 1, out long totalrecords)
                .FirstOrDefault(s => s.GetValue<string>("productSKU") == item.Code);
Also, is the limit set to
1
? Does it find the correct product then?
d
There are certainly some things that can be improved in this code when it comes to performance. I just wanted to show a basic outline! It would be better to write one more complex query to fetch all the content items you need at once (instead of retrieving 1 in a loop) About the pageSize/Limit: Funny, in my original code I also use int.MaxValue...but when reviewing the code before i wanted to post it here i figured this could be 1 but you have me doubting! haha.
@Anders Bjerner I just checked...it should be int.MaxValue indeed! I just updated the example
a
I had a hunch the limit was wrong in the example, but had to look an extra time or two 😁
Anyways, the most recent import I did is here: https://github.com/limbo-works/Limbo.Umbraco.Signatur/blob/v10/main/src/Limbo.Umbraco.Signatur/Services/SignaturJobsService.cs It's a package although there probably aren't many outside of our company that is going to use it. But we may be using it for multiple clients, so it makes sense to create as a package. It's also the first iteration of the package, so probably also things that could be improved 😮 It should also still follow the principles described in my article.
d
Quick update @Anders Bjerner: Indeed, contentService rather than publishedContent was the right approach. By first fetching the children of the container node:
Copy code
var containerChildren = _contentService.GetPagedChildren(containerNodeId, 0, int.MaxValue, out long _);
And then when iterating over each externalProduct from the API call, I manage to find existing products like this:
Copy code
var existingProduct = containerChildren.FirstOrDefault(x => x.GetValue<string>("productSKU") == externalProduct.Sku);
a
The last line can also be optimized. Essentially for each product you're iterating the list of existing products. This is what I explained with O(N * M) in my article. Building the dictionary first, and then using that dictionary for lookups, it should be something close to O(N + M) instead. If you have lots of products, this could optimize performance a lot. If you you don't have that many products, it may not matter that much. Reason: - List or T[] or similar data structure ➡ lookup is O(N) as worst case is you would have to iterate through the entire list - Dictionary ➡ lookup is O(1) 😍
d
Alright, I'll give the Dictionary a closer look! Performance optimization is a good idea 🙂
2 Views