Backing up your knowledge base with the Help Center API

Have more questions? Submit a request

35 Comments

  • Russur
    Comment actions Permalink

    Is there any way, via perhaps the API, to also get the images downloaded as well?

    0
  • Charles Nadeau
    Comment actions Permalink

    Hi Russur,

    There's no image API, but once you've downloaded the articles on your system, a number of Python libraries and techniques can let you read the image URLs in the files and make requests to download them. I like BeautifulSoup for parsing HTML, and Requests to make HTTP requests. You can do a Google search for other options.

    Me, I'd write a script that opened each file and used BeautifulSoup to get the image urls:

    tree = BeautifulSoup('html_source')  
    
    images = tree.find_all('img')

    Then I'd grab the src attribute in each img tag and use it to make a request for the image file from the server using the Requests library:

    for image in images:  
    
       src = image['src']
    if src[:4] != 'http': continue
        response = session.get(src, stream=True)

    Note: I'm checking to make sure the first 4 characters start with http so it's a valid request url.

    At this point, this image is in memory on my system. Next, I'd grab the filename from the src attribute and write it to file:

    file_name = src.split('/')[-1]  
    
    with open(os.path.join(file_path, file_name), mode='wb') as f:
        for chunk in response.iter_content():
        f.write(chunk)

    One thing to be careful about: Most web servers only allow browsers to download images. So I'd set a header so my request looked like it's coming from a browser:

    session = requests.Session()  
    
    session.auth = ('your_email', 'your_pwd')
    session.headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 5.1; rv:31.0) Gecko/20100101 Firefox/31.0'}

    Hope this helps.

     

    0
  • Russur
    Comment actions Permalink

    That did great! Here is my hacked code that i added. (This goes about the line:

            endpoint = data['next_page']

    # begin included code to search and pull out images

    tree=BeautifulSoup(article['body'], "html.parser")
    images = tree.find_all('img')

    for image in images:
    src = image['src']
    if src[:4] != 'http': continue
    response = session.get(src, stream=True)

    file_name = src.split('/')[-1]
    image_dir = src.split('/')[-2]
    file_name = str(article['id']) + '_' + image_dir + '_' + file_name
    with open(os.path.join(backup_path, file_name), mode='wb') as f:
    for chunk in response.iter_content():
    f.write(chunk)

    # End of included code

     

    Also added the 

    from bs4 import BeautifulSoup

    towards the top also.

    This will work to get the graphic as well as the directory name that Zendesk created for the image. I will probably update this to get the Section ID Name, and maybe recreate the directory structure. Thanks!

     

    0
  • David Conway
    Comment actions Permalink

    Every time I try to run the make_backup.py python3 script, I keep getting the following:

    Failed to retrieve articles with error 401

     

    0
  • Charles Nadeau
    Comment actions Permalink

    Hi David, a 401 points to a problem with the authentication credentials. Can you double-check the Zendesk email and password you entered on line 7 under "Code complete" above?

    The other thing to check is to see if your Zendesk is configured to allow passwords for API requests. In the admin interface, click the Admin button (gear icon) in the lower-left, then Channels > API. At the bottom of the page there should be a checkbox to enable password access.

     

    0
  • David Conway
    Comment actions Permalink

    The enable password access checkbox is checked.  My credentials are fine as I can get into Zendesk and manage articles.  API is not liking something.  We use gmail accounts for authentication into zendesk.  Could this be the problem?

    0
  • Charles Nadeau
    Comment actions Permalink

    That could be the problem. Because you're authenticated with a Google password, your Zendesk profile might not have a Zendesk password. If you're an admin, you should be able to add one yourself. See Resetting user passwords. Use the Set option instead of the Reset one.

    0
  • David Conway
    Comment actions Permalink

    That did the trick.  Everything seems to be working fine now.  Thanks.

    0
  • David Conway
    Comment actions Permalink

    When I run the make_backup.py program, I get a bunch of html files which are the articles. I open one of the html filea and I see the content, including any attachments. How do I import these html file(s) when needed to do so? Is there a way via curl command that will allow it. I have all the information I need like title, section id the article belongs to, the actual html file to be imported, the article is, and its position. So there is a scripted way to get the articles out, but is there a scripted way to get them back in?

    0
  • Charles Nadeau
    Comment actions Permalink

     

    You'll need a script to parse the content of each HMTL file, convert it to JSON, and post the data to HC. cURL is probably not the most efficient tool for this if you have more than a handful of articles.

    In Python, you can use the BeautifulSoup library to parse the content. One technique is described in Add the article translations, which is part of a larger tutorial on publishing localized articles on Help Center.

     

    0
  • Charles Nadeau
    Comment actions Permalink

    @Roland, @Adam, @David - 

    I added a section to the tutorial on restoring backed up articles. See Restoring articles above. Because it will necessarily overwrite Help Center content, please use it at your own risk.

     

    0
  • Tami Settergren
    Comment actions Permalink

    Great article! As a tech writer, not a programmer, I really appreciate having such a useful script explained so clearly. It'll help me automate the updating of my translated articles.

    I've been trying to modify the restore script to create new articles in sections, but haven't made it work yet. I think the problem is with the endpoint, which I don't know how to modify to specify a section. Here's what I have so far (which gives an error 404):

    endpoint = '/api/v2/help_center/sections/{section}/articles/{id}/translations/{loc}.json'.format(id=article, section=section, loc=language.lower())

    Where {section} is defined as section = '115001738987' in the settings. I'm guessing about syntax of the endpoint...what am I doing wrong?

    An earlier comment refers to this article...:

    https://support.zendesk.com/hc/en-us/articles/203691406-Automating-your-first-localization-handoff-Help-Center-#post

    ...which might have my solution, but for some reason I'm not authorized to access that page.

     

    0
  • Charles Nadeau
    Comment actions Permalink

    Thanks @Tami.

    Since you're creating articles, you'll want to use the Create Article endpoint:

    https://developer.zendesk.com/rest_api/docs/help_center/articles#create-article

    The translations endpoint in the restore script is generally used to update the content of an existing article in a specific language, or add a translation of an article.

    The correct url to the localization article is https://help.zendesk.com/hc/en-us/articles/229489108#post. I updated it in the comments above.

     

    1
  • Howie Paul
    Comment actions Permalink

    Hi, 

    Long time listener, first time Python utiliser!

    Ive found this guide very handy and easy to follow, so thanks!

    Backup works lovely, but my purpose for this is so I can restore it to our sandbox.

    Youve provided a restore script, but only in the scenario where the files exist already. Can you help a guy out if they in fact dont exist?

    Thanks

    Howie

    0
  • Joey
    Comment actions Permalink

    Hi Howie-

    The resource would have to be re-created in sandbox, as IDs would not carry over from your primary acct to sandbox. That said, I generally advise against using production data in a sandboxed environment. May I please ask what your goals are here?

    0
  • Howie Paul
    Comment actions Permalink

    Hi joseph, thanks for your response.

    My goals for the use of the sandbox then:

    I have created a new theme for the Help Center which I do not want to make public, but I do want to sent it round to gauge peoples opinion. The best way I found to do this is by using the samdbox.

    Given the high level of customisation I employ I would need my documents imported too, in order to give the full feel of the new theme,

    Hope that makes sense.

    Howie

    0
  • Joey
    Comment actions Permalink

    Hey Howie-

    Thanks for the explanation. The long and short of it is that all these resources will need to be recreated in the sandbox, generating their own unique IDs for Category, Section, Article, attachment, etc. Does this make sense? These values are not carried over from production.

    0
  • ccaux
    Comment actions Permalink

    Hi, I am trying to do exactly the same thing as Howie (exactly same use case). 

    I have started a reference doc to map IDs for Category and Sections and have created those in my Sandbox. 

    Is there a script available to recreate all the articles? I understand that the newly create articles will have new IDs. 

    Alternatively is there a recommended process for major overalls of the Help Centre (new IA, new design etc..)? 

     

    1
  • Bryan - Community Manager
    Comment actions Permalink

    Hey there @ccaux. As this article's code is basically open-source, perhaps someone else in the community has added this feature and will share. Other than that, the article was really for demo purposes and the code is not actively updated.

    As for customizing Help Center (HC), the support.zendesk.com/hc community has a lot more content and activity around HC cutomizations and ideas. There are also knowledge base articles there on the subject, too. The develop.zendesk.com site is more focused on APIs, Apps framework, and Embeddables (widgets & Mobile SDKs). I know this is a delayed response but hope it helps all the same.

    0
  • Travis Abdelhamed
    Comment actions Permalink

    When creating the log file using the provided code, your file will have a blank row between every article when opening in windows. To eliminate the blank row, modify the following line accordingly (adding the newline parameter):

    with open(os.path.join(backup_path, '_log.csv'), mode='wt', newline='', encoding='utf-8') as f:

     

     

    0
  • Nikolaus Thumma
    Comment actions Permalink

    Hey Charles - I'm running into an auth issue when trying to connect through the python script.  Does this not work if SSO is enabled?

     

    Thank You

    0
  • Charles Nadeau
    Comment actions Permalink

    Hi Jasper,

    There are 3 authentication options for using the API:

    • basic auth credentials using a Zendesk username & password
    • a Zendesk API token
    • an OAuth token created using a Zendesk API client

    For details, see https://developer.zendesk.com/rest_api/docs/support/introduction#security-and-authentication.

     

     

    0
  • Dan Derks
    Comment actions Permalink

    Hi Charles!

    Wonderful script, thank you -- we're all backed up.

    quick note: my machine has Python 2 and Python 3, so the pip commands defaulted to my Python 2 installation. perhaps it'd help others to explicitly use pip3?:

    $ pip3 install beautifulsoup4
    $ pip3 install lxml

    Re: Creating an Article with the Restore script, could you please walk a novice through which lines would need changed and what additional info is necessary?

    Thank you!

    0
  • Nicolas
    Comment actions Permalink

    Hi Charles,

    Thanks for the great script.

    I am facing an issue, though: the script doesn't retrieve all articles.

    In my case, I manage to get 59 articles back while the Guide Admin Portal mentions 111 published, 4 drafts and 34 archived.

    I have checked that:

    - the superuser credentials I use allow me full access on frontend, to all user-segmented sections;

    - all articles are written in the same language.

    Only articles visible visible to everyone have been retrieved, not the one belonging to the user-segmented sections.

    Any idea about how to solve this?

    Cheers,

    Nicolas

    0
  • Russell Dunn
    Comment actions Permalink

    Thank you for the script!

    Echoing a few earlier comments however, I'm also struggling to retrieve all articles. I get 155 articles exported, but have 173 published articles. I am an administrator.

    We do have some articles that are only visible to Agents/managers, as per Nicolas' comment above, would that be the problem? How then would I also retrieve these articles?

    0
  • Devan - Community Manager
    Comment actions Permalink

    Hello Russell,

    So this usually has something to do with the HC view permissions of the person making the API request (as identified by the credentials used to authenticate the request). If the person making the API request can't view certain articles in the HC (as specified by the view permissions in Guide), then the API won't return those articles to the person.

    Hope this helps to clarify things a bit better and let us know if there is anything else we can assist with please let us know. 

    0
  • Russell Dunn
    Comment actions Permalink

    Well yes thats what I thought, but when I said administrator the correct terminology in Guide is Manager, which is what my account is. I'm passing in my credentials as well, not running anonymously.

    0
  • Devan - Community Manager
    Comment actions Permalink

    Hello Russel,

    The only other thing that might apply in the instance is the API credentials the being using. Are you using your own email/password credentials or an API token with your email? If using an API token, the request would be restricted to the permissions of the admin who **created** the API token initially, not the person making the request regardless of whether the requester is a Guide Manager.

    If that not the issue, then the next step would be to examine the 18 articles that are not returned to look for a common denominator.

    0
  • Thomas Crowley
    Comment actions Permalink

    If I wanted to scrape attachments other than images (like PDF, DOCX, etc.) how would I do that? Can I modify the image part of the script or would I need something quite different?

    0
  • Bryan - Community Manager
    Comment actions Permalink

    Hi Thomas. If you go through a ticket's comments (using GET /api/v2/tickets/{ticket_id}/comments.json), you'll see the content_url property on a comment if there's an attachment. You can then use the URL value to retrieve the file.

    You can check out the general REST API attachment reference at: https://developer.zendesk.com/rest_api/docs/support/attachments

    There's no filtering of attachment types using these APIs, however, so that's something you would have to do.

    Does this help?

    0

Please sign in to leave a comment.

Powered by Zendesk